Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

SRE Incident Management: Overview, Techniques, and Tools

In the world of a site reliability engineer (SRE), failure is not only an option, but also expected. Systems, web applications, servers, devices, etc., are all prone to performance issues and unexpected outages at some point. It is an unavoidable fact. These unexpected failures can lead to huge revenue losses, customer trust and depending on the industry, maybe fines. Fortunately, SRE incident management is one of the core practices used to limit the disruption caused by unexpected issues.

Monitoring Distributed Systems

There was a time when standing up a website or application was simple and straightforward and not the complex networks they are today. Web developers or administrators did not have to worry or even consider the complexity of distributed systems of today. The recipe was straightforward. Do you have a database? Check. Do you have a web server? Check. Great, your system was ready to be deployed.

SRE Principles: The 7 Fundamental Rules

In one of our previous articles, we discussed what an SRE is, what they do, and some of the common responsibilities that a typical SRE may have, like supporting operations, dealing with trouble tickets and incident response, and general system monitoring and observability. In this article, we will take a deeper dive into the various SRE principles and guidelines that a site reliability engineer practices in their role.

Top 13 Site Reliability Engineer (SRE) Tools

The role and responsibilities of a site reliability engineer (SRE) may vary depending on the size of the organization. For the most part, a site reliability engineer is focused on multiple tasks and projects at one time, so for most SREs, the various tools they use reflect their eve-evolving responsibilities. A typical SRE is busy automating, cleaning up code, upgrading servers, and continually monitoring dashboards for performance, etc., so they are going to see more tools in that toolbelt.

What is a Site Reliability Engineer (SRE)?

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. It also encompasses a strategy and set of practices and principles across service offerings and is closely tied to DevOps and operations. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

Email Infrastructure Monitoring Checklist

A lot of time and resources are invested in making sure your customers get your emails. This is where email infrastructure comes in handy. While you have limited control over user interaction with your emails, monitoring email infrastructure is in your hands. Email infrastructure usually consists of your server and domain configuration, server performance, IP address, mail agents, and more. And to make sure your email infrastructure is in perfect working order, you need to constantly monitor it.

Monitoring Serverless Applications

Serverless. It’s likely you’ve already come across this term somewhere, but what exactly does it mean? Well, to start, serverless, or serverless computing, doesn’t really mean there aren’t servers involved, because there are, rather it refers to the fact that the responsibility of having to manage, scale, provision, maintain, etc., those resources now belong to cloud providers, such as AWS Lambda, Google Cloud Platform, Microsoft Azure, and others.

How to Optimize Websites for Ad Publishers

As an ad publisher, your revenue depends on two main factors: traffic to your site and ad optimization. A lot of the focus goes into the practice and processes of driving traffic to your site from an SEO perspective, but what if when visitors get to your site, they have a less than ideal experience? All the effort and time that went into creating and driving traffic to your site would be for nothing if the visitor lands on your page and doesn’t take any action.

What Does My Website Look Like From China? Test and Monitor Performance from China

In this current age of the Internet, it’s a common practice to build a website to run your online business. With the networks all around the world, theoretically, you can do business boundlessly. However, like each country has its boundary, the world of Internet is not a world without any control. In fact, every country has its own laws and rules toward this virtual world. And the case is especially different, when China’s Internet environment is involved.

Internal Applications: Monitoring from Behind Your Firewall

As companies decide whether or not to move ahead with an “everything in the cloud” strategy for providing consumer-facing applications, enterprise applications are also getting a new shape with web-based applications to support internal business operations. These applications live inside the private network of the organization and often have role-based access.