For our latest StackPod episode, we invited Hyke’s DevOps team lead and AWS Cloud architect: Yousef Sedky. Axiom Telecom is one of the largest telephone retailers in the United Arab Emirates and Saudi Arabia and Hyke, its sister company, is a distribution platform for mobile products.
Site Reliability Engineering (SRE) teams and Platform Engineering teams share similar goals -- like maximizing automation and reducing toil -- and similar methodologies. But they have different priorities, and use somewhat different tools to achieve them. What are SREs, what are platform engineers and how is each role similar and different? This article explains.
While DevOps focuses on software, HugOps focuses on the people behind the software. HugOps is a way to show empathy and appreciation for the real people who are involved in building, shipping, and running software. It’s a way to acknowledge and celebrate those – the Service Reliability Engineers (SREs), SysAdmins, Engineers, and Support Staff – who are working tirelessly behind the scenes to keep the services that we rely on running smoothly.
Observability is what defines a strong SRE team. In this blog, we have covered the importance of observability, and how SREs can leverage it to enhance their business. Observability is the practice of assessing a system's internal state by observing its external outputs. Through instrumentation, systems can provide telemetry such as metrics, traces, and logs that help organizations better understand, debug, maintain and evolve their platforms.
In this new era that we are moving into, what does successful reliability look like for modern teams and what are the requirements that will enable us to bring better reliability for our applications and system? With new ways of working, we explore how organziations should implement better service reliability and the different challenges teams are facing.
Reliability is important to everybody in a business. There’s a common misconception that it’s just important to engineers. We must change this mindset and think of reliability as a team sport that everyone needs to be part of. As an organization, there are five key phases to implementing effective reliability across teams.
In order to achieve high levels of reliability for services and products, businesses should consider the three fundamental pillars of reliability: monitoring, release engineering and simplicity.
Site Reliability Engineering (SRE) practice was established by Google nearly 20 years ago, and was popularized with Google’s monumental SRE Book. Everyone’s been attempting to follow that iconic path ever since.
SRE (site reliability engineering) is a discipline used by software engineering and IT teams to proactively build and maintain more reliable services. SRE is a functional way to apply software development solutions to IT operations problems. From IT monitoring to software delivery to incident response – site reliability engineers are focused on building and monitoring anything in production that improves service resiliency without harming development speed.
One more blog topic stemming from our weekly office hours that we hold with the field team here at Shipa. In our last office hours, was asked a question about “what are the difference between DevOps Engineers and SREs?”. Both professions are emerging disciplines and cultures that continue to evolve and play an importance in technology organizations. I’ve been fortunate to have written and spoken about this before; though taking a fresh look at what the two domains try to accomplish.
Software developments take place quickly as per the client’s requirements. The developments need to take place with safety and precautions. DevOps engineers can help into this matter; however, it is not possible without Observability.
If you’re an SRE, you might view AIOps with great excitement. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier. Alternatively, SREs may choose to view AIOps with disdain. They might think of AIOps as just a fancy buzzword that doesn’t live up to its promises, and that can become a distraction from the SRE tools that really matter. Which perspective is right?
SREs and Devs are used to solving problems even when an awkward or inefficient way is the only way. In AppScope 1.0, SREs and Devs have a new alternative to standard methods, that the AppScope team thinks will make that problem-solving a lot more fun. We in the AppScope team constantly hear firsthand about life in the SRE trenches. For this blog, we “interview” a fictional SRE/Dev whose thoughts and comments are a mash-up of things we’ve heard from real people we know.
In today’s world, the performance of your IT systems has a direct impact on your brand reputation and overall business revenue. A “good enough” approach to software performance is no longer good enough. This has led to the growing importance of SREs and a shift to more sophisticated, advanced observability that requires moving beyond basic on/off monitoring to advanced monitoring techniques.
When are you smarter than your playbooks, and when are your playbooks smarter than you? That’s a question that engineers rarely step back to consider. The rational, disciplined parts of our minds tell us that the playbooks we are supposed to follow were carefully designed and tested, and that we should stick to them at all costs.
Building a successful monitoring process for your application is essential for high availability. In the first of this three-part blog series, Safeer discusses the four key SRE Golden Signals for metrics-driven measurement, and the role it plays in the overall context of Monitoring. Monitoring is the cornerstone of operating any software system or application effectively. The more visibility you have into the software and hardware systems, the better you are at serving your customers. It tells you whether you are on the right track and, if not, by how much you are missing the mark.