May 2022

SRE basics: Understanding SLAs, SLOs and SLIs

May 27, 2022 By Aimee Pearcy In Reliably

SLAs, SLOs and SLIs are fundamental to site reliability engineering (SRE), but what are they and why are they important for delivering services?

Read Post

Reliably

Read more about SRE basics: Understanding SLAs, SLOs and SLIs

Error Budgets: Ultimate SRE Guide For Teams

May 26, 2022 By Samadrita Ghosh In Reliably

Any engineered system does not guarantee 100% uptime. There are bound to be some unforeseen system failures that cause downtime for the customers or create a poor customer experience. It is, therefore, best practice to take into account a margin for plausible failures. An error budget is this margin of error that the customer is informed about beforehand to secure tolerance during system failure for a decided number of hours.

Read Post

Reliably

Read more about Error Budgets: Ultimate SRE Guide For Teams

10 Reasons You Need A Service Level Agreement & Why It's important

May 26, 2022 By Mbaoma Mary In Reliably

A Service Level Agreement (SLA) consists of many service commitments. It is an essential part of a contract to outsource software development or software support between two or more parties, specifying the duties and the quality and type of service a company would provide for a fee to a customer.

Read Post

Reliably

Read more about 10 Reasons You Need A Service Level Agreement & Why It's important

Shift Left Reliability meetup - May Fifteen minutes or bust

May 20, 2022 By Reliably In Reliably

There is a yawning gap opening up between the best and the rest — the elite top few percent of engineering teams are making incredible gains year on year in velocity, reliability and human compatibility, whilst the bottom 50% are actually losing ground. The loss has nothing to do with engineering ability. Take an engineer out of an elite-performing team and place them in the bottom 50%, and they become subpar too; take an engineer out of a mediocre team and embed them in an elite team, and they are pulling their weight within the year.

View Video

Reliably

DevOps
SRE

Read more about Shift Left Reliability meetup - May Fifteen minutes or bust

The Journey Of Building Reliability And Scaling Your Systems

May 14, 2022 By Stoyan Yanev In Reliably

Starting small and scaling your systems to serve billions of requests per month is never an easy path, so how do you build an infrastructure whilst making the right decisions and compromises for your services? Choosing the right technology stack and ensuring your CI/CD pipeline is reliable are two key steps towards this which we will explore.

Read Post

Reliably

Read more about The Journey Of Building Reliability And Scaling Your Systems

What Does It Mean To Build Resilient Service Applications?

May 14, 2022 By Yan Cui In Reliably

Resilience is the capability to recover quickly from difficulties or toughness. It is not about preventing failures, but being able to recover from them quickly. As Amazon’s CTO Werner Vogels famously said ‘everything fails all the time’. It’s a fact of life that failures will inevitably happen but what we can do is build applications that can withstand different kinds of failures. For example, in a data center, hardware is going to fail all the time.

Read Post

Reliably

Read more about What Does It Mean To Build Resilient Service Applications?

Software Reliability Metrics That Matter To Engineers

May 11, 2022 By Ben Johnson In Reliably

Software reliability is the probability of failure-free operations in a computer program for a specified period of time in a specified environment. It is critical for validation in order to determine characteristics in terms of system performance, functional compatibility, maintenance, competency, installation coverage and process documentation continuance. Software reliability helps you to identify and fix bugs, improve performance, and test features.

Read Post

Reliably

Read more about Software Reliability Metrics That Matter To Engineers

DevOps Vs SRE: The Main Differences

May 8, 2022 By Aimee Pearcy In Reliably

Site reliability engineering (SRE) is a set of principles that incorporates aspects of software engineering into IT operations. It takes tasks that would typically have been done manually by operations teams and gives them to engineers to solve using software and automation. This helps to create a bridge between development and operations teams. The concept of SRE was created by Google back in 2003. Since then, it has been adopted by thousands of organizations all over the world.

Read Post

Reliably

Read more about DevOps Vs SRE: The Main Differences

Observability Vs Monitoring: What's The Difference?

May 8, 2022 By Mbaoma Mary In Reliably

Clients expect prompt implementation of changes to their software, and this requirement motivates site reliability engineers to incorporate reliability into applications. The healthy practice of observability and monitoring can improve the reliability and security of software systems. Monitoring is the recording and interpreting data from software systems to keep track of their performance.

Read Post

Reliably

Read more about Observability Vs Monitoring: What's The Difference?

NewsKit API: The journey of building reliability into our systems at News UK

May 3, 2022 By Reliably In Reliably

Starting small and currently serving billions of requests per month is never an easy path. Stoyan Yanev, Principal Engineer and Krasimir Petrov, Senior Software Engineer at News UK will show how they built their infrastructure and the decisions and compromises that had to be made along the way. The talk will be centered around NewsKits API and the importance of Reliability before opening up a discussion among the group.

View Video

Reliably

DevOps
SRE

Read more about NewsKit API: The journey of building reliability into our systems at News UK

How To Reduce Technical Debt

May 2, 2022 By Aimee Pearcy In Reliably

Technical debt is the implied cost of the additional work that is required when a team chooses a quick, easy solution that is limited, instead of a more well-thought-out, higher-quality solution that would take longer. Essentially, it’s what happens when teams prioritize speed over quality. Examples of technical debt include untested code, unreadable code, dead code, duplicated code, or outdated documentation.

Read Post

Reliably

Read more about How To Reduce Technical Debt

Operations | Monitoring | ITSM | DevOps | Cloud

May 2022

SRE basics: Understanding SLAs, SLOs and SLIs

Error Budgets: Ultimate SRE Guide For Teams

10 Reasons You Need A Service Level Agreement & Why It's important

Shift Left Reliability meetup - May Fifteen minutes or bust

The Journey Of Building Reliability And Scaling Your Systems

What Does It Mean To Build Resilient Service Applications?

Software Reliability Metrics That Matter To Engineers

DevOps Vs SRE: The Main Differences

Observability Vs Monitoring: What's The Difference?

NewsKit API: The journey of building reliability into our systems at News UK

How To Reduce Technical Debt

Monthly Archive

Follow Us