Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Objectively Speaking: Understanding the Power of Objectives

Objectives help monitor different aspects of your services and systems such as latencies, error rates, PRs that are open, the age of a bug, and more. These are examples of things that drift away from what we think is good; which is essentially what an objective is. Objectives help us to define what ‘good’ looks like.

How Do You Measure Technical Debt?

Technical debt is one of the trade-offs today’s software teams make to speed up development, which helps go-to-market time in return. That is mission-critical for most start-ups. Instead of dwelling on implementation details, or trying to cover edge cases that may affect a small fraction of the end-users in an early development stage, agile teams prioritize early and continuous delivery.

Improving Reliability With OKR Initiatives

‘OKR’, which stands for ‘Objectives and Key Results,’ is a goal management framework designed to define goals and track outcomes. It differs from typical goal-setting techniques because the aim is to set very ambitious goals that encourage teams to flex their creativity. OKRs are used by Google, LinkedIn, Twitter, and other successful companies to create measurable goals, and to make sure team members are aligned and engaged.

What Is A Site Reliability Engineer And Why You Need One

Site reliability engineering (SRE) does the work that would typically be done by operations but instead uses engineers with software experience to solve problems. The concept of SRE was created by Google in 2003 after a team of software engineers was asked to make Google’s sites more scalable, reliant, and efficient. They described SRE as ‘when you treat operations as if it’s a software problem’.

Reliability Testing For SRE

Reliability testing is a software testing technique designed to make sure that a piece of software meets customer requirements, and to identify any faults within the product before it is delivered to the customer. It is the key to improving the design, functionality, and ultimately the quality of software. It should be performed at each level of software creation, and it encompasses everything from unit testing, to full system testing.

HugOps During Downtime: Building Empathetic Teams

While DevOps focuses on software, HugOps focuses on the people behind the software. HugOps is a way to show empathy and appreciation for the real people who are involved in building, shipping, and running software. It’s a way to acknowledge and celebrate those – the Service Reliability Engineers (SREs), SysAdmins, Engineers, and Support Staff – who are working tirelessly behind the scenes to keep the services that we rely on running smoothly.

Implementing Service Reliability In The World Of Remote Teams

In this new era that we are moving into, what does successful reliability look like for modern teams and what are the requirements that will enable us to bring better reliability for our applications and system? With new ways of working, we explore how organziations should implement better service reliability and the different challenges teams are facing.

Five Phases Of Effective Reliability Within Organizations

Reliability is important to everybody in a business. There’s a common misconception that it’s just important to engineers. We must change this mindset and think of reliability as a team sport that everyone needs to be part of. As an organization, there are five key phases to implementing effective reliability across teams.