Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

ObservabilityCON Day 4 recap: a panel discussion on observability (and its future), the benefits of Chaos Engineering, and an observability demo showcase

Over the past four days, Grafana Labs' ObservabilityCON 2020 brought together the Grafana community for talks dedicated to observability. We hope you enjoyed all of the sessions, which are available on demand now. (Link to them from the schedule on the event page). The conference wrapped up with predictions and advice from observability experts, lessons in failure, and Grafana Labs team members showcasing ways Grafana and other tools fit into an observability workflow.

Looking back on Chaos Conf 2020

It’s already been a week since we closed our third annual Chaos Conf! While we were forced to take the conference online, this meant that more of you could join us. Over 3,500 people signed up to help make this the world’s largest Chaos Engineering conference. That’s 5x more than 2019, and nearly 10x more than 2018! This is a testament to the growth of Chaos Engineering as a practice across many different industries and around the world.

Is your microservice a distributed monolith?

Your team has decided to migrate your monolithic application to a microservices architecture. You’ve modularized your business logic, containerized your codebase, allowed your developers to do polyglot programming, replaced function calls with API calls, built a Kubernetes environment, and fine-tuned your deployment strategy. But soon after hitting deploy, you start noticing problems.

Technology Business Management and Chaos Engineering

Get started with Gremlin’s Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Technology Business Management (TBM) is a decision-making tool that helps organizations maximize the business value of information technology (IT) spending by adjusting management practices. With TBM, IT is transformed to run like a business instead of merely a cost center.

Understanding your application's critical path

Don’t wait for an incident to focus on reliability. Learn concrete steps for preventing incidents in the first place in our two-part series, Planning and Architecting for Reliability. It’s 3 a.m. You’re lying comfortably in bed when suddenly your phone starts screeching. It’s an automated high-severity alert telling you that your company’s web application is down. Exhausted, you open the website on your phone and do some basic tests.

Client-side chaos: Making your front end more reliable

Get started with Gremlin’s Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. The concept of Chaos Engineering is most often applied to backend systems, but for teams building websites and web applications, this is only half of the story.

Announcing Shared Scenarios to Promote a Culture of Reliability

Get started with Gremlin’s Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Today, Gremlin is excited to announce the ability to share a Scenario across your entire organization. This allows you to build up a library of reliability exercises that are customized to your company’s applications and technology.

What your company can learn from the Bank of England's resilience proposal

Learn how to modernize your financial systems with confidence while mitigating risk (and meeting compliance). This article was originally published on TechCrunch. The outages at RBS, TSB, and Visa left millions of people unable to deposit their paychecks, pay their bills, acquire new loans, and more.

Is your Grafana dashboard ready to spot chaos?

When it comes to systems reliability, you wouldn’t normally think that unleashing additional chaos would actually be helpful, would you? As more engineering teams moved toward microservice-based architectures for cloud applications over the course of this past decade, many of them didn’t change their testing strategies.

Announcing Chaos Conf 2020 (Online): Be Prepared For Moments That Matter

We’re excited to announce the third annual Chaos Conf! Given the events with Covid-19 this year, we will be holding this event fully online for the health and safety of attendees. The unforeseen impact of this virus on our lives, our businesses, and our software highlights the importance of preparing for the unexpected. Our theme for this year highlights this: Prepare for moments that matter. Chaos Conf will take place over the course of three days: October 6–8.