What is SRE?
Site Reliability Engineering (SRE) is a practice for managing the reliability of systems that began at Google in the early 2000s. Ben Treynor Sloss from Google started the first SRE team and coined the name.
Site Reliability Engineering (SRE) is a practice for managing the reliability of systems that began at Google in the early 2000s. Ben Treynor Sloss from Google started the first SRE team and coined the name.
Here at FireHydrant we are always looking for ways to improve and simplify incident management, today we’re happy to announce a set of changes to the incident and retrospective pages to further simplify the incident command center. To make it easier to stay up to date on the status of your incident, we have made the incident timeline permanently viewable on your Incident Command Center. You can adjust the width of your timeline to ensure you can see the most important information at all times.
In an effort to make it even easier to open incidents, FireHydrant will now let you open an incident from Slack in a single click. When an alert is ingested into FireHydrant a message will post to a channel of your choosing to open an incident. When the incident is opened it will pull in all the data from the PagerDuty alert and configure your incident with that data. Now you can go from an alert firing in PagerDuty to an open FireHydrant incident with all of your automated process in under 5 seconds.
Communication is one of the hardest things to do well while responding to incidents. At FireHydrant, we’ve focused on helping people communicate well within their teams when responding to incidents, and also after the fact during post-incident reviews. But what about communicating with your customers? During an incident, your customers want to know that you’re aware of the problem and are working to mitigate or resolve it.
The first HashiConf Digital event was held on June 22-24, online. The event was meant to be HashiConf Amsterdam, but the team pivoted and moved it online because of COVID-19. My employer FireHydrant was a sponsor, and I was happy to have a chance to attend. The event was very well organized, and that’s even more impressive given that the team had to shift it online.
Headquartered in Oakland, California, LaunchDarkly is a feature management platform that empowers all teams to safely deliver and control software through feature flags. By separating code deployments from feature releases, LaunchDarkly enables teams to deploy faster, reduce risk, and iterate continuously. Over 1000 organizations use LaunchDarkly to build, operate, and learn from their software.
In today’s world, systems are increasingly becoming more and more complex. Due to this complexity, it’s no longer a matter of “if” our systems will fail but “when”. To manage expectations for when our systems do fail, we can look no further than our Service Level Agreement.