Operations | Monitoring | ITSM | DevOps | Cloud

Context is Key: Additive vs. Subtractive Topology

Understanding the context of an IT incident can greatly reduce the MTTR and enhance the ability to determine the root cause. In an IT environment, ‘context’ is used to refer to the subset of information necessary to troubleshoot and diagnose an incident, or event. For some scenarios, the context may be the downstream dependencies after a high availability pair of firewalls goes offline, and in others, it may be the datastore in contention from multiple VMs.

PyCon 2019 - Scout brings APM for Python

The 2019 edition of PyCon USA takes place over the next few days in Cleveland, Ohio. Scout is delighted to be there, sharing our APM tool with the Python community. Plus, we'll have great t-shirts and stickers for you, and we love to get geeky - one of our lead product engineers, plus two of our smart support engineers, are working the booth, ready to help you figure out your Python performance problems.

Monitor Microsoft Hyper-V with Datadog

Hyper-V is a hardware virtualization platform used to create and run virtual machines on Windows host systems. Hyper-V allocates resources from the physical hosts it runs on to the virtual machines it creates. If those resources are spread too thin, virtual machines may encounter slow performance and startup failures. With our new integration you can monitor the health of every layer of your Hyper-V stack: physical hosts, virtual machines, and all of the applications and services running on them.

The Lifecycle of a Request

Most Rails developers should be pretty familiar with this work flow: open up a controller file in your editor, write some Ruby code inside an action method, visit that URL from the browser and the code you just wrote comes alive. But have you thought about how any of this works? How did typing a URL into your browser's address bar turn into a method call on your controllers? Who actually calls your methods?

When In Doubt, Add More Spans: A Tale of Tracing and Testing In Production

Recently, Toshok was telling a story about the kind of thing he talks about a lot—improving the performance of some endpoint or page or other. Obviously, we spend a lot of time thinking about how to improve the experience of our users, but what caught my attention this time was that what he was describing sounded like a new kind of testing in production—so I asked him to go into a bit more detail.

How To Get Real-Time Visibility Into Serverless Apps

As CEO and co-founder of IOpipe, Adam Johnson works with both individual developers and engineering teams at global enterprises to get real-time visibility into the detailed behaviors of their serverless applications. According to The New Stack’s 2018 ebook, serverless adoption has grown by 75 percent since 2017, but developers continue to cite concerns about application performance, risk, and monitoring as drawbacks to building on a serverless architecture.

Incident Review: Caches are Good, Except When They Are Bad

Between Wednesday, April 17th and Friday, April 26th, Honeycomb had four separate periods of downtime affecting the Honeycomb API, resulting in approximately 38 minutes of total downtime. At Honeycomb, we believe that visibility into production services is important, especially when service outages are making your users unhappy. We take the impact of outages on our customers seriously, and believe that transparency is key to you trusting in and using our service.