The latest News and Information on Observabilty for complex systems and related technologies.
From its inception as a powerhouse for logging, Elastic Observability has grown into a comprehensive solution for full-stack multi and hybrid-cloud observability. Given the increasing complexity of the cloud-native world, the major challenge for observability is twofold: getting deeper and more frictionless visibility at all levels of applications, services, and infrastructure, and making sense of the overwhelming amount of data that is available.
Debugging is an unavoidable part of software development, especially in production. You can often find yourself in “debugging hell,” where an enormous amount of debugging consumes all your time and keeps the project from progressing. According to a report by the University of Cambridge, programmers spend almost 50% of their time debugging. So how can we make production debugging more effective and less time-consuming?
With Datadog’s Dash conference right around the corner, we at Cortex have been thinking a lot about best practices for observability. To get the most out of an application performance monitoring (APM) vendor like Datadog, you want to make sure monitoring and observability are built into launch and production readiness checklists.
Another month has come to a close, so I’m back again to take you through what’s new and noteworthy from the month of September. If you missed last month’s blog, this will be a monthly recurring series to keep you posted with the latest and greatest at Honeycomb. There’s a ton to cover, so I’ll dispense with the preamble and dive right in.
Two years ago I wrote a piece in The New Stack about the Future of Ops Careers. Towards the end, I wrote: I described the second category as “operations engineering minus the infrastructure,” dedicated to evaluating and assembling a production stack of third-party platform providers, enabling software engineers to self-serve their services and own their own code in production. I said: That second category I was describing now has a name. We call those teams "platform engineering.".
So far in our series on scaling observability for game launches, we’ve discussed ways to 1) quickly analyze large volumes of telemetry data and, 2) ensure high-quality telemetry data for more effective analysis at lower costs. The best practices in these blogs outline best practices for scaling observability during game launch day – which is necessary to ensure high performance across all infrastructure components – to ensure no lag, no glitches, and no bugs.
Organizations today are under pressure to stay ahead and maintain IT applications and infrastructure optimally. That means their IT teams are tasked to make sure that functions move along smoothly while minimizing downtime. To keep the lights on, enterprises add whatever domain-specific tools they need. However, these tools are often reactive, and not nearly robust enough to handle complex application topologies.
Digital transformation requires organizational evolution. Constant demand for rapid delivery of upgrades and new products forces change. Surely, the old days of managing monolithic applications housed in private servers are over. Applications consist of virtualized, containerized, and serverless code that’s networked via APIs across a hybrid infrastructure of public and private clouds.