Operations | Monitoring | ITSM | DevOps | Cloud

A Practical Guide to Python Application Performance Monitoring (APM)

When your Python app starts slowing down, maybe queries are taking longer, memory keeps creeping up, or API calls are lagging—basic server metrics won’t tell you why. You need to see what’s happening inside the application itself. That’s the role of Application Performance Monitoring (APM). It gives you a breakdown of database queries, external API calls, memory usage, error rates, and more, so you can connect the dots between code and performance.

Serverless Applications: Why Monitoring is Essential for Speed and Reliability

Serverless applications are becoming the go-to architecture for modern developers. Startups and enterprises are building serverless applications because they offer scalability, cost-efficiency, and flexibility. However, these advantages come with unique challenges, especially when it comes to monitoring serverless applications. Traditional server monitoring tools fail to capture short-lived functions, making serverless application monitoring essential for maintaining performance and reliability.

DORA Compliance Software Options And Use Cases

DORA entered into application on January 17, 2025, and since then, DORA compliance software, such as Spektion, has become an essential part of many DORA-compliant workflows. However, in this article, we go beyond just one software solution and round up the most common DORA compliance software categories that covered entities are currently using. We also examine what they excel at and how they come together in the context of DORA compliance.

Top tips to keep calm when everything is needed ASAP

Top tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we’re looking at how to keep your cool when everything lands on your desk with an ASAP tag. There’s always that day at work. Meetings stacked back-to-back, emails piling faster than you can open them, and just when you think you’ve got a handle on things, your boss drops the golden line: Can you get this done today?

How should Prometheus handle OpenTelemetry resource attributes?

Note: A version of this post originally appeared on the OpenTelemetry blog. Victoria Nduka is user experience designer and open source contributor making her way into the cloud native space. She writes about design, accessibility, and open source with the same curiosity she brings to her work. On May 29, I wrapped up my mentorship with Prometheus through the Linux Foundation mentorship program.

The core KPIs of LLM performance (and how to track them)

A few months ago, I built an MCP server for Toronto’s Open Data portal so an agent could fetch datasets relevant to a user’s question. I threw the first version together, skimmed the code, and everything looked fine. Then I asked Claude: “What are all the traffic-related data sources for the city of Toronto?” The tool call fired. I got relevant results. And then I hit an error: “Conversation is too long, please start a new conversation.” I had only asked one question.

Understanding Incident Response vs Incident Remediation

At a high level, incident remediation is a part of the incident response process. An Incident response plan manages the incident lifecycle across planning, detection, investigation, and recovery. Meanwhile, incident remediation focuses on identifying root causes and implementing measures to prevent future occurrences.