Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Manual vs. AI-Driven Alert Triage and RCA: Who Will Win?

Curious to see how AI actually performs in a real-world production scenario? Watch the webinar “AI-Driven Alert Triage and RCA” with Logz.io Customer Success Engineer, Seth King. Below, we also bring the main highlights of the webinar. AI claims to make engineers more efficient and agile, by shortening processes and surfacing insights that help drive decisions.

Introducing the Coralogix SLO Center

Are you struggling to define reliability targets? Teams nowadays are turning to Service Level Objectives (SLOs), reliability targets that can be used to define how much you can play around with your systems before users are affected too much. While they're a great way of defining reliability targets, they are difficult to manage. That's why we built the SLO Center. One place to define, track, zoom into, and stay on top of all your reliability targets and error budgets - so you can be sure when you can experiment, and when it's best to stay safe.

AI Replay Summaries in Sentry Arrive!

Replays in Sentry are awesome. With one property in your Sentry config you can start capturing video-like replays of user interactions with your application, but the problem is... you still have to watch them... but not anymore! AI replay summaries take your replays and run the events through an LLM to summarize the events that happened in them. They are broken up into chapters, with the breadcrumb sequences embedded in, so you can quickly get context of whats happening in every replay.

Nothing about today's Internet stays in one place... so why does your monitoring?

Users are mobile. Apps are elastic. Traffic shifts constantly across clouds, ISPs, and geographies. Monitoring needs to adapt to that reality. You need visibility that moves with your users and your applications, wherever they go, however they connect. The Internet is now your application fabric. And your monitoring strategy should reflect that!

3 Signs You've Outgrown Scripts and Spreadsheets for Network Configs

In the early days of any IT operation, pragmatism rules. Most network teams start with what’s readily available—custom scripts, Excel spreadsheets, shared network drives, and tribal knowledge. It’s cost-effective and familiar. But as your organization grows, so does the complexity of your network. Devices multiply, configurations diversify, and the operational risk of keeping everything “stitched together” with manual methods increases exponentially.

Weaponized AI vs. AI Driven Security Posture Management: Why the Battle Starts in Misconfigurations

August 5, 2025, Las Vegas Black Hat 2025, Abnormal AI officially launched its Security Posture Management for Microsoft 365. This release marks a critical turning point. In an era where attackers weaponized AI to uncover and exploit misconfigured cloud environments at machine speed, reactive security simply can’t keep pace. Threat actors are now leveraging automated AI to scan systems, identify configuration drift, escalate privileges, and deploy zero‑day exploits in seconds.

Balancing Speed and Safety with Continuous Delivery

The benefits of continuous delivery are well known these days: rapid feedback, speed of innovation, reduced fault recovery time, and increased confidence in release processes. Along the same lines, those who release less frequently are likely to encounter more stress. Continuous delivery is a spectrum; it doesn’t have to mean blasting every commit to all production environments at once. So, how do we strike a balance between speed and safety?

Boosting Session Replay performance on iOS with View Renderer V2

After making Session Replay GA for Mobile, the adoption rose quickly and more feedback reached us. In less great news, our Apple SDK users reported that the performance overhead of Session Replay on older iOS devices made their apps unusable. So we went on the journey to find the culprit and found a solution that yielded 4-5x better performance in our benchmarks.

Size-capped telemetry storage with ClickHouse and Coroot

Cloud platforms make it incredibly easy to store data. Object storage feels endless, and block volumes can be resized anytime. That’s great, until you check the cost. In some cases, like financial transactions, storage costs are tiny compared to the value of the data. But observability is a different story. Logs, traces, and profiles can be extremely detailed and often take up more space than the actual business data. Yes, there are situations where logs need to be kept for compliance reasons.

Can External Data Predict System Failures?

Something critical just went down. Again. So you troubleshoot and find out everything's clean - logs, metrics, nothing seems out of the ordinary. You didn't think to look out the window, right? Let's rewind a couple of hours. The temperature spiked 15 degrees outside, the humidity was at 90% and a storm came out of nowhere. Meanwhile, your edge device is sitting in a box on a pole somewhere; it never stood a chance.