Operations | Monitoring | ITSM | DevOps | Cloud

From Data Lake to Lakehouse. Why Cribl is Preparing for the Agentic #ai Era #telemetry

Customers asked for a simpler way to store and access telemetry data, and Cribl delivered. First came Cribl Lake. Cost effective data storage, flexible access, and identity based authorization instead of infrastructure based access rules. A simple way to retain data at rest and run slow, inexpensive analytics when needed. But the story did not end there.

Why Gaining Control of Your #telemetry Data Is a Game Changer

Disconnected pipelines. Unknown data sources. Costs that do not add up. Many teams struggle to answer a simple question. What data do we have and where is it going? In this clip, a Cribl customer explains how bringing all telemetry data together changed everything. With Cribl, their team can finally see what they collect, where it flows, and what it costs. That clarity unlocked smarter reduction, better routing decisions, and major optimization across security and observability workflows.

AI wrote the code, but can you trust it? #aicoding #integration #cursor #devops #speedscale

Using AI coding tools like Cursor is fast, but it leaves a massive question: Is the new code going to break production? We solve this by combining Cursor with Proxymock! I take a live traffic snapshot of my running app, feed it back to the AI, and instantly run realistic integration tests locally. It's the only way to get true confidence before you push. Watch the full video below!

OnlineOrNot's lessons from Cloudflare's outage on 2025-11-18

On 2025-11-18 at 11:48 UTC, Cloudflare declared an incident affecting the global network (that also affected OnlineOrNot). OnlineOrNot monitors websites, APIs, web apps, and cron jobs, while providing status pages as well. While we partially mitigated the issue by enabling a fallback to AWS-based monitoring, between 13:00 UTC and 14:33 UTC failing checks went unreported, heartbeat checks over-reported, and status pages were unavailable.

Five ITOps best practices to stay ahead during major third-party outages

When external providers fail—whether it was CrowdStrike outage last year, AWS outage last month, or the Cloudflare DNS outage yesterday—the symptoms inside your environment often look like internal issues: timeouts, login failures, API errors, service degradation, or sudden spikes in dependency-related alerts. It’s natural for teams to start searching through their own infrastructure first, but none of these symptoms clearly point to your systems as the root cause.

AI-Suggested Alert Thresholds for Mobile Telemetry

Life is pretty good. I’ve shipped a mobile app and I’m (happily) drowning in telemetry. Battery impact, time in foreground/background per screen, crash rates, slow frames, network retries – the works. The data is brilliant; the challenge is turning signals into reliable alerts that catch real issues which are relevant to my app’s functions. So… what should I actually listen for, and where should I set the thresholds?

EP1: Getting started with ServiceDesk Plus MSP Cloud

Join us for a step-by-step tutorial on how to effectively configure your instance, customize your help desk, and automate processes using ServiceDesk Plus MSP Cloud. Also, learn how to leverage essential PSA features such as Timesheets, Billings, and Resource Management to streamline your MSP operations and achieve operational efficiency right from the start.