Operations | Monitoring | ITSM | DevOps | Cloud

How to Keep Your System Visible in the Age of Remote Working

Monitoring IT infrastructure and services has always been an essential IT prerequisite. However, your IT monitoring system and security measures need to upgrade with an exponential increase in the number of remote users post-pandemic. For instance, consider this: At the end of a work day, you are notified that one of your critical services has gone down. But the problem is that five teams support different processes of that service.

Setting better SLOs using Google's Golden Signals

To many engineers, the idea that you can accurately and comprehensively track your application's user experience using just a few simple metrics might sound far-fetched. Believe it or not, there are four metrics that aim to do just that. They're called the four Golden Signals and should be a core part of your observability and reliability practices.

Building confidence with Cortex Discovery Audit

A microservices catalog is only useful if you are confident that anything stored in it is fully accurate and that the information will not become outdated. How can you be certain that your catalog stays up-to-date in the future? Should you look for an asset in the catalog and, despite finding it there, also double-check GitHub? The service catalog is supposed to be your single source of truth. The purpose is defeated if you have to look for what you need in multiple different places.
Sponsored Post

Production Data Simulation: Record in One Environment, Replay in Another

Have you ever experienced the problem where your code is broken in production, but everything runs correctly in your dev environment? This can be really challenging because you have limited information once something is in production, and you can't easily make changes and try different code. Speedscale production data simulation lets you securely capture the production application traffic, normalize the data, and replay it directly in your dev environment. There are a lot of challenges with trying to replicate the production environment in non-prod.

How we do realtime response with incident.io, Sentry & PagerDuty

Like most tech companies, we use an on-call rota and various alerting tools. We do this to respond to incidents before they’re reported. Proactively identifying issues and communicating to customers helps us provide great experiences and fosters trust. Internally, we’ve been using these alerting tools in tandem with our auto-create incidents feature. We’ve found that it’s made responding to the pager much smoother - it’s one less thing to do when you get paged at 2am.

A Guide To Opentelemetry Collector

This article will give you a quick overview of some of the key attributes you should know in order to get started with leveraging the OpenTelemetry collector for your next telemetry project. As an integral component of any project that involves distributed tracking, the OpenTelemetry Collector plays an important role. Simply put, it is helpful to know that the collector itself is a data pipeline service that collects telemetry data.

Complete Guide to Endpoint Backup

Without a doubt, a data recovery solution is essential when it comes to maintaining security and business continuity. Backups give you important survival options when ransomware hits, a laptop is lost, or someone accidentally deletes a folder full of important files. Without those safe, secure, redundant copies of your most important data, you’d be left out in the cold.

IoT Project Lifecycle: Key considerations for OTA updates at scale [Part IV]

From entertainment to security, automation is now pervasive. Intelligent devices are transforming our homes while enriching our lives, making them more efficient, productive and environmentally friendly. Most embedded devices run Linux, and their number is poised to keep growing.

Rising IT costs: What to watch out for

It seems like every conversation is about inflation lately. Everything is getting more expensive and the news cycle suggests there is little chance of that abating. Inflation and supply chain challenges are having a knock on effect in terms of cloud adoption and network usage. We’ve already seen some of the big providers increase their prices - so what’s to be done? Can technology also offer solutions for stemming the rise of IT costs?