Operations | Monitoring | ITSM | DevOps | Cloud

Incident Review - Google Cloud Outage has Widespread Downstream Impact

Outages on the Internet always catch you by surprise, whether you are the end user or the Head of SRE or DevOps trying to keep a clear mind while you execute your incident playbook. As people in charge of ensuring reliable services for our customers, our normal experience of outages involves surfing a deluge of fire alarms and video calls as we work to solve the problem as quickly as we can. We often forget, therefore, what an outage means to the end user.

Reverse Connect for Azure Virtual Desktops (AVD)

There’s something common between AVD and eG Enterprise. Can you take a wild guess? Listening on open TCP ports is an extremely bad practice for cloud architectures, as it exposes products and services to accepting incoming messages from malicious parties. This is something eG Innovations avoids in our own products (see details). This is also a best practice adopted by Microsoft for Azure Virtual Desktops (AVD).

How to Restore Databases From Native SQL Server Backups

In my previous post, Native SQL Server Backup Types and How-To Guide, I discussed the main types of native SQL Server backups and various backup options. Backups are critical to restoring databases quickly, but there isn’t much benefit to having backup files sitting around if you aren’t prepared and know when and how to perform the restores.

Detailed Insight, Right on Time: Introducing Scheduled Alerts

Logz.io customers, here’s some big product news that we think you’ll be excited to hear. Scheduled Alerts, an altogether new manner of alerting, is coming your way. That’s right, get ready to utilize a whole new world of alerts that weren’t previously available in the Logz.io platform.

Istio Log Analysis Guide

Istio has quickly become a cornerstone of most Kubernetes clusters. As your container orchestration platform scales, Istio embeds functionality into the fabric of your cluster that makes monitoring, observability, and flexibility much more straightforward. However, it leaves us with our next question – how do we monitor Istio? This Istio log analysis guide will help you get to the bottom of what your Istio platform is doing.

What is AIOps?

AIOps is an approach to managing the exponential growth of IT operations and the complexity of new technology through the application of artificial intelligence (AI). IT infrastructure increasingly relies on complicated deployments, multi-cloud architectures, and huge amounts of data. Traditionally, the tech industry responds to complexity by applying extra brainpower to the problem, bringing in more engineers, developers, and management.

Video: The new simple, scalable deployment for Grafana Loki and Grafana Enterprise Logs

With the recent release of Loki 2.4 and Grafana Enterprise Logs 1.2, we’re excited to introduce a new deployment architecture. Previously, if you wanted to scale a Loki installation, your options were: 1) run multiple instances of a single binary (not recommended!), or 2) run Loki as microservices. The first option was easy, but it led to brittle environments where a heavy query load could take down data ingestion and problems were often difficult to debug.

Introducing Logz.io Event Management: Accelerating Collaborative Threat Response

In the domain of cyber threat response, there’s a critical resource that every organization is desperately seeking to maximize: time. It’s not like today’s DevOps teams aren’t already ruthlessly focused on optimizing their work to unlock the greater potential of their human talent. Ensuring your organization to identify and address production issues faster – and increase focus on innovation – is the primary reason why Logz.io and its observability platform exist.

Challenges maintaining Prometheus LTS

In this article, we’ll cover the three main challenges you may face when maintaining your own Prometheus LTS solution. In the beginning, Prometheus claimed that it wasn’t a long-term metrics storage, the expected outcome was that somebody would eventually create that long-term storage (LTS) for Prometheus metrics. Currently, there are several open-source projects to provide long-term storage (Prometheus LTS). These community projects are ahead of the rest: Cortex, Thanos, and M3.

Development Environment Observability with Sentry

At Sentry, we’re always looking for innovative ways to dogfood our product. Over the last year we added Sentry’s error monitoring to our developer environment so that we could better understand the health of it. In this blog post I’m going to touch on how fragile local development environments can be, how we brought observability into what’s happening by introducing Sentry, and what outcomes it has driven for our engineering organization.