Operations | Monitoring | ITSM | DevOps | Cloud

Reduce 60% of your Logging Volume, and Save 40% of your Logging Costs with Lightrun Log Optimizer

As organizations are adopting more of the FinOps foundation practices and trying to optimize their cloud-computing costs, engineering plays an imperative role in that maturity. Traditional troubleshooting of applications nowadays relies heavily on static logs and legacy telemetry that developers added either when first writing their applications, or whenever they run a troubleshooting session where they lack telemetry and need to add more logs in an ad-hoc fashion.

How Monitoring, Observability & Telemetry Come Together for Business Resilience

Systems going down because of an unforeseen incident? Got problems with your app or website? Is your audience missing out on products and services because your load times are too slow? Then monitoring and observability (and telemetry) should be of interest to you! In this long article, we’re covering everything! I’ll start with the concepts and how they work.

Epinio Meets s3gw

Since the very first version, Epinio has made use of an internal S3 endpoint to store the user’s projects in the form of aggregated tarballs. Those objects are then downloaded and staged by the internal engine’s pipeline and, finally, they are deployed into the Kubernetes cluster as consumable applications. Epinio makes use of S3 as an internal private service. In this scenario, S3 can be thought of as an internal ephemeral cache with the purpose of storing temporary objects.

8 Incident Management Tools You Need To Consider In 2023

You're probably aware that downtime is expensive—but do you know how expensive it is? The short answer is—very. According to the Ponemon Institute, outages cost organizations an average of $9,000 per minute (or $540,000 per hour). That's why companies of all sizes are investing in incident management tools to reduce their downtime and improve the customer experience.

CircleCI Outages: Have They Kept Their Promises in 2022?

At the beginning of April 2022, a massive disruption in CircleCI caused large portions of their cloud offering to be unavailable for users worldwide. It occurred after CircleCI deployed a change to its front end and an auto-vacuum job on one of its core databases. Due to this outage, CircleCI users were unable to run tests and deploy code. After the incident, CircleCI promised to prevent these kinds of disruptions in the future.

Python Logging Tutorial: How-To, Basic Examples & Best Practices

Logging is the process of keeping records of activities and data of a software program. It is an important aspect of developing, debugging, and running software solutions as it helps developers track their program, better understand the flow and discover unexpected scenarios and problems. The log records are extremely helpful in scenarios where a developer has to debug or maintain another developer’s code.

Why you can't have AIOps without Data Engineering

There’s a familiar saying: garbage in, garbage out. For ITOps, this directly applies to data engineering. BigPanda’s Area Vice President of Value and Adoption, Craig Ferrara, says the importance of data hygiene—putting good data in to get good data out—is the core of data engineering, and it requires ITOps to take a look at their data before integrating with an AIOps solution.

How Synthetic Transaction Monitoring Provides Complete Site Visibility

We’ve all been in the situation before: it’s Friday at 5 PM and the only on-call engineer available to handle incidents is about to hit the slopes. Unfortunately, at that very moment, a customer reports to support that they are unable to access the company’s ecommerce website to complete a purchase. Internal monitoring systems seem quiet and services appear available on internal health dashboards.