Operations | Monitoring | ITSM | DevOps | Cloud

RESOLVE '22: Expert predictions for AIOps 2022-2025

BigPanda’s RESOLVE ‘22 conference hosted a number of luminaries in the AIOps and IT Ops world, so naturally we needed to get their thoughts on the future of the market and where they see AIOps going in the next few years. Our guests for the session titled Expert predictions for AIOps 2022-2025 were from the press, investor community, analyst community and vendor world.

Open-source storage for beginners with Ceph

Modern organisations have become reliant on their IT capabilities, and at the heart of that infrastructure is a growing need to store data. Be it transactional databases, file shares, or burgeoning data lakes for business analytics. Traditionally, storage needs have been catered to by big iron hardware vendors, but over the last decade, more and more organisations have turned to open-source solutions such as Ceph running on commodity hardware.

How to Perform Geolocation Testing to Ensure Your Website Works Globally

So, you have launched a website intending to reach a worldwide audience? If you're running a business, this could be the first step to growing your brand. But is your website really ready to go global? After all, just because your website works for a user in the United States doesn't mean it will be accessible to a user in Japan. For one, not everyone speaks the same language. Does your website offer translation for users visiting from different global locations?

Configuring an OpenTelemetry Collector to connect to BindPlane OP

Bindplane OP is the first open source, vendor-agnostic, agent and pipeline management tool. It makes it easy to deploy, configure, and manage agents on thousands of sources, and ship metrics, logs, and traces to any destination. This blog shows you how to configure an existing OpenTelemetry Collector from any source to connect to Bindplane OP without needing to remove or reinstall the collector.

What is Kubernetes CrashLoopBackOff? And how to fix it

CrashLoopBackOff is a Kubernetes state representing a restart loop that is happening in a Pod: a container in the Pod is started, but crashes and is then restarted, over and over again. Kubernetes will wait an increasing back-off time between restarts to give you a chance to fix the error. As such, CrashLoopBackOff is not an error on itself, but indicates that there’s an error happening that prevents a Pod from starting properly.

An Introduction to PromQL: How to Write Simple Queries

PromQL is a flexible language designed to make it easy for users to perform ad-hoc queries against their data. By default, Prometheus indexes all of the fields in each metric except for source and target, which are not indexed by default. Prometheus is an open-source tool that lets you monitor Kubernetes clusters and applications. It collects data from monitoring targets by scraping metrics HTTP endpoints.

New in Grafana Alerting: File provisioning

We are happy to announce that file provisioning for Grafana Alerting has arrived in Grafana 9.1. This feature enables you to configure your whole alerting stack using files on disk, as you may already do with data sources or dashboards. The Terraform Grafana provider has also been updated to allow the provisioning of Grafana Alerting resources.

What are Canary Deployments and Why are they Important?

Every modification to software comes with the potential for production problems. Application failures often have serious consequences which can result in a loss of revenue and a poor customer experience. Additionally, organizations constantly try to improve their services for a better customer experience. How can you minimize the chance of error and update your application with confidence?

Intro to OEE

Efficient manufacturing is important for saving companies time, money, and energy. Making decisions based on data can improve efficiency, but there’s a lot of data to sort through. Manufacturing equipment contains many sensors, especially in the IIoT space. Overall Equipment Effectiveness (OEE) was first described by Seiichi Nakajima in the mid-twentieth century as part of his Total Productive Maintenance (TPM) method.

Ansible Key Terms: Getting Started

If you’re a systems administrator, there’s a good chance you’ve heard of Ansible. But if you’re not familiar with the tool or just getting started with it, there are some key terms and concepts you need to know. Here we will give you an overview of Ansible, from its origins to the latest features. We’ll also cover some of the key terminology associated with Ansible so you can start using it effectively immediately.