Operations | Monitoring | ITSM | DevOps | Cloud

%term

Ask a Partner About the Mistakes That Were Made

Did you hear about the failed video conferencing project? During a global stakeholder meeting, an IT team deployed new video conferencing software without testing a default echo feature. Every participant heard their voice echoing back, disrupting the meeting until someone identified and fixed the setting. While relatively minor, this incident demonstrates how overlooked technical details can compromise professional operations. Consider the failed ERP rollout by a food distributor.

How to do Agentless Monitoring with check_by_ssh

The fundamentals of Icinga 2 are check plugins. They are being executed and their return value is mapped to either Host or Service objects. Everything else follows on top. These check plugins can be either from the Monitoring Plugins or custom. While their origin does not matter, they are the building blocks of an Icinga monitoring stack. If a plugin goes CRITICAL, Icinga 2 alerts the sysadmin.

Manage All Your App Notifications in One Place with AppSignal

Alerts and notifications are the backbone of any Application Performance Monitoring (APM) tool, ensuring your team is immediately aware of critical issues. At AppSignal, we’re always improving our toolkit to help you stay ahead of problems before they impact performance or reliability. We've made huge improvements to how you can manage your app notifications and alerts with AppSignal.

Ex-Roblox SRE's take on SRE vs. DevOps

Former Roblox Sr. Engineering Manager Denys Pashutynski clarifies the fundamental difference between SRE and DevOps roles: SREs handle the customer-facing production edge while DevOps focuses on background automation.#sre From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

The One Thing Most Engineers Don't Understand (But Should)

How can engineering teams have a bigger impact on the bottom line? By thinking beyond code. Most engineers love to build and solve problems. But in a business, building for the sake of building isn’t enough. Even the cleanest code is just an expensive distraction if it doesn’t move the needle.

Diagnosing ActiveMQ broker performance issues with log analysis

Apache ActiveMQ is a widely used message broker that enables seamless communication between distributed applications. However, as the volume of messages increases, performance bottlenecks can arise, leading to slow message processing, high latency, broker crashes, and out of memory (OOM) errors. One of the most critical issues affecting ActiveMQ is OOM errors, which occur when the broker exceeds its allocated heap memory. This can result in service failures, message loss, and prolonged downtime.

How to leverage AI to enhance network monitoring in retail: A CXO's guide

The retail industry has evolved into a mix of physical stores, e-commerce, digital payments, and omnichannel interactions. Now, GenAI has been added to this mix, which changes how people shop, how retailers operate, and how employees work. While this shift creates opportunities for retailers of all sizes, it also presents serious challenges in maintaining network performance and staying compliant with industry regulations.

Diagnosing and resolving the 500 internal server error with Apache and Tomcat logs

The dreaded 500 internal server error is a common challenge for web administrators, often signaling a disruption in server operations. Diagnosing the root cause requires in-depth visibility into both web server and application behavior. In this blog, we’ll explore how log management tools simplify the diagnosis and resolution of 500 errors by leveraging insights from both Apache and Tomcat logs.

Getting started with Snyk dashboards

If you are involved in software development you will probably be aware of the ever-growing menace of supply chain attacks. These are attempts by attackers to insert malicious code into code libraries which might be downloaded or referenced by developers. Many modern frameworks can install hundreds or even thousands of dependencies, so the potential attack surface can be huge. As well as code libraries, attackers can also attempt to conceal malware in sources such as Docker images or CDNs.

Getting started with Postgres dashboards

In the last few years, Postgres has experienced a meteoric rise in popularity. A relational database that not long ago was relatively unknown outside of academic circles has now eclipsed MySql as the most popular database for developers in the most recent StackOverflow user survey. Why has it achieved such impressive popularity with developers?