Operations | Monitoring | ITSM | DevOps | Cloud

Gett replaces paging tool with Exigence to achieve IR excellence

“By the time a pager alerts you to a problem, it’s too late to think about how to manage the incident.”(Google SRE Workbook) Gett, a global leader in urban mobility and corporate travel tech, knew that relying on its incumbent paging system and siloed manual processes for incident management was no longer sustainable. Any delay in response and service restoration could jeopardize customer satisfaction and business continuity.

DevOps - Roles and Responsibilities

As DevOps grows within the tech industry, it continues to play a vital role in modern software development by bridging the gap between development and operations. DevOps engineers juggle a wide range of tasks in their daily life, combining coding, automation, system management, and team collaboration. In this blog, we’ll explore their core responsibilities, highlight essential best practices, and show how solutions like OnPage can help streamline their workflows.

April 2025 Update - Fully Redesigned Signl Center, Shift Tiers with Escalations, AI Shift and Duty Scheduling, and a new Chat View for the Mobile App

With our latest April update, we are setting a new benchmark in incident management excellence. The Signl Center in our web portal has undergone a major redesign, delivering a superior, more intuitive layout, enhanced tracking of notifications and escalation workflows, and an upgraded incident chat — redefining how operations and maintenance teams coordinate under pressure.

How to get alerted when your EC2 instance shuts down

Some of your most critical infrastructure runs on AWS EC2, so it's pretty damn important to know when your EC2 instances shut down. Sure, chances are someone in your organisation will start kicking and screaming within 30 minutes of a particularly important instance shutting down, but we can do better than that. When it comes to monitoring and customers (whether inside your org or outside), being proactive wins you a lot of points.

A Guide to OpenTelemetry Tracing in Distributed Systems

Understanding what’s happening inside your applications is key to keeping them performing well and reliably. OpenTelemetry tracing is an open-source, flexible solution that lets you monitor your distributed systems without locking you into a specific vendor. reliably This guide walks you through everything you need to know about OpenTelemetry tracing, from the basics to more advanced techniques, with practical tips for troubleshooting common issues along the way.

Apache Tomcat Performance Monitoring: Basics and Troubleshooting Tips

When Java web applications experience slowdowns or crashes, the culprit is often the Tomcat server. For DevOps engineers overseeing critical applications, proactive monitoring is crucial for ensuring optimal performance and reliability. In this guide, we'll explore the essential aspects of monitoring Apache Tomcat servers, focusing on the key metrics to track, setting up robust monitoring systems, and troubleshooting common performance issues that could impact your application’s stability.

Extra Factor Authentication: how to create zero trust IAM with third-party IdPs

Identity management is vitally important in cybersecurity. Every time someone tries to access your networks, systems, or resources, it’s critical that you are verifying that these attempts are valid and legitimate, and that they match a real, authenticated user. The way that this tends to be handled in cyber security is through Identity and Access Management (IAM), most commonly by using third-party Identity Providers (IdPs).

How Often Has GitHub Gone Down? A Data-Backed Look at 2024 Outages

GitHub, a platform offering version control and collaboration services for software development, plays a pivotal role in managing code, tracking issues and pull requests, and deploying software. As millions of developers and businesses rely on GitHub's infrastructure, its reliability is crucial. Tracking GitHub's outages and understanding their frequency is essential, particularly for organizations that depend on the platform for critical processes.

Industry Recognition Validates Resolve's Leadership in Agentic Automation

In the fast-moving world of IT operations, Gartner’s research provides critical insight into where the market is heading, and which vendors are leading the charge. In the past year, Resolve earned three powerful validations of its innovation and impact: These distinctions reflect more than technical capabilities. They reinforce Resolve’s mission to drive a new era of intelligent, autonomous IT orchestration through agentic automation.

Cribl and Palo Alto Networks Launch Partnership with Cortex XSIAM Integration

Cribl’s powerful data processing engine is designed specifically for IT and Security teams, enabling organizations to take control of their ever-growing data volumes. By simplifying the management, processing, and analysis of telemetry data, such as logs, metrics, and traces, generated across complex digital environments. This empowers organizations with the choice, control, and flexibility to manage and analyze data, allowing them to adapt to evolving needs and strategies.