Operations | Monitoring | ITSM | DevOps | Cloud

SCOM 2022: What to expect and what you need to know

Microsoft System Center Operations Management team has been busy with new features and updates for SCOM. The most recent version, SCOM 2019, came out on March 14th, 2019. The release date for the upcoming SCOM 2022 has been announced to be globally available in Q1 2022. This blog post will provide an overview of some of the features you can expect to see in SCOM 2022 and how those changes may affect your experience of SCOM monitoring as a user or customer.

Strengthen Your Cloud Ops with Preventive Healing

The cloud is driving enterprise digital transformation. Gartner predicts that by 2026, public cloud spending will exceed 45% of all enterprise IT spending, a 2.5x growth from 2021. Enterprises globally are accelerating application modernization, embracing the cloud. This is giving rise to a few key trends. Software-as-a-Service (SaaS) adoption is on the rise. So, organizations are using applications whose implementation/infrastructure they have little or no control over.

The Persistent Threat of Downtime in Banking and How to Solve it

At 8:54 pm on November 1, 2020, a customer of HDFC bank complained on Twitter that the bank’s services like internet banking and ATMs were down. More customers started raising similar issues over the next couple of hours, saying that UPI, credit card, and debit card transactions weren’t working either. Finally, at 11:55 pm, the bank confirmed that one of their data centers faced an outage. “Restoration shouldn’t take long,” they promised.

Seven Critical Capabilities to Look for in an AIOps Tool

In 2017, McAfee found that an average enterprise uses 464 custom applications. A large enterprise — a company with over 50,000 employees — uses 788 custom apps! The more applications you have, the more complex your application environment is. This means that you are more susceptible to outages. So, the tolerance for downtime is impossibly low. Mission-critical applications must be available at all times.

Observability in Practice

After years of helping developers monitor and debug their production systems, we couldn’t help but notice a pattern across many of them: they roughly know that metrics and traces should help them get the answers they need, but they are unfamiliar with how metrics and traces work, and how they fit into the bigger observability world. This post is an introduction to how we see observability in practice, and a loose roadmap for exploring observability concepts in the posts to come.

AWS Fargate Monitoring

How do you perform AWS Fargate monitoring? Today, we’ll discuss the background of AWS Fargate and using Retrace to monitor your code. As companies evolve from a monolithic architecture to microservice architectures, some common challenges often surface that companies must address during the journey. In this post, we’ll discuss one of these challenges: observability and how to do it in AWS Fargate.

Updated ELK Stack Guide For 2022 (Installation, Tutorials & More)

The ELK Stack has millions of users globally due to its effectiveness for log management, SIEM, alerting, data analytics, e-commerce site search and data visualisation. In this extensive guide (updated for 2021) we cover all of the essential basics you need to know to get started with installing ELK, exploring its most popular use cases and the leading integrations you’ll want to start ingesting your logs and metrics data from.

The eCommerce Holiday Calendar for DevOps

Seasonal spikes in consumer activity are expected, if not depended on, by online retailers throughout the calendar year. However, as shoppers rush to compete over door-buster deals and order holiday must-haves, web traffic escalates to levels standard resource allocation cannot easily sustain. This spike in traffic can lead to unresponsive checkouts, lost or abandoned carts, and slow-loading pages, ultimately resulting in thousands of dollars in lost revenue.

3 Ways Ops Teams Benefit From LM Logs

Sifting through logs in real-time or post-mortem to pinpoint the problem can take hours – and is often like trying to find the needle in the alert/log haystack. Further, keeping the troubleshooting process efficient can be a challenge due to context switching and relying on manual interpretation of events and technology-specific knowledge.