Operations | Monitoring | ITSM | DevOps | Cloud

Infrastructure as Code - IAC for Azure

Infrastructure as code and automating deployment and scale-up/down in Azure is becoming the new normal. Solution architects and system administrators are becoming coders and scripting is becoming part of their day-to-day job, whilst in parallel a raft of vendors is providing products to try and help avoid this need to script and address the shortage of staff with those skills to script and code this now necessary functionality.

How Uptime.com Can Help Troubleshoot a Server Outage

Everyone has heard about the 3 AM wakeup call, but what about those troublesome issues that dig at your team and eat away at your SLA hours? Hard-to-diagnose issues can strike at any time. They leach from your team, hurt morale, impede the customer experience… it’s just a whole mess. These kinds of incidents are ones that test what “response” really means to your organization, as fixing them is not always a simple task. Something has gone wrong.

Logging Agents Vs Log Libraries

Log management has been around for a long time, but how we manage our logs has changed profoundly over the years. For effective log management, there are times when you may have to trade off the new for the old, and vice versa. A clear understanding of log agents and log libraries will help assess what works best for different applications and infrastructures.

Root cause analysis using Metric Correlations

As complexity of systems and applications continue to evolve and change, the number of metrics that need to be monitored grows in parallel. Whether you’re on a DevOps team, an SRE, or a developer building the code yourself, many of these components may be fragmented across your infrastructure, making it increasingly difficult to identify the root cause when experiencing downtime or abnormal behavior.

Assign Read-Only Access to Users in Logz.io

Cloud monitoring and observability can involve all kinds of stakeholders. From DevOps engineers, to site reliability engineers, to Software Engineers, there are many reasons today’s technical roles would want to see exactly what is happening in production, and why specific events are happening. However, does that mean you’d want everyone in the company to access all of the data?

Full visibility of Microsoft Azure cloud service health - resolve issues before they impact on your customers.

For large digital enterprises Microsoft Azure and private cloud offering Azure Stack Hub have emerged as the strategic cloud platforms of choice for many organizations. Azure offers an open and flexible platform on which to quickly build, deploy and manage applications at scale.

Smarter CPU Testing - How to Benchmark Kaby Lake & Haswell Memory Latency

Modern CPUs are complex beasts with billions of transistors. This complexity in hardware brings indeterminacy even in simple software algorithms. Let’s benchmark a simple list traversal. Does the average node access latency correspond to say, a CPU cache latency? Let’s test it! Here we benchmark access latency for lists with a different number of nodes. All the lists are contiguous in memory, traversed sequentially, and have a 4 KB padding between the next pointers.

An Introduction to Distributed Tracing

There’s no strict definition of a distributed system. But generally speaking, if you have reached a point where you’re running more than five interdependent services at once, that means you’re running a distributed system. It also means you are more than likely experiencing difficulties when troubleshooting using traditional debugging tools. Unfortunately, pulling up multiple tools, each built for a monolithic world, doesn’t help pinpoint the problem.

Incident Review - Akamai Performance Degradation Slows Down Major Websites Worldwide

This summer has seen a series of outages and performance degradations from some of the world’s most widely used CDNs, including the June 8, 2021 Fastly outage (owing to DNS or configuration issues) and an Akamai outage on July 22, 2021 (also likely caused by DNS failure).