Operations | Monitoring | ITSM | DevOps | Cloud

3 Major Ways To Improve AWS Lambda Performance

This piece was originally three different blogs but is now one. In this piece, we lay out three ways you can improve your AWS Lambda performance. So much has been written about Lambda cold starts. It’s easily one of the most talked-about and yet, misunderstood topics when it comes to Lambda. Depending on who you talk to, you will likely get different advice on how best to reduce cold starts.

10 Tools and Techniques to Test Your IT Infrastructure Resilience

Many of our customers are large enterprises with critical highly-available and secure infrastructures. This means that they spend (as do we) a lot of time proactively investigating and stress-testing systems, indeed we and many other vendors also provide tools within our products to assist in “kicking the tyres”. However small or large your enterprise is though, it’s a methodology and mindset that you can embrace with plenty of free and open-source tools out there to assist you.

Give Monitoring a Shot

If you hang out around a particular segment of the SolarWinds® crowd, you’re likely to hear the story of how monitoring helped one former Head Geek™ score front row tickets to Aerosmith. This is not that story. This story was, however, inspired by that story. The original story involved the aforementioned Head Geek, Destiny Bertucci, using SolarWinds Web Performance Monitor (WPM) to monitor the ticket sales website.

Get started with distributed tracing and Grafana Tempo using foobar, a demo written in Python

Daniel is a Site Reliability Engineer at k6.io. He’s especially interested in observability, distributed systems, and open source. During his free time, he helps maintain Grafana Tempo, an easy-to-use, high-scale distributed tracing backend. Distributed tracing is a way to track the path of requests through the application. It’s especially useful when you’re working on a microservice architecture.

[Report] The 2021 State of Digital Operations Management

It’s no secret to anyone working in technology that IT’s operating world is becoming more demanding and complex. Digital transformation, hybrid working, exponentially increasing data volumes, greater security risks, and expanding global regulations are all driving up business demands and expectations for reliable and robust technology operations.

Overcoming Fear, Anxiety, and Mistrust to Gain Stakeholder Alignment

Most of us have encountered a situation like the following at some point in our careers: something either isn’t working right or could be working better. You’ve been through the process to understand the problem and identify solutions. (Check out my blog post “4 Steps to Efficiently Solve Problems” if you’re still working on understanding the problem.) Now it’s time to pick a solution—and you can’t get stakeholders to agree on one.

Introducing advanced user management for large teams

If we look at the number of sites that our users monitor, we can split our user base into two large groups. Teams in the first group only monitor one or a couple of sites. The second group monitors 30 or more sites. We've just launched new features that make user management more flexible for large teams. In this blog post, we'd like to tell you all about it.

Sites can now be grouped

Our users sometimes have a large number of applications that are being monitored by Oh Dear. Some of these applications are related to each other. Think for instance of a marketing site and an API that are part of the same application. To better emphasise that some of the things that are monitored are related, you can now use groups. When you start monitoring a site at Oh Dear, you can now optionally specify a group name.

OpenTelemetry Trace 1.0 is now available

For decades, application development and operations teams have struggled with the best way to generate, collect, and analyze telemetry data from systems and apps. In 2010, we discussed our approach to telemetry and tracing in the Dapper papers, which eventually spawned the open-source OpenCensus project, which merged with OpenTracing to become OpenTelemetry.

Monitor these Metrics to Keep your Servers Controlled

If we look at server definition, it is a piece of computer software or hardware that provides functionality to other devices or programs called clients. System administrators often come up with a common question over the performance of a server – Why is my server down? If server monitoring and management are inefficient, it often makes it very difficult to correctly analyze complex and unpredictable information in a data center. It’s hard to find a reason for server outage.