Operations | Monitoring | ITSM | DevOps | Cloud

Blog

Cloud Atlas, Episode 3: The Big Bang

An easy way to understand what the early cloud did is to think of it like a public utility. The same way buildings depend on a common set of utilities — gas, electricity, and water — software projects depend on a common set of services: compute, storage, and database. “Compute” refers to the power it takes to run the software.

Monitor OpenAI API and GPT models with OpenTelemetry and Elastic

ChatGPT is so hot right now, it broke the internet. As an avid user of ChatGPT and a developer of ChatGPT applications, I am incredibly excited by the possibilities of this technology. What I see happening is that there will be exponential growth of ChatGPT-based solutions, and people are going to need to monitor those solutions.

eBPF Explained: Why it's Important for Observability

eBPF is a powerful technical framework to see every interaction between an application and the Linux kernel it relies on. eBPF allows us to get granular visibility into network activity, resource utilization, file access, and much more. It has become a primary method for observability of our applications on premises and in the cloud. In this post, we’ll explore in-depth how eBPF works, its use cases, and how we can use it today specifically for container monitoring.

Monitoring and troubleshooting - Apache error log file analysis

Your Apache HTTP server access and error logs contain a wealth of actionable insights about potential server configuration and web application issues. The problem is that this information is hidden within millions of log messages, so you need analytics to efficiently extract these insights so you can respond to problems before they impact your users. Apache log analysis revolves around two activities: monitoring and troubleshooting.

Chaos Engineering Tools: Myth vs Fact

With so many Chaos Engineering tools available, it’s no surprise that SRE and platform leaders are doing their homework when choosing a platform to help them build and scale their Chaos Engineering programs. But like anything else you can research on the internet, there’s a lot of noise and hype that you need to wade through. Gremlin works with Reliability Engineering teams at hundreds of companies with the most sensitive workloads—and has since 2016.

Create a service catalog that grows with you

When your incident response process is centered around a service catalog, responders are able to more quickly pinpoint the service or functionality that’s down, bring in the team or experts, and then get to solving the problem faster. Saving even a few minutes can have a big impact on decreasing the costs around incidents and outages, so having up-to-date service details at your fingertips can make all the difference.

What is log management in DevOps?

DevOps teams are used to working with data that is spread out across lots of different systems and environments. In organizations that have achieved tight collaboration with security teams to transition to DevSecOps, this is even more true! Log management is part of how all these teams keep track of information and make vital business decisions. It’s important to take a moment to understand what is meant by log management.

Managed vs. Unmanaged Disks in Azure

Microsoft Azure is a leading cloud computing platform offering a wide range of services to cater to the needs of businesses across various domains. One of the popular services is Azure Storage, which allows organizations to store, access, and manage their data in a secure and scalable manner. When it comes to deploying virtual machines (VMs) in Azure, organizations need to make a critical decision between Managed and Unmanaged Disks.

The Ultimate Guide to Onboarding New Work-From-Home Employees

Does it not strike you as strange that even after the world has coped with the COVID crisis, work from home did not end? We shall not examine the reasons behind the continuation of work-from-home culture. However, it is important to be cognizant of the fact that work-from-home is here to stay, and organizations need to adapt to it quickly. One of the major concerns for organizations today is onboarding new work-from-home employees.

What is log management in security?

Cyber crimes are expected to cost the world roughly $10.5 trillion per year by 2025, according to Cybersecurity Ventures. And these attacks don’t just cost money. Businesses impacted by these kinds of crimes can expect to experience not only financial losses but also loss of productivity, damage to their reputation, potential legal liabilities and more.