Operations | Monitoring | ITSM | DevOps | Cloud

Jaeger Persistent Storage With Elasticsearch, Cassandra & Kafka

Running systems in production involves requirements for high availability, resilience and recovery from failure. When running cloud native applications this becomes even more critical, as the base assumption in such environments is that compute nodes will suffer outages, Kubernetes nodes will go down and microservices instances are likely to fail, yet the service is expected to remain up and running.

The OpsRamp Monitor: February

Don’t stop thinking about tomorrow: For enterprise IT leaders everywhere, it’s no longer enough to lead well today and have teeth in the business. You must now be prepared for all manner and scope of uncertainty and change, and according to Accenture, very few organizations are there yet. In a recent report, the consultancy reports that only 7% of organizations are “future-ready”.

Defense Department Cybersecurity: All Ahead on Zero Trust

With the Defense Department’s quick and successful pivot to a remote workforce last Spring via its Commercial Virtual Remote (CVR) environment, it proved that the future to fully operate from anywhere in the world is now. Gone are the days of thousands of civilian employees heading into the Pentagon or other installations everyday. However, with this new disparate workforce comes increased risks for network security. As my colleague Bill Wright expertly noted last Summer.

Monitoring DigitalOcean Billing with InfluxDB

I’ve always had a good experience using DigitalOcean, a cloud infrastructure provider which offers developers cloud services that help deploy and scale applications that run simultaneously on multiple computers. I’ve used DigitalOcean a lot for my personal projects — for example, to host my personal blog, its stats, and a NextCloud instance, all running in Kubernetes.

NiCE VMware Management Pack 5.3

Virtualization is part of many IT environments and a very effective way to reduce expenses while boosting efficiency and flexibility. The NiCE VMware Management Pack enables advanced health and performance monitoring for VMware to leverage your existing investment, reduce costs, save time, and build efficiencies that will help shape a future-proof business.

Use Datadog geomaps to visualize your app data by location

Being able to track and aggregate data by region is important when monitoring your application. It can provide visibility into where errors and latency might be occurring, where security threats might be originating, and more. Now, you can use Datadog geomaps to visualize data on a color-coded world map. This helps you understand geographic patterns at a glance, including where users are experiencing outages, app revenue by country, or if a surge in requests is coming from one particular location.

How to Find Memory Leaks in Websites and Web Applications

Knowing how your users interact with your web application and how they experience it is crucial to provide the best possible experience. So what do you need to know? Start with metrics such as page load times, HTTP request times, and core Web Vitals – time to the first byte, first contentful paint. If you use Sematext Experience you’ll see a number of other useful metrics for your web applications and websites there. However, metrics themselves are only a part of the whole picture.

Single-Tenant Cloud vs Multi-Tenant Cloud

In this article, we shall talk about the advantages and disadvantages of single-tenant cloud and multi-tenant cloud. So let us get started! In the past decade adoption of cloud computing has been off the charts. For a long time most companies (primarily enterprises) managed their own IT infrastructure and they could reap the benefits of isolation, privacy and greater management control. This is what is known as a single tenant cloud architecture i.e.

Four ways to send SCOM alerts to ServiceNow

If you work with Microsoft System Center Operations Manager (SCOM) and ServiceNow then you will be familiar with the fear of missing a critical infrastructure alert! But fear no more, we have just the ticket! Imagine if you could get these two tools working together, to fully synchronize your alerts and incidents for the lifetime of an issue – you’d be living the ITSM dream, right! So, here are our top four methods for making this dream a reality.

Overview of Incident Lifecycle in SRE

Incidents that disrupt services are unavoidable. But every breakdown is an opportunity to learn & improve. Our latest blog is a deep dive into best practices to follow across the lifecycle of an incident, helping teams build a sustainable and reliable product - the SRE way As the saying goes, “Every problem we face is a blessing in disguise”.