Operations | Monitoring | ITSM | DevOps | Cloud

Drive continuous improvement with shareable postmortems in Opsgenie

It’s a given that customers expect software and IT services to be high-performing and always on. And, because incidents and downtime will always be a thing, we believe that how you respond can make or break the customer experience. We’ve learned this lesson first hand while refining our own incident management process over the last decade.

Lifting the Index Size Limit of Prometheus with Postings Compression

Prometheus’s TSDB (TimeSeries DataBase) stores the recent data in the memory and the old data on persistent storage in the form of blocks. Each block has its own index to map the series to the actual chunks that contain the data samples. During Google Summer of Code 2019, I mentored Alec Wang throughout this work on lifting the size limitations of the index mentioned above. The work described below is up for review and should be merged soon.

Run your tests in parallel with Codefresh and Knapsack Pro

One of the most well-known problems when it comes to testing applications is the amount of time required by all test suites. Integration tests, in particular, are usually very slow to execute and depending on the type of application, several minutes (or even hours in extreme cases) are needed in order to get the final execution result. You can reduce the test execution time with several techniques, but one of the most effective methods is running your tests in parallel.

It Came From Below

I’m going to assume most people who read this blog are familiar with PagerDuty. But just in case anyone isn’t, PagerDuty is a tool we use in IT to notify us if some predefined check has failed. Maybe a key process has died or maybe we’re not seeing our expected traffic volume or maybe our server has stopped responding to ping. Whatever it is, PagerDuty will relentlessly, remorselessly, and loudly notify whoever is on call that something needs attention.

Extending the Competitive Advantage in Telecom

The telecom industry has always seemed to navigate well through tech changes. As the industry has evolved, it’s managed to transform from landline to mobile carriers, then from voice calls to messaging and data-centric networks. In many developed markets telcos are creating ecosystems for the data-driven economy. The next frontier is shaping up to be one driven by machine learning (ML) and artificial intelligence (AI).

Using Jaeger with Eclipse Che

As explained on the Eclipse Che website, “Che brings your Kubernetes application into your development environment and provides an in-browser IDE, allowing you to code, build, test and run applications exactly as they run on production from any machine”. However when deployed in your production environment, those same applications can be monitored using observability tools to understand their performance to help inform future improvements.

Gartner Symposium 2019 Top Trends Impacting IT Infrastructure and Operations Management

OpsRamp was a sponsor of Gartner Symposium in Orlando last week, where CIOs and top executives gathered to share knowledge on the changing role of IT operations, DevOps adoption, and anything to do with cloud migration, monitoring, and management. Gartner analysts and researchers presented on several trends in the world of IT operations, and 10 of them offered key insights into the direction of our industry:

How to Read, Search, and Analyze AWS CloudTrail Logs

In a recent post, we talked about AWS CloudTrail and saw how CloudTrail can capture histories of every API call made to any resource or service in an AWS account. These event logs can be invaluable for auditing, compliance, and governance. We also saw where CloudTrail logs are saved and how they are structured. Enabling a CloudTrail in your AWS account is only half the task.