Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Introducing Cloud Cost Intelligence for Snowflake

Here at CloudZero, we work with some of the top software-driven companies out there. Like us, they’re building their products on Amazon Web Services (AWS), along with whatever best-of-breed providers meet their needs. It’s no secret that in recent years, Snowflake has seen — well, some serious success. For many companies, including CloudZero, they're the data warehouse provider of choice — and an essential component of delivering their products.

How to monitor your AWS servers via MetricFire

In this article we explore the basics of monitoring Amazon Web Services (AWS) by feeding metrics to Grafana through Hosted Graphite’s agent and also through Hosted Graphite’s AWS add-on. This will allow us to monitor metrics from applications and servers hosted in AWS with clarity and depth. This article assumes you have created a Hosted Graphite account.

What Is Interconnection And Why Is It So Important To Enterprises?

Enterprise network connectivity has evolved in line with changing business needs over the last few decades and as we saw with the sudden shift to remote working in 2020, the evolution cycle is speeding up in response to environmental change. This makes interconnection more important than ever to the modern enterprise.

Tyler Wells on building a culture of reliability at Twilio

What does reliability look like at a company that has thousands of employees and provides critical communication services to over 150,000 customers? We talked with Tyler Wells, Senior Director of Engineering at Twilio, to learn how he and his team created a culture of reliability at Twilio. He talked in depth about his experiences developing reliability goals, building reliability practices, and aligning engineering teams on these objectives.

Achieving the Observability Imperative Requires AI

The shift to Observability Over the last six months, unified monitoring, log management, and event management vendors have reoriented their technology portfolios (often without any change to the underlying functionality) towards Observability. In so doing, a fair amount of confusion has been generated in the market.

The Future of Kubernetes on DevOps Radio

In this episode of DevOps Radio, Shipa’s CEO and Founder Bruno Andrade joins host Brian Dawson to discuss his thoughts on the future of Kubernetes. DevOps Radio is a CloudBees-sponsored podcast series. Hosting experts from around the industry, the show dives into what it takes to successfully develop, deliver and deploy software in today’s ever-changing business environment. From DevOps to Docker, each episode features real-world insights and a few stories, tips, industry scoop and more.

How to build your own incident management process

IT incident management is a fundamental operational process designed to ensure rapid service restoration. This process is typically assigned to the help desk but is also very much entrenched in the day-to-day of DevOps. When incident management goes right, service is restored quickly and the impact on productivity, continuity, and customer satisfaction is minimal.

7 Tips On Building And Maintaining An SRE Team In Your Company

In today's "always on" world, Reliability is a primary business KPI. Plant the culture of Reliability by implementing these 7 simple tips to build a solid SRE team in your organization. Many of today’s hottest jobs didn’t exist at the turn of the millennium. Social media managers, data scientists, and growth hackers were never heard of before. Another relatively new job role in demand is that of a Site Reliability Engineer or SRE. The profession is quite new.

Take the first step toward SRE with Cloud Operations Sandbox

At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues faster, increase release velocity and improve service reliability.

Level Up 2020 Highlights

Hear from LogicMonitor leadership on some of the biggest announcements and additions to the LM product suite in 2020. We release an array of features that allow IT and Dev Ops teams to have full visibility into every corner their infrastructure, and with the addition of LM Logs we're on a mission to provide an extensible, fully unified observability platform.