Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Are Your Data Centers and IT Closets Prepared for the Next CrowdStrike Event?

On July 19, 2024, a major IT disaster struck when a CrowdStrike update caused widespread chaos. CrowdStrike, a cybersecurity firm, inadvertently pushed a faulty “sensor configuration update” for its Falcon Sensor software. This update caused 8.5 million Windows devices to crash. The impact was severe, affecting airlines, banking systems, and healthcare networks, and the recovery process was laborious, requiring manual intervention for impacted devices.

Make Smarter AI Apps with RAG with Joey DeVilla

Join @JoeyDeVilla as he explores retrieval augmented generation (RAG). Learn how this AI technique combines machine learning models with external information retrieval to enhance accuracy. Using a Jupyter notebook and Star Wars characters, Joey explains RAG's principles and demonstrates live coding examples with LangChain and OpenAI. Get hands-on insights into solving hallucinations and outdated information issues with RAG.

Is it possible for data centers to be eco-friendly? #datacenter #datacenters #datacentersolutions

Data centers are major energy consumers, emitting CO2 equivalent to 13 million cars in North America alone. Emissions arise from primary operations and auxiliary systems like cooling and backup generators. Inefficiencies in IT equipment management and a lack of detailed emissions analysis hinder targeted reduction strategies. So, is it possible to make them more sustainable? The short answer is, yes.

How our data team handles incidents

Historically, data teams have not been closely involved in the incident management process (at least, not in the traditional “get woken up at 2AM by a SEV0” sense). But with a growing involvement of data (and therefore data teams) in core business processes, decision making, and user-facing products, data-related incidents are increasingly common, and more important than ever.

DCIM tools are expensive. How is @hyperviewhq making it more accessible to everyone? #datacenter

@hyperviewhq CEO explains why customers refer to Hyperview as the '@tesla of the DCIM software. It's sold direct to customers, pricing is fully transparent, and artificial intelligence is built into the foundation of the platform.

How role-based access control (RBAC) works in Gremlin

Reliability testing and Chaos Engineering are essential for finding reliability risks and improving the resiliency of systems. Gremlin makes it easy to do so, but not every engineer needs access to the same experiments, systems, or services. That’s why we released customizable role-based access controls (RBAC), letting Gremlin customers control which actions your users can perform in Gremlin.