Operations | Monitoring | ITSM | DevOps | Cloud

Improving Resilience for GenAI Workloads on AWS

GenAI can do incredible things, but like any technology, its success depends on how we implement and use it. Without proper implementation, GenAI failures can pose significant risks to your organization's reputation and customer trust, leading to real financial impact. And like any other application, regulatory rules, SLAs, and reliability standards still apply to GenAI. With more companies integrating GenAI into their systems and products, it’s essential to make sure GenAI workloads and applications are highly available to deliver an exceptional user experience.

AI Agents: Hype or Reality?

A few years ago, it was all about Blockchain; before that, IoT, then Big Data, and even earlier, the Cloud. Each era brought a paradigm shift of sorts, drawing huge investments and promises. Some delivered, some didn’t, but they each brought advancement in tech. Today, we find ourselves fully embracing the AI hype cycle that started circa 2022 with OpenAI.

Beyond the Hype Blog Part 2 - DeepSeek and Other AI Models

The recent introduction of the DeepSeek R1 (DeepSeek) Large Language Model (LLM) has shaken up the AI landscape, suggesting that new low-cost and open-sourced providers could enter the market. This disruption creates huge opportunities for service providers to drive innovation and for their vendors and suppliers to enhance or innovate in economically feasible ways.

It's time for a new approach: Edwin AI solves ITOps biggest challenges with agentic AI

For years, the term “AIOps” has been tossed around, but for IT teams, it hasn’t really brought the change it promised. Gartner coined the term, promising that machine learning and AI would forever change how we manage IT operations. Yet, the reality has been underwhelming. For most teams, traditional AIOps has amounted to little more than event management with a shiny new label.

The Role of Facial Recognition Cameras in Modern Surveillance AI Technologies

Facial recognition cameras have rapidly emerged as one of the most advanced tools in modern surveillance AI technologies, transforming security measures across industries. These intelligent systems integrate artificial intelligence with real-time data processing to identify individuals with remarkable accuracy, enhancing law enforcement, border control, and corporate security. Surveillance systems that leverage facial recognition cameras provide unmatched capabilities in monitoring and streamlining identity verification, making them an indispensable asset in today's security landscape.

Three reliability best practices when using AI agents for coding

One of the biggest causes of outages and incidents is good old-fashioned human error. Despite all of our best intentions, we can still make mistakes, like forgetting to change defaults, making small typos, or leaving conflicting timeouts in the code. It’s why 27.8% of unplanned outages are caused by someone making a change to the environment. Fortunately, reliability testing can help you catch these errors before they cause outages.

AI Governance in 2025: A Full Perspective on Governance in Artificial Intelligence

In a world where artificial intelligence (AI) is leaping forward — growing at a CAGR of almost 36% from 2024 to 2030 — questions about governance and ethics with the use of AI are surfacing. As humans continue to develop AI systems, it is crucial to establish proper guidelines to ensure powerful technologies like generative AI and adaptive AI are used in a responsible manner.

How to avoid blowing the budget on Azure AI

So you had a great day playing with really awesome new tech, solving big business challenges, and feeling like you really nailed it. Then you wake up the next day to an alert from Azure telling you you've blown your monthly budget and its only the first week of the month. We've all been there... right? Using any cloud service comes with a cost, but for most services the budget risk is low. Cost calculated daily isn't a problem when usage is predictable, but not everything works like that.

How to Achieve Ethical Quality Assurance (QA) for Your Software Using Artificial Intelligence (AI)

As the use of artificial intelligence (AI) for software testing and quality assurance (QA) becomes increasingly prevalent, there are ethical considerations that must be addressed to ensure fairness, transparency, and accountability.

Graylog Parsing Rules and AI Oh My!

In the log aggregation game, the biggest difficulty you face can be setting up parsing rules for your logs. To qualify this statement: simply getting log files into Graylog is easy. Graylog also has out-of-the-box parsing of a wide variety of common log sources, so if your logs fall into one of the many categories of log for which there is either a dedicated Input; a dedicated Illuminate component; or that uses a defined Syslog format; then yes, parsing logs is also easy.

Weaving AI into SIGNL4

Over the past two years, artificial intelligence (AI) has experienced remarkable growth, significantly influencing various sectors and daily life. In 2023, the release of advanced large language models (LLMs), such as OpenAI’s GPT-4 and Google DeepMind’s Gemini, marked a pivotal shift by enabling AI systems to process and generate diverse data types, including text, images, and audio.

Empowering DevOps Teams: Overcoming IT Complexity with Advanced AI + Automation

As IT environments become more complex, larger, and inundated with data, DevOps teams encounter significant obstacles that make efficient operations more challenging. The heightened complexity can create difficulties in maintaining visibility and control across hybrid IT ecosystems. Additionally, the substantial volume of data generated can overwhelm resource-constrained DevOps teams, making it difficult to extract valuable insights and make informed decisions.

Operational excellence in the age of AI and Automation

The future of operations is here with PagerDuty's groundbreaking AI and automation innovations. Learn how PagerDuty AI agents, powered by PagerDuty Advance, and new use cases like security incident management and LLMOps can help your organization achieve operational excellence to reduce cost, mitigate the risk of outages, and accelerate innovation.

The One Where We Meet Cribl Copilot

We’re kicking off our new live weekly product demo series—streaming on YouTube, X, and LinkedIn! Each week, we’ll dive into the latest features and hidden gems from the Cribl Suite of tools to help you unlock the full potential of your telemetry data. For our first session, we’re thrilled to welcome Nikhil Mungel, the visionary behind Cribl Copilot. This AI-powered assistant is designed to: Instantly surface answers from the documentation Build pipelines with just a simple request.

How to make your AI-as-a-Service more resilient

When you think about “AI reliability,” what comes to mind? If you’re like most people, you’re probably thinking of generative AI model accuracy, like responses from ChatGPT, Stable Diffusion, and Sora. While this is certainly important, there’s an even more fundamental type of reliability: the reliability of the infrastructure that your AI models and applications are running on. AI infrastructure is complex, distributed, and automated, making it highly susceptible to failure.

How AI is impacting Africa's connectivity landscape

Artificial Intelligence (AI) is reshaping industries worldwide, and Sub-Saharan Africa is no exception. Across the region, governments, businesses, and start-ups are recognising the potential of AI to drive economic growth, improve efficiencies, and enhance decision-making. Yet, as AI adoption accelerates, so does the demand for robust digital infrastructure, including high-performance computing, data centres, and connectivity.

Kubernetes for AI Workloads

Kubernetes has been facilitating container orchestration for around a decade for both stateful and stateless application workloads. With the recent rise of AI and the advent of tools like Kubeflow and Argo Workflows, Kubernetes is also becoming a first-class citizen when it comes to running AI workloads. When you are training a model on K8s, you may be tweaking many parameters and have to test each of them one by one.

Optimizing Observability Data Volume and Cost with AI

Struggling with high observability costs? In this video, Jade Lassery breaks down the challenges of managing excessive data and skyrocketing expenses. She introduces the Logz.io AI agent, a powerful solution designed to optimize data usage, reduce unnecessary costs, and improve efficiency. Learn how to take control of your observability spending while maintaining high performance. Watch now to discover smarter data management strategies!

Troubleshoot Kubernetes Performance Issues with AI

Struggling with Kubernetes performance issues? This video introduces an AI-powered agent designed to help users quickly identify and resolve bottlenecks. By analyzing logs, the AI detects performance issues, streamlining troubleshooting and improving system efficiency. Watch now to see how AI can simplify Kubernetes performance management and keep your infrastructure running smoothly!

CI/CD requirements for generative AI

CI/CD for generative AI applications presents unique challenges in model deployment, testing, and monitoring. Unlike traditional software applications, generative AI systems involve large model artifacts, complex dependencies, and specialized hardware requirements, making a sophisticated CI/CD pipeline essential for reliable delivery. As organizations embrace generative AI technologies, the need for specialized CI/CD solutions becomes critical.

AI Wearables: Why Startups Have the Advantage Over Big Tech

Big tech has the resources, but startups have the real advantage in AI wearables: speed, agility, and the freedom to take risks. Right now, the AI wearable market is in the wildcard phase—no dominant device, no set form factor, and no clear winner. That’s a massive opportunity for smaller teams that can move fast, test in the field, and refine in real time. Unlike big tech, startups don’t need a five-year roadmap. They can launch quickly, experiment aggressively, and pivot without worrying about shareholders.

Meta's Big Bet on AI Wearables

Meta is making a massive push into AI wearables, with at least six new devices launching in 2025. But here’s the catch—this wasn’t originally about AI. Meta built its hardware for the metaverse, only to find itself at the center of the AI revolution. With over 1 million Ray-Ban smart glasses already sold (and a goal of 5 million in 2025), it’s clear there’s demand. But can Meta actually scale this initiative from within, or will they lean on brand partnerships like Oakley to expand?

How Finance Teams Are Using AI To Drive Profitability

It’s getting increasingly difficult to both be a conscious human being with an internet connection and to be unaware of AI. From Jamie Dimon’s bullish stance to Elon Musk’s dire predictions to the art world’s raging debate (and uncanny experiments) over whether it can ever be used ethically, AI has an iron grip on our collective imagination, and businesses are scrambling to outspend each other on the way to making it drive sustainable profit.

Finding Root Cause Quickly with Logz.io AI Agent

In the video, Jade Lassery discusses how to effectively manage complex environments, especially when faced with unexpected spikes in errors. She introduces a Logz.io AI agent prompt that assists users in quickly identifying the root cause of these issues. By simply asking the right questions, users can streamline their troubleshooting process and enhance their operational efficiency.

What Developers Can Learn from EdTech's AI Revolution to Transform User Experience

EdTech platforms are changing. Thanks to artificial intelligence (AI), the learning landscape is experiencing massive innovations. Personalization and gamification are among the most apparent changes. Learners can now also receive real-time and accurate feedback. In turn, such leads to more dynamic and engaging experiences. They redefine education and set a new standard for user-centric design. Developers across industries can learn from these advancements.

How are your staff using AI? Why you need a company AI policy

As generative AI tools like ChatGPT and Gemini continue to revolutionize the way we work, offering benefits such as increased efficiency and productivity, their adoption has seen a significant surge in workplaces throughout 2024, with 75% of employees globally reporting that they used AI tools at work, according to Gartner.

How to leverage AI to enhance network monitoring in retail: A CXO's guide

The retail industry has evolved into a mix of physical stores, e-commerce, digital payments, and omnichannel interactions. Now, GenAI has been added to this mix, which changes how people shop, how retailers operate, and how employees work. While this shift creates opportunities for retailers of all sizes, it also presents serious challenges in maintaining network performance and staying compliant with industry regulations.

Five Key Benefits of Creative Fabrica's AI Font Generator: Transform Your Typography

The field of digital creativity is dynamic and innovative. New ideas, trends, and digital tools emerge daily, rewriting the market rules and posing new challenges for creators. Graphic designers, marketers, crafters, and artists have to adapt quickly. Often, the capacity to create custom visuals and fancy fonts can significantly enhance their projects and boost their competitive advantage. This is where Creative Fabrica's Font Generator steps in-an advanced font editor powered by AI. This font style generator utilizes AI (artificial intelligence) and offers numerous advantages that surpass conventional font design methods.

Maximizing B2B Sales Efficiency with AI-Powered Outreach

Sales outreach will become easier, smarter, and more scalable through increased artificial intelligence processing. Traditionally, B2B sales outreach has been a manual process of cold calling, follow-up emails, even qualifying to see who within a company is the right person to pitch to. In an analog non-AI world, it takes time and effort to reach out. But on the contrary, with AI reaching out, it doesn't have to be like that anymore. The outreach impact happens in a different way for the outreach changes the way people engage automation creates effective engagement systems and precise interests and audiences.

AI & Gartner's Strategic Roadmap Timeline for Cybersecurity - A Perspective from Teneo

The integration of artificial intelligence (AI) presents both unprecedented opportunities and emerging threats. Gartner’s Strategic Roadmap for Cybersecurity Leadership emphasizes the need for adaptive strategies that align with business objectives and technological advancements. Concurrently, the UK’s National Cyber Security Centre (NCSC) has highlighted the dual-edged nature of AI in its report on the impact of AI on cyber threats.

AI in Production with GitHub's Sean Goedecke

In this episode, we sit down with Sean Goedecke, Staff Software Engineer at GitHub, to discuss where LLMs fit into real-world development. Sean shares how he’s using LLMs how he’s drawing the line for AI-assistance in the codebases he manages—though, as he says, this might all change by next summer. Sean also weighs in on how LLMs could assist SREs during outages—especially when you’re only half-awake at 3 a.m. after a rather inconvinient page.

Can AI Help with Writing IT Blog Posts?

Blogging is essential for IT professionals to share knowledge, explain complex topics, and establish thought leadership. However, writing clear, engaging, and technically accurate blog posts can be challenging, especially when dealing with highly specialized subjects. Many IT experts are skilled at problem-solving and coding but may struggle to translate their insights into easy-to-read content. AI-based writing tools play a major role in this. IT professionals can use cutting-edge technology to create targeted content that is more engaging and hooking for audiences.

How Caching Improves Software Performance: A Developer's Guide

Every millisecond counts in modern software development. Whether you're optimizing a high-traffic web application, a complex SaaS platform, or a resource-intensive AI system, caching is one of the most powerful yet often overlooked techniques to dramatically boost performance. This guide will take developers beyond the basics, diving into real-world strategies, pitfalls, and best practices for implementing caching effectively.

Fine-tuning a Pre-trained GenAI Model - A Complete Guide

Ever been on the receiving end of a useless chatbot response? Imagine asking, “Why is my 5G down in this area?” and getting, “Try restarting your device.” No context. No understanding. No real help. The issue here, however, isn’t bad intent—it’s that the bot doesn’t understand telecom-specific language or service outage patterns. Instead of easing your support load, it’s driving frustrated customers to jam your support lines.

7 considerations when building your ML architecture

As the number of organizations moving their ML projects to production is growing, the need to build reliable, scalable architecture has become a more pressing concern. According to BCG (Boston Consulting Group), only 6% of organizations are investing in upskilling their workforce in AI skills. For any organization seeking to reach AI maturity, this skills gap is likely to cause disruption.

Aiven AI Insights - Ongoing Performance Actionable Insights

The Aiven Platform is more than a collection of open source services for streaming, storing and analyzing data. The platform ensures that all services run reliably and securely in the clouds of your choice, are observable, and can easily be integrated with each other and with external 3rd party tools.

Introducing relaxAI: The smart AI assistant you can trust

We’re excited to launch relaxAI, an AI assistant designed with one paramount focus: your privacy. In a world where AI tools are becoming indispensable but concerns about data usage are at an all-time high, relaxAI has been created as an assistant you can trust by combining cutting-edge AI capabilities with an unwavering commitment to security and transparency.

Digital Asset Management for Game Development Success

Creating a game is like building a massive puzzle; every piece-textures, models, audio files, and scripts-must fit perfectly. Managing these digital assets can quickly become overwhelming without the right tools and strategies. That's where digital asset management (DAM) steps in to save the day.

Building Production-Ready AI Infrastructure: How Megaport and Vultr Are Solving the Enterprise Challenge

In bridging traditional enterprise environments with modern GPU resources, we're helping organizations build AI infrastructure that's truly ready for production workloads. Co-authored by Duncan Ng, Vice President Solutions Engineering, Vultr As enterprises move from AI experimentation to production deployment, most are realizing a fundamental truth: Successful AI adoption requires more than just access to GPU computing power.

The Advanced Data Compression Techniques That Quietly Power Logz.io's AI Observability Agents

As an observability leader, at Logz.io, we pride ourselves on continuous innovation. That’s why, last year, we released our AI agents to revolutionize observability by helping businesses, and their engineering and DevOps teams, automate data analysis and root cause analysis. The primary way in which engineering and DevOps teams interact with the agents is by asking performance, troubleshooting, and optimization-related questions.

AI in Embedded Systems: A Black Box You Must Learn To Control

AI isn’t predictable, it adapts, making embedded engineering even more complex. A model that works in the lab might fail in the real world. So, how do successful teams deploy AI at the edge? A/B test models in the field—controlled environments aren't enough. Collect real-world performance data—observability tools are key. AI deployment isn’t a one-and-done process. It requires constant iteration and real-world validation.

AI in 2025: is it an agentic year?

2024 was the GenAI year. With new and more performant LLMs and a higher number of projects rolled out to production, adoption of GenAI doubled compared to the previous year (source: Gartner). In the same report, organizations answered that they are using AI in more than one part of their business, with 65% of respondents mentioning they use GenAI in one function.

How Sony improved IT incident management with AI-powered context and event correlation

Ben Narramore, Director of Global Operations and Service Management at Playstation, describes the impacts of adopting BigPanda AIOps on Sony’s operations, processes, and workflows. To learn more, watch the full webinar on How Sony expanded AIOps insights to Incident Management teams.

Revolutionizing IT Service Management: AI-Powered Transformation with Ivanti Neurons

Revolutionizing IT Service Management: AI-Powered Transformation with Ivanti Neurons In today's rapidly evolving digital landscape, IT service management (ITSM) is undergoing a radical transformation. Join our exclusive webinar to discover how Ivanti Neurons is leveraging artificial intelligence to redefine IT support, streamline operations, and drive unprecedented efficiency. Key Highlights.

The Modern Data Center: How AI is Reshaping Infrastructure

The traditional data center is undergoing a dramatic transformation. As artificial intelligence reshapes industries from healthcare to financial services, it’s not just the applications that are changing—the very infrastructure powering these innovations requires a fundamental rethinking. Today’s data center bears little resemblance to the server rooms of the past.

The Impact of Technology on Educating the Next Generation of Programmers

Technology is reshaping how aspiring programmers learn, practice, and master coding skills. With rapid advancements in artificial intelligence, cloud computing, and interactive learning platforms, traditional methods of teaching programming are evolving. The shift is not just about access to information-it's about engagement, personalization, and efficiency.

AI in Embedded Systems: A Black Box You Must Control

AI isn’t predictable, it adapts, making embedded engineering even more complex. A model that works in the lab might fail in the real world. So, how do successful teams deploy AI at the edge? A/B test models in the field—controlled environments aren't enough. Collect real-world performance data—observability tools are key. AI deployment isn’t a one-and-done process. It requires constant iteration and real-world validation.

AI-Driven Healthcare: How AIOps is Revolutionizing Medical IT Operations

The healthcare industry is undergoing a massive digital transformation, driven by artificial intelligence (AI) and automation. Among the most promising innovations is Artificial Intelligence for IT Operations (AIOps), which is reshaping how medical IT infrastructures are managed. From private practice billing services to laboratory billing services and patient management software, AIOps is bringing efficiency, accuracy, and cost savings to healthcare organizations.

Generative AI vs. Predictive AI: The Strategic Choice For Business Leaders

2022 was the "Model T" moment for generative AI, with ChatGPT making its grand entrance and shaking up digital transformation the way Ford changed mobility. The excitement and trust in AI are so real that VCs have funneled $3.9 billion into generative AI in Q3 2024 alone. 49% of companies are allocating new budgets to AI initiatives, and firms like Marsh McLennan have already deployed 40+ AI applications in production.

The AI Model Showdown - LLaMA 3.3-70B vs. Claude 3.5 Sonnet v2 vs. DeepSeek-R1/V3

Following all the hype and bluster with DeepSeek’s arrival in the AI landscape––and its ability to crash the poster child of AI’s share value overnight (Nvidia), we wanted to conduct a rigorous evaluation at Komodor. We tested DeepSeek’s models head-to-head against industry leaders in solving real-world Kubernetes challenges.

Solve Problems Faster with New, Smarter AI and Integrations in Splunk Observability

As businesses scale across hybrid and multi-cloud environments and integrate AI-powered technologies, complexity grows — and with it, the risk of performance degradation and cost of downtime. To avoid facing customer-impacting IT issues, organizations need better ways to correlate data across environments, detect anomalies before they escalate, and resolve incidents more efficiently. That’s where Splunk and Cisco come in.

Jekyll and Hyde: Taming AI Security with Automation

AI offers a world of promise for security teams, including potential for advanced threat detection, automated response capabilities, and enhanced data analysis for cybersecurity. But the same technology that supports cybersecurity teams can also be weaponized by threat actors — a true “Good vs. Evil", or “Jekyll and Hyde” scenario.

Edge AI is a Game-Changer for Embedded Devices

AI at the edge is built for embedded systems. And no need for tons of compute power— most of the heavy lifting happens during training so the models run efficiently on minimal hardware. With microcontrollers like STM32N6 optimizing for AI workloads, the potential is growing fast. Is AI at the edge part of your embedded strategy this year?

The Future of SEO: Predictions and Preparations

Search Engine Optimization better known as SEO, has been part of digital marketing right from the year 2000. With growth in technology and alterations in the manner that users interact on the online platform, SEO changes as well. It will particularly apply to businesses and marketers who want to improve on their positions as far as the internet is concerned. An SEO agency possesses industry knowledge to assist in updating strategies to match current knowledge and implementations of an algorithm.

Using a transformer-based text embeddings model to reduce Sentry alerts by 40% and cut through noise

Sentry uses Issue Grouping to aggregate identical errors and prevent duplicate issues from being created, and duplicate alerts being sent. One of the chief complaints we’ve heard from our users is that in some cases the existing algorithm did not sufficiently group similar errors together, and Sentry would create separate issues and alerts, causing unnecessary disruption–or at least annoyance–to developers.

How AI-powered anomaly detection is transforming APM for SREs

Site reliability engineers (SREs) often face challenges in keeping an organization’s sites running smoothly as the complexity of distributed systems steadily increases. With the rise of microservices, cloud-native architectures, and massive data volumes, manual monitoring and troubleshooting are no longer sustainable. SREs must navigate hurdles like alert fatigue, incident response delays, and the constant pressure to maintain system reliability.

How AI Can Misinterpret Data and Lead to Errors

While AI systems can analyze vast amounts of data quickly, they may also misinterpret that data and lead to significant errors. Understanding how AI misjudgments occur will improve algorithms and ensure they provide accurate results. From biases in data to linguistic ambiguities, various factors can contribute to an AI's misinterpretation of information. Look closely at how these systems work and reveal why you should address these issues right below.

How AI is Transforming the Way We Analyze Data

In 1956, when IBM's engineers unveiled the first hard disk drive, it stored only five megabytes-an amount dwarfed today by a single high-quality photo on your smartphone. But that wasn't the fascinating part; it was the vision. They anticipated a future where data would not only be stored but also analyzed on an unprecedented scale. Fast forward to the 21st century, and data is growing exponentially. Every second, trillions of bytes are created, tracked, and stored across the globe. But storing it isn't the challenge anymore; making sense of it is.

Unlocking Business Value Through Generative AI in Retail

Generative AI has now become an essential tool in reshaping retail as we know it. Major players, including Amazon, Walmart, Carrefour, and more, have integrated GenAI into their operations, unlocking unprecedented efficiency and personalization in their strategies. As AI continues to evolve, it’s no longer just about retail media and advertising—it’s revolutionizing every aspect of the retail landscape, from product development to customer service.

A Leader's Framework for Enterprise AI Platform Investment

Almost every business needs AI, but it’s not needed everywhere. Yes, you read it right. AI, though it transforms entire business models, comes with a price tag. A 2022 survey by McKinsey found that only 27% of companies using AI have successfully scaled their initiatives across the organization. This highlights a key challenge—adopting AI without a clear strategy can lead to wasted resources and minimal return on investment.

Software white labelling for businesses

White labelling is a type of business model increasingly used worldwide. It involves an external company to produce goods and sellers to offer them under their individual brands. This pattern works quite well in the digital industry. A white label AI software, therefore, can boost profitability without months of custom development. In this article, we will have a look at some details.

The Rise of the AI Humanizer: Breathing Life into Robotic Text

The evolution of AI writing tools has been nothing short of transformative, reshaping how we approach content creation across industries-from crafting compelling marketing copy to automating complex code generation. Having followed this journey closely as a tech media writer, I've seen both the promise and the pitfalls of these technologies. One challenge, however, has remained stubbornly persistent: the unmistakably robotic tone that often betrays the origins of AI-generated text.