July 2021

Spring Boot - Monitor your application with OpenTelemetry & SigNoz

In this video, we show a step by step process to monitor your Spring Boot application with OpenTelemetry. We use SigNoz as the backend and visualisation UI. SigNoz is an open source alternative to DataDog, NewRelic, etc. We natively support Opentelemetry based instrumentation. You can instrument any application written in a language/framework supported by OpenTelemetry and visualise metrics and traces in SigNoz.
Featured Post

How You Can Make Your Database More Efficient

Data is the lifeblood of your business, critical to its survival and success. It delivers insights into customers' specific needs, helping you better understand them and deliver a more tailored user experience. With data playing such a key role in whether modern businesses sink or swim, it's vitally important to optimize your database to ensure data is insightful, relevant, and actionable, providing the end user with the best possible experience.

Monitor AWS FSx audit logs with Datadog

Amazon FSx for Windows File Server is a fully managed file storage service built on Windows Server. Migrating on-premise Windows file systems to a managed service like FSx enables organizations to reduce operational overhead and take advantage of the flexibility and scalability of the cloud. But having visibility into file access activity across their environment is key for security and compliance requirements, particularly in sectors such as financial services and healthcare.

What Is Network Latency: Complete Guide on How to Check, Measure and Reduce It to Improve Performance

So you finally launched your service worldwide, great! The next thing you’ll see is thousands and thousands of people flooding into your amazing website from all corners of the world expecting to have the same experience regardless of their location. Here is where things get tricky. Having an infrastructure that will support the expansion of your service across the globe without sacrificing user experience is going to be real though as distance will introduce latency.

Monitoring serverless applications with AWS CloudWatch alarms

Running any application in production assumes reliable monitoring to be in place and serverless applications are no exception. As modern cloud applications get more and more distributed and complex, the challenge of monitoring availability, performance, and cost get increasingly difficult. Unfortunately there isn’t much offered right out of the box from cloud providers.

What is Incident Management in IT and Why does it matter?

Incident management is the process of identifying and resolving problems that occur in IT services. Incident Management is also used as a metric to measure the health of the IT Service Desk. Let’s discuss what incident management is, why it matters to your business, and how you can apply it to your organization.

Introducing Redgate's flexible-hybrid working model

The world of work has changed dramatically over the last year and, at Redgate, we’ve been reflecting on what this might mean for our business, and the ways that we work together in the future. From 2022 onwards, we’re aiming to offer Redgate staff a flexible-hybrid working model across our global organization.

Chapter Nine: In Which Dinesh Experiments with Chaos Engineering

Another day, another drama! This one, though, is very much of my own making. I have been wanting to try my hand at a bit of chaos engineering for some time now but C&Js just hasn’t been ready. Sarah’s been up for it too, though, at Animapanions. And now that our CIO, Charlie has seen MTTR drop across every single technology team, thanks to the rollout of Moogsoft and the new incident management system (kudos to James), it’s pilot day.

Why MSPs Need End-to-End Visibility

The “shared responsibility” model of the cloud puts most of the control with Microsoft, despite the MSP being responsible. As shown below, Microsoft puts very little responsibility into the hands of the customer (or, in your case, the MSP), which is why most MSPs stick to the basic onboarding-related tasks as their Microsoft 365 offering. But the customer’s reliance on Microsoft 365 has changed in the last 14 months… and so have their expectations.

How to Maximize the Performance of Your Kubernetes Deployment

With Kubernetes emerging as a strong choice for container orchestration for many organizations, monitoring in Kubernetes environments is essential to application performance. Poor application/infrastructure performance impact in the era of cloud computing, as-a-service delivery models is more significant than ever. How many of us today have more than two rideshare apps or more than three food delivery apps?

How To Create A Cloud Center Of Excellence (CCOE)

Establishing a Cloud Center of Excellence (CCOE) is an important milestone in every company’s cloud computing journey. While it is usually not the first milestone — which is focused on delivering value to customers — this milestone is reached as the company grows and scales. Turning ad-hoc cloud initiatives into a company-wide strategy, the CCOE helps the business to manage the cloud process and adopt best practices.

Bring Xray Out of the Box with Dependency and Binary Scanning

Shifting left security means you, the developer, catching and fixing vulnerabilities and license violations early in the SDLC. That’s why Xray scans binaries pushed to Artifactory by your builds, and alerts you when there are issues with your dependencies. But catching them earlier, even before checking in code, can be important for developers shifting left.

SQL Made Simple for vSphere-powered Data Centers

Today we’re announcing a new feature of VMware Tanzu SQL called Data Management for VMware Tanzu. It offers a convenient user interface that simplifies the operation, automation, and scalability of Tanzu SQL databases (Postgres and MySQL); this same convenience is also available from a comprehensive set of APIs.

Splunk Machine Learning Toolkit Overview

You no longer have to be a data scientist to bring intelligence to your Splunk data. The Machine Learning Toolkit (MLTK) availble for free on Splunkbase, is a purpose built tool that extends Splunk Processing Language (SPL) with machine learning algorithms, new commands, and powerful visualizations. This video provides a high-level overview of MLTK and preview the use-cases that it supports.

Splunk Mobile - Overview (in 60s)

Splunk Mobile enables you to unlock value from your data anywhere at any time. Regardless of your role or level of technical expertise, you can use Splunk Mobile to view dashboards and take action from your mobile device. Whether you’re a C-suite executive looking for a report, a NOC manager investigating an issue, or a SOC analyst uncovering an anomaly, getting answers has never been more convenient with the power of Splunk in the palm of your hands. Splunk Mobile is made for all organizations and roles, including yours.

Splunk Mobile - Backend Summary (in 60s)

Get to know the Secure Gateway Splunk app, which allows you to deploy and manage your fleet of mobile devices at scale. Plus, take a peek behind the scenes to learn how Splunk Secure Gateway facilitates communication between mobile devices and Splunk platform instances using an end-to-end encrypted cloud service called Spacebridge. Finally, get the latest on Spacebridge compliance and data privacy, since Spacebridge has now been certified to meet SOC2, Type 2 and ISO 27001 standards and is HIPAA and PCI-DSS compliant.

Splunk Cloud Monitoring Console on Mobile (in 60s)

The Cloud Monitoring Console (CMC) lets Splunk Cloud administrators view information about the status and performance of their Splunk Cloud deployment at a glance. On Splunk Mobile, you can access many of the same CMC dashboards as on Splunk Web. Whether you’re interested about your users, indexes, searches, or ingest volume, you can access this data on the go or at the comfort of your own couch.

Splunk On-Call prevents and cuts downtime episode length by half

Your Answer: Escalate the right alerts to the right on-call people for fast collaboration and issue resolution with Splunk On-Call. Reduce burn-out and make on-call suck less with a complete ChatOps experience that's integrated with your IT stack and incident reporting.

CircleCI Server 3.1 Demo | Server Metrics, Backup & Restore plus Runners

Learn how to use server metrics, backup and restore, and CircleCI runners on server 3.1. The latest version of server is designed to meet the strictest security, compliance, and regulatory restraints. This self-hosted solution offers the ability to scale under load and run multiple services at once, all within a team's Kubernetes cluster and network with the full CircleCI cloud experience.

How to sell your manager on CI/CD

Continuous integration seems like a smart choice, right? Why would anyone think that integrating your code into the product as soon as possible is a bad idea? Let me take you back to August 2000, when a fresh-faced young engineer was starting her first engineering role. She was given a desk, a computer, and a detailed project plan that included a release date three months in the future.

Continuous integration for Rust applications

Rust is a powerful language built on the promise of performance and reliability. With no runtime or garbage collector, it easily runs in any environment and can be integrated into any existing language or framework. With the advent of WebAssembly . Rust has become even more valued in the web development space. Rust’s seamless peering with Node.js to build highly performant functionalities has made it a delight for web developers.

Automate your releases with CircleCI and the GitHub CLI orb

Last year, GitHub announced the release of their new CLI tool . The new gh CLI wraps around the standard git cli and offers a suite of additional GitHub.com specific commands. These new commands include the ability to create a new pull request and to create a release directly from your terminal. We here on the CircleCI Community and Partner Engineering team at CircleCI use the gh pr checkout command all the time to safely test pull requests from the community (you!) on our various orbs .

DX NetOps Network Monitoring Software Helps Reduce Noise and Increase Efficiency with New Event Filters

DX NetOps 21.2 network monitoring software continues to innovate and improve the scale, speed, and simplicity of network operations with a focused set of high-value features and capabilities. The latest release of Broadcom’s DX NetOps 21.2 network monitoring software delivers expanded capabilities to monitor and assure your software-defined networking (SDN), and network functions virtualization (NFV) deployments.

News Roundup, July 30, 2021: What's Happening in AIOps, ITOps, and IT Monitoring

Happy System Administrator Appreciation Day! This special day was created in the 1990s to acknowledge and celebrate the tech gurus who assure that our computers, printers, and servers are working and in good condition. So, after thanking your hard-working system admin team, check out the latest in AIOps, ITOps, and IT infrastructure monitoring.

What Is an Intrusion Detection System (IDS)?

More personal and proprietary data is available online than ever before—and many malicious actors want to get ahold of this valuable information. Using an intrusion detection system (IDS) is essential to the protection of your network and on-premises devices. Intrusion detection systems are designed to identify suspicious and malicious activity through network traffic, and an intrusion detection system (IDS) enables you to discover whether your network is being attacked.

Announcing our $55M Series C Round Funding to further our storage-less data vision

It’s been an exciting year here at Coralogix. We welcomed our 2,000th customer (more than doubling our customer base) and almost tripled our revenue. We also announced our Series B Funding and started to scale our R&D teams and go-to-market strategy. Most exciting, though, was last September when we launched Streamaⓒ – our stateful streaming analytics pipeline. And the excitement continues!

Unlocking New Possibilities with CloudHedge and IBM Edge Computing

Edge computing is gaining huge momentum lately, and with the onset of 5G, the opportunities are endless. Moreover, it ensures or brings computation and data storage closer to where the data is generated, further enables better control, reduces costs, provides faster and actionable insights, and supports continuous operations. In fact, by 2025, 75% of enterprise data will be processed at the edge, compared to only 10% today, as predicted by Gartner.

Webinar: How to Survive in The Ever-Changing IT World

Today's IT world is changing extremely rapidly in terms of technologies used, hardware and software lifecycle management, trends and hypes. All companies within the industry strive to keep pace with these changes – ensure their software is using up to date technologies and finding the best talent experienced with modern technologies. You need to adapt pretty fast so that you can survive.

GitOps with Argo and Crossplane - Civo Online Meetup #10

Join Viktor Farcic and Anais Urlichs in this meetup as we will explore Crossplane through the Civo Crossplane Provider. We will showcase how to create Civo Kubernetes clusters through the Civo Crossplane Provider, we will look at GitOps best practices to manage all of your resources in Kubernetes, and lastly we will provide an overview on how you can take GitOps deployments to the next level with ArgoCD.

Troubleshooting Elasticsearch ILM: Common issues and fixes

Hiya! Our Elasticsearch team is continually improving our index Lifecycle Management (ILM) feature. When I first joined Elastic Support, I quickly got up to speed via our Automate rollover with ILM tutorial. I noticed after helping multiple users set up ILM that escalations mainly emerge from a handful of configuration issues. In the following sections, I’d like to cover frequent tickets, diagnostic flow, and common error recoveries. All commands shown can be run via Kibana’s Dev Tools.

Detecting unusual network activity with Elastic Security and machine learning

As we’ve shown in a previous blog, search-based detection rules and Elastic’s machine learning-based anomaly detection can be a powerful way to identify rare and unusual activity in cloud API logs. Now, as of Elastic Security 7.13, we’ve introduced a new set of unsupervised machine learning jobs for network data, and accompanying alert rules, several of which look for geographic anomalies.

MariaDB-as-a-Service in Jelastic Cloud Platform

Jelastic MariaDB-as-a-Service is a result of our many year experience with MariaDB hosting and analysis of the best practices on the platform. It is an automation that does all the "hard" work behind the scenes, providing you with a ready-to-work solution in a matter of minutes. MariaDB-as-a-Service powered by Jelastic PaaS offers numerous benefits, among them.

Most frequently asked questions surrounding Google's Cloud Operations Sandbox

Cloud Operations Sandbox serves as a simulation tool for budding SREs to learn the best practices from Google and apply them to real cloud services. In this blog, we have compiled a list of FAQs surrounding the use of Google's Cloud Operations Sandbox. The Google SRE sandbox provides an easy way to get started with the core skills you need to become a SRE.

The Top 21 Grafana Dashboards & Visualisations

In our guide on the best Grafana dashboards examples, we wanted to show you some of the best ways you can use Grafana for a variety of different use cases across your organisation. Whether you are a software architect or a lead DevOps engineer, Grafana is used to make analysis and data visualisation far easier to conduct for busy engineering and technical teams throughout the world.

Why Cloud-Native SIEM?

The SIEM is a central point where data is collected and correlated, and as we move to consume more cloud services and data sets the SIEM itself must also change in architecture. Architecture change is hard to make for existing products. Calling a product a ‘cloud solution’ is not the same as taking an on-premises product and hosting it for customers. It means building a new SIEM for a new world. There are a lot of reasons users seek new SIEMs.

Kubernetes 1.22 - What's new?

This release brings 56 enhancements, an increase from 50 in Kubernetes 1.21 and 43 in Kubernetes 1.20. Of those 56 enhancements, 13 are graduating to Stable, a whopping 24 are existing features that keep improving, and 16 are completely new. It’s great to see so many new features focusing on security, like the replacement for the Pod Security Policies, a rootless mode, and enabling Seccomp by default. Also, watch out for all the deprecations and removals in this version!

Queryless vs. Query-less. Faster Insights and Better Observer Experience with Span Analytics

In one of my previous blogs I explained how important it is for a modern observability platform to provide “the observers” full, flexible access to all raw telemetry. Observability’s promise to find unknown unknowns relied directly on the ability of fast, powerful and multidimensional high-cardinality analysis of raw data, to uncover previously unknown patterns that have not yet been visualized as a metric, dashboard panel or an alert or anomaly event.

Logging and Monitoring: A Match Made in Software Heaven

All code and no logging makes your application a black box system. Similarly, all logging and no monitoring makes analyzing performance complicated and inconvenient. The goal is to achieve better visibility into the operations of your application, its status, performance, and overall health. Making this information easily accessible presents more context about the critical incidents and surfaces actionable insights for optimizing performance.

How we're working with the Elastic team to make the Elasticsearch data source for Grafana even more powerful

Back in March, we announced that Grafana Labs was partnering with Elastic to build an official Elasticsearch plugin for Grafana. As our CEO Raj Dutt wrote at the time, our “big tent” philosophy “means that we want to support data sources that our users are passionate about. Elasticsearch is one of the most popular data platforms that can be visualized in Grafana.”

How to Use Cargo Repositories in Artifactory

For five years running, Rust has taken the top spot in Stackoverflow’s survey of most loved programming languages. Seen by many as the next step after C/C++, the language is fast becoming embraced by embedded device developers and as a robust system for IoT. At JFrog, we took notice and are eager to welcome Rust developers to the empowerment of robust binaries management and how it contributes to continuous integration.

JFrog detects malicious PyPI packages stealing credit cards and injecting code

Software package repositories are becoming a popular target for supply chain attacks. Recently, there has been news about malware attacks on popular repositories like npm, PyPI, and RubyGems. Developers are blindly trusting repositories and installing packages from these sources, assuming they are secure.

Announcing the GA of the LogDNA Configuration API and LogDNA Terraform Provider

We’re excited to announce that our Configuration API and Terraform Provider are now generally available for all LogDNA customers. We received tremendous feedback from our public beta release and, based on that feedback, we are enabling several new features with the GA release that allow for more programmatic workflows with LogDNA. First, we are enabling Preset Alerts as a new resource that can be configured with the configuration API as well as within Terraform.

Tale of the Beagle (Or It Doesn't Scale-Except When It Does)

If there’s one thing folks working in internet services love saying, it’s: "Yeah, sure, but that won’t scale." It’s an easy complaint to make, but in this post, we’ll walk through building a service using an approach that doesn’t scale in order to learn more about the problem. (And in the process, discovering that it actually did scale much longer than one would expect.)

"Frodo, We Aren't in the Shire Anymore": The Importance of a Customer Journey & How to Avoid Wrecking It

“Frodo, We Aren’t in the Shire Anymore”: The Importance of a Customer Journey & How to Avoid Wrecking It Fans of Lord of the Rings — otherwise known as “Ringers” — never grow weary of reading or watching Frodo and his fellow Hobbits journey through Middle Earth on an epic quest to Mordor (where rumor has it there now exists a very stylish Starbucks at the base of Mount Doom). Well, customers who visit a website are on an important journey as well.

How Log Analytics Powers Cloud Operations, Part II: Use Cases

Cloud computing shapes the ability of enterprises to transform themselves and compete in the 2020s. By renting elastic cloud resources, enterprises can support new customer platforms, distributed workforces, and back-office operations. The cross-functional discipline of CloudOps helps enterprises realize the promise of cloud computing by optimizing applications and infrastructure on cloud platforms.

5 Most Common API Errors and How to Fix Them

As software got more complex, more and more software projects rely on API integrations to run. Some of the most common API use cases involve pulling in external data that’s crucial to the function of your application. This includes weather data, financial data, or even syncing with another service your customer wants to share data with. However, the risk with API development lies in the interaction with code you didn’t write—and usually cannot see—that needs debugging.

7 paw-some traits you didn't know about Freddy AI

Watch this video and find out the seven ways in which Freddy AI helps your IT agents work smarter and faster with intelligent recommendations. Does your organization have someone like Freddy AI to empower your support team and maximize their productivity? No? Then allow us to tell you a little bit something about our enterprise-grade AI engine, Freddy AI! Empower your agents with intelligent recommendations, and free up their valuable time.

How to start and grow a system administrator career

July 30 is System Administrator Appreciation Day. We honor all 33,000 ServiceNow system administrators who help make the world of work better for people. From overseeing instance security updates and critical configurations to proactively maintaining instance health, you are critical to the success of ServiceNow projects. Thank you. To show our appreciation, we want to ensure you know how to prepare for every stage of the system administrator journey.

A Sneak Peek at the "Calico Certified Operator: AWS Expert" Course

Recently, we released our new “Calico Certified Operator: AWS Expert” course. You can read more about why we created this course and how it can benefit your organization in the introductory blog post. This blog post is different; it’s an opportunity for you, the potential learner, to get a glimpse of just a few interesting parts of the course. You won’t learn all the answers here, but you’ll learn some of the questions!

Rollbar Tip of the Day: Linking to AWS CloudWatch logs from Rollbar

Learn how to link to log data in AWS CloudWatch from Rollbar to help you quickly understand the root cause of an error. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Dun & Bradstreet Reduces Mean Time to Resolution with xMatters

How does a business continue to improve its incident management processes, when it’s already using some of the best tools on the market? Join Nick Romanelli, Site Reliability Engineering Lead at Dun & Bradstreet, and Zoe Na, Customer Success Manager at xMatters, as they discuss how Dun & Bradstreet has been able to use xMatters to reduce MTTR and streamline major incident management. With their innovative use of Flow Designer, Dun & Bradstreet have created unique workflows that you’re going to want to know about!

Annual SolarWinds Study Reveals Opportunities for Business and IT Collaboration in Managing Enterprise Risk Driven by Internal and External Security Threats

SolarWinds IT Trends Report 2021: Building a Secure Future examines how technology professionals perceive the evolving state of risk in today's business environment following internal impact of COVID-19 IT policies and exposure to external breaches. SolarWinds introduces Secure by Design program as a guide for industry-wide approach to help prevent future cyberattacks.

Boosting performance with network monitoring solutions

Technological advances and emerging networking concepts are constantly shaping our IT infrastructure. Networks are no longer limited to traditional networking constraints such as its static nature, but are continually evolving to improve efficiency by spanning across wired, wireless, virtual, and hybrid IT environments. This IT evolution drives organizations to advance digitally and support computational requirements to meet their business objectives.

Use Datadog Session Replay to view real-time user journeys

When developing large, customer-facing applications, it’s paramount to have visibility into real user behavior in order to optimize your UX. Without a direct view into what users are actually doing when navigating your app, it can be difficult to reproduce bugs and understand how aspects of your frontend design are causing user frustration and churn. With Datadog RUM’s Session Replay feature, currently available in beta, you can watch individual user sessions using a video-like interface.

NiCE Log File Monitor Management Pack 2.0 for Microsoft SCOM

The NiCE Log File Monitor Management Pack 2.0 is a FREE solution supporting the SCOM Community in next-level log file analysis. It helps IT performance and security data analysts identify errors causing transactions and queries to take too long or not run at all. Software-related bugs, security issues, or erroneous configurations that impact website or application performance are figured out quickly by employing improved templates for alert rules, performance rules, or monitors.

AI in Predictive Maintenance and Forecasting

Industry 4.0 is taking every industry by storm with unprecedented advancement with innovative technology solutions. Its key technologies such as automation, AI, ML, Data Analytics and IoT enables industries to drive business operations with automated data-driven intelligence. Integrating the physical and digital systems, the manufacturing industry is increasingly adopting intelligent manufacturing in this Industry 4.0 era.

How Cox Automotive's IT Operations Team Relies On Monitoring To Help Bring 27 Company Brands and Over 700 Applications Under One Roof

Cox Automotive is a global company with over 40,000 auto dealer clients across five continents. The company, which houses Kelly Blue Book, Autotrader, and 25 other brands, was built through acquisitions. Its IT Operations team is tasked with bringing them together under the Cox Automotive umbrella and ensuring “a good, consistent experience” for its customers worldwide.

How Redox quickly identify and resolve database performance issues

In the IT team at Redox, I wear two hats: Software Development Manager and DBA. I’m the only DBA in the team, so if anything goes wrong it’s my job to identify and fix it. As you might imagine, this can be challenging when being a DBA isn’t your full-time role. Based in Sydney, Redox is one of the leading chemical and ingredients distributors in the world, with over 350 employees across our locations in Australia, New Zealand, Malaysia, and the US.

Releasing Icinga Web v2.9.2

Today we’re announcing the general availability of Icinga Web v2.7.6, v2.8.4 and v2.9.2. All are standard bugfix releases and include fixes found by the community since the latest releases. You can find all issues related to this release on our Roadmap. Please make sure to also check the respective upgrading section in the documentation. This release is accompanied by the minor releases v2.7.6 and v2.8.4 which include the fix for the flattened custom variables.

Which Sources of Metrics Should You Use for Optimization?

Welcome back to my series about metrics for optimization. In the last blog, I discussed the meaning of optimization, determining what you are trying to optimize, and learning how to discover if your enterprise’s optimization goal is a business or technology goal. If you did not get the chance to read my last blog, I suggest you do so as you need to understand what optimization is to progress to our next topic, which is the sources of metrics you should use.

Logz.io Delivers Cloud Native Monitoring to the Azure Marketplace

Logz.io is proud to launch a new partnership with Microsoft that enables Azure customers to directly integrate with Logz.io’s platform from within the Azure Console. This integration importantly allows Azure developers to begin monitoring their workloads faster than ever before, using the open-source technologies that their teams love. Check out this video for a demonstration of how it works.

Three Key Takeaways from The State of Digital Operations Report 2021

2020 heralded a year of increased complexity and customer demands, which isn’t going away. In this new normal, organizations will still be tasked with keeping up this break-neck pace. So, what did digital operations look like in 2020 compared to 2019?

What Is AWS Monitoring? 8 Best Practices To Get You Started

Migrating to the cloud provides cost, scalability, performance, maintenance, and other engineering and IT benefits. Today, Amazon Web Services (AWS) stands out as the most popular cloud platform, offering an advanced public cloud with robust services that are easy to integrate with your existing workflows. While AWS strives to keep its tools simple to use, many users still require deep expertise to get their AWS environment set up properly and running smoothly.

Integrating Logz.io with Azure

Azure users can now deploy the Logz.io platform directly from the Azure Console with the click of a button. The seamless integration between Azure and Logz.io delivers visibility and monitoring for enterprise organizations developing applications on Azure, providing the specific information needed to streamline code development and achieve business agility.

Monitoring Kubernetes the Elastic way using Filebeat and Metricbeat

In my previous blog post, I demonstrated how to use Prometheus and Fluentd with the Elastic Stack to monitor Kubernetes. That’s a good option if you’re already using those open source-based monitoring tools in your organization. But, if you’re new to Kubernetes monitoring, or want to take full advantage of Elastic Observability, there is an easier and more comprehensive way. In this blog, we will explore how to monitor Kubernetes the Elastic way: using Filebeat and Metricbeat.

Grafana Labs joins the CNCF Governing Board as a Platinum member of the open source foundation

At Grafana Labs, we are proud to be one of the largest code contributors to Cloud Native Computing Foundation projects. We are currently the leading company contributor to Prometheus, and also make substantial contributions to Cortex, Thanos, Jaeger, and OpenTelemetry. Our own open source projects — Grafana, Grafana Loki, and Grafana Tempo — have also become fundamental parts of the cloud native ecosystem.

Introducing the New Rollbar Integration for GitHub Enterprise Server

We’re excited to launch our new integration with GitHub that supports GitHub Enterprise Server customers. This allows companies using GitHub Enterprise on their own domains to access key features in Rollbar that help developers fix errors faster. GitHub Enterprise offers a fully integrated development platform for organizations to accelerate software innovation and secure delivery. With Rollbar, GitHub Enterprise Server customers can now access.

Hear From Product Automation & AIOps Lightning Talk

Learn about what's new with PagerDuty Runbook Automation & AIOps from the Summit 2021 Launch. Our Product team shares how you can benefit from our latest updates and enhancements and enjoy demos that were recorded live from Summit 2021 featuring PagerDuty Runbook Actions, Customer Change Event Transformer, Change Correlation, and Outlier Incident.

Hear From Product Incident Response Lightning Talk

Learn about what's new with PagerDuty Incident Response from the Summit 2021 Launch. Our Product team shares how you can benefit from our latest updates and enhancements and enjoy demos that were recorded live from Summit 2021 featuring PagerDuty Incident Context in MS Teams, Slack Insights previews, Stakeholder Updates in ChatOps, Priority-based Business Service Subscription, Past Incidents on Mobile, Add Responder Notification Rules.

Quick Kubeflow Pipelines with KALE, ElasticSearch and Ceph

KALE allows you to annotate your Jupiter notebooks on Kubeflow and magically compile and run Kubeflow Pipelines. In this demo, Aymen Frikha from Canonical shows how to deploy and run Kubeflow alongside ElasticSearch and Ceph, and how to quickly run a pipeline directly from a Jupyter notebook, using KALE (Kubeflow Automated pipeLines Engine).

What is the MITRE ATT&CK Framework for Cloud? | 10 TTPs You should know of

In any case, by using the MITRE ATT&CK framework to model and implement your cloud IaaS security, you will have a head start on any compliance standard since it guides your cybersecurity and risk teams to follow the best security practices. As it does for all platforms and environments, MITRE came up with an IaaS Matrix to map the specific Tactics, Techniques, and Procedures (TTPs) that advanced threat actors could possibly use in their attacks on Cloud environments.

How to mitigate CVE-2021-33909 Sequoia with Falco - Linux filesystem privilege escalation vulnerability

The CVE-2021-33909, named Sequoia, is a new privilege escalation vulnerability that affects Linux’s file system. It was disclosed in July, 2021, and it was introduced in 2014 on many Linux distros; among which we have Ubuntu (20.04, 20.10 and 21.04), Debian 11, Fedora 34 Workstation and some Red Hat products, too. This vulnerability is caused by an out-of-bounds write found in the Linux kernel’s seq_file in the Filesystem layer.

How logging everything helps mitigate ransomware risks

Ransomware attacks, the malicious code that attackers use to encrypt data or lock users out of their devices, have been rampant and are on the rise globally. The largest ransomware payout thus far in 2021 was made by an insurance company at $40 million. A more recent attack occurred in early July and was launched by a group called REvil. The immediate victim was a Florida company, Kaseya, that provides software to companies that manage technology for thousands of smaller firms.

JFrog and Vdoo: Better Together

JFrog customers will soon enjoy end-to-end, holistic security across their software lifecycle — from development to devices — as the technology of recently-acquired Vdoo gets integrated into the JFrog DevOps Platform. That was the pledge made by JFrog and Vdoo leaders during their first joint webinar, in which they explained why JFrog acquired Vdoo, how the platform’s security and compliance capabilities will expand, and what’s the integration timeline.

Does the Source of Your Application Metrics Matter?

Thanks for returning to our series Metrics for Optimization. Once again, please be sure to read our first two blog posts, What is Optimization? and Which Sources of Metrics Should You Use for Optimization? In my first post, I taught you what optimization means, how to figure out what you are trying to optimize, and if you are focused on business goals or technology goals when performing this optimization.

How to monitor Cassandra database clusters

Apache Cassandra is an open-source distributed NoSQL database management system that was released by Facebook almost 12 years ago. It’s designed to handle vast amounts of data, with high availability and no single point of failure. It is a wide-column store, meaning that it organizes related facts into columns. Columns are grouped into “column families.” The benefit is that you can manage data that just won’t fit on one computer.

The Quick and Easy Guide to Reformatting Code in IntelliJ

As a developer, you’re going to be making changes to a codebase. That’s why, as Harold Abelson put it, “Programs must be written for people to read.” If a codebase is not clearly formatted, debugging becomes more difficult than it should be. Though usually overlooked, little changes like reformatting and proper indentation of your code can obviously differentiate a professional developer’s code base from someone just learning.

3 steps to find new revenue opportunities from your customers' digital evolutions

John Pagliuca, CEO of N-able, has taken issue in the press multiple times with the term digital transformation, preferring the term digital evolution. I agree that evolution is a better term. Digital transformation implies a one-time event; digital evolution acknowledges the ongoing nature of these changes. In short, the market will continue to change. How you adapt dictates whether you come out far ahead or remain with the status quo.

Freshservice Service Management Benchmark Report 2021

The Freshservice Service Management Benchmark Report 2021 (FBR 2021) is a benchmark index for key performance indicators (KPIs) for IT Service Management. With anonymously aggregated data from over 47 million unique service desk tickets, the FBR 2021 draws insights across industry averages for agent productivity, service desk efficiency, and scalability of service management solutions.

Securing XML implementations across the web

In December 2020, we blogged about security issues in Go’s encoding/xml with critical impact on several Go-based SAML implementations. Coordinating the disclosure around those issues was no small feat; we spent months emailing the Go security team, reviewing code, testing and retesting exploits, coming up with workarounds, implementing a validation library, and finally reaching out to SAML library maintainers and 20 different companies downstream.

Malware alert: The RedXOR and Mamba attacks and how to defend against them

Picture this: It’s a normal day of working from home as usual since the COVID-19 outbreak. After that satisfying cup of coffee, you log in. But something is wrong. No matter how many times you click, your files don’t open. Your screen is frozen and refuses to budge. And then, you see one of the worst nightmares any IT admin can imagine: “Oops, your files have been encrypted. But don’t worry, we haven’t deleted them yet.

Test internal applications with Datadog's testing tunnel and private locations

As part of your monitoring and testing strategy, you may run tests on different types of applications that are not publicly available—from local versions of production-level websites to internal applications that directly support your employees. Testing each one requires leveraging tools that allow you to verify functionality across a wide range of devices, browsers, and workflows while maintaining a secure environment.

Monitor your CI pipelines and tests with Datadog CI Visibility

Datadog CI Visibility, now available in beta, provides critical visibility into your organization’s CI/CD workflows. CI Visibility complements Datadog’s turn-key CI provider integrations and the integration of synthetic tests in CI pipelines to give you deep insight into key pipeline metrics and help you identify issues with your builds and testing.

Proactive VPN Monitoring for the Hybrid Workforce

A VPN, allows remote employees to create a secure traffic connection to the corporate network. These connections essentially tunnel from a computer or mobile device through a VPN server, often through the public Internet. VPN technology has been around since the mid-1990s, but its usage is now going mainstream due to Covid. As Covid accelerates, it means new monitoring challenges for IT amid a high VPN adoption.

Podcast: Break Things on Purpose | Paul Marsicovetere, Senior Cloud Infrastructure Engineer at Formidable

Break Things on Purpose is a podcast for all-things Chaos Engineering. In this episode of the Break Things on Purpose podcast, we speak with Paul Marsicovetere, Senior Cloud Infrastructure Engineer at Formidable.

How Converting to YAML Build Pipelines Can Help Engineering Teams Be More Efficient

Engineering teams can only be as efficient as the processes they employ during development. The need for increased efficiency is why software development has shifted from the “waterfall” approach to a more responsive, agile methodology. In an agile development environment, quality software can be delivered consistently to suit the ever-changing needs of stakeholders and end users.

7 Ways Your Status Page Can Save You

Having a Status Page is like having a dog. A dog alerts you to an incident; sudden noise, approaching neighbor, squirrel… A dog sounds the alarm on an intruder. A dog even alerts you to maintenance by barking at every handyman, garbage truck, and gardener within sight. As a dog fetches the same stick over and over, so does a status page fetch the attention of your users – especially during a live incident – with each browser refresh they wait for the status to change.

Reliability Matters. Blameless is Growing with Series B $30M Funding

When Blameless started in 2018, the team set out on a mission to help all engineers achieve reliability with less toil and risk. Three years in, that mission has become more important than ever. What has changed is the rate of SRE adoption, now the fastest growing team and practice inside engineering. This represents a clear recognition of the many upsides that an SRE practice brings with its combination of continuous learning, velocity, and resilience.

Meet Thundra Foresight: Your CI Observability Tool!

Over the past three years, we have served thousands of developers with our two major products, Thundra APM and Thundra Sidekick – and it still feels like we’re just getting started. We would like to thank all of our users and supporters who gave us the strength to build our one-of-a-kind products. And we are very excited to announce our latest innovation: Thundra Foresight!

Scout APM Announces Python Application Support for Error Monitoring Tool

Traditionally an APM tool, Scout has expanded its service offerings to now include error monitoring of Python web applications for more cohesive and actionable observability insights within a single platform. This new feature supports an overall better user experience by eliminating the need for multiple web-application monitoring services; Scout APM with Scout Error Monitoring offers performance and error insight and alerting within a single, integrated dashboard.

Zero Trust Network Access: Accelerating Zero Trust Maturity with nZTA

Covid made the hypothetical necessity of IT risk planning a reality. Many organizations responded to the immediate need for remote workforces by adding more VPN licenses. But while adding more VPN capacity solved the problem of resource access, it also led to network bottlenecks and application latencies.

5 Ways to Get Valuable Insight From Your AWS Bill

Did you know that Virtana Optimize’s Bill Analysis tool shows you not just the services currently monitored by Virtana but all services to deliver an overall view of your AWS cost? And if you’ve set up and configured consolidated billing to link multiple AWS accounts, you can include data from all those accounts in that view. You can even add multiple billing orgs to the same Virtana Optimize account.

Defending the Internet of Things from hackers and viruses

The 2010 Stuxnet malicious software attack on a uranium enrichment plant in Iran had all the twists and turns of a spy thriller. The plant was air gapped (not connected to the internet) so it couldn’t be targeted directly by an outsider. Instead, the attackers infected five of the plant’s partner organizations, hoping that an engineer from one of them would unknowingly introduce the malware to the network via a thumb drive.

Why automation is key if you're looking to scale your business

IT is not a technology cost, it’s a human resource cost! This is a fundamental concept MSPs need to keep in mind when they are looking at their businesses, but one many smaller MSPs tend to overlook. Think about it; for every business you’re supporting as an MSP, you’re doing so to ensure their IT infrastructure has stability of operation and is optimized to maximize staff productivity.

How to Monitor Microsoft 365 OneDrive Application

Exoprise supports Microsoft 365 OneDrive monitoring similar to SharePoint with OAuth credentials as well as full experience monitoring via a headless browser. The sensor emulates a real user signing into OneDrive to collect end-to-end performance metrics such as health score, server latency, login times, etc. across the infrastructure and measure optimal availability.

Server Performance Guide: Key Metrics and How to Optimize

Everybody hates it when they have to wait for an application to load—or when an application doesn’t load at all. And if this happens with your application, you’re not just losing business but also losing brand value. Most applications today are online. So servers play a crucial role in keeping applications up and running. Application performance is directly proportional to server performance. Hence, it’s very important to monitor and improve server performance.

Learn how to use the Jira, ServiceNow, GitHub, and GitLab plugins for Grafana for better visibility into software development

GitHub, GitLab, Jira, and ServiceNow are some of the most popular software development tools out there, and Grafana has powerful integrations with each of them. Join us for a live webinar on July 29 at 9:30 PT / 12:30 ET / 16:30 UTC for a demo of these data source plugins and best practices for creating a single pane of glass for viewing your software operations metrics. You can register here.

10 Best Server Performance Monitoring Tools & Software in 2021

Setting up and administering multiple servers for business and application purposes has become easier thanks to advancements in cloud technology. Today, enterprises are choosing to operate large numbers of servers both in the cloud and in their data centers to meet the ever-increasing demand. As a result of these changes, monitoring technologies have become crucial. In this post, we’ll explore the best server monitoring tools and software currently on the market.

What's New: Introducing Next-Gen ChatOps With PagerDuty and Slack

In this new world of digital everything, new application versions usually mean that you’re going to get bigger and better features, more capabilities, and an uplifted user experience, right? When I talk to customers, many can’t wait to upgrade the PagerDuty integrations that they depend on to test new features. If you’re a PagerDuty for Slack user, the next-generation version of our Slack integration will certainly be an exciting development.

Collecting and operationalizing threat data from the Mozi botnet

Detecting and preventing malicious activity such as botnet attacks is a critical area of focus for threat intel analysts, security operators, and threat hunters. Taking up the Mozi botnet as a case study, this blog post demonstrates how to use open source tools, analytical processes, and the Elastic Stack to perform analysis and enrichment of collected data irrespective of the campaign.

Instrumenting Our Frontend Test Suite (...and fixing what we found)

Here at Sentry, we like to dogfood our product as much as possible. Sometimes, it results in unusual applications of our product and sometimes these unusual applications pay off in a meaningful way. In this blog post, we’ll examine one such case where we use the Sentry JavaScript SDK to instrument Jest (which runs our frontend test suite) and how we addressed the issues that we found.

What's new in Sysdig - July 2021

Welcome to another monthly update on what’s new from Sysdig! Happy 4th of July to our American audience, and bonne Bastille to our French friends. It’s been heating up in the northern hemisphere, so we hope you’ve all been managing to stay cool and safe. Our team continues to work hard to bring great new features to all of our customers, automatically and for free! The big news this month is our intent to acquire Apolicy, which has everyone full of excitement.

Developer's Dilemma: When Is the Right Time to Invest in Log Management

Development cycles are complicated. If you’re on a development team, whether you’re building out a custom application, maintaining and iterating on a growing microservice, or breaking ground on a new platform for a startup, you have your hands full. Log management, though seldom celebrated outside hardcore DevOps and IT circles, is still a well-known instrument among seasoned developers. It is insight into the internal workings of your processes as they are used.

Observability at Microsoft: Blue Screen of Death to OpenTelemetry

Ted Young discusses OpenTelemetry at Microsoft with Reiley Yang. Reiley is a Principal Software Engineering Manager at Microsoft and a core contributor to OpenTelemetry. Lightstep’s observability platform is the easiest way for developers and SREs to monitor health and respond to changes in cloud-native applications. Powered by cutting-edge distributed tracing and a groundbreaking metrics database, and built by the team that launched observability at Google, Lightstep’s Change Intelligence provides actionable insights to help teams answer the question “What caused that change?”

What's New in Rundeck 3.4

In this session we will give a live walkthrough covering new capabilities released in Rundeck 3.4. Learn about security and compliance improvements we’ve made including the ability to organize secrets management by project — so now each Runbook can access a different set of passwords and keys for its access control list (ACL). We also have a new plug-in for Thycotic users to manage secrets. Rundeck 3.4 now allows for queueing of jobs when those jobs must be run serially. Finally, we’ll discuss our vision for the future of Rundeck, and our primary development themes for the next year.

OverOps training for new users

Chapters:
00:00 - Intro
01:00 - Agenda
02:21 - Tour of the Automated Root Cause (ARC) screen
10:17 - Search feature on ARC screen
11:11 - Log view
13:02 - Environment view
14:01 - How to create a Jira ticket from the ARC screen
14:32 - Hide button
15:03 - Resolve button and resurfaced issues
16:13 - Labels and notes
16:30 - 3rd-party utility codes
17:01 - Data dashboard
20:00 - Review of new and critical events in data dashboard

Splunk SOAR Feature Video: Custom Functions

Splunk SOAR’s custom functions allow shareable custom code across playbooks and the introduction of complex data objects into the playbook execution path. These aren’t just out-of the-box playbooks, but out-of-the-box custom blocks that save you time and effort. This allows for centralized code management and version control of custom functions providing the building blocks for scaling your automation, even to those without coding capabilities.

Splunk SOAR Feature Video: Contextual Action Launch

Splunk SOAR apps have a parameter for action inputs and outputs called "contains". These are used to enable contextual actions in the Splunk SOAR user interface. A common example is the contains type "ip". This is a powerful feature that the platform provides, as it allows the user to chain the output of one action as input to another.

Splunk SOAR Feature Video: Configure Third Party Tools

To get started in Splunk SOAR, you will need to configure an asset. Assets are the security and infrastructure assets that you integrate with the Splunk SOAR platform, like firewalls and endpoint products. Splunk SOAR connects to these assets through apps. Apps extend the platform by integrating third-party security products and tools.

Detecting SeriousSAM CVE-2021-36934 With Splunk

SeriousSAM or CVE-2021-36934 is a Privilege Escalation Vulnerability, which allows overly permissive Access Control Lists (ACLs) that provide low privileged users read access to privileged system files including the Security Accounts Manager (SAM) database. The SAM database stores users' encrypted passwords in a Windows system. According to the Microsoft advisory, this issue affects Windows 10 1809 and above as well as certain versions of Server 2019.

Adopting and maturing to service ownership with PagerDuty and Rundeck

Among the common goals of today’s engineering and operations teams is to adopt a culture of Service Ownership: “You build it, you own it.” As with many ancillary objectives to driving DevOps across an organization, this is easier said than done. Sometimes this is in small part due to the technology stack/architecture of a given company. But more often than not, this is because teams lack the human-to-technology mechanisms that allow for a culture of service ownership.

Adventures in operational automation

In this session, David Morse from Parsons will join Arturo Suarez Martin from Rundeck by PagerDuty to discuss his experiences using Rundeck to automate all-the-things. Learn new ideas for automation use cases. See how Runbook automation doesn’t just save staff time, but also improves quality of operations. Hear tips and tricks from experienced users of Rundeck for creating self-service operations, automating incident response, and supporting transformation to service ownership.

Getting Started with kapp

In this video Tiffany Jernigan (twitter.com/tiffanyfayj) talks about the Carvel toolset tool, kapp. kapp (part of the open source Carvel suite) is a lightweight application-centric tool for deploying resources on Kubernetes. Being both explicit and application-centric it provides an easier way to deploy and view all resources created together regardless of what namespace they’re in. Being dependency-aware, it is able to wait for resources to be created, updated, or deleted, and provides a live status on the progress of the actions. Continue on to see how to get started with kapp.

Tanzu Tuesdays 62 - Monitoring Avail. w/Error Budget Burn Rate on Tanzu Observability w/Amber Salome

Starting in April of 2020 my team was tasked with managing Tanzu Application Service on multiple foundations for a client. Early on it was a priority to establish a strong SRE practice around managing the platform. This talk discusses how we defined key metrics for monitoring availability, custom solutions for populating availability data into an observability platform (Tanzu Observability by Wavefront), dashboard creating, and alerting practices. We discuss in depth the benefits of using a burn rate when monitoring availability error budget consumption, and how this strategy allows for more sensitive alerting and limiting error budget consumption.

Listening to the Hype: OpsRamp featured in eight Gartner Hype Cycles

July is Hype Cycle season, the time of year when Gartner livens up the summer doldrums by updating its eagerly awaited Hype Cycle series of reports. This year’s Hype Cycles demonstrated OpsRamp’s growing brand recognition as we were listed as a representative vendor in eight different Gartner Hype Cycles.

Turn slow, ad hoc response into real-time incident resolution with PagerDuty

Presented by Sean Noble at PagerDuty Summit 2021 Learn how your first responders can take action in real time when an incident occurs by delegating the power to run diagnostics and remediation. Hear about a new capability that curates a palette of automation to responders for impacted systems in an incident. Now, instead of immediately escalating to subject matter experts and developers, responders can quickly diagnose and document incidents, and even run common corrective actions such as trigger fail-over or remediations. In this session, we’ll demo this new capability, and reveal our roadmap.

Do you really need a service mesh?

The challenges involved in deploying and managing microservices have led to the creation of the service mesh, a tool for adding observability, security, and traffic management capabilities at the application layer. While a service mesh is intended to help developers and SREs with a number of use cases related to service-to-service communication within Kubernetes clusters, a service mesh also adds operational complexity and introduces an additional control plane for security teams to manage.

JavaScript Logging Basic Tips

In the past few years, JavaScript has evolved in several ways and has come a long way. With the evolving technology, machines are becoming more powerful, and browsers are getting more robust and compatible. In addition, Node.js’s recent development for JavaScript’s execution on servers, JavaScript has been getting more and more popular than ever before.

How OverOps integrates into CI/CD

OverOps CEO Rod Squires discusses how OverOps integrates into CI/CD. OverOps root cause analysis at runtime instantly pinpoints why critical issues break backend Java and .NET applications. No detective work is required, such as searching logs. OverOps provides the precise line of code and associated variables at the moment the error occurred. For pre-prod, critical issues are identified and resolved before being promoted to production, and before impacting the customer.

How OverOps Helps the Retail and eCommerce Industries

OverOps CEO Rod Squires discusses how OverOps supports businesses in the retail and eCommerce industries. WHAT IS OVEROPS? OverOps root cause analysis at runtime instantly pinpoints why critical issues break backend Java and .NET applications. No detective work is required, such as searching logs. OverOps provides the precise line of code and associated variables at the moment the error occurred. For pre-prod, critical issues are identified and resolved before being promoted to production, and before impacting the customer.

How OverOps Helps the Telecommunications Industry

OverOps CEO Rod Squires discusses how OverOps supports businesses in the telecommunications industry. OverOps root cause analysis at runtime instantly pinpoints why critical issues break backend Java and .NET applications. No detective work is required, such as searching logs. OverOps provides the precise line of code and associated variables at the moment the error occurred. For pre-prod, critical issues are identified and resolved before being promoted to production, and before impacting the customer.

How OverOps Helps the Financial Services Industry

OverOps CEO Rod Squires discusses how OverOps supports businesses in the FinServ industry. OverOps root cause analysis at runtime instantly pinpoints why critical issues break backend Java and .NET applications. No detective work is required, such as searching logs. OverOps provides the precise line of code and associated variables at the moment the error occurred. For pre-prod, critical issues are identified and resolved before being promoted to production, and before impacting the customer.

How to Ensure Patch Compliance

Patch compliance indicates the number of compliant devices in your network. This means the number of computers that have been patched or remediated against security threats effectively. The distribution and deployment of patches accomplish nothing if your devices are not compliant. So to establish a good patch management strategy, it is important to pay attention to the effectiveness and reach of your patch deployment activities.

Monitoring Kubernetes with Epsagon

Kubernetes is an open-source orchestration platform that allows you to manage and scale your containerized workloads. You can run Kubernetes anywhere—on-premises or in a public or hybrid cloud. Kubernetes helps you build scalable services by providing functionalities like declarative configuration, immutable infrastructure, horizontal scaling, load balancing, service discovery, and self-healing systems.

Getting over on-call anxiety

You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.

BizTalk Migrator: What is new and what is coming (June 2021 Edition)

BizTalk Migrator tool is one of the latest releases of Microsoft, which helps to migrate your BizTalk solutions to Azure in a much simpler and automated way. So to keep you informed about the recent enhancements of the tool, the Azure Logic Apps team had a live remote session exclusively on that topic. Without any further delay, let us jump in as there are tons of updates are waiting

AWS EC2 Service Discovery with HAProxy

AWS Auto Scaling groups are a powerful tool for creating scaling plans for your application. They let you dynamically create a group of EC2 instances that will maintain a consistent and predictable level of service. HAProxy’s Data Plane API adds a cloud-native method known as Service Discovery to add or remove these instances within a backend in your proxy as scaling events occur. In this article, we’ll take a look at the steps used to integrate this functionality into your workflow.

Collecting Actionable Bug Reports with Jira Service Management

This video takes a look at leveraging Jira Service Management linked with Jira Software to receive, route, and escalate bug reports for the appropriate response. Jira Service Management empowers everyone within an organization to easily report software bugs, and enables agents to easily create associated Jira issues. Most importantly, this capability does not require the need for an individual Jira Service Management license.

Get comprehensive monitoring for your Apache Kafka ecosystem instances quickly with Grafana Cloud

We are happy to announce that the Kafka integration is available for Grafana Cloud, our composable observability platform bringing together metrics, logs, and traces with Grafana. Apache Kafka is an open source distributed event streaming platform that provides high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

How digital workflows ensure secure access at Copenhagen Airport

Access to any airport is tightly governed. When you’re one of Europe’s busiest airports and a high-profile piece of national infrastructure, you cannot have unauthorized people wandering about. At Copenhagen Airport, we have more than 20,000 people cleared to access the airport. These might be security staff, baggage handlers, IT service providers, catering teams, and others. As the airport has grown, so has the number of on-site workers.

5 Key Considerations When Choosing a Log Management Solution

Purchase decisions often begin with a price check. Log management is no different. Evaluate your budget and narrow down the options that fit to choose the tool that gives you the most for what you pay. As always, cheaper is better as long as the platform doesn’t cut any corners. But with log management, there is a catch – not all tools are transparent with their pricing model.

Experiencing Turbulence? Hypercare Helps Travel and Hospitality Firms Manage Sky-High Demand

Many sectors suffered during the COVID-19 pandemic, but the travel and hospitality industry was struck particularly hard as the world went into lockdown and governments urged us to stay home. According to the International Air Transport Association, global air passenger demand in 2020 was down a record 65.9% from the previous year, and the tourism industry saw an estimated loss of 100.8 million jobs worldwide.

How Ivanti is Helping the Federal Government Make Zero Trust a Reality

As you may have heard, Ivanti was selected by the National Institute of Standards and Technology’s (NIST’s) National Cybersecurity Center of Excellence (NCCoE) to participate as a collaborator in its Implementing a Zero Trust Architecture project.

Back to the (Monitoring) Basics | An IT Journey to Monitoring Glory: Session 1

Let’s start by going back to the basics of monitoring. Whether you’re new to monitoring or looking for a refresher, there’ll be something for everyone. The presenters will bring their experience monitoring small, to large, to hybrid, and beyond environments. Join the live chat to ask questions of the presenters and offer other attendees your own tips for ramping up on monitoring.

Supercharge your SIEM Capabilities with Modern Log Management

Cybersecurity is front and center today for every business regardless of size or industry. Major ransomware attacks and data breaches seem to make headlines just about every day. Sophisticated attackers and cybercriminals are always finding new ways to extort businesses, steal confidential data, and wreak havoc. A quick read of the CrowdStrike 2021 Global Threat Report will surely give you cause for concern.

Using AI/ML to Increase Gaming Monetization

Gamers are not shy about reaching into their wallets for premium content and features. They also won’t hesitate to tap the uninstall button at the first sign of trouble. It’s not uncommon for a gamer to boot up a hotly anticipated new game or revisit an old favorite only to put it down days or weeks later. The culprit is often gaming monetization issues that get in the way of what would otherwise be a long-term rewarding gaming experience.

The 15 Best CFO Tools For Managing Software Financials

As businesses evolve thanks to technology advances, so must the role of the Chief Financial Officer (CFO). With an ever-growing list of financials to manage, such as payroll, operating expenses, IT and software expenses, and more, CFOs have to find the right processes and choose the right tools to get the cost data they need to make informed decisions and ensure profitability for their company.

The Benefits of Centralized Log Management and Analysis

Log centralization is kind of like brushing your teeth: everyone tells you to do it. But until you step back and think about it, you might not appreciate why doing it is so important. If you’ve ever wondered why, exactly, teams benefit from centralized logging and analysis, keep reading. This article walks through five key advantages of log centralization for IT teams and the businesses they support.

Onboarding Data in Splunk Security Analytics for AWS

Splunk Security Analytics for AWS’ new data onboarding wizard quickly takes you from subscribing to the service to visualizing your AWS environment. We’ll walk through the wizard in this video, and you’ll see how the new process can save you hours, days or even weeks when compared to traditional data onboarding processes.

Clojure microservices for JavaScript developers part 2

This series was co-written by Musa Barighzaai and Tyler Sullberg. In the previous post, we explored high-level differences between thinking in Clojure compared to thinking in JavaScript. We are now ready to start building our first Clojure microservice. The microservice we are going to build will be very simple. It will be an HTTP server that uses a Redis data store to count how many times a given IP address has pinged the /counter endpoint.

Clojure microservices for JavaScript developers part 3

This series was co-written by Tyler Sullberg and Musa Barighzaai. This is the third and final post in a series of posts for JavaScript developers about how to set up Clojure microservices. The previous posts were: Those previous posts are useful context, but you can clone the repo and jump into this post without reading them.

How to Monitor Microsoft 365 Azure Active Directory (AAD)

Monitor Microsoft 365 Azure Active Directory (AAD) with Exoprise CloudReady. AAD is an enterprise multi-tenant cloud directory and identity management service. In your business, employees who work from home remotely or the office rely on Azure AD to log in to multiple Office 365 services and access them through a single set of cloud credentials.

How To - Push Device Configuration Changes

Fastest time-to-value and lowest TCO (total cost of ownership) are among the top 10 reasons that customers choose, love and continue using Netreo. Turning time-consuming administrative projects into simple tasks is one way Netreo consistently delivers superior value. But like all software solutions in use today, many Netreo features go unused or misunderstood by too many customers and would-be users.

10x development speed with local serverless debugging

In this article you’ll find out how to 10x your development speed with local serverless debugging. Questions such as “what happens when you scale your application into millions of requests?”, “what to expect when going serverless?”, “how does it look like?”, or “how is it to build applications on serverless and work locally?” will be addressed.

"Accelerate" your team with Sleuth

The larger your team grows and the faster your teams move, the harder it is for engineering leaders to find trust but verify moments, the moments where you should dig in and make sure your team's health is improving. Imagine a world where all your engineering tools are working together such that accurate and insightful trust but verify moments come to you. Imagine a world where you have the finest Sleuth in the world, working just for you.

How to Reduce Alert Fatigue: Preventing Noisy Alerts and Error Messages

Monitoring solutions are a vital component in managing an application’s environment. From the systems layer all the way up to the end user’s connection to the app, you want to find out how the platform is performing. Indicators like CPU, memory, the number of connections, and overall health help teams make informed decisions for guaranteeing uptime. Teams monitor metrics (short-term information) and logs (long-term information) mainly from a reactive perspective.

How to Notify Your Team of Errors: Email vs. Slack vs. PagerDuty

Site Reliability Engineering (SRE) and Operations (Ops) teams heavily rely on notifications. We use them to know what’s going on with application workloads and how applications are performing. Notifications are critical to ensuring SREs and Ops teams can resolve errors and reduce downtime. They’re also crucial when monitoring environments — not only when running in production but also during the dev-test or staging phase.

How Much Does a Digital Experience Leader Make in IT?

Have you ever tried to search for a leadership position in IT that’s dedicated exclusively to employee experience, sometimes listed as end user experience or Digital Employee Experience (DEX)? I’m not talking about a CXO (Chief Experience Officer) role outside of IT—that position is usually advertised for customer experience or employee communications and human resources. I’m talking strictly enterprise IT.

Supporting Azure Shared Image Gallery with Elastigroup

Images are one of the most basic, common attributes for your virtual machines (VMs), and contain the operating system which may be customized with specific installations and features. It is necessary to keep VM images organized and structured so that they are easily maintained, managed, and are accessible. Azure introduced their Shared Image Gallery to help solve this, giving users a way to manage, share and distribute custom images.

Cleaning House - How One IT Team Saved $1.8M on SaaS Licensing

I recently spoke to the IT Director and Head of End User Computing at a leading healthcare company who implemented Salesforce globally across their entire employee user base 9 months ago (before later becoming a Nexthink customer). She told me their Salesforce licensing model was similar to others you’ll see in market: a set of base licenses and then selected add-ons based on employee roles – with some at no charge and others priced ala carte. Her problem? License metering.

Accelerating Machine Learning with MLOps and FuseML: Part One

Building successful machine learning (ML) production systems requires a specialized re-interpretation of the traditional DevOps culture and methodologies. MLOps, short for machine learning operations, is a relatively new engineering discipline and a set of practices meant to improve the collaboration and communication between the various roles and teams that together manage the end-to-end lifecycle of machine learning projects.

Microservices Are 'Easy', Dependencies Are Hard - Itiel Shwartz (at Yalla DevOps 2021)

Yalla! DevOps 2021 -- The first, in-person DevOps conference of the year! Driven by the DevOps community. All about the DevOps community. Microservices Are ‘Easy’, Dependencies Are Hard: The Right Way to Build a Cloud-Native CI/CD Microservices are more agile, easier to test, and simpler to maintain. If you don’t know, now you know. Thanks to k8s, it’s so easy! In fact, it is so easy, we’re gradually scaling down to smaller and smaller services. Sounds like there’s no downside at all. Or is there? In this talk, Itiel describes the many pitfalls of microservices, and how to avoid them.

AI in Construction and Architecture Industry: [With Real World Use Cases]

Globally the impact of AI is increasingly growing year on year in every industry sector. Opening new scopes in construction & architecture sector, the global AImarket is estimated to grow at a CAGR of 29.4% (around) from 2019 to 2026 and is expected to reach around US$ 2.1 Billion by 2026.

Accelerating Code Quality with DORA Metrics

What do Google’s DevOps Research and Assessment (DORA) and Rollbar have to do with each other? DORA identified four key metrics to measure DevOps performance and identified four levels of DevOps performance from Low to Elite. One way for a team to become an Elite DevOps performer is by focusing on Continuous Code Improvement.

Why implementing Zero Trust is more important than ever before

Five worthy reads is a regular column on five noteworthy items we’ve discovered while researching trending and timeless topics. This week, we explore why organizations should implement Zero Trust in 2021. In 2010, John Kindervag introduced the concept of “Zero Trust” which has become a touchstone for cyber resilience and persistent security. Zero Trust is not a security product, architecture, or technology.

Making ServiceNow better with CloudFabrix RDA

The onset of ServiceNow has relieved the IT Services workforce. With CloudFabrix RDA added to it, we made it even better. Let’s face it that many IT Service transformation implementations take longer because of a lack of automation around migration and production. The efficiency of ITSM is further compromised due to the absence of data automation and enrichment. ServiceNow with Robotic Data Automation stirs a positive impact on three critical areas of data operations ITSM teams.

Comparison N-Able vs Kaseya vs Pandora FMS: Fight !!!

Lemons, oranges, grapefruits, limes… We know that they are not the same, but if necessary, you can make juice with all of them. And yes, we can and we will. We are in summer and it makes you want to make a good cocktail, doesn’t it? Today, in PFMS blog, we are going to analyze the commonalities of N-Able (Solarwinds MSP), Kaseya and Pandora FMS. Also their -remarkable- differences of course.

Gaining a real competitive edge in managed services

These are interesting – and challenging – times to be a Managed Service Provider. When it first published its Managed Services Market Size Forecast, Mordor Intelligence valued the market at US$152 billion in 2020, and predicted it to reach US$274 billion by 2026, a compound annual growth rate of 11.2%. Over a year later, following a pandemic which has changed the way most of us work and which will probably see permanent changes going forward, Mordor is sticking by its prediction.

Kubernetes Monitoring Explained: 10+ Tools To Get You Started

Kubernetes is the platform of choice for orchestrating containerized applications. It’s ideal for large applications running on distributed instances. But you likely already knew that because if you are weighing Kubernetes monitoring tools, you know what Kubernetes (K8s) is and why it is useful. That means you also have an inkling of how challenging Kubernetes can be to manage. This is where Kubernetes monitoring tools come in handy.

SAN Performance Monitoring

A Storage Area Network (SAN) is a specialized, high-speed network that provides block-level network access to storage. SANs are typically composed of hosts, switches, storage elements, and storage devices that are interconnected using a variety of technologies, topologies, and protocols. Each computer on the network can access storage on the SAN as if they are local disks connected directly to the computer.

How Grafana helps organizations manage SLOs across multiple monitoring data sources

“SLO is a favorite word of SREs,” Grafana Labs Principal Software Engineer Björn “Beorn” Rabenstein said during his talk at KubeCon + CloudNativeCon NA 2019. “Of course, it’s also great for design decisions, to set the right goals, and to set alerting in the right way. It’s everything that is good.” So what happens when things go bad?

Top 5 Web Application Monitoring Tools You Should Know

Web application monitoring tools can keep your business afloat. Period.  Imagine this. You’re about to run a crucial end-of-season sale on your website. You’ve sent your emails, run social media campaigns, paid for advertisements, and stocked up your inventory; you are all set to let the cash register ring.  However, on D-day, your website goes down. It’s unable to handle the incoming traffic or is simply down because of technical glitches.

A Step By Step Guide to Tomcat Performance Monitoring

Application server monitoring metrics and runtime characteristics are essential for the applications running on each server. Additionally, monitoring prevents or resolves potential issues in a timely manner. As far as Java applications go, Apache Tomcat is one of the most commonly used servers. Tomcat performance monitoring can be done with JMX beans or a monitoring tool such as MoSKito or JavaMelody.

Workflow Management Tools (Potential Benefits)

Workflow management tools can have a tremendous impact on team performance. Is your team failing to live up to expectations? Do you have a clear plan to drive productivity and improve customer satisfaction? If you’re struggling to get more from your team, we would recommend implementing StartingPoint into your operations for a streamlined workflow. When it comes to finding a comprehensive solution for workflow automation, StartingPoint ticks all the boxes.

Get Started with Splunk for Security: Splunk Security Essentials

Continuing to ride the waves of Summer of Security and the launch of Splunk Security Cloud, Splunk Security Essentials is now part of the Splunk security portfolio and fully supported with an active Splunk Cloud or Splunk Enterprise license. No matter how you choose to deploy Splunk, you can apply prescriptive guidance and deploy pre-built detections from Splunk Security Essentials to Splunk Enterprise, Splunk Cloud Platform, Splunk SIEM and Splunk SOAR solutions.

With Splunk Synthetic Monitoring, proactively find and fix your user experience issues

Trend, visualize, and improve performance of all your page resources and third party dependencies. Detect and resolve issues faster across your critical user flows, business transactions and API endpoints using Splunk Synthetic Monitoring.

Troubleshooting End-User Issues With a DEM Tool

In the last decade, businesses have made massive investments in the digital economy with the goal of increasing operational efficiency and improving their customer or end-user experience. However, it isn’t rare for businesses to incur losses due to poor page load speed, failed transactions, or website errors. This is why businesses need to track end-user experience in real time and resolve issues quickly.

Clojure microservices for JavaScript developers

This series was co-written by Tyler Sullberg and Musa Barighzaai. CircleCI is growing, which is wonderful. However, one of the growth challenges we have is that our backend is primarily written in Clojure, and few developers know Clojure. Many CircleCI engineers, including myself, have learned Clojure on the job. Before joining CircleCI, I was a JavaScript developer. As the lingua franca of software engineers, JavaScript is a relatively straightforward language to learn.

Powerful Time-Based Automation Rules to Move You From Reactive to Proactive

Unattended incidents won’t clean up after themselves and will come back to haunt you—whether as rising MTTR metrics, a cluttered Incident Index, fruitless back-and-forth communication, or a declining CSAT score. Powerful automation conditions can drive productivity, save you manual work, and speed up your incident lifecycle management —and keep your employees happy.

What Is Optimization And Why Is It Important?

We will be doing a four-part blog series about looking at the sources of metrics for optimization. To kick off the series, I will teach you what optimization actually means, what you are trying to optimize, and if you are focused on business goals or technology goals when performing this optimization. In our next blog, we will look at APM derived metrics and metric domains you can leverage, as they are the two primary sources people rely on for optimization today.

Essential guide to CI/CD, ITIL and tools to bridge the two

Continuous integration and continuous deployment (CI/CD) drive software development and release in DevOps. Companies based in traditional ITIL practices often want to reap some of CI/CD’s benefits but aren’t sure how to combine the two. In this article, learn the technology stack options for building a strong CI/CD pipeline, why companies rooted in ITIL are running CI/CD alongside, and best practices for a hybrid ITIL-CI/CD approach.

Analyzing Office 365 GCC Data With Sumo Logic

Many of our customers today leverage Office 365 GCC High, including organizations looking to meet evolving requirements for working with the United States Department of Defense. Sumo Logic enables customers to leverage our out-of-the-box monitoring and analytics capabilities to analyze Office 365 GCC High data to offer security engineers and security analysts stronger situational awareness of internal employee data.

High-availability connectivity for Kubernetes with dual ToR

Dual ToR (top of rack) peering provides a redundant path for customers with cluster applications that cannot tolerate service downtime or failure and require a high-availability solution. While Calico ToR connectivity has existed for some time, Calico Enterprise now supports connectivity with dual ToR switches.

Featured Post

The New Normal for Hybrid IT Solutions

The more things change, the more things stay the same. An idiom that's oddly comforting in its assurance that everything will remain balanced and undisrupted, and the winds of change-however ferocious-are somehow futile against the staunchness of the status quo. That said, I would suggest the creator of this idiom hadn't experienced a year like 2020 (and now 2021).

Troubleshoot faster with process-level app and network data

When responding to an incident, you need to quickly find the scope of the issue so you know which teams to notify and which parts of your system to investigate next—before your end users are affected. But as multiple processes use resources on each of your hosts, and interact in unexpected ways, it can be difficult to know exactly what is causing an issue—especially if those processes are running off-the-shelf software.

Citrix Tips for Troubleshooting

I recently saw a user asking on EUC Slack “is there a Domain controller response time in ?”. Unfortunately for him, his choice of monitoring product doesn’t include such metrics. However, it did make me wonder if Citrix admins are aware of the importance of getting metrics about Domain Controllers, simply because many EUC monitoring tools fail to monitor them.

PD Summit21: Transforming Infrastructure Teams Through Observability

What is this ""observability"" thing that everyone is talking about? Observability allows you to navigate the dark unknowns with echolocation while others attempt to fly blindly without it. Are your dashboards all green, but you still have an issue brewing? Do you need instant feedback based on the Core Analysis loop? Are your engineers tired of waking up at 3 AM for the expected issues? Is there a lack of time for experimentation? Generate your own answers and create a meaningful course of action with observability.

PD Summit21: The Netflix Reliability Story: A Brief History of How We Evolved Resilience to Failure

In Netflix engineering, we’re driven by ensuring Netflix is there when you need it to be. We strive to provide a service that people love and can enjoy anytime, anywhere. An important foundation for bringing our customers joy is a strong focus on reliability that ensures Netflix will be available when they need it. In this talk, I’ll tell the story of how we've grown our reliability practices over time to meet the changing demands of microservices and distributed computing.

PD Summit21: Adopting and Maturing to Service Ownership with PagerDuty and Rundeck

Among the common goals of today's engineering and operations teams is to adopt a culture of service ownership: ""You build it, you own it."" As with many ancillary objectives to driving DevOps across an organization, this is easier said than done. Sometimes this is in small part due to the technology stack/architecture of a given company. But more often than not, this is because teams lack the human-to-technology mechanisms that allow for a culture of service ownership.

eG Enterprise, the virtual assistant that every Citrix Admin needs

eG Enterprise is the virtual assistant, who’ll make your life a whole lot easier. Just like Siri and Alexa, eG will proactively monitor your IT & applications. Wouldn’t you want to know what these extra sets of hands can deliver? Watch this short video to know how automatic root-cause diagnosis tech, Citrix service topology views, synthetic & real user monitoring capabilities, and machine learning and auto-baselining tech enable you to be the IT hero among your peers, colleagues, and the management.

How to encourage DBAs to embrace DevOps, rather than fear change

How do we help Database Administrators (DBAs) embrace DevOps in a way that can be really productive and part of a rich DevOps team that delivers value to customers quickly and continuously? That’s an important question to ask right now because there’s a common view among DBAs that DevOps isn’t for them. They’re responsible for documentation and maintenance and deployments, they have internal customers, and they serve internal requests.

Introduction to Custom Metrics in Python with the Logz.io RemoteWrite SDK

We just announced the creation of a new RemoteWrite SDK to support custom metrics from applications using several different languages. This tutorial will give a quick rundown of how to use the Python SDK. Using these integrations, Prometheus users can send metrics directly to Logz.io using the RemoteWrite protocol without sending them to Prometheus first. Each SDK, while for a separate language, is each capable of working with frameworks like Thanos, Cortex, and of course M3DB.

Announcing the RemoteWrite SDK for Custom Metrics in Python, Go & More

We’re proud to announce the creation of a new RemoteWrite SDK to support custom metrics from applications using Golang (Go), Python, and Java, with many more on the way. Each SDK will have automatic, continuous deployment of updates. Using these integrations, Prometheus users can send metrics directly to Logz.io using the RemoteWrite protocol without sending them to Prometheus first.

How Vanguard used Observability to Accelerate and De-risk their Cloud Migration

Rich Anakor, chief solutions architect at Vanguard, is on a small team with a big goal: Give Vanguard customers a better experience by enabling internal engineering teams to better understand their massively complex production environment—and to do that quickly across the entire organization, in the notoriously slow-moving financial services industry. They also had a big problem: The production environment itself.

5 things you can do to improve your customer support (part 2)

From my previous blog, I’m going to continue the list of five things you can do to improve your technical service delivery to your customers (if you didn’t read the last post, you can catch up on what you missed here (link)). In the following three points, I focus on the role automation can play.

Creating Envoy WebAssembly Extensions

In the CNCF ecosystem, Envoy, an open source service proxy developed by Lyft, is a very common choice in service mesh networking. In a previous post we discussed that both Consul and Istio leverage Envoy. Were you aware that you can extend Envoy’s capabilities with WebAssembly? What is WebAssembly? WebAssembly, or Wasm as it is often abbreviated, is not so much of a programming language as it is a specification for a binary instruction format that can be run in sandboxed virtual machines.

Understanding IIS Log Files: Operating Instructions

Commonly, your website or app functions perfectly until you release it. During testing, you might seem to have control over everything. But, sooner or later, you will face some challenges. In fact, it is totally normal when something goes wrong. The most important thing is how you settle these problems. In most cases, issues with availability alerts and users’ complaints can be addressed by the means of IIS logs. IIS logging will provide you with the necessary data to deal with a breakdown.

4 benefits of combining ITSM and ITOM

IT management can be costly and time-consuming without streamlined processes and systems to support your business goals. With the quickened pace of business requiring faster scale, leaders and decision-makers must find ways to adapt and optimize their processes. Combining IT Service Management (ITSM) and IT Operations Management (ITOM) can help you prioritize operations efficiency while delivering the best service to your employees.

Deep Learning Toolkit 3.6 - Automated Machine Learning, Random Cut Forests, Time Series Decomposition, and Sentiment Analysis

We’re excited to share that the Deep Learning Toolkit App for Splunk (DLTK) is now available in version 3.6 for Splunk Enterprise and Splunk Cloud. The latest release includes: Let’s get started with the new operational overview dashboard which was built using Splunk’s brand new dashboard studio functionality which I highly recommend checking out. You can learn more about it in this recent tech talk which you can watch on demand.

What's New: Updates to Event Intelligence, Integrations, and More!

If you thought that the product announcements from PagerDuty’s largest event of the year, PagerDuty Summit 2021, was all we had in store for you, think again! We’re excited to announce that the July Release comes with a new set of updates and enhancements to the PagerDuty platform! You can learn about our latest capabilities via the Q1 PagerDuty Pulse or read below for the highlights.

The Confident Commit ep. 7 | Design Your Org Structure for Fast Flow of Change with Matthew Skelton

Rob interviews co-author of Team Topologies and founder of Conflux, Matthew Skelton on how to structure your team for a fast flow of change. Discover the signs, symptoms, and proper metrics that indicate your organization's structure may need a redesign.

PD Summit21: Migrating to L1 Support to PagerDuty

Learn how Maersk transitioned from operating with an L1 support team to using PagerDuty to drive an efficient operational support model. In this talk you will learn how implementing PagerDuty within the platform SRE team was part of a major re-org with the goal of driving a new operations model for a highly available (99.999%) platform that lead to outstanding results. At Maersk, we saw increased efficiencies and reduced TTR along with other significant advantages of using PagerDuty from both on-call and management perspectives.

PD Summit21: AWS and PagerDuty: Better Together -- A Digital Transformation Journey

PagerDuty’s platform for real-time operations helps teams manage a complex transition from siloed and centralized approaches to multiple, distributed teams supporting a hybrid cloud infrastructure. To make this journey successful, one thing is clear: your people, technology, and operational processes need to be aligned in real time. That’s why we’re continuing to invest in our partnership with AWS. The integrations we’re bringing to market have always been centered on unlocking AWS’s unprecedented scale and agility for our joint customers.