Operations | Monitoring | ITSM | DevOps | Cloud

March 2021

Collect Amazon CloudWatch metrics faster with Datadog using CloudWatch Metric Streams

Having quick access to metrics and health signals from your AWS environment is paramount to identifying issues expediently and monitoring the effects of any deployed fixes. Datadog is proud to partner with AWS for the launch of CloudWatch Metric Streams, a new feature that allows AWS users to forward metrics from key AWS services to different endpoints, including Datadog, via Amazon Kinesis Data Firehose with low latency.

ITOM vs ITSM

Is it important to your customer to understand the distinction between IT Operations Management (ITOM) and IT Service Management (ITSM)? Most likely not. Your customers are only concerned with how quickly you solve their problems. It makes no difference to them which application or infrastructure you use. So why should you, as an organization, be concerned about the meaning of these concepts, how they align, how they differ, and why they matter?

Exposing More Public Endpoints for Sending Metrics and Errors to AppSignal

Today, we launch a new feature: sending metrics and errors to AppSignal over our “Public Endpoint” API. AppSignal has many web frameworks, databases and background job frameworks automatically instrumented when you want to monitor a Node.js, Ruby or Elixir app. If you have code running on serverless architecture such as AWS Lambda, you can’t run our agent, so our standard integration with all of the out-of-the-box magic won’t work.

Resource check profile - Monitor Windows event logs and Linux syslogs

Monitor the internal resources in your server such as event logs and syslogs to monitor specific events across all Windows and Linux servers. Internet-facing systems constantly confront the risk of security hacks and data theft. While you're monitoring key performance metrics of your servers, keeping an eye out for security incidents is also necessary. This can be achieved through event log monitoring for Windows servers, and syslog monitoring for Linux servers.

Monitoring critical windows services and processes

Along with server performance metrics, such as CPU, disk, and memory usage, it is important to monitor the performance of each service and process running on the server to completely analyze the load on the system resources. This video shows how Site24x7 helps you achieve that. Say you're monitoring a Windows server with Site24x7. Along with tracking the performance metrics of the server, you can also track the performance of critical services like MySQL, Apache, and PostgreSQL, and processes like redis-server.exe.

Monitor dependency and alert suppression

A network outage triggers multiple redundant alerts and burns out your alert balance. Site24x7's monitor dependency configuration helps you effectively handle the alert flood during an outage. Let's say you're monitoring your server with Site24x7, along with a few plugins hosted in it. Any downtime faced by the server will also affect the plugins, resulting in a flood of alerts for the server and the individual plugins.

Comparing Real User Monitoring and Synthetic Transactions

Written by Nick Cavalancia, Microsoft Cloud & Datacenter MVP The need for visibility into service availability and delivery quality has led to the rise in interest in monitoring Microsoft’s Office 365 services from the user perspective. With two different approaches available, what value do they each bring?

How to Improve Core Web Vital Scores

From May 2021, Google is using ‘Core Web Vitals’ as a brand new ranking signal. Google states that business owners should monitor and improve their scores to avoid damaging their organic SEO. In this blog, we will explain how to improve Core Web Vitals scores. To discover the specific issues affecting your users’ experience, we strongly advise having a Core Web Vitals audit.

Intro to exemplars, which enable Grafana Tempo's distributed tracing at massive scale

Exemplars are a hot topic in observability recently, and for good reason. Similarly to how Prometheus disrupted the cost structure of storing metrics at scale beginning in 2012 and for real in 2015, and how Grafana Loki disrupted the cost structure of storing logs at scale in 2018, exemplars are doing the same to traces. To understand why, let’s look at both the history of observability in the cloud native ecosystem, and what optimizations exemplars enable.

Getting started with Dashboard Server

SquaredUp Dashboard Server lets you and your team create beautiful dashboards, for any tool or data, that you can share with everyone in your organization. Here’s a quick introduction to the product where we show you exactly what you can achieve, in no time at all. Let’s start with the three tabs on the top left of the screen: Getting Started, Next Steps, and Sample Dashboards. We’ll run through them one by one.

Integrating Logging into CI/CD

In my experience, pipeline monitoring and management is traditionally either left for the last developer who deployed, or unmonitored entirely. This lack of centralized monitoring and production-level resiliency can lead to significant development delays or even bring pipeline and train deliveries to a grinding halt. But we can do better.

SQL Server Problems? Try These Tips First

When SQL Server is having performance issues, giving you error messages, or just running slowly, you need to figure out the root cause of the problem before making any changes. Don’t jump to conclusions—make sure you get a big-picture view of the issue. Start with these five troubleshooting tips to optimize your SQL Server performance.

9 Things We Learned From the Digital Workspace Survey

The Digital Workspace Survey conducted by eG Innovations and xenappblog is one of the largest surveys of its kind, generating over 1,050 responses from IT professionals around the world. 62% of the respondents worked for organizations using digital workspaces for their employees. The remainder came from various types of organizations providing support for digital workspaces. Below are just a few highlights from the report which is packed with insights.

Monitoring custom metrics with Netreo

All too frequently system administrators and developers need to keep an eye on critical signals generated by their production systems. However, if these signals are highly specific in nature, tracking them can be tough. Most monitoring systems stick to tracking standard metrics emitted by servers hosting applications. But what happens if you need to monitor a number of specific customers that have signed up in the last hour and custom metrics?

How To Instantly Boost the ROI of Your Hybrid Cloud

Advertising tycoon David Ogilvy famously remarked, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” Replace the word “half” with “one-third” and “advertising” with “public cloud” and you’d describe what enterprises are grappling with right now. They know that not all of the cloud resources they’re paying for are being used, but they don’t know which ones those are.

Sumo Logic joins AWS to accelerate Amazon CloudWatch Metrics collection

We are excited to join AWS for the launch of Amazon CloudWatch Metric Streams; a fully managed, scalable, and low latency service that streams Amazon CloudWatch metrics to partners via Amazon Kinesis Data Firehose. AWS and Sumo Logic customers can now leverage AWS Kinesis Firehose for Metrics Source for streaming CloudWatch metrics into their Sumo Logic accounts, to help simplify the monitoring and troubleshooting of AWS infrastructure, services, and applications.

The new Google algorithm update and what you need to do before May

In May 2021, Google will be putting their new algorithm live which will have a direct impact on your page rankings in their search engine. Unfortunately, no one can hide from this new algorithm change, it will affect everyone who owns a website, especially those that don’t adhere to the new changes that are coming. Google’s update is to further improve user experience on the web.

On-boarding Remote Workers with Service Watch

It’s been more than half a year since I joined Exoprise. When Covid-19 struck last year, it became clear that companies would expand their hiring requirements beyond local regions and find suitable candidates (just like me!). Remote work and the requirement for on-boarding remote workers no longer became a luxury. This is reflected in job portals and HR sites such as Indeed and Glassdoor where they began to insert a new tag “Remote WFH Option Available”.

Upgrades to SCOM Made Easy with Easy Tune - Deep Dive

It can be hard to move tuning between Management Groups when implementing a side-by-side upgrade, especially when you have an old chaotic Management Group where overrides aren't always stored in logical places. In this video, we show you how Easy Tune Enterprise can help with this migration process by capturing effective overrides from wherever they are stored (as CSV files) and then using Easy Tune to apply these to your new SCOM Management Group.

Upgrades to SCOM Made Easy with Easy Tune - 7 Minute Guide

It can be hard to move tuning between Management Groups when implementing a side-by-side upgrade, especially when you have an old chaotic Management Group where overrides aren't always stored in logical places. In this video, we show you how Easy Tune Enterprise can help with this migration process by capturing effective overrides from wherever they are stored (as CSV files) and then using Easy Tune to apply these to your new SCOM Management Group.

The Untapped Power of Key Marketing Metrics

Marketing and Site Reliability teams rarely meet in most organizations. It’s especially rare outside the context of product marketing sessions or content creation. With observability now pivotal to success, we should be looking to bring the two together for technical and commercial gains. In this piece, we’re going to explore the meaning of observability and its relevance to marketing metrics.

Unlocking Hidden Business Observability with Holistic Data Collection

Why do organizations invest in observability? Because it adds value. Sometimes we forget this when we’re building our observability solutions. We get so excited about what we’re tracking that we can lose sight of why we’re tracking it. Technical metrics reveal how systems react to change. What they don’t give is a picture of how change impacts the broader business goals. The importance of qualitative data in business observability is often overlooked.

Explore Prometheus Metrics with Logz.io Infrastructure Monitoring

Metrics Explore is the Logz.io feature for deep dives into Prometheus metrics. Similar to Kibana Discover, it allows for easy querying, pull-down list selections, and other ways to navigate your data. Best yet, you can explore important metadata for detailed metric analysis. There are a few ways to move around the metrics in your system. Get started by finding the Explore icon on the left-hand menu.

Dashboard Server Product Preview

How would your IT team be transformed if you could dashboard anything, for free? Find out on 25th March at 2pm as we introduce you to SquaredUp Dashboard Server. Combining the best of SquaredUp with a powerful Powershell integration, Dashboard Server allows you to dashboard virtually any data, and is made for IT pros looking for a tool that’s quick to implement, easy to set up, and effortless to maintain.

Want to visualize software development insights with Grafana? With our new Jira Enterprise plugin, you can!

A very fun part of my job as a Solutions Engineer at Grafana Labs is getting to learn the ins and outs of a new feature or play with a plugin while it is still in development. So, when I heard murmurs that our latest Enterprise plugin would be an integration with Jira, I felt the forsaken call of the agile sirens luring me back to my days when I worked as a technical writer on a product team.

Network monitoring with Hosted Graphite

Network monitoring is the process of looking after your network with the help of various tools and techniques. These tools that are often called network monitoring systems constantly track various aspects concerning your network, such as bandwidth usage, traffic, etc. This tracking is important in case of outages as these systems notify the network administrator immediately. Moreover, network monitoring systems are essential for status updates so that you can improve your system’s efficiency.

Cisco Secure Application: from the engineers who built it

Today, protecting your digital business starts with the application. Discover why runtime application self protection solutions (RASP), like Cisco Secure Application, is a game changer for application and security teams. Learn how you can simplify vulnerability management, block attacks in real-time, and save time.

A Sanity Listicle for Mobile Developers

Just like that Mobile March Madness 2021 is almost in our rearview. Before we look to April, let’s recap some of our most notable mobile updates from this past month with a few tips on how to solve what matters faster and a sneak peek of what’s coming next. That’s right. We’re constantly improving our mobile monitoring solution independent of our alliteration-based marketing campaigns.

Leveraging APM to Conquer Custom App Management Troubles

Though you might use custom-made apps for many reasons, there’s only one reason you deploy them: off-the-shelf solutions just can’t get the job done. Whenever operations are too complex for a commercial application already in place or whenever a new kind of digital business is launched, a custom app is necessary. This is true no matter what the business model is.

What Is Topology?

Topology is a multilayered map showing how everything in the IT environment is related. It's similar to Google Maps, which gives you a bird's eye view into an area and how everything is interconnected. Also, in Google Maps, you can see how traffic is flowing and which intersections may be causing bottlenecks. A view into topology allows similar visibility. You can see how components of an IT system are laid out to interact with each other.

Leading with Observability: Key Considerations for Technology Leaders

By 2022, Gartner estimates that more than 3 out of 4 global organizations will be running containerized applications in production. With this comes a new set of monitoring challenges — ephemeral, short-lived infrastructure, complex service interdependencies and on-call developers who now need access to data for fast troubleshooting, just to name a few.

Monitor Fastly performance with Datadog

Fastly is an edge cloud platform that includes a content delivery network (CDN), as well as services for image optimization, video streaming, cloud security, and load balancing. These services are supported by a network of caches in different locations, which enables enterprise-scale companies to deliver applications to users as quickly as possible, even in times of peak traffic.

How to Measure Core Web Vitals

Core Web Vitals are a new set of performance metrics that will become part of Google’s ranking algorithm from May 2021. In this blog, we explain how to measure Core Web Vitals scores. There are many ways website owners can find out their Core Web Vitals scores. This includes: PageSpeed Insights, Search Console, Lighthouse, Chrome DevTools, Chrome UX Report, and the Web Vitals Extension.

Why Real-Time Monitoring is So Important

No one can deny the importance of a proper monitoring system for the effective management of IT infrastructure. You need the most efficient solutions and monitoring tools to optimize performance, make the most out of your resources, and be able to deal with errors and failure conditions. While the traditional way of IT monitoring involves the use of reports, it does have certain limitations.

How we're graduating Grafana Agent experiments into the official Prometheus project

We’ve been experimenting with new ways to use and operate Prometheus over the past year. Every successful Grafana Agent experiment turns into an upstream contribution for the whole Prometheus community to benefit from. In this blog post, I go over the history of the Agent’s successful — and not so successful — experiments.

Build an Application-Aware Infrastructure with Cisco Intersight Workload Optimizer

For many organizations, applications are the business. But not all companies are aware just how critical an optimal infrastructure is to application performance. AppDynamics + Cisco Intersight Workload Optimizer supports intelligent, highly collaborative problem-solving between AppOps and InfraOps teams to build a truly application-supportive infrastructure for the modern enterprise.

How Eliminating Network Choke Points Can Help the DoD Plan for the Next Wave of the Pandemic

At the start of the COVID-19 pandemic, military IT leaders raced to expand network capacity and upgrade infrastructure to ensure it could meet their mission-critical workloads and support telework. Their successful efforts have established a proof of their own abilities to adapt and scale their networks with speed and agility. As more DoD personnel telework and government and home networks are pushed to the max, here are three things they must consider as plans for the impact on IT systems.

Improve Your DevOps Strategy Through Platform Ops

Organizations looking to scale DevOps implementations, improve their DevOps strategy, and deliver production code fast and reliably should take note of Platform Ops. Platform Ops will reshape the way we deliver value to the customer by offering an internal marketplace of self-service capabilities to many different internal business consumers. Platform Ops is an implementation of broader DevOps strategy, philosophies, and principles.

10 Mistakes to Avoid When Sizing Cloud Resources

One of the most common concerns when moving to the cloud is cost. Given that cloud allows you to turn IT costs from CAPEX (long-term investments ex. in hardware equipment and software licenses) into OPEX (day-to-day operating expenses), it’s crucial to choose the right service and estimate it properly. In this article, we’ll look at the common pitfalls and discuss how you can avoid them to truly benefit from the cloud’s elasticity.

How to Optimize Server Performance

Optimizing server performance is important in supporting end-user requirements. Using server optimization, actively monitor: Web server monitoring and optimization helps you to troubleshoot bottlenecks as they emerge and optimize server performance. In this post, we will discuss how to optimize performance and why it is important.

How Tanzu Observability Continuous Improvement Makes You More Successful

Did you know that the VMware Tanzu Observability by Wavefront engineering team not only listens to input from customers, but acts on that input to improve the overall Tanzu Observability experience? In fact, as a direct result of customer surveys, the Wavefront engineering team recently completed 30 days of improvement focused on query quality. I will quickly run through those improvements in this post.

What Are Microservice Architectures?

In this article, we are going to look at Microservice architectures, their benefits, what makes them different from traditional monolithic architectures, and how to go about setting up monitoring and alerting for them. MetricFire is a Hosted Graphite, Grafana, and Prometheus service, where we help you set up and manage these open-source tools. If you would like to follow the steps in this blog, make sure to sign up for MetricFire's free trial and even book a demo session.

IT Spring Cleaning: Making the best of the current situation

Spring is just around the corner. And since we're at home a lot right now due to the coronavirus pandemic, it's all the more worthwhile to take some time for spring cleaning. But it's not just in our own four walls that the winter grumpiness should disappear; the IT landscape is also in need of a digital spring cleaning.

Runbooks: What They Are and Why You Need One Yesterday

Let’s talk about The Legend of Zelda: A Link to the Past, and how it relates to DevOps. The game tasks our hero with finding three pendants, which unlock a Master Sword he can use to travel to an alternate realm and ultimately take down the bad guy. The US version of this SNES masterpiece came packaged with a fairly detailed instruction manual that contained an optional guide at the end to help locate the three pendants.

How to Increase Network Visibility with a Second Monitoring Session | Obkio

In today's tutorial, you will learn how to add a 2nd Public Monitoring Agent in Obkio’s app and how to create a 2nd Monitoring Session. When looking at the results of a Network Session, it tells us what the performance is between 2 points on the network. For example, let’s say you have a 1st Agent installed in your network and a 1st Public Monitoring Agent installed in Azure and you want to add a second 2nd remote Public Monitoring Agent

Auvik and TruMethods: Operational Excellence: Learn How to Be a World Class IT Service Provider

As your MSP evolves, it is critical to ensure profitability. Join Auvik’s Partner Success Manager Chrissy Little and TruMethods President Gary Pica as they discuss a simple framework to help. Topics covered in this session include an exercise for uncovering unrealized profitability, five metrics you can track for accountability, three changes to make to your service delivery, and more! If you want to save time managing networks and get deeper visibility into the key metrics, you’ll want to try Auvik’s network monitoring and management software.

Best 7 Automated Network Diagram Software + Guide

For your business’s network to thrive, you have to really know its details and intricacies. One of the best ways to understand your network is to make a network diagram, which is a visual representation of your network’s devices and how they connect to one another. You can manually create your own network diagram, which can help visualize complex networks. However, it can be difficult and time-consuming to build a coherent, accurate network diagram on your own.

5 Key Capabilities That Your VMware Horizon Monitoring Solution Must Have

The success of any digital workspace technology depends on the user experience and VMware Horizon is no exception. There are many aspects of user experience. When a user first connects, the logon must happen quickly. When the user launches a client application (e.g., Microsoft Outlook, Google Chrome, etc.), the application must be available for user interactions within seconds.

MSPs Evolve with AIOps

AIOps is fast changing from a technology that was viewed with skepticism to an industry-changing innovation responding to the challenges of managing multifaceted, hybrid IT environments. Recently, our partner Pinnacle Technology Partners (PTP) hosted a panel discussion entitled: “Improving IT Management & Automation with AIOps,” led by Gary Derheim, VP of Managed Services & Marketing at PTP who interviewed executives and technical experts from PTP and OpsRamp.

What Is a Network Diagram?

Network diagrams are not only handy to have, but provide a vital look at the network topology for your team, your company, and your peace of mind. Let’s look at what a network diagram is, and why it’s so important. A network diagram is, simply put, a schematic or map of your existing network that illustrates the nodes and their connections. Network diagrams are very useful at mapping out your elements and device interactions, as well as illustrating different network topology types.

What is Application Discovery and Dependency Mapping

Modern applications are interdependent on different devices and servers. If you are not familiar with your software structure, you will encounter problems when you make changes within your application. To prevent this, you need to observe application discovery and dependency mapping (ADDM).

Resource Roundup: Getting Started with InfluxDB Cloud on Google Cloud

Are you looking to get started with InfluxDB on Google Cloud? We’ve pulled together our top resources to help you get the most out of your time series data whether that’s coming from your Google Cloud infrastructure or your own application. Read how customers like Wayfair and Vera C. Rubin Observatory use InfluxDB on Google Cloud to solve their time series data collection and processing challenges to power their multifaceted, complex real-world use cases.

Featured Post

The End of Automation Anxiety

If there's one thing 2020 has highlighted, it's that the world is constantly changing. For most industries-IT included-one constant spans the ages: a lingering concern about automation. The very thing that helped pave the foundations for the industrial revolution and the world we know today has been a concern ever since, with the worry machines will take our jobs in one way or another. In 2021, we're likely to see the IT industry address this anxiety, embracing automation software solutions in a smarter way and freeing up tech professionals to prioritize the most important jobs.

What is ITOM and why is it important to ITSM?

Businesses of today are faced with an extensive list of technical solutions, both on-prem and in the cloud, with which to manage their business services and systems. As the complexity of IT infrastructures increases so does a business’s ability to maintain service levels. If new systems and processes are not aligned and synchronized across a business, it can result in a lack of visibility, which can negatively impact critical services.

How to Optimize Server and VM Performance with VirtualMetric

The key to maximizing revenues and ensuring foolproof security is to actively monitor infrastructure. With virtualization growing increasingly, virtual machine or VM performance monitoring is one of the most crucial parts of infrastructure monitoring. This is because as opposed to a physical server or machine, detecting problems in a VM can be even more challenging.

How To Monitor Network Performance for Microsoft Teams, Office & Azure

VoIP and Unified Communication apps, such as Microsoft Teams, are more sensitive to network performance than other applications. So whether you’re working from an office, or your own home, you want to make sure your VoIP and UC apps are performing as they should be. Whether you’re an IT pro or not, you can easily monitor Microsoft network performance and quickly find network problems affecting Microsoft apps.

Dashboard Server Learning Path

Hey everyone! It’s been a while since we've gone on a learning journey together, after the Azure Monitor Learning Path. I’m thankful for the love that series received, and since then, I've been looking for another opportunity to take on something new and share my learning experience with you all. Finally, the time has come!

Coralogix - On Demand Webinar - 2021 Troubleshooting Best Practices

When it comes to troubleshooting, the majority of time spent is usually on finding the issue rather than fixing it. To change this, it’s not enough to store a few metrics - you need to also store context. In this on-demand webinar, we’ll explain the techniques for creating a powerful observability stack, that will not only tell you what is broken, but why it has broken.

Grafana 7.5 released: Loki alerting and label browser for logs, next-generation pie chart, and more!

Grafana v7.5 has been released! This is the last stable release before we launch Grafana 8.0 at GrafanaCONline in June. Register for free now, so you won’t miss the great sessions we’re planning around all things Grafana. And if you’re doing something special with Grafana that you’d like to share with the community, the CFP for GrafanaCONline is open until 06:59 UTC on April 10! Now, back to 7.5.

Using Sentry Performance To Make Sentry Performant

Like many companies, Sentry uses feature flags to determine when certain users see certain features. Recently, we decided to switch our feature flag software to an open-source system called Flagr. And while implementing Flagr, we used our new Performance tools to find — and ultimately fix — a serious issue in how we were fetching our flags.

SaaS vs. PaaS vs. IaaS: What's the Difference?

Every service you acquire to use temporarily and let go of when you’re done can use the “as a service” suffix. A cab is a vehicle as a service. Rather than buying a car, you merely pay to have another person’s car move you from one point to the next. IT services embody the same model. Browser-based text editors alleviate the need to install fully-fledged word processors on computers.

How to Achieve Unified IT Observability Amidst the Global Pandemic

The concept of unified IT observability has gained newfound importance in today’s world of remote work, hybrid infrastructures, and technological convergence. In this video, Christina Kosmowski, President at LogicMonitor, shares five tips on how to power your business forward and achieve unified IT observability during uncertain times.

How to: Pingdom super powered status page

Pingdom is one of the most used website monitoring tools, with almost 15 years in the business. It excels at providing simple and reliable synthetics as well as real user monitoring. This monitoring tool provides a simple public status page, but as you might have noticed it’s quite limited. It only serves as a display of your uptime and response time history, not much more than that.

Debugging with Dashbird: Lambda Configuration Error

It shouldn’t be a surprise that Lambda configuration error is one of the most common error messages, and we all know AWS error messages aren’t known for being especially detailed. Oftentimes you will come across other vague error messages like “encoding not enabled,” or “stream is failing,” and depending on the context, this could mean your services could be completely down.

Stop Onboarding Like It's 2019

First impressions matter. The first day of a new job can be daunting, especially in a post-2020 world. Back in 2019, you might have asked yourself questions like: is this office going to be open-concept or a bunch of cubicles? Will I get to sit near a window with some natural light or are they going to stick me next to the bathroom? Now new hires ponder questions like: what will my digital work environment look like? Will IT send me a new laptop, camera, and speakers?

IT Support Fatigue is Real (Here's How to Solve It)

Have you ever seen one of those old war movies where some grizzled veteran is whispered to have the fabled ‘1,000-yard stare’? Well, you need to keep an eye for it among IT teams too. IT Support Fatigue was already a very real problem before the pandemic, and has only become more severe in the months since. It’s an issue we explored in the Nexthink Pulse Report.

Which Event Log Events Should You Worry About?

When you are configuring your event log monitor settings, you need to decide which event log events you need to worry about. Event logs are generated for a wide array of processes, applications, and events. Logs will record both successes and failures. As such, you need to decide what data is most vital and needs your immediate attention.

Observability vs. Monitoring: Analysis of the Divide

There is an idea of the relationship between observability and monitoring, that they complement each other in an inseparable way. While true that you can only monitor a system that is observable, the line dividing observability and monitoring grows narrower with every deployment you make; making these two practices less of a pairing and more a single entity.

Why We Chose the M3DB Data Store for Logz.io Prometheus-as-a-Service

Logz.io is focused on creating the best observability service to manage the scale of monitoring, add value on top of AI/ML technologies, and enhance enterprise security. Metrics is one of the pillars of Logz.io, and our Prometheus-as-a-Service offering. It has been a crucial part of our platform goals, but if we turn the clocks back a year, our service only used the open-source Elasticsearch database (ES).

Finding the Bug in the Haystack: Hunting Down Exceptions in Production

Software companies are in a constant pursuit to optimize their delivery flow and increase release velocity. But as they get better at CI/CD in the spirit of “move fast and break things,” they are also being forced to have a very sobering conversation about “how do we fix all those things we’ve been breaking so fast?” As a result, today’s cloud-native world is fraught with production errors, and in dire need of observability.

2021: The year of Cortex for IoT?

My Grafana Labs colleague RichiH recently talked about why IoT and time series databases work so well together. It just so happens that we have a highly scalable time series database on hand. Let’s talk about that. My name is Goutham, and I am a maintainer for Cortex. I have been working on it for nearly three years out of the four-and-a-half years the project has existed. Cortex is built to serve as a scalable, long-term store for Prometheus.

Container deployment showdown: Docker or Kubernetes?

Monitoring the current state and performance of applications is critical for IT Ops and DevOps teams alike. Understanding the health of an application is one of the most effective ways of anticipating potential bottlenecks or slowdowns, yet it’s one of the largest challenges faced by many organizations that build and deploy software. This is largely due to applications’ distributed and diversified nature.

Looking Back as we Move Forward: A Pandemic Journey - Part 2

Over a year after COVID-19 was declared a global pandemic, the hope of speaking about it in the past tense is something we all still hold on to. Not only are we still being challenged by it in the present, but it has changed the way we think and do many things. However, just because something has become normalized over time (out of necessity) doesn’t mean that everyone has adjusted without incident.

DIY Guide to Building a SCOM Connector Webinar

Find out how to use PowerShell and REST APIs to integrate SCOM alerts into ServiceNow. In this webinar, we'll take you through building your own SCOM connector for ServiceNow using Powershell to send SCOM alerts to the Incident table. This DIY guide illustrates how to get alerts out of SCOM to your cloud platforms, therefore you can apply these principles to any similar system.

Creating Custom Event Views in SQL Sentry

If you’re using SQL Sentry regularly, there’s a great event management feature that provides a lot of value for our advanced users. I often find the SQL Sentry Event Calendar isn’t being used as often as it once was. The Event Calendar lets you view historical and future events, drill down into event failures, and reschedule jobs using drag and drop all from within the SQL Sentry desktop client. In addition, you can create custom views of events you need to reference frequently.

How to monitor containerized Kafka with Elastic Observability

Kafka is a distributed, highly available event streaming platform which can be run on bare metal, virtualized, containerized, or as a managed service. At its heart, Kafka is a publish/subscribe (or pub/sub) system, which provides a "broker" to dole out events. Publishers post events to topics, and consumers subscribe to topics. When a new event is sent to a topic, consumers that subscribe to the topic will receive a new event notification.

Coffee Break Webinar Series: Intelligent Observability for IT Ops

IT Operations teams are often the bedrock of the digital business, ensuring that processes and services continue humming smoothly as developers continue to evolve and increase customer value. But increasingly complex systems can flood them with alerts that get in the way of operators from doing their best work and paving the way for new, innovative services.

New in Telegraf 1.18.0: Beat, Directory, NFS, XML, Sensu, SignalFX and More!

Last week we released Telegraf 1.18 with a range of new plugins including Elastic Beats, directory monitoring, NFS, XML parsing and some aggregators and processors to help with your data ingestion. All of these packages were written in Go 1.16.2. This was one of our largest releases in a while and couldn’t have been done without the 70+ Telegraf community members who contributed to writing plugins, fixing bugs, reviewing code, and everything else to improve Telegraf!

Core Web Vitals: Google's Upcoming Ranking Changes

From 1st May 2021, Google will start judging your website based on a new set of performance metrics called Core Web Vitals. This initiative is focused around user experience and will become part of Google’s ranking algorithm. Failing them is bad, getting them right is a huge opportunity. Only 23% of businesses currently pass these tests consistently, so organic SEO will be negatively impacted for the large majority of companies.

10 Best Tools for Monitoring Wireless Access Points

Monitoring wireless access points is business-critical, allowing teams to facilitate day-to-day workflows. These wireless access points (APs or WAPs) provide strong Wi-Fi signals and transmission ranges for sending and receiving valued information. But wireless access points are susceptible to cyberattacks that jeopardize your revenue and reputation.

How to Monitor Network Traffic: Best Tips for IT Pros

What if I told you that monitoring network traffic is a lot less daunting than it actually seems? I think I have your attention. As a network has various moving parts, understanding it all can be a serious challenge. If something breaks or a component stops working correctly, implementing a quick troubleshooting process is essential. If you can't fix the issue right away, this could be detrimental for your end users, leading to more severe complications.

Cisco Network Monitoring: 6 Best Practices

It's often said that your network is the "backbone" of your IT infrastructure, underlying every other part of your enterprise IT. If your Cisco network infrastructure goes down or is experiencing performance issues, it's crucial that you have a real-time solution to identify and resolve the problem as soon as possible. But what does such a solution look like when it comes to Cisco networks?

How I fell in love with logs thanks to Grafana Loki

As part of my job as a Senior Solutions Engineer here at Grafana Labs, I tend to pretty easily find ways out of technical troubles. However, I was recently having some Wi-Fi issues at home and needed to do some troubleshooting. My experience changed my whole opinion on logs, and I wanted to share my story in hopes that I could open up some other people’s eyes as well. (I originally posted a version of this story on my personal blog in January.) First, some background info.

High throughput VM logging and metrics agent now in Preview

Running and troubleshooting production services requires deep visibility into your applications and infrastructure. Virtual machines running on Google Compute Engine (GCE) provide some system logs and metrics without any configuration required, but capturing application and advanced system data has required the installation of both a metrics agent and a logging agent.

Announcing the Rollbar Terraform Provider For Managing Rollbar Automatically

It can be really exciting when your development team is growing fast! But then you soon realize that managing all the developer tools to constantly create new projects or add users is becoming a full-time job. Well, not anymore. At least, not for Rollbar. We’re releasing our HashiCorp Terraform Verified Provider for Rollbar today, built in partnership with HashiCorp.

How to Prove User Behavior is Root Cause of Citrix Latency Using ICA/HDX Channels

In my recent article Understanding Citrix Latency Metrics to Troubleshoot Remote Worker Issues, I explained some causes to “Citrix is slow” complaints and delved into the latency metrics that are available to assist with troubleshooting.

CDN Monitoring Is More Than Just Performance, It's About Hard Dollars

On one of our monthly check-in calls, the Director of Infrastructure and Operations at one of our largest customers, was telling me how the one big problem he was still trying to solve is optimizing their OPEX or at least make it more predictable. He cited a popular Content Delivery Network (CDN) that they spend millions with as an example. The company had launched a new service and he spoke about how the CDN cost had quadrupled.

BMC Remedyforce Plugin for Pandora FMS

There are more than eight hundred pages of documentation for Pandora FMS. The science – and art, I think – of monitoring is very extensive. The needs of a large company are different from those of a medium or small organization. But even two large companies are not the same and their needs may be totally different.

Uniting Tracing and Logs With OpenTelemetry Span Events

The current landscape of what our customers are dealing with in monitoring and observability can be a bit of a mess. For one thing, there are varying expectations and implementations when it comes to observability data. For another, most customers have to lean on a hodgepodge of tools that might blend open source and proprietary, require extensive onboarding as team members have to learn which tools are used for what, and have a steep learning curve in general.

Elastic 7.12 released: General availability of schema on read, technical preview of the frozen tier, and support for autoscaling

We are pleased to announce the general availability (GA) of Elastic 7.12. This release brings a broad set of new capabilities to our Elastic Enterprise Search, Observability, and Security solutions, which are built into the Elastic Stack — Elasticsearch and Kibana.

Elastic APM PHP Agent 1.0 released

We are proud to announce the 1.0 release of the Elastic APM PHP Agent! If you are interested in this work, please try the agent and let us know how it works for you and what features you miss! The best way to give feedback and ask questions is in our discussion forum, or if you find an issue or you would like to submit a pull request, jump to our GitHub repository. The agent is Apache-licensed, and we are more than happy to receive contributions from the community!

Anodot vs. Datadog: The Breakdown

We are often asked what’s the difference between Anodot and Datadog. Since both platforms monitor data at scale, using machine learning to detect anomalies and incidents, the differentiation might be unclear. So we’re using the real estate here to quickly clarify what each platform is built for, and why – despite some overlaps in features – these are two fundamentally different creatures.

A Day in the Life: Intelligent Observability at Work with a Super SRE

After we’d fixed Aparna’s network issue, James came to see me at my desk. Masks on, socially distanced and all that, but it was nice to have some face-to-face time. James is cool – that dry British humor and not your classic IT Ops dude. He’s been here forever and mentored me when the CIO, Charlie, hired me as the first SRE here a year or so ago. I lucked out really.

AWS CloudWatch alerts vs. Dashbird alerts

In the 21st century, it’s quite easy to manipulate machines and computers. Our worries are no longer if something is doable, but if something can be perfected. Therefore, we mostly search for new ideas and ways to make our work impeccable. For example, if you’re using a particular software and you realize that the software is excellent, but it could be better in some ways that would allow you to work even faster, you’ll explore the alternatives.

What Is the OSI Model?

As an IT professional, chances are you’ve come across the phrase Please Do Not Throw Sausage Pizza Away while hearing about protocols, network design, and implementation issues. How about Or Please Do Not Touch Steve’s Pet Alligator? If these ring a bell then you’re on the right track: These are smart memory aids linked to the seven layers of the Open System Interconnection (OSI) model.

Introduction to Giraffe

Giraffe is InfluxData’s graphing library, built to use and graph the data coming from InfluxData’s time series database, InfluxDB. Yes, there are other graphing libraries available; but ours is the only one purpose-built to graph line protocol without having to convert it. Plus, we have lots of great features, like legends and colorization, without much configuration. So, how to get started?

Monitor Microsoft 365 with RapDev's integration in the Datadog Marketplace

Microsoft 365, formerly known as Office 365, is a suite of cloud-based productivity and communication services that is used by more than one million companies worldwide. The applications included in the suite are critical to the daily workflows of subscribers and therefore require careful monitoring in order to minimize the effects of downtime and ensure optimal usage.

Introducing Atatus Log Monitoring

Log Monitoring is a crucial step in ensuring to know what’s happening in all your servers from a single location. Did you know Log Monitoring tools are implemented by the strategy called “defense-in-depth”? Boom!!! That’s where the log monitoring concept developed, and now we have many log monitoring tools in the market. Issues that users face in the log monitoring tool: We considered all the above points while we designed our tool.

What's new in Grafana Cloud for March 2021: improvements to alerting, synthetic monitoring, and more

As the product manager for Grafana Cloud, I am constantly following the progress of all the new features that our engineering teams are working on, from early ideation to release. We’re always excited to share updates with the community, so you can all try them out and let us know what you think. So each month, I’ll be rounding up the latest Grafana Cloud features and improvements on the blog.

Tutorial | How to Set Up LogDNA Ingestion Source

Centralize your logs from any source in LogDNA so that you can monitor and troubleshoot your systems and applications in a single UI. In this video, I’ll show you how to add an ingestion source. We support multiple ingestion sources, which you can learn about in our documentation portal below. In this video, we’ll show you how to ingest Kubernetes logs using the LogDNA Agent.

Tutorial | How to Custom Parsing with LogDNA

LogDNA automatically parses common log types so that you can easily view and search through them. If you have logs that aren't in a format we automatically parse, you can create a custom parsing template so they'll be parsed as well, allowing you to use them in views, alerts, boards, and graphs. In this video, we will show you how to use Custom Parsing templates for a log that we don't automatically parse, such as one from an internal application.

Tutorial | How to use LogDNA Screens

Use LogDNA Screens to display daily log activity from all of your systems or select systems. Use time-shifted graphs to aggregate data from the previous week to compare activity levels in your current week. Our screens let you create an easy-to-read dashboard containing widgets that convey metrics from your logs. These include graphs, gauges, tables, and time-shifted graphs. In this video, we'll create a screen with widgets that provide different views of your webserver's 404s.

Aggregating Application Logs From EKS on Fargate

Today we’re going to talk about logging with Kubernetes on AWS using CloudWatch and SolarWinds® Papertrail™. We’ll cover setting up Papertrail, installing and configuring the rKubeLog package, viewing the logs in the Papertrail event viewer, and cross-checking those logs with the ones we see with kubectl. From there, we’ll set up a few different alerts.

What Does Server Monitoring Mean in 2021? A Look at Modern Server Options and How to Monitor Them

A few years ago, monitoring was simple. We had all our servers somewhere in the data center; we just had to install a monitoring tool and gather all the data from every server. Things changed when we started moving to the cloud and then to containers. Today, we often need to monitor a variety of different sources. A “server” can be a physical machine, virtual machine, container, or even a serverless application. Therefore, our approach for monitoring needs to change.

2021 Hybrid Cloud Predictions: CEO Perspective

COVID-19 certainly accelerated some trends. For example, in my view, COVID-19 accelerated the pace and progress of digital transformation by five to 10 years, as companies faced the need to adapt to a post-COVID-19 world, involving permanently the higher adoption of remote work, remote education, and digital touch-points. Technologies requiring less human touch and more digital touch-points will be adopted faster in the aftermath of COVID-19.

How to handle monitoring as a solo founder

When you're a single person running a system with thousands of users or more, it can be pretty daunting to think about going on holiday, or even relaxing for a weekend. "What if it goes down, and I'm not there to fix it?!" you ask yourself. While you can never really guarantee that nothing will go wrong, you can take some steps to minimise your risk of things going wrong.

SOS - Don't Let a Microsoft Outage Drag You Down

You are at your desk, when all of a sudden there seems to be a hum that is growing louder, your heart starts to pound and you quickly realize that you might be in the midst of another Microsoft outage. You know that right now, Microsoft is your organization‘s core workforce engine and the backbone to ensure productivity - any outage or decrease in service quality can cause widespread productivity declines.

Now is the time for Sumo!

Sumo Logic transforms an overwhelming volume of data generated from digital services into valuable insights. With Sumo Logic, customers improve how they monitor and troubleshoot applications and infrastructure, manage audit and compliance requirements, detect and resolve security threats, and extract critical, key business indicators to gain insights into customer behavior and engagement.  

Top VMware Horizon Performance Challenges for VM Administrators

Over the last year, most organizations have deployed or expanded digital workspaces to support employees working from home. One of the most popular technologies for digital workspaces is VMware Horizon. In this blog, we discuss the top performance challenges that administrators of VMware Horizon deployments have to tackle.

How To - Improve Your Secure Web Gateway Rollout With Endpoints

The principles of network security didn’t change overnight, but our abrupt transition to a remote workforce dismantled traditional concepts of how we secure our network. Offices were designed for people to access their resources on-site or through a few well-defined locations; security took the form of inline firewalls, web gateways, and VPNs that were routed through a datacenter or other resource hub.

7 Best Network Infrastructure Mapping Tools

Network infrastructure mapping tools that help you maintain up-to-date maps can be the difference between quickly diagnosing a problem or wasting hours of your time. Similarly, network maps can help everyone in your organization better understand and optimize a network you inherit. In many cases, however, network maps contain stale information or don’t exist at all.

Performance Monitoring for Android Applications

Android is arguably the most ubiquitous operating system in the world. Whether it’s a tablet, phone, folding phone, computer, TV, or IoT device, chances are you’ve interacted with Android OS. And to help developers get full visibility into how their customers experience Android’s myriad applications, we’re extending Performance to Android.

Visualize your DevOps data for free

We recently launched the public preview of a new tool that lets anyone dashboard anything – for free. Meet Dashboard Server! You can download it free because we wanted to get our incredible dashboards in the hands of more people. The SCOM community know us well but our dashboards are perfect for so many use cases beyond System Center monitoring. And if you love them as much as we think you will, all we ask in return is that you share that love by spreading the word.

Logz.io Infrastructure Monitoring: Building Visualizations in Dashboards

In a previous post I explained how to send metrics to Logz.io Infrastructure monitoring with Prometheus—now let’s analyze them by building Prometheus dashboards and visualizations in our metrics UI! Once you’ve started to send metric data to Logz.io, how do you visualize and interpret that data so that it’s useful for you? Logz.io Infrastructure Monitoring provides powerful querying and visualization of your data.

VirtualMetric Webinar Cloud Native Applications on VMware & Kubernetes

In this webinar, Yusuf Ozturk hosted Lino Telera to speak about Cloud-Native Applications on VMware & Kubernetes. You will learn about Kubernetes, the ways to deploy it, get CI/CD Pipeline example and even see a Live Demo. Speaker: Lino Telera, Cloud Architect at InfoCert S.p.A and Blogger at blog.linoproject.net, Cloud-Native Coach, 7 x vExpert and VMUG leader. What does it mean to build and deploy Cloud-Native Application today? Introduction to Kubernetes.

Monitoring Windows Event Logs - Getting Started

Windows event logs are important for security, troubleshooting, and compliance. When you analyze your logs, you can monitor and report on file access, network connections, unauthorized activity, error messages, and unusual network and system behavior. However, Windows servers produce tens of thousands of log entries every day.

Secure by Design | Securing the Software Development Build Environment

The recent SUNBURST cyberattack on the SolarWinds software build environment is a concerning new reality for the software industry, representing the increasingly sophisticated actions by outside nation-states on the supply chains and infrastructure on which we all rely. SolarWinds is committed to sharing our learnings about this attack broadly given the common development practices in the industry and our belief that transparency and cooperation are our best tools to help prevent and protect against future attacks.

The Relationship Between Observability vs. Monitoring

Monitoring has always been a crucial operation in a software development cycle. This is mainly because of the complexity of industry-level IT and consumer-facing product development. Additionally, there is an ever-growing demand for rapid upgrades in products. To meet these requirements, streamlined performance and stability have become more important than ever; and without effective monitoring practices, they appear difficult to achieve.

Business Service Monitoring (BSM) with GroundWork Monitor

We all want our monitoring systems to alert us when things go wrong. While it’s important to get alerts in the event of a failure or latency problem on something specific such as a SQL database, it’s actually just as important to not receive alerts from too many specific sources in the same alerting channel. If our monitoring system starts to fatigue us, we will ignore alerts until the phone calls and Emails from end users start letting us know a service is impaired or unavailable.

How to Understand Log Levels

More than once, I’ve heard experienced software developers say that there are only two reasons to log: either you log Information or you log an Error. The implication here is that either you want to record something that happened or you want to be able to react to something that went wrong. In this article, we’ll take a closer look at logging and explore the fact that log levels are more than just black or red rows in your main logging system.

Effective Technical Writing Is Essential for Your Organization's Success

"No one likes documentation", chirped the little blue birds. The bird quotes in the image above are all taken from real tweets and are listed below for accessibility: I can't argue with "liking to write documentation" being a rare skill, since so many people are vocal about disliking it. That tweet might prove to be true, in which case, technical writers should simply be more appreciated for the rare gems that they are. In this blog post, however, I’m going to explain.

7 Reasons Why You Should Consider a Data Lake

With the volume, velocity, and variety of today’s data, we have all started to acknowledge that there is no one-size-fits-all database for all data needs. Instead, many companies shifted towards choosing the right data store for a specific use case or project. The distribution of data across different data stores brought the challenge of consolidating data for analytics.

Microservices vs. Serverless Architecture

Microservices and serverless are both important topics in the world of cloud-native computing. Yet, although serverless functions and microservices architectures often go hand-in-hand, they’re distinct technologies that fill different roles in modern software environments. Here’s an overview of what microservices and serverless are, how they relate to each other, how they are different, and why you may or may not wish to deploy a serverless microservice.

Top Java Software Errors: 50 Common Java Errors and How to Avoid Them

Imagine, you are developing Java software and suddenly you encounter an error? Where could you have possibly gone wrong? There are many types of errors that you will encounter while developing Java software, but most are avoidable. Some errors are minor lapses when writing codes but that is very much mendable. If you have an error monitoring tool such as Stackify Retrace, you can write codes with ease.

Globally used DNS provider Network Solutions suffers an outage

On Wednesday 17th March, globally used DNS provider Network Solutions experienced the dreaded downtime we all hope to avoid. Starting at 4am Eastern time and continuing through to Thursday 18th with still no resolution, it’s affected thousands of people across the world. Network Solutions are the 4th biggest domain registrar in the world, with nearly 7 million users worldwide.

How to Configure PA Server Monitor to Monitor Your Event Logs

Did you know that you could configure PA Server Monitor’s Event Log Monitor feature to monitor one or more of your event logs? The event logs can include standard application, security, and system logs, as well as any custom event logs you want to monitor. With our server monitoring software, you have complete control and flexibility over the types of events you want to monitor.

Healthwatch for VMware Tanzu 2.1 Offers Breakthrough Platform Monitoring

Keeping your distributed systems running smoothly has never been easy. To that end, Healthwatch for VMware Tanzu created an “out of the box” option for tracking the health of your app platform. The module proved to be a big upgrade from homegrown monitoring toolchains. Platform teams have since come to rely on Healthwatch’s curated indicators, alerts, and visualizations.

Monitor .NET runtime metrics with Datadog

If you are a .NET developer, monitoring runtime metrics can help you troubleshoot bugs and detect resource inefficiencies in your applications. With Datadog, you can easily collect, visualize, and alert on key .NET runtime metrics, including exceptions, garbage collection statistics, thread count, and more. We have fully integrated .NET runtime metrics into Datadog APM so that you can easily view them alongside your distributed traces, logs, and other telemetry.

Announcing the All-new Kamon APM Service Map

Kamon APM wakes up today with a new home page for all your observability data: the all-new Kamon APM Service Map! The Service Map shows a real-time representation of all your microservices and the dependencies between them, combined with health status and easy access to the most important bits of data from your entire infrastructure.

The Easiest Way to Monitor Ruby: Automatic Instrumentation

Setting up a proper monitoring overview over your application’s performance is a complex task. Normally, you’d first need to figure out what you need to monitor, then instrument your code, and finally make sense of all the data that has been emitted. However, with a few things set in place, and an APM that natively supports Ruby, it’s easier than ever to take this step. In this post, we’ll show you how you can do it too.

Infrastructure Monitoring Tutorial: Getting Started Sending Prometheus Metrics

This Logz.io Infrastructure Monitoring tutorial will cover how to get started with our latest product, our new Prometheus-as-a-Service metrics solution that’s based on Prometheus. Engineers monitor metrics to understand CPU and memory utilization for infrastructure, duration and serverless execution, or for network traffic. For more advanced metrics monitoring operations, teams can send custom metrics to monitor signals like the number of active users.

Why implementing Grafana Enterprise was a bright idea for U.K. energy supplier Utilita

Energy efficiency is a term often used to describe appliances or light bulbs. But when it comes to business, making sure your sales team — and customers — are being efficient with their personal energy is a big key to success. That was a major lesson learned by Utilita, the U.K.’s first and only specialist Smart Pay As You Go Energy supplier.

How to Build a Monitoring Application in Less Than 10 Minutes

This talk shows how to use Tasks, Flux, dashboards and monitoring and alerting in InfluxDB Cloud to create an external service or website monitor. This video is a simple example for everyone to use as a template for their own custom monitoring applications built on top of InfluxDB Cloud.

DevSecOps is a Practice. Make it visible.

While DevSecOps feels like just another industry term, engineering teams everywhere are feeling greater and greater accountability for the security and stability of applications they build. DevSecOps is a practice, not a product. The practice consists of three primary use cases. For enterprises to be successfully implementing DevSecOps practices they need to focus on visibility, consistent communication, and data-driven incident response.

Database Cloud Migration Done Right - SolarWinds Lab Episode #95

The global pandemic has accelerated corporate planning for cloud computing and digital transformation by 2-4 years on average. But database migration, the process of moving a database from one place to another, is no walk in the park. Obstacles abound. Setbacks are common.

ITOps In 2 Minutes | AIOps vs MLOps | Jay Menon

The OpsRamp IT operations management (ITOM) platform allows you to see everything in your hybrid IT environment, take the right action faster with integrated event and incident management and automate with confidence with AIOps. Learn more about our service-centric AIOps platform. With OpsRamp, you can detect and resolve incidents faster, understand resource dependencies and avoid costly performance issues that result in lost revenue and productivity.

Manually Add a Metadata Source in SentryOne Document

SentryOne Document supports multiple metadata sources, and we plan to expand available providers in the future. But what if you want to add a metadata source that isn’t currently supported? This is where SentryOne Document’s Custom Metadata Import, available in both the Software and Cloud edition, can help. Leveraging this provider, you can manually insert any metadata source—DB2, REST API, or others.

Revoke certificate of an Icinga endpoint

A Certificate Revocation List (CRL) is a list of certificates that have been revoked by the issuing Certificate Authority (CA) before their scheduled expiration date. Those certificates should no longer be trusted. A client application such as an Icinga Agent can use a CRL to verify that the certificate of the server is valid and trusted.

Deploying Elastic to further strengthen IT security at TierPoint

TierPoint is a leading provider of secure, connected data center and cloud solutions at the edge of the Internet with thousands of customers. At TierPoint, I’m responsible for maintenance and development of the information security program, which includes threat analytics, incident response, and digital forensics. We’re constantly looking for new and even more effective ways to aggregate, process, and make decisions from massive amounts of data streaming in from diverse sources.

Hyperautomation in IT Operations

It made it into our IT Operations Glossary for 2021 blog last month but got rejected by Wikipedia. Gartner included it in its Top 10 Strategic Technology Trends for 2020 list and business process management software vendor Appian published a book about it. We’re talking about hyperautomation. Wikipedia’s reservations aside, hyperautomation is a technology trend that’s moving beyond the business process management (BPM) world and into IT operations.

How to Marie Kondo Your Incident Response with Case Management & Foundational Security Procedures

Marie Kondo, a Japanese organizational consultant, helps people declutter their homes in order to live happier, better lives. She once said: Similarly, in security, operational teams are constantly bogged down by a “visible mess” that inhibits their ability to effectively secure their organization.

5 Ways to Get Valuable Insight From Your AWS Bill

Did you know that CloudWisdom’s Bill Analysis tool shows you not just the services currently monitored by CloudWisdom but all services to deliver an overall view of your AWS cost? And if you’ve set up and configured consolidated billing to link multiple AWS accounts, you can include data from all those accounts in that view. You can even add multiple billing orgs to the same CloudWisdom account.

What is LDAP and how does it work?

As corporations grow, the need to organize user data and assets into a hierarchical structure becomes critical to to simplify storage access of those assets. LDAP enables organizations to store, manage, and secure information about the organization, its users, and assets. In this guide, we’ll explain what LDAP is, its uses, and how it works.

Tanzu Observability Named Fast-Moving Leader in GigaOm Cloud Observability Report

We are excited to share that technology research and analysis provider GigaOm has named VMware Tanzu Observability as a fast-moving leader in its forward-looking assessment of the cloud observability vendor space in 2021. Its cloud observability report considered solution connections; data integration and processing; performance management; root cause analysis; and full-stack observability.

Reports You Have To Check To Ensure The Health Of Your Infrastructure

Managing IT infrastructure is impossible without proper monitoring solutions and tools. Monitoring requires regular checks on the status, and the best way to gather details would be in the form of reports. Advanced IT monitoring solutions provide an automatic diagnosis of performance and availability issues across your IT network; manual interventions help optimize the process. Manual interventions make sure you don’t miss out on any warning signs before reaching the critical points.

Architecture and Monitoring Apache ActiveMQ with Grafana

In this article, we are going to look at the architecture of Apache ActiveMQ and how to monitor critical metrics of ActiveMQ using Hosted Prometheus and Hosted Grafana. If you would like to follow the steps in this blog, make sure to sign up for the MetricFire free trial. You can use Graphite and Grafana directly from our platform. MetricFire is a Hosted Graphite, Grafana and Prometheus service, where we do the setup and management of these open-source tools so you don’t have to.

Logz.io's Prometheus-as-a-Service is Generally Available

Today, Logz.io is thrilled to announce that Prometheus-as-a-service is now generally available for anyone to try themselves! I’d like to thank the Logz.io village for executing a huge milestone on our quest to unify the best open source monitoring tools on Logz.io’s scalable cloud platform.

Webinar: Understanding Serverless Observability with AWS and Lumigo

Serverless experts from AWS and Lumigo go over how to add monitoring, logging, and distributed tracing to your serverless applications. Learn how to track serverless health metrics by getting visibility and alerts on specific serverless issues. Then troubleshoot using visual serverless maps, correlated AWS services, and logs to understand what service requires attention to keep high levels of application reliability.

Making Your Log Data More Useful With LM Logs

To prevent failure and minimize downtime, it’s important to make sure your infrastructure and applications are observable. But, just getting to the point of observability isn’t enough. You need to be able to use the data that comes with observability — ideally in a way that helps your team troubleshoot more quickly and minimize or prevent downtime.

ITOps In 2 Minutes | AIOps vs MLOps | Jay Menon

The OpsRamp IT operations management (ITOM) platform allows you to see everything in your hybrid IT environment, take the right action faster with integrated event and incident management and automate with confidence with AIOps. Learn more about our service-centric AIOps platform. With OpsRamp, you can detect and resolve incidents faster, understand resource dependencies and avoid costly performance issues that result in lost revenue and productivity.

Working with the WebAPI tile - tips & tricks

Regardless of the SquaredUp product you use, the WebAPI tile is very useful when it comes to connecting to external data sources and showing them in your dashboards. It brings you closer to that single pane of glass dashboarding dream that we all have, which is why it is also one of our most used tiles!

5 Technical Metrics You Need for Observability in Marketing

Metrics measuring user engagement on your website are crucial for observability in marketing. Metrics will help marketing departments understand which of your web pages do not provide value for your business. Once known, developers can look at the web page’s technical metrics and determine if updates are required. Typically user engagement statistics, like the average time required to load your page, are stored separately from technical site logs.

PromQL Tutorial: 5 Tricks to Become a Prometheus God

For the seasoned user, PromQL confers the ability to analyze metrics and achieve high levels of observability. Unfortunately, PromQL has a reputation among novices for being a tough nut to crack. Fear not! This PromQL tutorial will show you five paths to Prometheus godhood. Using these tricks will allow you to use Prometheus with the throttle wide open.

What to Consider When Monitoring Hybrid Cloud Architecture

Hybrid cloud architectures provide the flexibility to utilize both public and cloud environments in the same infrastructure. This enables scalability and power that is easy and cost-effective to leverage. However, an ecosystem containing components with dependencies layered across multiple clouds has its own unique challenges. Adopting a hybrid monitoring strategy doesn’t mean you need to start from scratch, but it does require a shift in focus and some additional considerations.

Infographic: The State of Software Code

We surveyed nearly 1,000 developers across the U.S. to uncover key development trends and insights. Today’s businesses are software businesses. If there was any positive in 2020, it’s the power software has to allow us to continue in some “normal” sense. Learn how this survey uncovers how too many companies and their development teams still have a major blind spot when it comes to errors in their code.

How Does Page Load Time Affect Your Conversion Rate?

As a consumer, you know how you respond to slow sites. You hit the back button or close the browser. As a business owner, you’re well aware of this. You’re a consumer, too, after all. So, you know page load time and conversion rate are related. But how exactly? How does it affect the bottom line? In this article, we’ll analyze how page load time affects conversion rates and by how much.

How Endpoint Monitoring Saved My Life (Or At Least My Sanity)

Catchpoint makes parenting a saner sport! OK, that’s not the usual use case. But much as I know it sounds crazy, I used Endpoint Monitoring to explain to my 11-year-old gamer that I’m totally innocent. The reason she is having gaming lag and lost connections is not because of her parents’ poor home Wi-Fi or her old PC – and she doesn’t just have to take my word for it. Here, let me explain.

Do you already know what Synthetic Transaction Monitoring is?

Most of us who work in IT understand the concept of basic monitoring. It is not that it is an irrelevant concept, but rather, it is essential for any infrastructure, either manual, by reading logs and checking machines and services; semi-automated, using custom-made scripts or, the safest and most recommended, using a specific tool such as Pandora FMS to centralize your monitoring. But what is synthetic monitoring? We will see that throughout this article.

Event Latency: What It Is and Why You Should Care

Recently, we added a new derived column function to Honeycomb, INGEST_TIMESTAMP(), which can help customers debug event latency and/or inaccurate timestamps. A meaningful minority of the events sent to Honeycomb are already old when they arrive, and a very special few claim to have been sent from the future. Has this happened to you? Let’s do an experiment.

Elastic named a Leader in the 2021 GigaOm Radar on Cloud Observability

We are thrilled that Elastic has been recognized as a Leader and a Fast Mover in the 2021 GigaOm Radar Report for Cloud Observability. GigaOm Radar reports offer a forward-looking view of the market and are designed to help IT decision makers evaluate solutions with an eye to the future. Their analysts consider not just where the solutions are today, but also how the market is evolving and where the solutions are headed relative to that trend.

How Microsoft Used Splunk's Ethlogger to Turn Blockchain Data Into Supply Chain Insight

The way we ‘data’ is about to change, and Splunk’s Connect for Ethereum (aka EthLogger) is helping organizations to adapt. Splunk Connect for Ethereum enables organizations of all sizes to investigate, monitor, analyze and act upon their rapidly growing blockchain data sets across multiple chains.

Getting Started with OpenTelemetry .NET and OpenTelemetry Java v1.0.0

Recently we announced in our blog post, "The OpenTelemetry Tracing Specification Reaches 1.0.0!," that OpenTelemetry tracing specifications reached v1.0.0 — offering long-term stability guarantees for the tracing portion of the OpenTelemetry clients. Today we’re excited to share that the first of the language-specific APIs and SDKs have reached v1.0.0 starting with OpenTelemetry Java and OpenTelemetry .NET.

Network Throughput vs Bandwidth and How to Measure It

Have you ever used the term bandwidth? Probably. Have you ever used the term network throughput? Maybe. Have you used them interchangeably? Most likely. So let’s see where we get these mixed up. Namely, let’s look at the key differences between network throughput and bandwidth. And while we’re at it, let’s cover the fundamentals of throughput and how to measure it, so you can keep your network flowing efficiently and cleanly.

Painless Kubernetes monitoring and alerting

Kubernetes is hard, but lets make monitoring and alerting for Kubernetes simple! At iLert we are creating architectures composed of microservices and serverless functions that scale massively and seamlessly to guarantee our customers uninterrupted access to our services. As many others in the industry we are relying on Kubernetes when it comes to the orchestration of our services.

What's new in Grafana Enterprise Metrics for scaling Prometheus: enhanced access control and a compactor that supports 650 million active series and beyond

I’m a fresh starter here at Grafana Labs, leading one of our teams working on the Grafana Enterprise Stack. As a longtime user of Grafana, I couldn’t wait to see what’s new in versions 1.1 and 1.2 of Grafana Enterprise Metrics (GEM), our scalable, self-hosted Prometheus service. I tried out the shiny features and wanted to share some of the cool things I found.

Azure Management Talk: 8 easy steps to improve your security posture in Azure

You've deployed your application on Azure. Instantly hackers are targeting your public IP and the brute forcing of passwords and ports starts. What now? Should I deploy Azure Sentinel, or just enable Azure Security Center as a start? Join MVP and Microsoft RD Maarten Goet as he takes you through the 8 easy steps into improving your security posture on Azure. This is a demo heavy session no cloud engineer or developer should miss!

Tweaking Your Monitoring Strategy for a Seamless End-User Experience

Technology and end users have an almost dichotomous relationship; as technology gets more complex, end users expect a more seamless experience. At the same time, end users are also looking for maximized application availability and performance. How can a federal IT team meet these increasing demands? The answer is monitoring. Most monitoring instances have been implemented as an afterthought or as a way to solve a single, specific problem.

Application-Centric and Database-Centric Monitoring: Why You Need Both

How would you like it if your application shut down or started running slow without any warning? Let’s say, for instance, users who like your application try to access it, but aren’t able to. How would you know whether low traffic to your site is because users weren’t satisfied or because they simply weren’t able to access it? Users definitely won’t be satisfied when your application starts crawling with long response times.

AWS Machine Learning Tools (2021 edition)

When you want to stay ahead and on top of things in a fast-moving industry, machine learning (ML) is surely one of the trending solutions. Today, innovative companies already have leading Machine Learning tools well-integrated into their processes. In comparison, your start could seem dreadfully slow. Or maybe you just don’t have the time or resources to invest in running your own Machine Learning training infrastructure.

If one public cloud is good, are multiple public clouds better?

Virtana recently published the results of a new State of Hybrid Cloud survey. One of the findings is that 81% of companies in the study who have started their migration to the public cloud have engaged multiple providers. This result tallies with a recent Gartner survey of public cloud users, in which 81% of those respondents said they are working with two or more providers.

Two Major Industry Awards Confirm ChaosSearch's Growing Role in Enterprise Cybersecurity

On Friday, March 12th, ChaosSearch announced that its ChaosSearch Data Platform won Gold for two product categories in the 2021 Cybersecurity Excellence Awards: Best Security Analytics Solution and Best Security Log Analysis solution. Of course, we are thrilled to be recognized for our leadership and innovation in security analytics, but beyond that, these awards help to highlight ChaosSearch’s advantages in an area of growing importance to Security Operations (SecOps) teams.

Python Optimization: 3 Easy Steps

Python is one of the best programming resources available for designing machine learning systems. With a variety of technical abilities and potentially time-saving loops and processes, it can be an invaluable tool. However, it’s these capabilities that also make Python difficult to use. In many cases, Python may seem sluggish as it tries to navigate intricate, complicated strings of code.

Help Wanted: New Remote Work Roles Blur Lines b/n IT & HR

Fair warning, I tried to avoid using the obligatory good news, bad news opener for this article, but I couldn’t help myself. It’s just that we sort of are living in the age of good news, bad news. Especially when it comes to IT support and remote work. So here it goes. First, the good news: If you work in IT support, your colleagues need your expertise now more than ever before.

Getting Started with Time Series Data Science

Are you interested in performing time series forecasting or anomaly detection, but you don’t know where to start? If so, you’re not alone. There is an overwhelming variety of libraries, algorithms, and workflow recommendations for these tasks. As a Developer Advocate at InfluxDB, the leading time series database, I’ve researched time series data science methodologies and best practices for forecasting and anomaly detection.

Guidelines for picking where to send monitoring alerts

If you've ever had to be on the receiving end of a monitoring system that uses email for alerts, you know how noisy things can get. Particularly if you're working in an agency or freelance-like environment, with dozens of client sites to maintain. You get so many emails that you start looking into integrations with third-party services like Zapier, and coming up with more and more complex rules to try reduce the noise.

What if You Could Autonomously Monitor Across Your Databases?

When DevOps teams talk about monitoring a database, the primary motivation is to ensure that the database won’t suffer a performance hiccup. Long queries, timeouts and table scans are among the most popular causes behind lousy customer experience. However, in recent years, more data has been shifted to cloud databases.

Logback Configuration Example: Tutorial on How to Use It for Logging in Java

Troubleshooting issues in your applications can be a complicated task requiring visibility into various components. In the worst-case scenario, to understand what is happening and why it is happening you will need metrics, logs, and traces combined together. Having that information will give you the possibility to slice and dice the data and get to the root cause efficiently. In this article, we will focus on logs and how to configure logging for your Java applications.

Looking Back as We Move Forward: A Pandemic Journey

Until recently, work has been thought of as a place you go, rather than a thing you do. While cloud technology has made a ‘digital workplace’ possible, for many businesses, this has been a long-term ambition. But what happens when this changes overnight? Suddenly more people are at home than in the office, and the typical 9-5 model doesn’t apply anymore. This is the reality that businesses the world over had to face head-on a year ago when the pandemic hit.

Skylight 5: Now with Source Locations!

This week we released Skylight version 5.0, which represents a major undertaking that has involved every person at Tilde and every part of our ever-growing stack. In addition to major internal refactors, this release also modernizes our native Rust code, and introduces Skylight's newest feature, Source Locations.

The Big SCOM Survey: Results and expert analysis

What’s the future of SCOM? How do others set up their SCOM landscape? What tools are SCOM Managers using to help streamline processes? The answers are all here. The Big SCOM Survey 2021 is the first of what we hope to be an annual survey where we measure the pulse of the SCOM community, get insights and share best practices. This year, we asked 27 questions and had 118 respondents – and we’d like to share our findings with you.

Auvik and Cherwell Present: Processes for Managing Your Network

In this webinar, Auvik’s Technology Advocate Steve Petryschuk and Cherwell’s Principal Solution Consultant Ryan Counts walk you through a few critical steps you can take to make network operations more manageable. You’ll learn which two process factors will help you go from reactive to proactive, how to assess your current network management methods using a simple framework, and when to incorporate tools to automate processes.

SLF4J Tutorial: Example of How to Configure It for Logging Java Applications

Logging is a crucial part of the observability of your Java applications. Combined with metrics and traces gives full observability into the application behavior and is invaluable when troubleshooting. Logs, combined with metrics shortens the time needed to find the root cause and allows for quick and efficient resolutions of problems.

Guaranteeing Microsoft 365 Service Delivery: When the SLA Becomes an XLA

Microsoft 365 has become the new ‘virtual office’ and organizations are looking to ensure a consistent user experience, especially for employees working remotely. Watch this webinar to find out new approaches to manage your Microsoft 365 service delivery.

5 Things to Know About the Orion Platform Database Today

1. It’s the Back End for 14 Products and Connectors The full list is published here. The key is, a lot of products all leverage this single database, though not every customer runs every product. Over 20 years, the Orion Platform has evolved into, well, a pickup truck. It’s a utility vehicle, doing the best job it can for everyone who needs a ride. As such, this pickup truck works great most of the time.

Using the Icinga Web API

Unfortunately, there is little to no documentation for using the Icinga Web API to perform monitoring actions such as scheduling downtimes. But it’s a simple thing and I’ll give you a quick example of how to do it. Using the Icinga Web API instead of the Icinga API gives you the advantages of the permission and restriction system, various authentication methods and auditing.

Splunking AWS ECS And Fargate Part 3: Sending Fargate Logs To Splunk

Welcome to part 3 of the blog series where we go through how to forward container logs from Amazon ECS and Fargate to Splunk. In part 1, Splunking AWS ECS Part 1: Setting Up AWS And Splunk, we focused on understanding what ECS and Fargate are, along with how to get AWS and Splunk ready for log routing to Splunk’s Data-to-Everything Platform.

How to monitor website availability

“100% website availability.” Which webmaster would not want to see this availability report? Every website owner would like their website available for users to be 99.9% all of the time. Without a website that is accessible and running smoothly at any time of day, all web-related investments will go to waste. That is why website availability monitoring is so important.

Monitor VoltDB with Datadog

VoltDB is an ACID-compliant, in-memory relational database designed to support real-time analytics. VoltDB’s in-memory storage, stored procedures, and shared-nothing architecture make it specifically optimized for quickly processing massive streams of data. This means VoltDB is tailored for use cases like online gaming, telecommunication, and financial applications, which require fast data processing.

Log4j 2 Configuration Example: Tutorial on How to Use It for Efficient Java Logging

When it comes to troubleshooting application performance, the more information you have the better. Logs combined with metrics and traces give you full visibility into your Java applications. Logging in your Java applications can be achieved in multiple ways – for example, you can just write data to a file, but there are far better ways on how to do that, as we explained in our Java logging tutorial.

Getting started with PromQL - Includes Cheatsheet!

Getting started with PromQL can be challenging when you first arrive in the fascinating world of Prometheus. Since Prometheus stores data in a time-series data model, queries in a Prometheus server are radically different from good old SQL. Understanding how data is managed in Prometheus is key to learning how to write good, performant PromQL queries. This article will introduce you to the PromQL basics and provide a cheat sheet you can download to dig deeper into Prometheus and PromQL.

Grafana Loki 2.2 released: Multi-line logs, crash resiliency, and performance improvements

I imagine everyone is long since tired and bored with their Loki 2.1 end of year/holiday gift, so I’m here today to bring some really exciting news. Loki 2.2 is released!!! New to Loki? Want a refresher? Owen Diehl and I did a webinar not long ago. Check out the on-demand video for a good overview of what Loki is capable of in 2021! Lots of new features are in this release, but worth celebrating in particular is that the single most requested feature for Loki has been added!

Performance Monitoring Support for React Native

March Mobile Madness continues with Performance support for React Native. Our friend, Jenn Meung shares how Performance supports his mobile appliction. In addition to working with Sentry, I also contribute to Tour, a travel app which helps people plan trips with a drag-and-drop interface. Because Tour is built with React Native, we’ve always had issues accurately gauging how people use our app and its performance.

The Top Query Languages You Should Know for Monitoring (and a couple more)

Sifting data can be fun for some people. Connecting the dots and finding correlations where they weren’t obvious before. It’s the crux of what drives people’s motivation in data science. It’s no different in any other field, especially in one involving systems observability, telemetry, or monitoring. And the best way to do that is to develop a fluency with query languages for different database structures and open source tools.

Top ITSM Trends of 2021

ITSM is a hot topic for people who manage IT infrastructure. And due to the immense changes, we have experienced this past year, there are new developments that show where the industry is headed. ‘Automation’ has been all the craze for the past few years, but it is now more relevant than ever. Then there is ‘digital transformation’, the buzzword of 2021, which is a result of the rapid levels of digitization that all organizations are experiencing.

How to monitor your key process with Bleemeo?

This article explains how to monitor service process on your service with Bleemeo using MySQL as an example. This feature is what we called "Key Process Monitoring" and is described by our documentation. Currently, you can monitor any service process automatically discovered by Bleemeo. All metrics are detailed in our documentation, in a few words, metrics related to memory, cpu, network and i/o used by this process are gathered by the agent.

Elastic Security 101

Elastic Security empowers analysts to collect data from multiple data source integrations, perform traditional SIEM functions, and take advantage of machine learning-based malware protection on the endpoint. Analysts can filter, group, and visualize data in real-time while performing automated threat detection across various security events and information. In this video, you’ll learn about the components that make up Elastic Security and what those components do to help you protect your data.

How to configure your Endpoint Integration policy in Elastic Security

Elastic Security offers the ability to open and track security issues using cases. Cases created directly in Elastic Security can be sent to external systems like Atlassian’s Jira, including Jira Service Desk, Jira Core, and Jira Software. In this video, you’ll learn how to connect Elastic Security to the Jira Service Desk.

Pingdom Powered Status Page

Pingdom is one of the most used website monitoring tools, with almost 15 years in the business. It excels at providing simple and reliable synthetics as well as real user monitoring. This monitoring tool provides a simple public status page, but as you might have noticed it’s quite limited. It only serves as a display of your uptime and response time history, not much more than that.

Improve Business KPIs with Splunk APM Business Workflows

One of the biggest challenges that DevOps teams face is how to connect their efforts with the priorities of business leaders. In conversations we’ve had, developers and SREs discussed how they need to show business and engineering leaders that they are investing their time solving the right problems, and how solving these problems lead to overall better business outcomes.

Sumo Logic extends its APM to browser

Over a year ago we decided to invest heavily in Application Observability, understanding the modern observability platform must unite logs, metrics, and traces in one analytics layer to better serve reliability use cases. We have also advocated a modern trend to acquire tracing data via open source industry standards like OpenTelemetry without vendor lock-in.

Enough! 4 Work From Home Solutions to Heal IT's Pain

Let’s face it: when it comes to managing a work-from-home setup, IT has a lot of problems they don’t know how to solve. It’s not for lack of effort – there just don’t seem to be many practical solutions out there that can alleviate their new remote work-induced headaches. It was tough enough getting everyone up and running in home offices. Now, IT is all but drowning in tickets (a majority of tech leaders have reported ticketing increases up to 50%).

Website downtime: 4 more major websites that have gone down in the last month

Website down. Two words that cause panic in website owners far and wide, especially if they have thousands, if not millions of people, using their website on a daily basis. But it’s not as rare as most people think, even the biggest of companies find themselves in the deep depths of website downtime hell. The impact? Loss of revenue for starters, followed up by your competitors who are online getting your customers’ attention, bad SEO rankings, and lack of customer satisfaction.

What You Need to Know About Server Security in 2021

How often do you check your event log monitor for potential security breaches? Did you know that many potential security breaches, events, and other problems are logged to event logs? Unfortunately, even the most skilled IT professionals have a hard time making sense of what to watch for that could indicate security issues or even a potential breach until it is too late. Event logs contain a ton of information that can be useful.

How to Prevent Website Defacement

We have identified what website defacement is. We can all agree that it has the potential to have long-lasting effects on your brand image, if not prevented. Your website can be left inaccessible, and a security breach can make you lose trust among customers who entrusted you with their data. It can also impact search engine rankings and traffic.

How we're gradually evolving to a new Uptrends

You might’ve noticed that things are looking a bit different in your account lately. We’ve been gradually rolling out updates to your dashboards and menus over the last months, and we will continue to improve the coming period. With these updates, we are upgrading the way you interact with Uptrends, so you can find stuff faster and visualize your data more easily.

Monitoring Locations Update

Since June 2020, when we launched the new Monitive, we have the same monitoring network of 8 locations. It’s time to expand. The current locations are VPS servers from DigitalOcean and Linode. I like to mix it a bit, so that we don’t rely on a single provider, but not too much, since the administrative overhead of managing dozens of providers is another lesson learned in 10 years of uptime monitoring.

How to Reduce Bandwidth Consumption for Your Network?

Do you know who interacts with whom, when, and for how long and how frequently in your network? Network administrators must have clear visibility of bandwidth utilization using a robust bandwidth monitoring tool, to find out slow loading yet crucial connections, to plan out the capacity of network properly or to control the Quality of Service.

Powerful Caching with Redis for Node.js Applications

Regardless of the tech stack used, many developers have already used Redis or, at least, heard of it. Redis is specifically known for providing distributed caching mechanisms for cluster-based applications. While this is true, it’s not its only purpose. Redis is a powerful and versatile in-memory database. Powerful because it is incredibly super fast. Versatile because it can handle caching, database-like features, session management, real-time analytics, event streaming, etc.

The basics of IoT, and why Prometheus works so well with it

Before we start, please take a moment to appreciate what day it is. IoT, or Internet of Things, has been a buzzword for longer than usual. Buzzwords usually have two common properties, and then their paths fork. I like thinking about buzzwords and about the useful aspects of what they mean. The most recent public example focuses on another buzzword currently in its hype phase: observability.

Four Unique LogicMonitor Dashboards To Inspire You

LogicMonitor dashboards are truly customizable — customizable enough to allow users to visualize virtually anything. Dashboards provide our users with a wide array of capabilities, from capacity planning and service availability notifications to root cause analysis and IT spend forecasting capabilities. We’ve seen LogicMonitor users get radically innovative when it comes to creating unique dashboards that add value to their lives both in and outside of work.

Top 7 Datadog Competitors to Know in 2021

Datadog was founded in 2010 by Olivier Pomel and Alexis Le-Quoc. They develop the Datadog to reduce the friction they experienced between system administration teams and developers. Together they raised over $648 million to bring its valuation to $7.8 billion by the end of 2019. Let’s dig deep into Datadog to know its competitors.

New10: Monitor hybrid cloud environments, troubleshoot serverless production workloads with Datadog

Pavel Kruhlei, Quality Engineer lead of New10 talks about how Datadog allowed them to resolve performance issues in their serverless applications, as well as increase visibility into their hybrid environment with Datadog.

Webinar: How Serverless is Changing the Cost Paradigm

-One of the key characteristics of serverless components is the pay-per-use pricing model. For example, with AWS Lambda, you don’t pay for the uptime of the underlying infrastructure but just the number of invocations and how long your code actually runs. This removes the need for many micro-optimizations. As a result, many applications would run at a fraction of the cost if they were moved to serverless.

Using observability to scale AWS Lambda [Live session]

How to utilize observability to optimize your Lambdas for scale and maintain their performance over time - from development to production to scabability. How do you spot potentially slow-running Lambda functions and how do to power-tune them in development? Load testing and how you need a good observability tool for when you do load testing? How to do load testing? How to use observability and make crucial data available in production and at scale? Observability best practices and common mistakes.

5 Universal Steps to Cloud Database Migration

Chances are if you’re reading this, you are (or are thinking about) moving to cloud. While cloud migration isn’t a new term and many people are migrating, there are a few things you should know before taking the leap. Migrating your database to the cloud is worth the hours of planning, sorting through data and running tests because if it’s not done right, it may come with more bad than good.

Are NoSQL Databases Relevant For Data Engineering?

SQL is great, but sometimes you may need something else. By and large, the prevalent type of data that data engineers deal with on a regular basis is relational. Tables in a data warehouse, transactional data in Online Transactional Processing (OLTP) databases — they can all be queried and accessed using SQL. But does it mean that NoSQL is irrelevant for data engineering?

10 Ways to Get Ahead with OpsRamp's AIOps

IT operations departments in larger enterprises often use 10-15 monitoring tools across different teams to track the health and availability of their core business services. Rather than helping ITOps teams gain a comprehensive view of their infrastructure, an overload of monitoring tools tends to only compound organizational silos and limit insights for incident troubleshooting. Yes, there is too much of a good thing.

Why Observability Is the Key Ingredient to Success

Digital transformation is accelerating at a staggering pace. Consider these statistics. In December 2019, Splunk partner Zoom had 10 million monthly active users. By the end of last year, that number was estimated to be closer to 300 million. It was part of an explosion of technological growth replicated across many industries and businesses in 2020. As Splunk CEO Doug Merritt said.

The Déjà Vu Evolution of Cloud Computing

I believe that the evolution to hybrid cloud is inevitable. Not because it’s grabbing headlines, but because it mirrors the industry’s history of new technology adoption. Take the evolution of virtualization, for example. Going back 20 years give or take, virtual machines popularized by VMware, KVM, and Hyper-V started to gain traction.

Efficiently Monitor the State of Redis Database Clusters

Monitoring Redis, the open source in-memory data platform, is complicated enough when you are hosting your Redis instance on just a single server. It gets even more complex when you build a Redis cluster that consists of multiple nodes and distribute your data across them. But as long as you know which metrics to prioritize and how to collect them, Redis monitoring is feasible enough. This article offers an overview of how to monitor the state of Redis database clusters.

TL;DR InfluxDB Tech Tips: Debugging and Monitoring Tasks with InfluxDB

With InfluxDB you can use Tasks to process data on a schedule. You can also use tasks to write custom alerts. However, sometimes your task will fail. In this TLDR, we’ll learn how to debug your task with the InfluxDB UI and the InfluxDB CLI.

VPN and Firewall Log Management

The hybrid workforce is here to stay. With that in mind, you should start putting more robust cybersecurity controls in place to mitigate risk. Virtual private networks (VPNs) help secure data, but they are also challenging to bring into your log monitoring and management strategy. VPN and firewall log management gives real-time visibility into security risks. Many VPN and firewall log monitoring problems are similar to log management in general.

Trace AWS event-driven serverless applications with Datadog APM

Last year, we released native tracing for AWS Lambda through Datadog APM to provide deep visibility into serverless functions and surface performance issues such as cold starts and errors, without any added latency. But Lambda functions are only one piece of the puzzle in a rapidly growing serverless ecosystem, which includes message queues, data streams, notification services, and more.

Quick Test Feature

A feature that’s not available in the Monitive service, but has proven to be a useful helper is the ability to quickly check a website from several locations around the world. Just head out to the homepage and type in a website, with or without https://. Press Test Availability and you instantly get an overview of how your website is performing from several locations around the world.

Observability & AIOps, the perfect combination for dynamic environments

IT teams live in dynamic environments and continuous integration/continuous delivery has been on high demand. In the dynamic environment, DevOps and underlying technologies such as containers and microservices, continue to grow more dynamic, and complex. Now, just like DevOps, observability has become a part of the software development life cycle.

More Changes Mean More Challenges for Troubleshooting

The widespread adoption of Agile methodologies in recent years has allowed organizations to significantly increase their ability to push out more high quality software. Previous development practices revolved heavily around centralized applications and infrequent updates that were shipped maybe once a quarter or even once a year.

Why Your Mean Time to Repair (MTTR) Is Higher Than It Should Be

Mean time to repair (MTTR) is an essential metric that represents the average time it takes to repair and restore a component or system to functionality. It is a primary measurement of the maintainability of an organization’s systems, equipment, applications and infrastructure, as well as its efficiency in fixing that equipment when an IT incident occurs. Key challenges with MTTR arise from just trying to figure out that there is actually a problem.

Splunk SOAR Playbooks: Crowdstrike Malware Triage

The combination of Crowdstrike and Splunk Phantom together allows for a more smooth operational flow from detecting endpoint security alerts to operationalizing threat intelligence and automatically taking the first few response steps – all in a matter of seconds. In this video, distinguished Phantom engineer Philip Royer will walk you through an out-of-the-box playbook that you can set up in Phantom to triage malware detections from Crowdstrike and automate a variety of responses based on an informed decision by an analyst.

Defining A Cloud Monitoring Strategy: Best Practices

When you are running cloud-based services as part of your overall business operations, it becomes necessary to monitor your cloud operations for evaluating the usage and efficiency of the cloud services, applications, and infrastructure. Cloud monitoring also lets you watch for threats and be mindful of cyber-attacks. Here is a brief rundown on how best to monitor cloud services and some tips to make it more efficient and useful.

What is Grafana?

Today, almost every application stack would usually consist of a number of different applications, each performing a specific role and working together towards a common goal. This is the case whether it be that of a fortune 500 company or a computer science student trying to complete a tech project. As such, the stability and reliability of your infrastructure would greatly depend on the performance of each application within that infrastructure.

How to Monitor Cloud Server Performance with Graphite

Dive into the article to learn how to monitor cloud server performance with Graphite and get started on your monitoring needs! Application Performance Monitoring (APM) is a crucial part of the technological era. It refers to a methodological approach towards maintaining and sustaining a system’s health. It is extremely important to monitor an application’s health and performance upon launch, and then regularly afterwards.

Sample Approaches of Hybrid Cloud Monitoring Models

While cloud is seen as the go-to environment for modernizing IT strategies, many security-conscious businesses are still hesitant to adopt a full cloud strategy. A logical middle ground has now emerged: the hybrid cloud. Hybrid cloud promotes the coupling of on-premises infrastructures with one or more public cloud services to meet both cost optimization and security compliancy. Despite its benefits, hybrid cloud computing can present technical and management challenges.

Use Grafana to Monitor Flask apps with Prometheus

Flask is a widely used Python framework for building simple web applications, as well as being popular for REST APIs. If your Flask application has a lot of requests and is sensitive to request delays, it is essential to keep track of its metrics. Monitoring Flask with Prometheus is the perfect tool for this. When used in combination with Grafana, it can help make your app’s metrics easier to understand.

8 Emerging Web Development Trends in 2021 and Beyond

What’s trending in web design and development? Like humans, technology also continues to evolve as we find new innovations, doing things faster and make them work flair than they did before. This increasing attention to responsive web design is partly because of the increased usability and the performance it brings to the users. Each year, developers are always discovering new technologies that can keep them in the competition.

Issue Detail Performance Improvements

One of Sentry’s most-trafficked pages is our issue details page, as it helps our customers understand the root cause an error. For those of you who are new to Sentry, we define an issue as a group of similar events. To render the issue details, a significant amount of data needs to be fetched — counts, charts, event details, and other metadata. The two main components on this page are the latest event, and summary statistics.

A closer look at the admin API and plugin for centralized tenant adminstration and control in Grafana Enterprise Logs

To follow up on our introduction of Grafana Enterprise Logs, the latest addition to the Grafana Enterprise Stack, let’s dig into one of the key features: the admin API and admin plugin. Grafana Loki, Grafana Labs’ log aggregation project, provides the underpinnings of Grafana Enterprise Logs (GEL).

ITOps In 2 Minutes | What is Product-Led Growth? | Michael Fisher

The OpsRamp IT operations management (ITOM) platform allows you to see everything in your hybrid IT environment, take the right action faster with integrated event and incident management and automate with confidence with AIOps. Learn more about our service-centric AIOps platform. With OpsRamp, you can detect and resolve incidents faster, understand resource dependencies and avoid costly performance issues that result in lost revenue and productivity.

Pinpointing a Memory Leak For an Application Running on DigitalOcean

It can be difficult to track down memory leaks, especially when deploying apps on cloud-based application services. Often teams are left with restarting their apps as the only—and less than ideal—recourse to get them back up and running. So what do you do when your cloud-based app springs a memory leak, and how do you pinpoint the cause? In this article, we’ll create a Java app designed to purposely leak memory and deploy it to DigitalOcean.

Page Load Time in E-Commerce: How to Track and Improve Key Metrics

When you’re in e-commerce, page load time is crucial. Any change you make to your site can affect it, as you’ve probably found out the hard way on several occasions. So you want to make sure your site doesn’t slow down again. Or at least that when it happens, it doesn’t go unnoticed and unaddressed for any longer than it has to. Relax—you’re in the right place. In this article, we’ll examine which metrics to track and improve to keep your site in tip-top shape.

SQL Sentry Tips and Tricks: Monitoring Targets Across Multiple Domains

A frequently asked question when I’m speaking with SQL Sentry customers is, “Can I monitor targets across multiple domains?” The answer to this question is yes. Although there might be specific scenarios in which you’ll want to have multiple SQL Sentry installs, it’s possible to monitor targets across multiple domains through one install (i.e., one centralized SQL Sentry database). There are a couple of different options available to do so.

What Is Root Cause Analysis (RCA) and Why Do You Need It?

Imagine you have a hole in your car's tire. To fix it quickly and get on your way, you apply a patch. Then it happens again. You apply another patch. Before you know it, you're driving on the highway and you blow a tire. The risk was always there. You were simply hiding it because you didn't solve the problem. We see this often when it comes to IT issues. Teams take a band-aid approach to fixing problems without addressing the underlying causes.

Shy but interesting look at the history of monitoring

Close your eyes and breathe slowly, can you already feel the coolness on the tips of your boots? On the tense phalanges of your hands? The first step is right in front of you. It is a spiral staircase armed with worn ashlars under old voussoirs. The dim light of a chandelier accompanies you. What are you waiting for? Go up! The forbidden book awaits you in the last of the stays, where you will finally find out something about the history of monitoring.

Forrester TEI study: Sumo Logic's Cloud SIEM delivers 166 percent ROI over 3 years and a payback of less than 3 months

We are seeing a renewed focus on accelerating digital transformation projects across business ecosystems and workflows within our customer base. These projects are enabling key business outcomes and this organizational transformation has given security and IT leaders the catalyst and opportunity to modernize security operations while eliminating on-premises debt.

Sumo Logic Continues to expand Public Sector Footprint

In a recent press release entitled ‘Sumo Logic Achieves FedRAMP Moderate Authorization’, dated Feb 2, 2021, the pioneer in continuous intelligence announced its Continuous Intelligence Platform™ has achieved Federal Risk and Authorization Management Program (FedRAMP™) Authorization at a Moderate impact level enabling the company to help public sector organizations get real-time insights into their complex on-premises and cloud environments.

TL;DR InfluxDB Tech Tips - Time Series Forecasting with Telegraf

If you’re familiar with Telegraf, you know that you can easily configure this lightweight collection agent with a single TOML configuration file to gather metrics from over 180 inputs and write data to a wide variety of different outputs and/or platforms. You might also know that Telegraf can act as a processor, aggregator, parser, and serializer.

Sponsored Post

Gain more visibility into code performance with Raygun APM for Node.js

Raygun has been busy building our best-in-class APM so you can provide flawless digital customer experiences. By adding Raygun Application Performance Monitoring to your monitoring suite, your team will gain more visibility on code and server performance, achieve a faster time to resolution with finer granularity, and reduce infrastructure costs by optimizing existing services. Raygun is a developer-friendly product that surfaces more diagnostic details than other APM solutions. Combined with our usage-based pricing, we have the ability to provide companies like Olo with millions of customers with cost-efficient and powerful APM.

Key network monitoring challenges faced by MSPs and how to overcome them

Networks are growing more complex by the day and to monitor them effectively, organizations need to invest in technical know-how, expensive tools, and costly operations. To avoid the overheads of cost, effort, and time, and to ensure constant availability of networks, more and more organizations are turning to managed service providers (MSPs) to manage their network. In fact, the global market for MSPs has been steadily growing and is expected to cross $300 million in the next few years.

New Monitive website sneak peek

Last year, while we were coming up to the date of launching the new Monitive, I wanted a new homepage to go in harmony with the new service. As we were focusing 100% on finalizing the service, there were few resources for the actual website. I only knew the following: So after about a week of research and brainstorming, we decided to go with Hugo static site generator, the Jumpstart theme by MediumRare and just a single landing homepage and the legal pages (privacy policy and terms of use).

Logging in Ruby with Logger and Lograge

Logging is tricky. You want logs to include enough detail to be useful, but not so much that you're drowning in noise - or violating regulations like GDPR. In this article, Diogo Souza introduces us to Ruby's logging system and the LogRage gem. He shows us how to create custom logs, output the logs in formats like JSON, and reduce the verbosity of default Rails logs.

How to build a ServiceNow Incident Connector for SCOM

Connecting Microsoft SCOM and ServiceNow is a no-brainer! If you want to synchronize your alerts with your incidents, view issues in real-time, and generally make your life easier then why wouldn’t you! But, we know not everyone is tech-savvy enough to develop these solutions themselves, so we’ve written this blog to give you a helping hand. So, if you want to give it a go and build your own integration tool, then here’s how to get started.

How to Monitor Microsoft Teams Audio Video Performance

In this video, we will demonstrate how to deploy an Exoprise Teams AV Sensor to proactively monitor Teams Audio Video performance from any vantage point. You will see how to monitor Teams AV performance issues from throughout the world and remote locations. Track real-time statistics and call quality metrics such as Packet Loss, Jitter, RTT all seen from the perspective of the Teams Client / WebRTC protocol.

Flowmon Detects Windows DNS SIGRed Exploitation

The vulnerability called SIGRed (CVE-2020-1350) has been around for 17 years, during which time it was present in Windows Server operating systems from version 2003 through 2019 and received a maximum severity rating of 10. It was finally patched in July 2020. As the vulnerability allows an attacker to perform remote code execution on Windows Server via DNS, it poses an extremely serious danger and can propagate over the network without user interaction.

How I built a monitoring system for my avocado plant with Arduino and Grafana Cloud

A couple months ago, during our Grafana hack days, I created my first monitoring solution: my sourdough monitoring system. It was a lot of fun to build it, and I enjoyed it a lot! So when the next Grafana hack day was approaching, I started to wonder what my next monitoring system could be. What would I like to learn more about? What would I like to get better at doing? To be honest, I didn’t have to think hard.

Avoiding Ad Blockers with Forwarding Domains

Large tech companies are monetizing and exploiting customer data in increasingly unpalatable ways. It’s no surprise that users are fighting back. It’s estimated between 25% and 50% of users are employing ad blockers. Unfortunately, some overzealous ad blocking tools have added TrackJS domains to their block lists. We believe the blocks are unwarranted (more below). We don’t sell or monetize our user data. Ever.

MSP software: 10 tools MSP companies should try in 2021

2021 is sure to bring a number of challenges for your MSP—here are some of the best tools on the market today that can help your company overcome these challenges. Managed services providers (MSPs) and other businesses that deal with sensitive data on behalf of their customers and their own organization face a range of challenges.

Monitoring PHP Application Performance: A Step-by-Step Guide

Today we’re here to show you the ropes on PHP monitoring. You’ll learn how to monitor the performance of your PHP applications. But why is doing this valuable in the first place? It used to be common for application performance to be considered a non-functional requirement. Things have changed, though. Nowadays, more and more software professionals have come to think of performance not as a nice-to-have, but as the most important must-have.

Coffee Break Webinar Series: Intelligent Observability for DevOps

Amidst the nonstop pace of work to constantly evolve today’s digital business, we can forget to take a moment out to think about how it is that we’re doing that work. A new series of ‘coffee break’ webinars aim to provide that opportunity by pausing to look at the ways humans can best work with observability data. In particular, Coffee Break with Helen Beal looks at improving the work done by different types of software engineers that leverage artificial intelligence.

How to Draw Network Diagrams

Drawing a good network diagram isn’t hard to make, but it can be distressingly rare. Even network engineers with years of experience often draw network diagrams that are jumbled and hard to understand. As a network administrator responsible for the network, it’s vitally important you have a detailed understanding of your network topology. Without this information, even basic troubleshooting can be unnecessarily difficult.

Web Applications Monitoring Guide

Companies are juggling multiple servers, applications, transactions, and web services in the modern digital world. This brings out the difficulty of keeping track of it all. Here’s where web applications monitoring steps in and gives an in-depth overview of all applications’ performance. Let’s take a step back and dig into the basics of web application monitoring.

Betting on complexity and change: Q&A with Paddy Power Betfair's Teodor Olteanu

Over the past year, the pandemic has created a number of new IT challenges, spurred on by the sudden shift to remote work. On the bright side: in the wake of these developments, companies across the globe have taken huge leaps in terms of their IT strategies. One such company is Paddy Power Betfair. When the Irish betting company Paddy Power merged with Betfair in 2016, they created one of the most powerful and unique brands in the world of online gambling.

Laravel Monolog Handler for Logflare

For our API, we’ve been happily using NewRelic’s monolog enricher for a while, which sends our application logs to NewRelic at the end of each request, making it light and fast for our system not to be bothered by it. Until it stopped working with the upgrade to Composer 2, and they knew about it for several months and still didn’t do a single thing to fix it. So I decided to move to Logflare. Logflare is a fast, light, scalable, and powerful logging aggregator.

Best practices for monitoring Microsoft Azure platform logs

Microsoft Azure provides a suite of cloud computing services that allow organizations across every industry to deploy, manage, and monitor full-scale web applications. As you expand your Azure-based applications, securing the full scope of your cloud resources becomes an increasingly complex task. Azure platform logs record the who, what, when, and where of all user-performed and service account activity within your Azure environment.

Document. Don't create.

Reading through the Traffic Secrets book by Russell Brunson I found out about a very interesting tip of Gary Vaynerchuk: Document. Don’t create. For years, the fact that we cannot provide transparency from what we learn as we’re moving along and growing was a constant thorn in my back. Even when we decided to do a marketing (a.k.a. content) push with our blog, it did not bring that much value since I, as the founder of Monitive, didn’t find the time to write.

UptimeRobot March 2021 Update: New Integrations, Heartbeat & SSL Improvements

While the world is getting more optimistic with the wave of vaccine options, we’re looking forward to introducing you to our latest news. We’re happy to say we managed to fulfill some of our last year’s promises and are still working on our plans and improving UptimeRobot. We introduced a completely redesigned mobile app and new status pages, with many new features. We entered the new year enthusiastically and kept on being busy. Let’s take a look at what’s new!

The Digital Workspace Monitoring Journey

One of the best parts of our job is enabling you to become the best #ITPerformanceHero that you can be. But behind the scenes, eG Innovations is filled with talented team members who wear their own IT Hero capes. We want to take some time to introduce you to a few of them, who make eG Innovations the leading performance monitoring company in the market. We’re pleased to introduce Babu Sundaram, Head of Product Engineering at eG Innovations.

How To - Monitor Multi-Cloud with Catchpoint

Many companies have adopted a multi-cloud environment for their services and many more plan to expand cloud usage in the future. As more VMs and workloads transition to public and private clouds, it is becoming clear that multi-cloud is becoming a standard or benchmark rather than something optional that businesses want to ‘try out’.

Using Thola for monitoring your network devices

Once upon a time there was a small company in the south west of Germany that used an old check plugin for monitoring its network devices. But as their network got bigger and bigger over time, the plugin (written in Perl) became more greedy than ever before and swallowed all of the available resources. The CPUs were melting and the RAM was about to collapse. So a small team of creative software developers decided to take the fate of their company into their own hands.

What Is NullReferenceException? Object reference not set to an instance of an object

“Object Reference Not Set to an instance of an object.” Cast the first stone those who never struggled with this error message when they were a beginner C#/.NET programmer. This infamous and dreaded error message happens when you get a NullReferenceException. This exception is thrown when you try to access a member—for instance, a method or a property—on a variable that currently holds a null reference. But what is a null reference?

JSON to InfluxDB with Telegraf and Starlark

Data platforms — or databases with sets of APIs for flexibly working with data — are quintessential backbones for those who rely heavily on being able to change how they obtain data and work with their data over time. A good data platform will provide you the necessary tools to glean the insights you need to solve tangible problems. That platform should also hopefully make it so you don’t have a bad time doing it!

Desktop Central featured in 2021 Gartner Peer Insights Customers' Choice for UEM

At ManageEngine, customer satisfaction is not just a promise, but also a driving force behind everything we do. From resolving bugs to delivering a seamless experience, we always look forward to hearing what our users think about our solutions. That’s why we’re delighted to announce that ManageEngine Desktop Central has been recognized as a Gartner Peer Insights Customers’ Choice for Unified Endpoint Management Tools. To all of our customers who reviewed us, we want to say thank you!

Monitor debugging data with NerdVision's integration in the Datadog Marketplace

NerdVision is a live debugging platform that enables users to take snapshots of their application’s state at runtime. NerdVision is compatible with .NET, Java, Node.js, Python, and ColdFusion applications—no matter where they are hosted—and doesn’t require any changes to the source code.

How to build a CSS-only responsive website navigation

In the new light of website performance that I’m pursuing, I have learned to avoid Javascript at all costs. Here’s a nice Javascript-less desktop and mobile navigation update that we’ve added to our website. Inspired by Dirk Olbrich’s Hugo Starter Theme with Tailwind CSS this works by displaying a regular navigation bar on landscape tablets and desktop resolutions, but changes into a nice dropdown on mobile resolutions.

How to use Glouton as Nagios NRPE Daemon

When using Nagios, the NRPE daemon has been the traditionnal solution to implement local checks (load, number of users, custom scripts, etc.). All other checks are performed remotely from the Nagios server. NRPE daemon has been a bit challenging as you need to keep it in sync with your Nagios server and sometimes backporting this daemon can be painful. As Glouton has been implemented in Go, when you need a Nagios NRPE daemon, you can just use the binary on any compatible system and voila.

Kubernetes Logging Simplified - Pt 1: Applications

If you’re running a fleet of containerized applications on Kubernetes, aggregating and analyzing your logs can be a bit daunting if you’re not equipped with the proper knowledge and tools. Thankfully, there’s plenty of useful documentation to help you get started; observIQ provides the tools you need to gather and analyze your application logs with ease.

Cloud Native Goes Native with Charity Majors and David McKay

Cloud-native and serverless technologies are gaining traction as organizations increasingly recognize the value of containers and Kubernetes in application development environments. As a result, the cloud-native ecosystem is growing at a healthy pace. In this topic spotlight, we take a look at the cloud-native landscape and discuss its impact on DevOps, application security and more. Some of the issues discussed during the webinar include.

Best Practices for Monitoring Your End-User Experience - SolarWinds Lab Episode #94

Knowing if a server has high CPU is helpful, but does it really matter if your end users can still access their apps without performance issues? If you're only monitoring the server side, then you don't have a complete picture of your environment. End-user monitoring can be an extremely valuable tool—if you know how to you use it. Join Product Manager Katie Cole and Head Geek Patrick Hubbard as they dive deep into best practices for Pingdom®, the SaaS-based, end-user experience monitoring tool from SolarWinds.

Why we're partnering with Elastic to build the Elasticsearch plugin for Grafana

As I’ve often talked about before, we have a “big tent” philosophy at Grafana Labs. We believe our users should determine their own observability strategy and choose their own tools; Grafana allows them to bring together and understand all their data, no matter where it lives. In practice, that means that we want to support data sources that our users are passionate about.

Microservices vs APIs: One Doesn't Always Imply the Other

When it comes to conversations around application architecture or working with integrations between applications, you’ve likely heard a couple terms pop up a few times: microservice and APIs. You might also have run across the common misconception that microservices are just a way to implement APIs so they can communicate with each other. As you’ll see in this article, there are alternative ways to architect our microservice applications.

The future of SCOM

'SCOM is here to stay'. Please read the following extract from an article written by Richard Benwell, CEO at SquaredUp. Gartner predicts that 30% of Enterprise IT spending will be on cloud and outsourcing by 2023. That’s 70% spending on non-cloud and outsourcing – the in-house infrastructure and software we have today. And if you’re running traditional Windows and Linux workloads at scale, SCOM remains the best tool on the market.

Debugging Development Logs with Papertrail and rKubeLog

It’s important to ensure the logging and monitoring of a service is as consistent across environments as the code itself. However, it can be expensive and cumbersome to test the logging functionality with the usual required log exporters, database infrastructure, and processing requirements of normal production-grade solutions.

SQL Sentry Events Log Updates Provide a Centralized View of Events

The SQL Sentry Environment Health Overview (EHO), which is part of the dashboard shown on the Start page, enables you to see all the conditions that have fired alongside the overall health of your database environment. We understand how useful it is to be able to quickly review the health information without having to dig deep into performance data, and we’re excited to announce a few enhancements to the EHO, Events Log, and Actions Log available in the SQL Sentry 2021.1 release.

Your Performance Data, Your Way With Custom Charts in SentryOne Portal

Our product and engineering teams have spent a significant amount of time over the past year working on a new dashboard experience in SentryOne® Portal to give you the upper hand in monitoring your servers and diagnosing performance issues. Providing control over the way data is displayed is one of our most requested features, and we’re excited to satisfy this request with custom charts.

New Pandora FMS features and improvements

Today we are here to make a small compilation of the new Pandora FMS features launched throughout this last year 2020, a general review of all the small advances that we incorporated and that will be useful when you are at the controls of our software. Mainly we are going to deal with new features but also great improvements in quality of use that we added to Pandora FMS throughout 2020.

A Day in the Life: Intelligent Observability at Work with an ITOps Hero

This is the second in a series of blog posts exploring the role that intelligent observability plays in the day-to-day life of smart teams. In this post, meet our clever ITOps engineer, James, as he reduces noise and distraction using intelligent observability.

Observability and Monitoring for Modern Applications

I drive a 2005 Ford diesel pickup truck. Most of the time my truck runs great. But occasionally an orange light on the dashboard will flicker on to alert me that something is wrong. Unfortunately, there’s no information about what is wrong and why. My truck has a monitoring solution, but not an observability solution. In many cases, IT has the same problem as my truck.

Observability vs. Monitoring: What's the Difference?

One of the more delicate debates in the DevOps world is what observability has to do with monitoring. Is observability just a trendy buzzword that means the same thing as monitoring? Is observability an improved version of monitoring? Are monitoring and observability different types of processes that solve different problems? The answer to those questions depends in part on your perspective.

Service Map & Dashboards Provide Insight into Health and Dependencies of Microservice Architecture

With almost every blog you read about monitoring, troubleshooting, or more recently, the observability of modern application stacks, you’ve probably read a statement saying that complexity is growing as a demand for more elasticity increases which makes management of these applications increasingly difficult. This blog will be no exception, but there’s a good reason for that: we just enabled the first Sumo Logic customers with powerful new tools to tackle these exact challenges.

Your guide to SSL certificates as an online customer

We’re all familiar with the internet, especially since we use it to do almost all of our daily activities. Since the days of that familiar buzzing noise of AOL dial-up as it connected to somewhere out there in the stratosphere, we’ve been hooked on the internet and its vast space that holds endless amounts of information, ready for us to tap into right at our fingertips.

Centralized Log Management for Cloud Streamlines Root Cause Analysis

Cloud services make the daily tasks of business easier. They enable remote workforce collaboration, streamline administrative tasks, and reduce capital costs. However, these “pros” come with a few “cons.” The IT stack’s increased complexity means staff work across divergent log management tools when something breaks. Centralized log management for the cloud makes root cause analysis easier by aggregating all event log data in a single location.

Control Your Logging Spend With Usage Quotas

We built LogDNA around the idea that developers are more productive when they have access to all of the logs they need, when they need them. However, we also know that log management can get expensive fast. And, for anyone who owns the budget for developer tools, logs can be an unpredictable line item as you try to determine your monthly, quarterly or even annual spend.

10 Server Maintenance Tips for Efficient Server Maintenance

Just like your commercial vehicles or HVAC systems, servers require regular maintenance to ensure they are operating effectively and optimally. So, we decided to compile a list of server maintenance tips you should be doing. Keep in mind, these server maintenance tips are meant to be used as a guide to help you develop your server maintenance checklist and schedule.

A CIO's View of Observability: The Key to Balancing Strategic and Operational Needs

IT executives are being invited to play critical, strategic roles in the enterprise. The combination of disruptive threats, transformational momentum, and the pandemic that accelerated both have thrust you into the limelight. But these same drivers have also made your job exponentially more challenging. The need for technology to play a strategic role in every nook and cranny of the enterprise has resulted in a far-flung, ever-more-complex, and dynamic technology stack - that you must operate flawlessly to deliver competitive advantage.

Script Writing: A Crash Course on User Journey Selectors

User Journeys are a powerful tool for ensuring key processes across your site are working correctly. They follow a scripted set of instructions to interact with your pages like a human visitor does – and to identify issues as they come up. We offer a “Managed Service” for looking after your User Journey scripts, or you might prefer to use our script builder to write your own.

With Flutter and Sentry, You Can Put All Your Eggs in One Repo

This month we’re updating several of our mobile SDKs. You might think it’s madness… Mobile March Madness. First up is Flutter. It’s fair to say that all of us have had a bad mobile experience which frustrated us enough to warrant switching apps. Getting the experience right requires a lot of work due to the variety of OSes, screen sizes and orientations.

Users Monitoring - Monitor Remote Users and Analyze Efficiency

Something which the digitalization trend did not manage to do, was done by the global pandemic. And this is the global switch to remote work, remote learning and remote collaboration. Many companies and industries were thinking that allowing their staff to work remotely is not possible and will harm more than bring benefits. But when the pandemic started, many organizations needed to make a fast transition to remote work and remote work organization.

Write Prometheus queries faster with our new PromQL Explorer

We are announcing the new PromQL Explorer for Sysdig Monitor that will help you easily understand your monitor data. The new PromQL Explorer allows you to write PromQL queries faster by automatically identifying the common labels among different metrics. It also allows you to interactively modify the PromQL results by using the visual label filtering

Start Kubernetes monitoring in 5 minutes with Netdata

While Kubernetes (k8s) might simplify the way you deploy, scale, and load-balance your applications, not all clusters come with "batteries included" when it comes to monitoring. Doubly so for a monitoring stack that helps you actively troubleshoot issues with your cluster. You need robust Kubernetes monitoring, but you don’t want to spend a week setting it up, much less a single valuable day.

Seven KPIs for AIOps

Leaders looking to measure the benefits of AIOps and build key performance indicators (KPIs) for both IT and business audiences should focus on key factors such as uptime, incident response, remediation time and predictive maintenance, so that potential outages affecting employees and customers can be prevented. Business KPIs connected to AIOps include employee productivity, customer satisfaction and web site metrics such as conversion rate or lead generation.

Analyze your tracing data any way you want with Sumo search query language

It’s been almost a year since I shared some thoughts about distributed tracing adoption strategies on this blog. We have discussed how different approaches between log vendors and application performance management (APM) vendors exist in the market and how important that is to allow users to analyze the data, including custom telemetry, the way they want.

The Cost of Racing Toward Success

LogDNA recently celebrated 5 years since our launch in Y Combinator and during this half-a-decade we’ve learned several lessons about balancing cost and scalability. As a founder, here are the top 3 things I wish someone had told me as we were racing towards success. The appeal of building a cloud-native application for a startup is a no brainer—it’s agile, scalable, and can be managed by a distributed team. Not to mention, it’s the cheapest way to get off the ground.

ManageEngine makes the cut again for unified endpoint management

There’s no doubt in my mind that Gartner Midmarket Context: Magic Quadrant report is the most important of all Magic Quadrants up to this point. With COVID-19 forcing a large amount of the workforce worldwide to move from their offices to work-from-home environments, unified endpoint management and security has been essential in enabling businesses to continue to operate securely. At ManageEngine, we are constantly evolving our solutions to meet these dynamic market needs.

How to View Office 365 Service Health in CloudReady?

In today’s modern IT world, enterprises are looking to not only streamline their global monitoring operations but also facilitate access to reporting capabilities that provide insight into usage, uptime, and availability of SaaS services. Covid disrupted the work culture and our daily lives. With so many of us working from home, IT leaders and executives are now more than ever interested in ensuring that the cloud services their team relies on are available.

Monitor Juniper network devices with Datadog

Juniper Networks provides a range of IT network and security devices, including routers, switches, access points, and firewalls. As you scale your on-prem infrastructure with potentially thousands of devices distributed across multiple locations, getting visibility into your entire network can easily become a pain point.

Accelerate your logs investigations with Watchdog Insights

If you’re investigating an incident, every minute means degraded performance or even downtime for customers. The causes of an issue often come from parts of your systems and applications that you would not think to check, and the sooner you can bring these to light, the better.

How Grafana and Prometheus work together

Let us get an insight on how Grafana and Prometheus work together for monitoring metrics. Application monitoring is a crucial feature for any successful software offering. Application monitoring in its simplest form refers to collecting metrics on an application and using those metrics to gain an insight to improve the performance and efficiency of the application. Think of it as a cycle. Grafana and Prometheus are probably the most prominent tools in the application monitoring and analytics space.

How hard is it to run your own Graphite?

How hard is it to run your own Graphite? Here’s one story of what someone had to go through! In this article, I want to share my story about what I went through to run my own Graphite and give some advice on how much you should be prepared both mentally and technically... ESPECIALLY if you are not a Graphite expert.

How to monitor AWS Lambda

How do we get started on monitoring AWS Lambda? Let me first introduce you to the term serverless computing. It doesn't matter if you have been in the tech industry only a few months, or you started writing code when Pascal was still considered cutting edge, you probably would have heard the term serverless computing thrown around in recent times. But what exactly is serverless computing?

Icinga for Windows: Management Console Preview (Experimental Feature)

Today we are very excited to share with you our new experimental feature for Icinga for Windows: The Management Console Our goal with this feature is to make the entire configuration and management of the Icinga Agent as well as the installation, distribution and automation as easy as possible - for all Icinga for Windows components. Let us know what you think about this feature!

Correlate Your Metrics, Logs & Traces with the curated OSS observability stack from Grafana Labs

Correlation between metrics, logs, and traces should be as effortless as possible. This helps you make better decisions and actions. The Grafana Labs open-source observability stack enables powerful correlations between your metrics, log, and traces. The key here is to have consistent metadata across the three pillars of observability. Let me demo you how this works in this video.

New Feature: Searching Through Samples in AppSignal

Your wishes are being granted. Search is now available for all AppSignal customers. 🎉 You can now quickly find specific samples inside of AppSignal. This is especially useful when searching for an error/slow request for a particular customer in a specific revision or request ID. You can access the search in AppSignal from any screen in an application. It’s located in the dark top bar. Let’s take a look at the new search in action.

Announcing Alert Grouping for the AIOps Early Warning System

Available for Enterprise and Enterprise MSP customers, the new Header Graph (Beta) feature is being rolled out in the v148 release. This time-series graph allows for easy alert grouping to cut down troubleshooting time and quickly identify the resources that are causing an alert storm.

New in Grafana 7.4: Export usage data to Loki to help manage dashboard sprawl and troubleshoot faster

We first released the usage insights Enterprise feature in Grafana 7.0 based on feedback from customers that they would like to better understand how their users are interacting with Grafana, including the dashboards they visit, the information they query, and where they run into issues. What we learned was that dashboard sprawl is a real issue: Administrators estimate that almost 60% of dashboards might not be used at all.

All together now: Bringing your GKE logs to the Cloud Console

Troubleshooting an application running on Google Kubernetes Engine (GKE) often means poking around various tools to find the key bit of information in your logs that leads to the root cause. With Cloud Operations, our integrated management suite, we’re working hard to provide the information that you need right where and when you need it. Today, we’re bringing GKE logs closer to where you are—in the Cloud Console—with a new logs tab in your GKE resource details pages.

Metricbeat Deep Dive: Hands-On Metricbeat Configuration Practice

Metricbeat, an Elastic Beat based on the libbeat framework from Elastic, is a lightweight shipper that you can install on your servers to periodically collect metrics from the operating system and from services running on the server. Everything from CPU to memory, Redis to NGINX, etc… Metricbeat takes the metrics and statistics that it collects and ships them to the output that you specify, such as Elasticsearch or Logstash.

Doubling Down: What It's Like Contributing to Open Source at Logz.io

Logz.io has always prided itself as a company pushing the use of open source tech. As we have moved to expand our reach with metrics and traces over the past year and a half, we have doubled down on our own contributions to the community. With (distributed) traces in particular, we have been able to forge ahead. Our relationship with the teams at Jaeger and OpenTelemetry have really blossomed (and we are kind of proud to have supported the latter in the run-up to the OpenTelemetry v1.0 release).

How to Monitor Microsoft SharePoint Online Performance

In this video, we’ll cover the basics of getting started with Exoprise CloudReady and how to set up your first sensor to monitor Microsoft 365 SharePoint from your own locations or behind the firewall. You will learn how to quickly install the management client, add a private site, deploy a SharePoint sensor and visualize the data in the CloudReady platform all in under 5 minutes. CloudReady supports deploying private sensors behind your firewall or public sensors in the cloud for synthetic transaction monitoring. Service watch, on the other hand, can be deployed for real user monitoring of remote user issues.

Most Essential Tools Help Track Website Performance

Websites and web applications are the modern equivalents to storefronts, business cards, road show booths, newspapers, markets, bulletin boards, software installed on the client’s machine, and much more. Being a business-critical component, and sometimes the business itself in the case of SaaS applications, a website or app experiencing any downtime or disruption can have serious financial implications (aka clients and prospects leaving).

5 Reasons You Don't Need a Management Title to Be a Leader

Leaders can be categorized, somewhat, by their motivation. I recall an offsite meeting a few years ago that began with everyone contemplating the question, "Why do you lead?" There were about 15 people in the room, and each attendee could be grouped in one of the following categories: Nothing is wrong with any of these categories necessarily. They’re all concepts that can drive either positive or negative outcomes.

What is Observability

As IT environments become more complex, enterprises running business-critical workloads in dynamic environments need to ensure the performance and reliability of their applications. This is where observability comes in. Observability is the ability of the internal states of a system to be inferred from external outputs. Without it, your team’s productivity could be greatly diminished.

SRE Survey 2021: Where do we go from here

What a difference a year makes. In a matter of 365 days, the entire planet stared down at uncertainty, and while most of the world is far from recovered, we are starting to see a time where some level of normalcy will return. But what will this look like? How will the past year transform our social interactions, our time out of the house, and how we conduct business?

Refine Your Observability Experience at Scale

Today, we announced that Refinery is now generally available. With Refinery, it’s now easy to highlight the critical debugging data you need and to stop paying for the rest. Refinery is a sampling solution that lets you control resource costs at scale without sacrificing data fidelity. Support for Refinery is now also included in Honeycomb Enterprise plans.

Grouping AWS Lambda functions with Dashbird Project View

One of the serverless best practices is one-purpose functions. You should keep your Lambda functions small and solve exactly one use-case. This way, you can optimize them better and keep potential security problems contained. But creating many small functions can get overwhelming quickly. Even small projects can end up with more than 20 Lambda functions.

Enriching Splunk Contact Center Analytics with uberAgent Endpoint Monitoring

Like many other industries, contact centers are increasingly relying on employees working from home. The WFH trend poses new challenges, but it also surfaces issues that were largely ignored before. This article explains how holistic monitoring with Splunk Contact Center Analytics and uberAgent help drive exceptional customer service.

Exploring the Value of your Google Cloud Logs and Metrics

With our ability to ingest GCP logs and metrics into Splunk and Splunk Infrastructure Monitoring, there’s never been a better time to start driving value out of your GCP data. We’ve already started to explore this with the great blog from Matt here: Getting to Know Google Cloud Audit Logs. Expanding on this, there’s now a pre-built set of dashboards available in a Splunkbase App: GCP Application Template for Splunk!

How to Set Up Cisco AnyConnect for Your VPN

Because the world continues to work from home this year, I’ve had to configure Cisco AnyConnect VPNs on ASA firewalls for clients a few times. Unfortunately, the documentation from Cisco is extremely confusing, and I’ve seen a lot of organizations that do it wrong (by which I mean insecurely). The process itself is quite simple, though, so let’s go through the steps you’ll need to configure Cisco AnyConnect for your VPN.

Analyze JMX to Better Assess The Health Of Your Java Applications

Java Management Extensions, or JMX, was first added to J2EE, and it has been part of J2SE since the 5.0 release. The JMX API aims to provide a standard for monitoring and managing Java-enabled applications and services. In this article, we will explain the JMX architecture and show you how to pull the metrics that it generates into your Sumo Logic account in order to gain unique insights and a more thorough understanding of the health of your application and services.

Microsoft Explores the 'Future of Mixed Reality' at IT Conference

Early this month, Microsoft will host the second part of its Ignite conference aimed at IT professionals and developers. The livestreamed event, which begins on March 2, will feature a number of exciting trends in technology innovation, but the industry has its sights on one element in particular: a presentation on mixed reality.

Key metrics for monitoring AWS Fargate

AWS Fargate provides a way to use AWS container orchestration services—Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS)—without needing to provision and maintain the infrastructure that runs your containers. Fargate is similar to serverless container platforms from Google (Cloud Run) and Microsoft (AKS virtual nodes).

How to collect metrics and logs from AWS Fargate workloads

In Part 1 of this series, we showed you the key metrics you can monitor to understand the health of your Amazon ECS and Amazon EKS clusters running on AWS Fargate. In this post, we’ll show you how you can: You can use Amazon CloudWatch and related AWS services to gain visibility into your ECS clusters and the Fargate infrastructure that runs them.

AWS Fargate monitoring with Datadog

In Part 1 of this series, we looked at the important metrics to monitor when you’re running ECS or EKS on AWS Fargate. In Part 2 we showed you how to use Amazon CloudWatch and other tools to collect those metrics plus logs from your application containers. Fargate’s serverless container platform helps users deploy and manage ECS and EKS applications, but the dynamic nature of containers makes them challenging to monitor.

Sponsored Post

Microsoft Teams Optimization for a Remote Workforce

Microsoft Teams is everywhere. Not surprisingly, during the pandemic, the number of daily active users for Teams increased to 75 million in 2020. More and more people are WFH and companies are becoming virtual. Personal meetings are fading now, and Teams poises to become the next best collaboration tool. According to a Riverbed study, 64% of US employees are now working from home because of the Covid pandemic. In turn, Microsoft Teams optimization has become a critical topic for Operations and Network personnel.

Announcing AppSignal for Ruby Gem 3.0!

We’re very happy to present you with version 3.0 of AppSignal for Ruby - a new major release for the Ruby gem. 🎉 We have changed the way we instrument apps and gems to provide better compatibility with other instrumentation gems. Support for Ruby version 1.9 has been removed and deprecated classes, modules, methods, and instrumentations have also been removed. Read our upgrade guide! In the rest of the post, we’ll explain what the new version of our gem brings to you and your apps.

Azure Management Talk: Application Observability in a Distributed world

In this session, Chris Reddington will provide an overview of Application Insights and how it slots into the wider Azure Monitoring ecosystem. We will explore Alerts, Metrics, Queries, Dashboards, Workbooks and more, and how Application Insights can bring clarity to a distributed cloud deployment.

LogicTalks: Why ATSG's Approach to Monitoring Matters

In this episode of LogicTalks, Michael Tarbet, VP of Sales at LogicMonitor is joined by Scott Mayers, Sr. VP of Cloud and Managed Solutions at ATSG. The pair connect to discuss why LogicMonitor is invaluable to ATSG’s daily operations as a managed services provider. From keeping tabs on 10s of thousands of endpoints, consolidating a plethora of monitoring tools into one platform for greater visibility and ease of use, and leveraging AI powered alerting and forecasting, LogicMonitor provides ATSG with the enterprise grade SaaS monitoring solution to it needs to support its customers 24/7, worldwide.

Dashboard anything for free: SquaredUp Dashboard Server Public preview available now

How would your IT team be transformed if you could dashboard anything, for free? If you’re an IT pro and want an enterprise dashboarding tool that’s quick to implement, easy to set up, and effortless to maintain, you need SquaredUp Dashboard Server! The public preview of SquaredUp Dashboard Server just went live! Dashboard Server functions independently of SCOM and Azure and introduces a new PowerShell tile to take dashboarding a big step further. Now you can dashboard virtually any data.

Logging Errors in Web Workers

Release 3.8.0 of the TrackJS browser agent added support for Web Workers, which adds some awesome new observability to the background tasks of your web applications. Many development teams have adopted Web Workers to their web applications to add offline support, caching, or to process heavy tasks. Workers allow web apps to feel faster by removing work from the user interface thread.

Expert Guide to Redis Monitoring

Redis is an open source. in-memory data structure store with blazing performance that’s used as a database, cache, and message broker. Redis is licensed under BSD (Berkeley Source Distribution), which means it can be used for free with some minimum use restrictions. It supports a good number of abstract data structures, such as strings, maps, lists, and so on. Redis, which is used as a database to store data, is fast in performance.

Gain Visibility into Performance Across Your Data Estate with SQL Sentry Premium Edition

Earlier this year, we announced our plan to release new SQL Sentry editions that would help data professionals not only get started easily with SQL Sentry but also gain visibility across their data estate. We ended up making some tweaks to our SQL Sentry editions following that announcement, and we are excited to introduce both a brand-new edition and an updated edition of our flagship database performance monitoring solution.