Operations | Monitoring | ITSM | DevOps | Cloud

HAProxyConf 2019 - Fully-Automated Deployment of Anycasted Load Balancers with HAProxy and Python

Keeping your service configuration aligned over hundreds of hosts is never a simple task. This talk will illustrate how the University of Paderborn automated the integration of HAProxy into our infrastructure. As our current generation of load balancer appliances approached the end of life and we thought about improving how we managed our services, our goal was clear: we needed a scalable, consistent, active-active setup of load balancers that could be easily automated with open-source tools. We achieve scalability with Anycast but needed to make sure the configurations could keep up with application changes.

Maintaining consistency in codebases with Go Vet

Maintaining success in a large open source project is one of the key objectives of Mattermost. We have hundreds of contributors and we want to create a project that could serve as a model in the Go community. Having said that, following idiomatic Go principles is the thing that we care most about while maintaining our code consistency. For this specific task, we utilized go vet and with this blog post, I would like to explain how we pushed the limits of this tool by extending it.

How to Efficiently Detect Domain Generation Algorithms (DGA) in Kubernetes with Calico Enterprise

2020 is predicted to be an exciting year with more organizations adopting Kubernetes than ever before. As critical workloads with sensitive data migrate to the cloud, we can expect to encounter various Advanced Persistent Threats (APT) targeting that environment.

Serverless CI/CD: How we added a staging step

Unit tests and integration tests are vitally important, but sometimes even those aren’t sufficient to ensure that critical services in your application will function smoothly in production. In those cases, adding a staging step to our CI/CD process allows us to test a feature with real data in a less supervised environment. For example, here at Lumigo we decided to use it for our Node.js tracer.

How We Use PagerDuty for Emergency Response

PagerDuty is known as the platform for driving real-time work, and with the current global spread of COVID-19, many of our customers have been asking how we leverage PagerDuty internally to intelligently coordinate a response to emergency situations (such as this) as they arise. PagerDuty customers primarily leverage our platform for coordinating an incident response process when technical issues happen, such as a bad deployment, network degradation or failed hardware.

How to get started with Elasticsearch Service on AWS GovCloud

We’re happy to announce the beta availability of our new government region, AWS GovCloud (US East), for the Elasticsearch Service on Elastic Cloud. This new region is our first step in simplifying operations for Elastic users who handle government data as we work toward gaining a Moderate authorization for the Federal Risk and Authorization Management Program (FedRAMP).

Jaeger data analytics with Jupyter notebooks

In the previous blog post Data analytics with Jaeger aka traces tell us more! we have introduced our data science initiative and platform. The ultimate goal is to develop new functionality within the Jaeger project based on AI/ML that will provide new insights into our applications. This type of functionality is also referred to as AI operations (AIOps). Jupyter notebooks provide a simple user interface for experimenting with data.

How Auvik Can Help Keep Networks Steady During the COVID-19 Pandemic

Your phone is ringing off the hook. Another client is calling, trying to figure out how to ensure their team can continue to be productive in the wake of the COVID-19 pandemic. Even though these are extraordinary times, your role as the IT professional hasn’t really changed. Your job is to make sure your clients are productive and they can still access the resources, tools, and applications they need to do their jobs.

Announcing Ticketing

Incidents come up quickly and tracking critical tasks to be done in the moment and after an incident is resolved it can be challenging to keep up with what was done by who during an incident and what tasks still need to be completed. In an effort to continue simplifying your incident response process today we are happy to announce an overhaul of ticketing and task tracking on FireHydrant along with a major overhaul of our JIRA integration.

The Top 9 Best Practices for Monitoring Your Server

Has your phone gone in the middle of the night when your boss is calling because the server is down? Maybe you wake up to a tone of text messages that something is wrong with the server? If you have encountered this, then you know the importance of monitoring your server so that you are not the last to know when there is a problem. Part of server monitoring is putting best practices in place to ensure you are prepared for the unexpected.