Operations | Monitoring | ITSM | DevOps | Cloud

Latest Blogs

Reliability Monitoring For Improved Digital Experience

Monitoring methodologies evaluate application reachability, availability, performance, and reliability to measure digital experience accurately. Only measuring one or the other will offer a skewed view of the end-user experience. For example, higher availability is not the sole indicator of a good end-user experience. At the same time, reliability is a critical performance indicator for service providers.

SaaS vs onPremise: Pros, Cons and Cost Analysis

Be aware that we’re not saying that you are in cloud nine, but that you may most likely be using the cloud. That is, if you use Google mail, Microsoft Office 365 office suite or you take a photo with your cell phone and then it gets automatically uploaded to iCloud or something similar, you are using the cloud.

ProblemChild: Detecting living-off-the-land attacks using the Elastic Stack

When it comes to malware attacks, one of the more common techniques is “living off the land” (LOtL). Utilizing standard tools or features that already exist in the target environment allows these attacks to blend into the environment and avoid detection. While these techniques can appear normal in isolation, they start looking suspicious when observed in the parent-child context. This is where the ProblemChild framework can help.

Careers at a Crossroad: Staying Technical vs. Heading into Management

There’s a point in every IT professionals’ career where they inevitably ask themselves,“do I want to stay technical, or get into management networking jobs?” Sometimes this point occurs when they find themselves already are in management, either by design, or as I like to say, “by accident”.

Elixir SDK for ConfigCat

One of the great things about SaaS applications is that users in the platform automatically have access to any available software updates. Yet, having a beta program requires a separate environment, creating a potential challenge for users and development teams. In this context, having a tool where you can control features and flag certain users is important because sometimes features are too early or not relevant for all users.

Security Log Management Done Right: Collect the Right Data

Nearly all security experts agree that event log data gives you visibility into and documentation over threats facing your environment. Even knowing this, many security professionals don’t have the time to collect, manage, and correlate log data because they don’t have the right solution. The key to security log management is to collect the correct data so your security team can get better alerts to detect, investigate, and respond to threats faster.

Announcing the LogDNA and Sysdig Alert Integration

LogDNA Alerts are an important vehicle for relaying critical real-time pieces of log data within developer and SRE workflows. From Slack to PagerDuty, these Alert integrations help users understand if something unexpected is happening or simply if their logs need attention. This allows for shorter MTTD (mean time to detection) and improved productivity.

SRE vs. DevOps [Understanding Differences & Similarities]

Site Reliability Engineering (SRE) and DevOps share a goal of building a bridge between development and operations. We'll explore and compare both approaches. Wondering to yourself, which is better for your company, SRE or DevOps? Neither SRE or DevOps is “better,” exactly, since they’re similar yet different in a few key ways: SRE, or site reliability engineering, is a methodology developed by Google engineer Ben Treynor Sloss in 2003.

Make your Onboarding Experience Better with a Murder Mystery Game

Onboarding a new tool can be boring. Or stressful. Or both. When onboarding an incident response tool, it can be difficult to make sure that your team is getting the most from the experience. Do you opt for a run-of-the-mill meeting, or try to learn while in an incident? Neither option is ideal. That’s why Petal’s DevOps Engineer Michael Cole found a new way to get his team using Blameless for their incident response process.