Amazon FSx for Windows File Server is a fully managed file storage service built on Windows Server. Migrating on-premise Windows file systems to a managed service like FSx enables organizations to reduce operational overhead and take advantage of the flexibility and scalability of the cloud. But having visibility into file access activity across their environment is key for security and compliance requirements, particularly in sectors such as financial services and healthcare.
So you finally launched your service worldwide, great! The next thing you’ll see is thousands and thousands of people flooding into your amazing website from all corners of the world expecting to have the same experience regardless of their location. Here is where things get tricky. Having an infrastructure that will support the expansion of your service across the globe without sacrificing user experience is going to be real though as distance will introduce latency.
Running any application in production assumes reliable monitoring to be in place and serverless applications are no exception. As modern cloud applications get more and more distributed and complex, the challenge of monitoring availability, performance, and cost get increasingly difficult. Unfortunately there isn’t much offered right out of the box from cloud providers.
Incident management is the process of identifying and resolving problems that occur in IT services. Incident Management is also used as a metric to measure the health of the IT Service Desk. Let’s discuss what incident management is, why it matters to your business, and how you can apply it to your organization.
The world of work has changed dramatically over the last year and, at Redgate, we’ve been reflecting on what this might mean for our business, and the ways that we work together in the future. From 2022 onwards, we’re aiming to offer Redgate staff a flexible-hybrid working model across our global organization.
Another day, another drama! This one, though, is very much of my own making. I have been wanting to try my hand at a bit of chaos engineering for some time now but C&Js just hasn’t been ready. Sarah’s been up for it too, though, at Animapanions. And now that our CIO, Charlie has seen MTTR drop across every single technology team, thanks to the rollout of Moogsoft and the new incident management system (kudos to James), it’s pilot day.
The “shared responsibility” model of the cloud puts most of the control with Microsoft, despite the MSP being responsible. As shown below, Microsoft puts very little responsibility into the hands of the customer (or, in your case, the MSP), which is why most MSPs stick to the basic onboarding-related tasks as their Microsoft 365 offering. But the customer’s reliance on Microsoft 365 has changed in the last 14 months… and so have their expectations.
With Kubernetes emerging as a strong choice for container orchestration for many organizations, monitoring in Kubernetes environments is essential to application performance. Poor application/infrastructure performance impact in the era of cloud computing, as-a-service delivery models is more significant than ever. How many of us today have more than two rideshare apps or more than three food delivery apps?
The Splunk Threat Research team has researched two of the current payloads involved in these heinous campaigns against healthcare and first responder organizations such as Conti & REvil. In the first blog, we explored the REvil ransomware group and in this blog, we will explore Conti.
Although the fundamental concepts of site reliability engineering are the same in any environment, SREs must adapt practices to different technologies, like microservices.
As we indicated in our previous blog, AIOps (Artificial Intelligence for IT Operations) refers to the application of machine learning analytics technology that enhance IT operations analytics.
Establishing a Cloud Center of Excellence (CCOE) is an important milestone in every company’s cloud computing journey. While it is usually not the first milestone — which is focused on delivering value to customers — this milestone is reached as the company grows and scales. Turning ad-hoc cloud initiatives into a company-wide strategy, the CCOE helps the business to manage the cloud process and adopt best practices.
If using ML to optimize your SaaS application is not part of your cloud journey, you are in for a BIG surprise. Wasted money, degrading top and bottom-line growth, and a team exhausted by the vicious loop of triaging SLO violations once a new deployment hits production.
Shifting left security means you, the developer, catching and fixing vulnerabilities and license violations early in the SDLC. That’s why Xray scans binaries pushed to Artifactory by your builds, and alerts you when there are issues with your dependencies. But catching them earlier, even before checking in code, can be important for developers shifting left.
I have been involved in startups all of my career, and I want to know if the democratization of IT spending is still true. 5 or 6 years ago when I was a consultant in the infant DevOps space, it seemed there was a transformation taking place in the way enterprises acquire technology.
Today we’re announcing a new feature of VMware Tanzu SQL called Data Management for VMware Tanzu. It offers a convenient user interface that simplifies the operation, automation, and scalability of Tanzu SQL databases (Postgres and MySQL); this same convenience is also available from a comprehensive set of APIs.
Continuous integration seems like a smart choice, right? Why would anyone think that integrating your code into the product as soon as possible is a bad idea? Let me take you back to August 2000, when a fresh-faced young engineer was starting her first engineering role. She was given a desk, a computer, and a detailed project plan that included a release date three months in the future.
Rust is a powerful language built on the promise of performance and reliability. With no runtime or garbage collector, it easily runs in any environment and can be integrated into any existing language or framework. With the advent of WebAssembly . Rust has become even more valued in the web development space. Rust’s seamless peering with Node.js to build highly performant functionalities has made it a delight for web developers.
Last year, GitHub announced the release of their new CLI tool . The new gh CLI wraps around the standard git cli and offers a suite of additional GitHub.com specific commands. These new commands include the ability to create a new pull request and to create a release directly from your terminal. We here on the CircleCI Community and Partner Engineering team at CircleCI use the gh pr checkout command all the time to safely test pull requests from the community (you!) on our various orbs .
DX NetOps 21.2 network monitoring software continues to innovate and improve the scale, speed, and simplicity of network operations with a focused set of high-value features and capabilities. The latest release of Broadcom’s DX NetOps 21.2 network monitoring software delivers expanded capabilities to monitor and assure your software-defined networking (SDN), and network functions virtualization (NFV) deployments.
Happy System Administrator Appreciation Day! This special day was created in the 1990s to acknowledge and celebrate the tech gurus who assure that our computers, printers, and servers are working and in good condition. So, after thanking your hard-working system admin team, check out the latest in AIOps, ITOps, and IT infrastructure monitoring.
More personal and proprietary data is available online than ever before—and many malicious actors want to get ahold of this valuable information. Using an intrusion detection system (IDS) is essential to the protection of your network and on-premises devices. Intrusion detection systems are designed to identify suspicious and malicious activity through network traffic, and an intrusion detection system (IDS) enables you to discover whether your network is being attacked.
It’s been an exciting year here at Coralogix. We welcomed our 2,000th customer (more than doubling our customer base) and almost tripled our revenue. We also announced our Series B Funding and started to scale our R&D teams and go-to-market strategy. Most exciting, though, was last September when we launched Streamaⓒ – our stateful streaming analytics pipeline. And the excitement continues!
Edge computing is gaining huge momentum lately, and with the onset of 5G, the opportunities are endless. Moreover, it ensures or brings computation and data storage closer to where the data is generated, further enables better control, reduces costs, provides faster and actionable insights, and supports continuous operations. In fact, by 2025, 75% of enterprise data will be processed at the edge, compared to only 10% today, as predicted by Gartner.
Hiya! Our Elasticsearch team is continually improving our index Lifecycle Management (ILM) feature. When I first joined Elastic Support, I quickly got up to speed via our Automate rollover with ILM tutorial. I noticed after helping multiple users set up ILM that escalations mainly emerge from a handful of configuration issues. In the following sections, I’d like to cover frequent tickets, diagnostic flow, and common error recoveries. All commands shown can be run via Kibana’s Dev Tools.
As we’ve shown in a previous blog, search-based detection rules and Elastic’s machine learning-based anomaly detection can be a powerful way to identify rare and unusual activity in cloud API logs. Now, as of Elastic Security 7.13, we’ve introduced a new set of unsupervised machine learning jobs for network data, and accompanying alert rules, several of which look for geographic anomalies.
Jelastic MariaDB-as-a-Service is a result of our many year experience with MariaDB hosting and analysis of the best practices on the platform. It is an automation that does all the "hard" work behind the scenes, providing you with a ready-to-work solution in a matter of minutes. MariaDB-as-a-Service powered by Jelastic PaaS offers numerous benefits, among them.
Cloud Operations Sandbox serves as a simulation tool for budding SREs to learn the best practices from Google and apply them to real cloud services. In this blog, we have compiled a list of FAQs surrounding the use of Google's Cloud Operations Sandbox. The Google SRE sandbox provides an easy way to get started with the core skills you need to become a SRE.
In our guide on the best Grafana dashboards examples, we wanted to show you some of the best ways you can use Grafana for a variety of different use cases across your organisation. Whether you are a software architect or a lead DevOps engineer, Grafana is used to make analysis and data visualisation far easier to conduct for busy engineering and technical teams throughout the world.
The SIEM is a central point where data is collected and correlated, and as we move to consume more cloud services and data sets the SIEM itself must also change in architecture. Architecture change is hard to make for existing products. Calling a product a ‘cloud solution’ is not the same as taking an on-premises product and hosting it for customers. It means building a new SIEM for a new world. There are a lot of reasons users seek new SIEMs.
This release brings 56 enhancements, an increase from 50 in Kubernetes 1.21 and 43 in Kubernetes 1.20. Of those 56 enhancements, 13 are graduating to Stable, a whopping 24 are existing features that keep improving, and 16 are completely new. It’s great to see so many new features focusing on security, like the replacement for the Pod Security Policies, a rootless mode, and enabling Seccomp by default. Also, watch out for all the deprecations and removals in this version!
In one of my previous blogs I explained how important it is for a modern observability platform to provide “the observers” full, flexible access to all raw telemetry. Observability’s promise to find unknown unknowns relied directly on the ability of fast, powerful and multidimensional high-cardinality analysis of raw data, to uncover previously unknown patterns that have not yet been visualized as a metric, dashboard panel or an alert or anomaly event.
All code and no logging makes your application a black box system. Similarly, all logging and no monitoring makes analyzing performance complicated and inconvenient. The goal is to achieve better visibility into the operations of your application, its status, performance, and overall health. Making this information easily accessible presents more context about the critical incidents and surfaces actionable insights for optimizing performance.
Back in March, we announced that Grafana Labs was partnering with Elastic to build an official Elasticsearch plugin for Grafana. As our CEO Raj Dutt wrote at the time, our “big tent” philosophy “means that we want to support data sources that our users are passionate about. Elasticsearch is one of the most popular data platforms that can be visualized in Grafana.”
For five years running, Rust has taken the top spot in Stackoverflow’s survey of most loved programming languages. Seen by many as the next step after C/C++, the language is fast becoming embraced by embedded device developers and as a robust system for IoT. At JFrog, we took notice and are eager to welcome Rust developers to the empowerment of robust binaries management and how it contributes to continuous integration.
Software package repositories are becoming a popular target for supply chain attacks. Recently, there has been news about malware attacks on popular repositories like npm, PyPI, and RubyGems. Developers are blindly trusting repositories and installing packages from these sources, assuming they are secure.
We’re excited to announce that our Configuration API and Terraform Provider are now generally available for all LogDNA customers. We received tremendous feedback from our public beta release and, based on that feedback, we are enabling several new features with the GA release that allow for more programmatic workflows with LogDNA. First, we are enabling Preset Alerts as a new resource that can be configured with the configuration API as well as within Terraform.
If there’s one thing folks working in internet services love saying, it’s: "Yeah, sure, but that won’t scale." It’s an easy complaint to make, but in this post, we’ll walk through building a service using an approach that doesn’t scale in order to learn more about the problem. (And in the process, discovering that it actually did scale much longer than one would expect.)
If you’re working with Kubernetes and the thought of searching for each new term you come across seems exhausting, you’ve come to the right place! This glossary is a comprehensive list of Kubernetes terminology in alphabetical order.
Being a software developer is an amazing job with a growing job field ensuring there is great security and need in the market. But, with a great job that is highly skilled comes the demand of having a high understanding of the tools and languages needed to properly develop software efficiently.
“Frodo, We Aren’t in the Shire Anymore”: The Importance of a Customer Journey & How to Avoid Wrecking It Fans of Lord of the Rings — otherwise known as “Ringers” — never grow weary of reading or watching Frodo and his fellow Hobbits journey through Middle Earth on an epic quest to Mordor (where rumor has it there now exists a very stylish Starbucks at the base of Mount Doom). Well, customers who visit a website are on an important journey as well.
SRE’s Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.
Cloud computing shapes the ability of enterprises to transform themselves and compete in the 2020s. By renting elastic cloud resources, enterprises can support new customer platforms, distributed workforces, and back-office operations. The cross-functional discipline of CloudOps helps enterprises realize the promise of cloud computing by optimizing applications and infrastructure on cloud platforms.
Continuous Packaging (CP) is a term that we use a lot at Cloudsmith, and it is one that we think will become a cornerstone in a secure software development process.
As software got more complex, more and more software projects rely on API integrations to run. Some of the most common API use cases involve pulling in external data that’s crucial to the function of your application. This includes weather data, financial data, or even syncing with another service your customer wants to share data with. However, the risk with API development lies in the interaction with code you didn’t write—and usually cannot see—that needs debugging.
July 30 is System Administrator Appreciation Day. We honor all 33,000 ServiceNow system administrators who help make the world of work better for people. From overseeing instance security updates and critical configurations to proactively maintaining instance health, you are critical to the success of ServiceNow projects. Thank you. To show our appreciation, we want to ensure you know how to prepare for every stage of the system administrator journey.
Recently, we released our new “Calico Certified Operator: AWS Expert” course. You can read more about why we created this course and how it can benefit your organization in the introductory blog post. This blog post is different; it’s an opportunity for you, the potential learner, to get a glimpse of just a few interesting parts of the course. You won’t learn all the answers here, but you’ll learn some of the questions!
Here at RapidSpike, we have an ever-growing list of integrations available to help manage incidents raised from all facets of our system. The latest addition to the roster being Splunk On-Call (formerly known as VictorOps).
Technological advances and emerging networking concepts are constantly shaping our IT infrastructure. Networks are no longer limited to traditional networking constraints such as its static nature, but are continually evolving to improve efficiency by spanning across wired, wireless, virtual, and hybrid IT environments. This IT evolution drives organizations to advance digitally and support computational requirements to meet their business objectives.
When developing large, customer-facing applications, it’s paramount to have visibility into real user behavior in order to optimize your UX. Without a direct view into what users are actually doing when navigating your app, it can be difficult to reproduce bugs and understand how aspects of your frontend design are causing user frustration and churn. With Datadog RUM’s Session Replay feature, currently available in beta, you can watch individual user sessions using a video-like interface.
The NiCE Log File Monitor Management Pack 2.0 is a FREE solution supporting the SCOM Community in next-level log file analysis. It helps IT performance and security data analysts identify errors causing transactions and queries to take too long or not run at all. Software-related bugs, security issues, or erroneous configurations that impact website or application performance are figured out quickly by employing improved templates for alert rules, performance rules, or monitors.
It’s no secret that the past year and a half has been challenging for all departments across organizations. One unsung hero? IT. IT is quite literally responsible for keeping the lights on in a work environment that has shifted primarily online.
Industry 4.0 is taking every industry by storm with unprecedented advancement with innovative technology solutions. Its key technologies such as automation, AI, ML, Data Analytics and IoT enables industries to drive business operations with automated data-driven intelligence. Integrating the physical and digital systems, the manufacturing industry is increasingly adopting intelligent manufacturing in this Industry 4.0 era.
Cox Automotive is a global company with over 40,000 auto dealer clients across five continents. The company, which houses Kelly Blue Book, Autotrader, and 25 other brands, was built through acquisitions. Its IT Operations team is tasked with bringing them together under the Cox Automotive umbrella and ensuring “a good, consistent experience” for its customers worldwide.
In the IT team at Redox, I wear two hats: Software Development Manager and DBA. I’m the only DBA in the team, so if anything goes wrong it’s my job to identify and fix it. As you might imagine, this can be challenging when being a DBA isn’t your full-time role. Based in Sydney, Redox is one of the leading chemical and ingredients distributors in the world, with over 350 employees across our locations in Australia, New Zealand, Malaysia, and the US.
Today we’re announcing the general availability of Icinga Web v2.7.6, v2.8.4 and v2.9.2. All are standard bugfix releases and include fixes found by the community since the latest releases. You can find all issues related to this release on our Roadmap. Please make sure to also check the respective upgrading section in the documentation. This release is accompanied by the minor releases v2.7.6 and v2.8.4 which include the fix for the flattened custom variables.
Welcome back to my series about metrics for optimization. In the last blog, I discussed the meaning of optimization, determining what you are trying to optimize, and learning how to discover if your enterprise’s optimization goal is a business or technology goal. If you did not get the chance to read my last blog, I suggest you do so as you need to understand what optimization is to progress to our next topic, which is the sources of metrics you should use.
Logz.io is proud to launch a new partnership with Microsoft that enables Azure customers to directly integrate with Logz.io’s platform from within the Azure Console. This integration importantly allows Azure developers to begin monitoring their workloads faster than ever before, using the open-source technologies that their teams love. Check out this video for a demonstration of how it works.
2020 heralded a year of increased complexity and customer demands, which isn’t going away. In this new normal, organizations will still be tasked with keeping up this break-neck pace. So, what did digital operations look like in 2020 compared to 2019?
Migrating to the cloud provides cost, scalability, performance, maintenance, and other engineering and IT benefits. Today, Amazon Web Services (AWS) stands out as the most popular cloud platform, offering an advanced public cloud with robust services that are easy to integrate with your existing workflows. While AWS strives to keep its tools simple to use, many users still require deep expertise to get their AWS environment set up properly and running smoothly.
In my previous blog post, I demonstrated how to use Prometheus and Fluentd with the Elastic Stack to monitor Kubernetes. That’s a good option if you’re already using those open source-based monitoring tools in your organization. But, if you’re new to Kubernetes monitoring, or want to take full advantage of Elastic Observability, there is an easier and more comprehensive way. In this blog, we will explore how to monitor Kubernetes the Elastic way: using Filebeat and Metricbeat.
At Grafana Labs, we are proud to be one of the largest code contributors to Cloud Native Computing Foundation projects. We are currently the leading company contributor to Prometheus, and also make substantial contributions to Cortex, Thanos, Jaeger, and OpenTelemetry. Our own open source projects — Grafana, Grafana Loki, and Grafana Tempo — have also become fundamental parts of the cloud native ecosystem.
We’re excited to launch our new integration with GitHub that supports GitHub Enterprise Server customers. This allows companies using GitHub Enterprise on their own domains to access key features in Rollbar that help developers fix errors faster. GitHub Enterprise offers a fully integrated development platform for organizations to accelerate software innovation and secure delivery. With Rollbar, GitHub Enterprise Server customers can now access.
This week in the Civo DevOps bootcamp, we covered what is DevOps & why it matters. There are no prerequisites for this bootcamp. The only thing we require is your passion to learn and explore. This session was undertaken by mentors highly experienced in their respective fields. Read on to learn more about it!
SSL is an acronym for Secure Sockets Layer. It's considered as a standard security technology, it serves as an Internet protocol by establishing an encrypted link between a browser and a web server. SSL is used to secure credit card transactions, data transfer and logins thereby preventing hackers from accessing private data.
Kubeflow is the open-source machine learning toolkit on top of Kubernetes. Kubeflow translates steps in your data science workflow into Kubernetes jobs, providing the cloud-native interface for your ML libraries, frameworks, pipelines and notebooks. Read more about Kubeflow
A year ago, in July of 2020, I started my SysAdmin Day post with the words, Here we are, 12 months later, and a lot has changed, but life (and tech) continue to be extraordinarily not-normal. The challenges we face as IT pros in general and SysAdmins in particular push us to our limits daily, and there’s no hiding or sugar-coating it.
In any case, by using the MITRE ATT&CK framework to model and implement your cloud IaaS security, you will have a head start on any compliance standard since it guides your cybersecurity and risk teams to follow the best security practices. As it does for all platforms and environments, MITRE came up with an IaaS Matrix to map the specific Tactics, Techniques, and Procedures (TTPs) that advanced threat actors could possibly use in their attacks on Cloud environments.
The CVE-2021-33909, named Sequoia, is a new privilege escalation vulnerability that affects Linux’s file system. It was disclosed in July, 2021, and it was introduced in 2014 on many Linux distros; among which we have Ubuntu (20.04, 20.10 and 21.04), Debian 11, Fedora 34 Workstation and some Red Hat products, too. This vulnerability is caused by an out-of-bounds write found in the Linux kernel’s seq_file in the Filesystem layer.
Ransomware attacks, the malicious code that attackers use to encrypt data or lock users out of their devices, have been rampant and are on the rise globally. The largest ransomware payout thus far in 2021 was made by an insurance company at $40 million. A more recent attack occurred in early July and was launched by a group called REvil. The immediate victim was a Florida company, Kaseya, that provides software to companies that manage technology for thousands of smaller firms.
JFrog customers will soon enjoy end-to-end, holistic security across their software lifecycle — from development to devices — as the technology of recently-acquired Vdoo gets integrated into the JFrog DevOps Platform. That was the pledge made by JFrog and Vdoo leaders during their first joint webinar, in which they explained why JFrog acquired Vdoo, how the platform’s security and compliance capabilities will expand, and what’s the integration timeline.
Thanks for returning to our series Metrics for Optimization. Once again, please be sure to read our first two blog posts, What is Optimization? and Which Sources of Metrics Should You Use for Optimization? In my first post, I taught you what optimization means, how to figure out what you are trying to optimize, and if you are focused on business goals or technology goals when performing this optimization.
Apache Cassandra is an open-source distributed NoSQL database management system that was released by Facebook almost 12 years ago. It’s designed to handle vast amounts of data, with high availability and no single point of failure. It is a wide-column store, meaning that it organizes related facts into columns. Columns are grouped into “column families.” The benefit is that you can manage data that just won’t fit on one computer.
As a developer, you’re going to be making changes to a codebase. That’s why, as Harold Abelson put it, “Programs must be written for people to read.” If a codebase is not clearly formatted, debugging becomes more difficult than it should be. Though usually overlooked, little changes like reformatting and proper indentation of your code can obviously differentiate a professional developer’s code base from someone just learning.
John Pagliuca, CEO of N-able, has taken issue in the press multiple times with the term digital transformation, preferring the term digital evolution. I agree that evolution is a better term. Digital transformation implies a one-time event; digital evolution acknowledges the ongoing nature of these changes. In short, the market will continue to change. How you adapt dictates whether you come out far ahead or remain with the status quo.
In December 2020, we blogged about security issues in Go’s encoding/xml with critical impact on several Go-based SAML implementations. Coordinating the disclosure around those issues was no small feat; we spent months emailing the Go security team, reviewing code, testing and retesting exploits, coming up with workarounds, implementing a validation library, and finally reaching out to SAML library maintainers and 20 different companies downstream.
Picture this: It’s a normal day of working from home as usual since the COVID-19 outbreak. After that satisfying cup of coffee, you log in. But something is wrong. No matter how many times you click, your files don’t open. Your screen is frozen and refuses to budge. And then, you see one of the worst nightmares any IT admin can imagine: “Oops, your files have been encrypted. But don’t worry, we haven’t deleted them yet.
As part of your monitoring and testing strategy, you may run tests on different types of applications that are not publicly available—from local versions of production-level websites to internal applications that directly support your employees. Testing each one requires leveraging tools that allow you to verify functionality across a wide range of devices, browsers, and workflows while maintaining a secure environment.
Datadog CI Visibility, now available in beta, provides critical visibility into your organization’s CI/CD workflows. CI Visibility complements Datadog’s turn-key CI provider integrations and the integration of synthetic tests in CI pipelines to give you deep insight into key pipeline metrics and help you identify issues with your builds and testing.
A VPN, allows remote employees to create a secure traffic connection to the corporate network. These connections essentially tunnel from a computer or mobile device through a VPN server, often through the public Internet. VPN technology has been around since the mid-1990s, but its usage is now going mainstream due to Covid. As Covid accelerates, it means new monitoring challenges for IT amid a high VPN adoption.
Break Things on Purpose is a podcast for all-things Chaos Engineering. In this episode of the Break Things on Purpose podcast, we speak with Paul Marsicovetere, Senior Cloud Infrastructure Engineer at Formidable.
Engineering teams can only be as efficient as the processes they employ during development. The need for increased efficiency is why software development has shifted from the “waterfall” approach to a more responsive, agile methodology. In an agile development environment, quality software can be delivered consistently to suit the ever-changing needs of stakeholders and end users.
Having a Status Page is like having a dog. A dog alerts you to an incident; sudden noise, approaching neighbor, squirrel… A dog sounds the alarm on an intruder. A dog even alerts you to maintenance by barking at every handyman, garbage truck, and gardener within sight. As a dog fetches the same stick over and over, so does a status page fetch the attention of your users – especially during a live incident – with each browser refresh they wait for the status to change.
When Blameless started in 2018, the team set out on a mission to help all engineers achieve reliability with less toil and risk. Three years in, that mission has become more important than ever. What has changed is the rate of SRE adoption, now the fastest growing team and practice inside engineering. This represents a clear recognition of the many upsides that an SRE practice brings with its combination of continuous learning, velocity, and resilience.
Over the past three years, we have served thousands of developers with our two major products, Thundra APM and Thundra Sidekick – and it still feels like we’re just getting started. We would like to thank all of our users and supporters who gave us the strength to build our one-of-a-kind products. And we are very excited to announce our latest innovation: Thundra Foresight!
Choosing the right APM tool is critical. How do you know which is the right one for you? Here are the top 13 open source application performance monitoring(APM) tools which can solve your monitoring needs.
I’ve recently been talking to some of the users of our eG Enterprise monitoring solution and its AIOps-powered root cause analysis platform. Multiple users have mentioned how the usability of the solution has been enhanced considerably by the availability of rich featured mobile apps for iOS and Android platforms.
Traditionally an APM tool, Scout has expanded its service offerings to now include error monitoring of Python web applications for more cohesive and actionable observability insights within a single platform. This new feature supports an overall better user experience by eliminating the need for multiple web-application monitoring services; Scout APM with Scout Error Monitoring offers performance and error insight and alerting within a single, integrated dashboard.
Covid made the hypothetical necessity of IT risk planning a reality. Many organizations responded to the immediate need for remote workforces by adding more VPN licenses. But while adding more VPN capacity solved the problem of resource access, it also led to network bottlenecks and application latencies.
The 2010 Stuxnet malicious software attack on a uranium enrichment plant in Iran had all the twists and turns of a spy thriller. The plant was air gapped (not connected to the internet) so it couldn’t be targeted directly by an outsider. Instead, the attackers infected five of the plant’s partner organizations, hoping that an engineer from one of them would unknowingly introduce the malware to the network via a thumb drive.
IT is not a technology cost, it’s a human resource cost! This is a fundamental concept MSPs need to keep in mind when they are looking at their businesses, but one many smaller MSPs tend to overlook. Think about it; for every business you’re supporting as an MSP, you’re doing so to ensure their IT infrastructure has stability of operation and is optimized to maximize staff productivity.
Software responds to modern, highly dynamic, and distributed systems differently. Grasping this concept presents various challenges for today’s customers. They must understand how it behaves in the field, how it responds to diverse situations and modifications, and they need to keep track of progress made on thousands of products.
Everybody hates it when they have to wait for an application to load—or when an application doesn’t load at all. And if this happens with your application, you’re not just losing business but also losing brand value. Most applications today are online. So servers play a crucial role in keeping applications up and running. Application performance is directly proportional to server performance. Hence, it’s very important to monitor and improve server performance.
GitHub, GitLab, Jira, and ServiceNow are some of the most popular software development tools out there, and Grafana has powerful integrations with each of them. Join us for a live webinar on July 29 at 9:30 PT / 12:30 ET / 16:30 UTC for a demo of these data source plugins and best practices for creating a single pane of glass for viewing your software operations metrics. You can register here.
Setting up and administering multiple servers for business and application purposes has become easier thanks to advancements in cloud technology. Today, enterprises are choosing to operate large numbers of servers both in the cloud and in their data centers to meet the ever-increasing demand. As a result of these changes, monitoring technologies have become crucial. In this post, we’ll explore the best server monitoring tools and software currently on the market.
In this new world of digital everything, new application versions usually mean that you’re going to get bigger and better features, more capabilities, and an uplifted user experience, right? When I talk to customers, many can’t wait to upgrade the PagerDuty integrations that they depend on to test new features. If you’re a PagerDuty for Slack user, the next-generation version of our Slack integration will certainly be an exciting development.
Detecting and preventing malicious activity such as botnet attacks is a critical area of focus for threat intel analysts, security operators, and threat hunters. Taking up the Mozi botnet as a case study, this blog post demonstrates how to use open source tools, analytical processes, and the Elastic Stack to perform analysis and enrichment of collected data irrespective of the campaign.
Welcome to another monthly update on what’s new from Sysdig! Happy 4th of July to our American audience, and bonne Bastille to our French friends. It’s been heating up in the northern hemisphere, so we hope you’ve all been managing to stay cool and safe. Our team continues to work hard to bring great new features to all of our customers, automatically and for free! The big news this month is our intent to acquire Apolicy, which has everyone full of excitement.
Development cycles are complicated. If you’re on a development team, whether you’re building out a custom application, maintaining and iterating on a growing microservice, or breaking ground on a new platform for a startup, you have your hands full. Log management, though seldom celebrated outside hardcore DevOps and IT circles, is still a well-known instrument among seasoned developers. It is insight into the internal workings of your processes as they are used.
00:00 - Intro
01:00 - Agenda
02:21 - Tour of the Automated Root Cause (ARC) screen
10:17 - Search feature on ARC screen
11:11 - Log view
13:02 - Environment view
14:01 - How to create a Jira ticket from the ARC screen
14:32 - Hide button
15:03 - Resolve button and resurfaced issues
16:13 - Labels and notes
16:30 - 3rd-party utility codes
17:01 - Data dashboard
20:00 - Review of new and critical events in data dashboard
SeriousSAM or CVE-2021-36934 is a Privilege Escalation Vulnerability, which allows overly permissive Access Control Lists (ACLs) that provide low privileged users read access to privileged system files including the Security Accounts Manager (SAM) database. The SAM database stores users' encrypted passwords in a Windows system. According to the Microsoft advisory, this issue affects Windows 10 1809 and above as well as certain versions of Server 2019.
Let’s check out together the features and improvements related to this new Pandora FMS release: Pandora FMS 756.
July is Hype Cycle season, the time of year when Gartner livens up the summer doldrums by updating its eagerly awaited Hype Cycle series of reports. This year’s Hype Cycles demonstrated OpsRamp’s growing brand recognition as we were listed as a representative vendor in eight different Gartner Hype Cycles.
The challenges involved in deploying and managing microservices have led to the creation of the service mesh, a tool for adding observability, security, and traffic management capabilities at the application layer. While a service mesh is intended to help developers and SREs with a number of use cases related to service-to-service communication within Kubernetes clusters, a service mesh also adds operational complexity and introduces an additional control plane for security teams to manage.
Patch compliance indicates the number of compliant devices in your network. This means the number of computers that have been patched or remediated against security threats effectively. The distribution and deployment of patches accomplish nothing if your devices are not compliant. So to establish a good patch management strategy, it is important to pay attention to the effectiveness and reach of your patch deployment activities.
Kubernetes is an open-source orchestration platform that allows you to manage and scale your containerized workloads. You can run Kubernetes anywhere—on-premises or in a public or hybrid cloud. Kubernetes helps you build scalable services by providing functionalities like declarative configuration, immutable infrastructure, horizontal scaling, load balancing, service discovery, and self-healing systems.
Lambda is an excellent option for deploying lower-traffic web services when you don't want to maintain another server and you want easy access to all of AWS's other services. In this article, Godwin Ekuma shows us step-by-step how to deploy our Rails apps to AWS Lambda.
You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.
BizTalk Migrator tool is one of the latest releases of Microsoft, which helps to migrate your BizTalk solutions to Azure in a much simpler and automated way. So to keep you informed about the recent enhancements of the tool, the Azure Logic Apps team had a live remote session exclusively on that topic. Without any further delay, let us jump in as there are tons of updates are waiting
AWS Auto Scaling groups are a powerful tool for creating scaling plans for your application. They let you dynamically create a group of EC2 instances that will maintain a consistent and predictable level of service. HAProxy’s Data Plane API adds a cloud-native method known as Service Discovery to add or remove these instances within a backend in your proxy as scaling events occur. In this article, we’ll take a look at the steps used to integrate this functionality into your workflow.
We are happy to announce that the Kafka integration is available for Grafana Cloud, our composable observability platform bringing together metrics, logs, and traces with Grafana. Apache Kafka is an open source distributed event streaming platform that provides high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Access to any airport is tightly governed. When you’re one of Europe’s busiest airports and a high-profile piece of national infrastructure, you cannot have unauthorized people wandering about. At Copenhagen Airport, we have more than 20,000 people cleared to access the airport. These might be security staff, baggage handlers, IT service providers, catering teams, and others. As the airport has grown, so has the number of on-site workers.
Purchase decisions often begin with a price check. Log management is no different. Evaluate your budget and narrow down the options that fit to choose the tool that gives you the most for what you pay. As always, cheaper is better as long as the platform doesn’t cut any corners. But with log management, there is a catch – not all tools are transparent with their pricing model.
Many sectors suffered during the COVID-19 pandemic, but the travel and hospitality industry was struck particularly hard as the world went into lockdown and governments urged us to stay home. According to the International Air Transport Association, global air passenger demand in 2020 was down a record 65.9% from the previous year, and the tourism industry saw an estimated loss of 100.8 million jobs worldwide.
As you may have heard, Ivanti was selected by the National Institute of Standards and Technology’s (NIST’s) National Cybersecurity Center of Excellence (NCCoE) to participate as a collaborator in its Implementing a Zero Trust Architecture project.
Cybersecurity is front and center today for every business regardless of size or industry. Major ransomware attacks and data breaches seem to make headlines just about every day. Sophisticated attackers and cybercriminals are always finding new ways to extort businesses, steal confidential data, and wreak havoc. A quick read of the CrowdStrike 2021 Global Threat Report will surely give you cause for concern.
Gamers are not shy about reaching into their wallets for premium content and features. They also won’t hesitate to tap the uninstall button at the first sign of trouble. It’s not uncommon for a gamer to boot up a hotly anticipated new game or revisit an old favorite only to put it down days or weeks later. The culprit is often gaming monetization issues that get in the way of what would otherwise be a long-term rewarding gaming experience.
As businesses evolve thanks to technology advances, so must the role of the Chief Financial Officer (CFO). With an ever-growing list of financials to manage, such as payroll, operating expenses, IT and software expenses, and more, CFOs have to find the right processes and choose the right tools to get the cost data they need to make informed decisions and ensure profitability for their company.
Log centralization is kind of like brushing your teeth: everyone tells you to do it. But until you step back and think about it, you might not appreciate why doing it is so important. If you’ve ever wondered why, exactly, teams benefit from centralized logging and analysis, keep reading. This article walks through five key advantages of log centralization for IT teams and the businesses they support.
Fastest time-to-value and lowest TCO (total cost of ownership) are among the top 10 reasons that customers choose, love and continue using Netreo. Turning time-consuming administrative projects into simple tasks is one way Netreo consistently delivers superior value. But like all software solutions in use today, many Netreo features go unused or misunderstood by too many customers and would-be users.
In this article you’ll find out how to 10x your development speed with local serverless debugging. Questions such as “what happens when you scale your application into millions of requests?”, “what to expect when going serverless?”, “how does it look like?”, or “how is it to build applications on serverless and work locally?” will be addressed.
The larger your team grows and the faster your teams move, the harder it is for engineering leaders to find trust but verify moments, the moments where you should dig in and make sure your team's health is improving. Imagine a world where all your engineering tools are working together such that accurate and insightful trust but verify moments come to you. Imagine a world where you have the finest Sleuth in the world, working just for you.
Monitoring solutions are a vital component in managing an application’s environment. From the systems layer all the way up to the end user’s connection to the app, you want to find out how the platform is performing. Indicators like CPU, memory, the number of connections, and overall health help teams make informed decisions for guaranteeing uptime. Teams monitor metrics (short-term information) and logs (long-term information) mainly from a reactive perspective.
Site Reliability Engineering (SRE) and Operations (Ops) teams heavily rely on notifications. We use them to know what’s going on with application workloads and how applications are performing. Notifications are critical to ensuring SREs and Ops teams can resolve errors and reduce downtime. They’re also crucial when monitoring environments — not only when running in production but also during the dev-test or staging phase.
Have you ever tried to search for a leadership position in IT that’s dedicated exclusively to employee experience, sometimes listed as end user experience or Digital Employee Experience (DEX)? I’m not talking about a CXO (Chief Experience Officer) role outside of IT—that position is usually advertised for customer experience or employee communications and human resources. I’m talking strictly enterprise IT.
Images are one of the most basic, common attributes for your virtual machines (VMs), and contain the operating system which may be customized with specific installations and features. It is necessary to keep VM images organized and structured so that they are easily maintained, managed, and are accessible. Azure introduced their Shared Image Gallery to help solve this, giving users a way to manage, share and distribute custom images.
I recently spoke to the IT Director and Head of End User Computing at a leading healthcare company who implemented Salesforce globally across their entire employee user base 9 months ago (before later becoming a Nexthink customer). She told me their Salesforce licensing model was similar to others you’ll see in market: a set of base licenses and then selected add-ons based on employee roles – with some at no charge and others priced ala carte. Her problem? License metering.
Building successful machine learning (ML) production systems requires a specialized re-interpretation of the traditional DevOps culture and methodologies. MLOps, short for machine learning operations, is a relatively new engineering discipline and a set of practices meant to improve the collaboration and communication between the various roles and teams that together manage the end-to-end lifecycle of machine learning projects.
Globally the impact of AI is increasingly growing year on year in every industry sector. Opening new scopes in construction & architecture sector, the global AImarket is estimated to grow at a CAGR of 29.4% (around) from 2019 to 2026 and is expected to reach around US$ 2.1 Billion by 2026.
What do Google’s DevOps Research and Assessment (DORA) and Rollbar have to do with each other? DORA identified four key metrics to measure DevOps performance and identified four levels of DevOps performance from Low to Elite. One way for a team to become an Elite DevOps performer is by focusing on Continuous Code Improvement.
As everything good in life, serverless also comes with its downsides. One of them is the infamous “cold start”. In this article, we’ll cover what they are, what influences serverless startup latency, and how to mitigate its impacts in our applications.
Five worthy reads is a regular column on five noteworthy items we’ve discovered while researching trending and timeless topics. This week, we explore why organizations should implement Zero Trust in 2021. In 2010, John Kindervag introduced the concept of “Zero Trust” which has become a touchstone for cyber resilience and persistent security. Zero Trust is not a security product, architecture, or technology.
The onset of ServiceNow has relieved the IT Services workforce. With CloudFabrix RDA added to it, we made it even better. Let’s face it that many IT Service transformation implementations take longer because of a lack of automation around migration and production. The efficiency of ITSM is further compromised due to the absence of data automation and enrichment. ServiceNow with Robotic Data Automation stirs a positive impact on three critical areas of data operations ITSM teams.
Lemons, oranges, grapefruits, limes… We know that they are not the same, but if necessary, you can make juice with all of them. And yes, we can and we will. We are in summer and it makes you want to make a good cocktail, doesn’t it? Today, in PFMS blog, we are going to analyze the commonalities of N-Able (Solarwinds MSP), Kaseya and Pandora FMS. Also their -remarkable- differences of course.
These are interesting – and challenging – times to be a Managed Service Provider. When it first published its Managed Services Market Size Forecast, Mordor Intelligence valued the market at US$152 billion in 2020, and predicted it to reach US$274 billion by 2026, a compound annual growth rate of 11.2%. Over a year later, following a pandemic which has changed the way most of us work and which will probably see permanent changes going forward, Mordor is sticking by its prediction.
Kubernetes is the platform of choice for orchestrating containerized applications. It’s ideal for large applications running on distributed instances. But you likely already knew that because if you are weighing Kubernetes monitoring tools, you know what Kubernetes (K8s) is and why it is useful. That means you also have an inkling of how challenging Kubernetes can be to manage. This is where Kubernetes monitoring tools come in handy.
In my time as a systems engineer in the insurance industry, I learned a lot about compliance programs and the challenges that organizations face to ensure infrastructure is compliant. Audits were time-consuming and handoffs between security, compliance, and infrastructure operations teams were always challenging.
When the PSP policies in Kubernetes are near their deprecation, with this article by Squadcast, you will learn a different approach for governing your Kubernetes cluster security using Kyverno - a policy management tool.
Digital experience monitoring refers to the measurement of customers’ and employees’ experience using an application, service, or device. It helps IT teams ensure that applications, services, or devices are available to users and functioning optimally.
A Storage Area Network (SAN) is a specialized, high-speed network that provides block-level network access to storage. SANs are typically composed of hosts, switches, storage elements, and storage devices that are interconnected using a variety of technologies, topologies, and protocols. Each computer on the network can access storage on the SAN as if they are local disks connected directly to the computer.
“SLO is a favorite word of SREs,” Grafana Labs Principal Software Engineer Björn “Beorn” Rabenstein said during his talk at KubeCon + CloudNativeCon NA 2019. “Of course, it’s also great for design decisions, to set the right goals, and to set alerting in the right way. It’s everything that is good.” So what happens when things go bad?
Web application monitoring tools can keep your business afloat. Period. Imagine this. You’re about to run a crucial end-of-season sale on your website. You’ve sent your emails, run social media campaigns, paid for advertisements, and stocked up your inventory; you are all set to let the cash register ring. However, on D-day, your website goes down. It’s unable to handle the incoming traffic or is simply down because of technical glitches.
Application server monitoring metrics and runtime characteristics are essential for the applications running on each server. Additionally, monitoring prevents or resolves potential issues in a timely manner. As far as Java applications go, Apache Tomcat is one of the most commonly used servers. Tomcat performance monitoring can be done with JMX beans or a monitoring tool such as MoSKito or JavaMelody.
Workflow management tools can have a tremendous impact on team performance. Is your team failing to live up to expectations? Do you have a clear plan to drive productivity and improve customer satisfaction? If you’re struggling to get more from your team, we would recommend implementing StartingPoint into your operations for a streamlined workflow. When it comes to finding a comprehensive solution for workflow automation, StartingPoint ticks all the boxes.
Continuing to ride the waves of Summer of Security and the launch of Splunk Security Cloud, Splunk Security Essentials is now part of the Splunk security portfolio and fully supported with an active Splunk Cloud or Splunk Enterprise license. No matter how you choose to deploy Splunk, you can apply prescriptive guidance and deploy pre-built detections from Splunk Security Essentials to Splunk Enterprise, Splunk Cloud Platform, Splunk SIEM and Splunk SOAR solutions.
In the last decade, businesses have made massive investments in the digital economy with the goal of increasing operational efficiency and improving their customer or end-user experience. However, it isn’t rare for businesses to incur losses due to poor page load speed, failed transactions, or website errors. This is why businesses need to track end-user experience in real time and resolve issues quickly.
Everything is running smoothly at observIQ this week. This update comes with some valuable quality of life improvements to the platform, a new Dashboard plugin, and improvements to set up, Alerts, and Live Tail.
Unattended incidents won’t clean up after themselves and will come back to haunt you—whether as rising MTTR metrics, a cluttered Incident Index, fruitless back-and-forth communication, or a declining CSAT score. Powerful automation conditions can drive productivity, save you manual work, and speed up your incident lifecycle management —and keep your employees happy.
We will be doing a four-part blog series about looking at the sources of metrics for optimization. To kick off the series, I will teach you what optimization actually means, what you are trying to optimize, and if you are focused on business goals or technology goals when performing this optimization. In our next blog, we will look at APM derived metrics and metric domains you can leverage, as they are the two primary sources people rely on for optimization today.
Continuous integration and continuous deployment (CI/CD) drive software development and release in DevOps. Companies based in traditional ITIL practices often want to reap some of CI/CD’s benefits but aren’t sure how to combine the two. In this article, learn the technology stack options for building a strong CI/CD pipeline, why companies rooted in ITIL are running CI/CD alongside, and best practices for a hybrid ITIL-CI/CD approach.
Many of our customers today leverage Office 365 GCC High, including organizations looking to meet evolving requirements for working with the United States Department of Defense. Sumo Logic enables customers to leverage our out-of-the-box monitoring and analytics capabilities to analyze Office 365 GCC High data to offer security engineers and security analysts stronger situational awareness of internal employee data.
Dual ToR (top of rack) peering provides a redundant path for customers with cluster applications that cannot tolerate service downtime or failure and require a high-availability solution. While Calico ToR connectivity has existed for some time, Calico Enterprise now supports connectivity with dual ToR switches.
When responding to an incident, you need to quickly find the scope of the issue so you know which teams to notify and which parts of your system to investigate next—before your end users are affected. But as multiple processes use resources on each of your hosts, and interact in unexpected ways, it can be difficult to know exactly what is causing an issue—especially if those processes are running off-the-shelf software.
I recently saw a user asking on EUC Slack “is there a Domain controller response time in
How do we help Database Administrators (DBAs) embrace DevOps in a way that can be really productive and part of a rich DevOps team that delivers value to customers quickly and continuously? That’s an important question to ask right now because there’s a common view among DBAs that DevOps isn’t for them. They’re responsible for documentation and maintenance and deployments, they have internal customers, and they serve internal requests.
We just announced the creation of a new RemoteWrite SDK to support custom metrics from applications using several different languages. This tutorial will give a quick rundown of how to use the Python SDK. Using these integrations, Prometheus users can send metrics directly to Logz.io using the RemoteWrite protocol without sending them to Prometheus first. Each SDK, while for a separate language, is each capable of working with frameworks like Thanos, Cortex, and of course M3DB.
We’re proud to announce the creation of a new RemoteWrite SDK to support custom metrics from applications using Golang (Go), Python, and Java, with many more on the way. Each SDK will have automatic, continuous deployment of updates. Using these integrations, Prometheus users can send metrics directly to Logz.io using the RemoteWrite protocol without sending them to Prometheus first.
Rich Anakor, chief solutions architect at Vanguard, is on a small team with a big goal: Give Vanguard customers a better experience by enabling internal engineering teams to better understand their massively complex production environment—and to do that quickly across the entire organization, in the notoriously slow-moving financial services industry. They also had a big problem: The production environment itself.
From my previous blog, I’m going to continue the list of five things you can do to improve your technical service delivery to your customers (if you didn’t read the last post, you can catch up on what you missed here (link)). In the following three points, I focus on the role automation can play.
In the CNCF ecosystem, Envoy, an open source service proxy developed by Lyft, is a very common choice in service mesh networking. In a previous post we discussed that both Consul and Istio leverage Envoy. Were you aware that you can extend Envoy’s capabilities with WebAssembly? What is WebAssembly? WebAssembly, or Wasm as it is often abbreviated, is not so much of a programming language as it is a specification for a binary instruction format that can be run in sandboxed virtual machines.
Commonly, your website or app functions perfectly until you release it. During testing, you might seem to have control over everything. But, sooner or later, you will face some challenges. In fact, it is totally normal when something goes wrong. The most important thing is how you settle these problems. In most cases, issues with availability alerts and users’ complaints can be addressed by the means of IIS logs. IIS logging will provide you with the necessary data to deal with a breakdown.
IT management can be costly and time-consuming without streamlined processes and systems to support your business goals. With the quickened pace of business requiring faster scale, leaders and decision-makers must find ways to adapt and optimize their processes. Combining IT Service Management (ITSM) and IT Operations Management (ITOM) can help you prioritize operations efficiency while delivering the best service to your employees.
We’re excited to share that the Deep Learning Toolkit App for Splunk (DLTK) is now available in version 3.6 for Splunk Enterprise and Splunk Cloud. The latest release includes: Let’s get started with the new operational overview dashboard which was built using Splunk’s brand new dashboard studio functionality which I highly recommend checking out. You can learn more about it in this recent tech talk which you can watch on demand.
If you thought that the product announcements from PagerDuty’s largest event of the year, PagerDuty Summit 2021, was all we had in store for you, think again! We’re excited to announce that the July Release comes with a new set of updates and enhancements to the PagerDuty platform! You can learn about our latest capabilities via the Q1 PagerDuty Pulse or read below for the highlights.