Operations | Monitoring | ITSM | DevOps | Cloud

DevOps

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

SRE Redefines IT Operations as Architect of Sustainable Systems

Site Reliability Engineering (SRE) is a term that’s getting attention and gaining momentum – and for a good reason. SRE takes features of software engineering and applies them to various problems in infrastructures and operations. Organizations look to build SRE teams with a couple goals in mind, including to create and increase scalability and develop solid software systems.

Kubernetes Community Days Munich Recap

A couple of weeks ago I had the absolute joy of attending KCD Munich for the first time, with my friend and colleague Guy Menahem (whom some of you know simply as The Good Guy on Twitter and YouTube). Besides rooting for Guy and his co-speaker, Arsh Sharma of Okteto, during their session on Backstage.io and IDPs, I enjoyed being untethered from ‘booth duty’ and free to engage with all the beautiful human beings that gathered together for this Kubetastic event!

10 Best Git GUIs for Public Sector Developers

For developers working in the public sector, leveraging secure version control systems like Git is essential to manage code and web content efficiently and safely. Git simplifies collaborative projects between developers working in fields like government, healthcare, banking, and education, but hey, let’s face it – mastering Git via the command line can be like solving a Rubik’s Cube blindfolded. That’s where a Git GUI comes in handy.

Cloud connectivity and interoperability

The post-pandemic world has transformed our work habits and the landscape of conducting business. Organizations now take the hybrid approach to work, wherein employees may work from an office, while travelling, or from a remote location. This fundamental shift has accelerated the pace of cloud adoption, as the cloud makes data access possible from anyplace, anytime. But the cloud brings with it a set of complexities that must be managed.

What is Scalability?

The number of simultaneous requests that an application can successfully support is a measure of its scalability. The point at which an application can no longer successfully handle more requests is its scalability limit. When a key piece of hardware is exhausted and new or more machines are needed, this limit is reached. Scaling these resources can include any combination of CPU and physical memory (different or more computers), hard disc (larger hard drives, less "live" data, solid state drives), and/or network bandwidth (several network interface controllers, larger NICs, fibre, and so on).

Using Grafana and Graphite to monitor server load

Since server outages can lead to a loss of customers, reputation, and other troubles and it is important to get information on the status of the server on time. MetricFire's Hosted Grafana and Graphite will help you monitor server load in a timely and efficient manner. Servers generate a large number of metrics and it is essential to not only track their values but also to observe their changes over time. There is also a possibility to correlate app statistics with server load metrics.

Leveraging AWS EventBridge to stay ahead of spot instance interruptions

Amazon EC2 Spot Instances can help you save significantly on your compute costs. However, you should also be aware that Amazon can take them back with a two-minute notice if the demand for the instance type goes up. Fortunately, AWS EventBridge, along with Spot by NetApp, can help you automate the process of detecting and reacting faster to these interruptions.