Most Apache Spark users overlook the choice of an S3 committer (a protocol used by Spark when writing output results to S3), because it is quite complex and documentation about it is scarce. This choice has a major impact on performance whenever you write data to S3. On average, a large portion of Spark jobs are spent writing to S3, so choosing the right S3 committer is important for AWS Spark users.
As companies move more applications into the cloud, and package them into containers, environments become more complex with limited visibility. While infrastructure is abstracted away as much of it is delivered by the hyperscalers, this creates an opaqueness that makes it hard to control costs and understand resource utilization. As a result, many companies are experiencing high cloud bills and lots of cloud waste.
Moving large amounts of data to the cloud can be arduous and time-consuming. A cloud migration would take years if engineers manually moved data from assessment through mobilization and migration phases. An effective cloud migration also requires adequate data encryption, fast data transfer speeds, and constant monitoring. Migrating workloads to AWS requires you to monitor costs in real-time as well to avoid overspending.
Thomas Stringer has a couple of great blog posts on how to understand your Azure monitoring costs and also on how to reduce your costs, see Azure Monitor Log Analytics too Expensive? Part 2 – Save Some Money | Thomas Stringer (trstringer.com). In the past I’ve blogged on How to calculate the Azure Monitor and Log Analytics costs associated with AVD (not an easy task!).
In the last few years, many organizations I worked with have significantly increased their cloud footprint. I’ve also seen a large percentage of newly launched companies go with cloud services almost exclusively, limiting their on-premises infrastructure to what cannot be done in the cloud — things like WiFi access points in offices or point of sale (POS) hardware for physical stores.
Spot worker nodes on EKS (Elastic Kubernetes Service) are a great way to save costs by allowing customers to take advantage of unused capacity. With Sumo Logic, we have experimented with and adopted spot worker nodes for some of our EKS clusters to see if we can pass along the same benefits. We decided to share some of the learnings, challenges, and caveats with using spot instances along with the monitoring setup.
Toward the end of each year, tens of thousands make the journey to Las Vegas to participate in AWS re:Invent. AWS re:Invent has been the seminal conference for cloud-focused engineers since 2012, offering a space where the global cloud computing community can share and learn the latest insights and solutions.