Discover How We Managed to Reduce Our Cloud Costs by 30 percent

Learn how we sliced our cloud costs without impacting our application performance and without real infrastructure change.

Alexandre Olive
skeepers

--

An image a coins in a glass with a tree growing from it
Photo by micheile henderson on Unsplash

Cloud computing is fantastic. You can spawn a new server for free in seconds and play with it like never before. You can host whole websites for free with the frontend on buckets and have your backend on serverless function also for free — until a specific limit in number of calls.

But once your traffic starts to ramp up and your app starts scaling, you need to be careful. Because as free as the beginning is, things can get nasty pretty quickly if not properly operated.

There are so many horror stories on the web about companies or single developer that exploded their cloud cost accounts overnight. It ranges from a lambda function that calls itself for 4.5k $ to a forgotten database for 60k $ and more.

For us, it’s not an overnight issue; it was more that the company was going great, so they were not looking at the cloud cost too much. But when I joined, we realized that our cloud costs were much higher than they should be and needed to get them down.

In today’s article, I want to discuss the simple changes we made to our infrastructure to reduce our costs by 30 percent and also what we plan to do in the future to reduce it even more.

How things were looking

21,433.21€. That’s what we paid for the month of October 2022 for both the staging and production environment combined. Here’s the breakdown:

  • 13,743.83€ for the Kubernetes cluster (with 2,333.84 in discounts)
  • 6,124.75€ for the Cloud storage (buckets)
  • 3,237.22€ for the SQL databases
  • There are other smaller costs, but I will ignore them for this article.
Screenshot from google cloud billing page showing the prices of our staging and production environment.

While the costs were not out of control, the Kubernetes Cluster and Storage were costing us a lot.

Let’s look at the breakdown of the costs of the Kubernetes cluster itself on the production environment alone.

Screenshot from google cloud billing page for the compute engine costs
Screenshot from Google Cloud billing page for the compute engine costs

Those labels might need an explanation for the uninitiated:

  • N1 predefined instance core/ram are the nodes our Kubernetes cluster uses daily. N1 is the kind of node.
  • Spot preemptible instance core/ram are nodes we spawn to run asynchronous tasks. Preemptible means we pay less for those nodes, but we have no guarantee for availability.
  • Network Egress Inter Zone is the data moving through the network between availability zones (AZ). Let’s say you have an API hosted in a node in zone AZ making queries to another API in a node in AZ B; you’ll pay for the data going from AZ B to AZ A.

One final breakdown I want to examine is the difference between staging and production.

Breakdown of the costs between production and staging

That’s a lot of money going out in the clouds each month; let’s see how we can reduce it without impacting users.

Production environment

We have multiple angles to reduce the cost of the production environment. We are going to start with the highest in the cost breakdown: N1 predefined instance core/ram.

N1 predefined instance core/ram

We are currently using n1-standard-8 nodes that are autoscaling based on usage. Before jumping into the correction, we must understand how Kubernetes scale new nodes.

When you define your pods, you tell Kubernetes how much RAM and CPU you think that pod needs to work correctly; that’s what Kubernetes calls the Request.

So let’s say the pod A for my API needs 400mb of RAM and 0.2 vCPU to work appropriately, and the Kubernetes node capacity is 30 GB of RAM and eight vCPU. You could fit 40 pods in that node because eight vCPU divided by 0.2 vCPU equals 40 pods, while 30,000mb divided by 400mb equals 75 pods.

Notice here that we are talking about expected and not used resources, and that’s where our first issue came from.

Schema representing the requested CPU vs the actual capacity of the node

If you have 40 pods requesting 0.2 vCPU in an eight vCPU node, Kubernetes will consider that node full and launch a new one even if the pods use only four vCPUs.

Now that we understand that we pay for the number of nodes and that the nodes scale on the requested resources, we can deep dive into the Kubernetes yaml configuration file and see what can be changed.

Photo by Cookie the Pom on Unsplash

The first thing we did was to compare the yaml definition of our pods with the actual resource usage in Grafana. We can easily see the request compared to the actual usage using Grafana, as you can see below.

Grafana’s request/usage on RAM for 1 API with one pod running (screenshot taken while writing this article so after the configuration update).

In this graph, we can see that the average usage of RAM is 0.1go while the requested is 0.4go.

And if we compare it to our YAML configuration for this API, we could find something like this.

A screenshot from a deployment file where we can see a reduction on requested RAM and CPU
Screenshot from a change in our IAC repository from a deployment.yaml file

We significantly reduced the requested memory and CPU values to be closer to the reality of the API’s usage — and we still stayed conservative; we could reduce it much more. We did this process for all of our services.

The second thing we noticed was that we ran too many pods by default for a single service.

For example, for this deployment, we were running three pods minimum, which means that even if the API was receiving almost no calls during the night, it was still running three pods.

So, we went through every service and reduced the minimum number of pod replicas to one (where it was safe to do so).

With those two changes, our N1 predefined instance core/ram went from 4,235.43€ to 1,973.28€. 🎉

Network Egress Inter Zone

This one is trickier; as I explained before, we are paying for the data going through the network between availability zones.

2.712,34€ divided by 0.01 equals 271 234 GB through the network… something is really wrong here.

We have more than forty microservices, but the good thing is that they are pretty well done, and there is almost no direct communication between them — except for a few webhooks here and there.

But we have a GraphQL gateway in front of all those services. And this gateway receives A LOT of calls.

We decided to tackle this issue two ways:

  • The first was to track down which calls were the biggest regarding data returned using Flow Logs.
  • The second way, and the one we were the most confident about, was to check which API received the most calls and which endpoint in this API would be an excellent fit for caching. We were already using Redis as a pub/sub solution, so we could easily use it as a cache as well.

We decided to do the cache first since we needed it anyway, and everything was already set up. We just had to code.

In insight, this was a mistake — not a terrible one, but still a mistake — because we just moved the issue in the cache system without knowing if the data was too big because of an issue in the code or because of the number of calls we were receiving. Seems logical now that I write this article but you live and learn right ?

Using Kiali, we could see which API received the most calls.

Based on this information and the functional aspect of the application, we decided which endpoint should be cached. The GraphQL gateway will implement the cache and won’t have to do the HTTP call to the services to get the data, reducing the Egress costs.

Just by implementing the first solution, we successfully reduced the Egress costs from 2.712,34€ to 1,095.19€.

We sadly have yet to implement the second solution to track down the remaining HTTP calls that transfer vast amounts of data.

Cloud Storage

Most of our costs around cloud storage are about the size itself rather than data transferred out of the buckets.

Our application receives videos from users that we normalize and store both the original version and the normalized version. Those videos are then edited and validated (or refused) by clients.

We discovered that we never deleted anything. We’ve stored unused videos from users for years and have over 100 TB of data. I’m not sure the GDPR would be happy about that.

Photo by Lia Trevarthen on Unsplash

So, in agreement with our legal and product team, we set up a CRON job that queries the database on dynamic parameters like “video refused at” and archives those files. To prevent any unrecoverable error, we first switch the storage class of the files to “archive” in GCP (it’s Glacier in S3) and use a lifecycle rule to delete them entirely after six months.

October 2022 storage cost for production environment
September 2023 storage cost for production environment

It is still an ongoing process as we have to be careful in what we decide to delete, but we successfully reduced our storage cost from 4,754.85€ to 3,029.93€ in production.

Staging environment

In October 2022, we paid 4,059.25€ for the staging environment, which is insane.

We took four simple actions to reduce the costs:

  • We implemented a shutdown of the environment during out-of-office hours. Sadly, since we are an international team, there was not much time during weeknight, but it’s still two days during the weekend.
  • We scaled down our Kubernetes cluster, set the minimum pod for all deployments to one, and prevented autoscaling past one pod — except for a few critical services.
  • We added lifecycle rules to our cloud storage buckets to delete everything six months or older.
  • We cleaned up our database and deleted all the data from previous developers or QA that we no longer used.

With those four easy steps, we reduced the cost of the environment by 2.5!

Our plan for the future

We already have a few ideas on how we plan to reduce our costs even more, and most of them are just a continuation of what we talked about in the article.

  • Fine-tune Kubernetes requests to be even more precise on what our microservices need to run.
  • Track down the remaining Egress costs as we first planned to do
  • Keep implementing new rules in our archive cron to delete unnecessary files from buckets
  • Switch our video processing from CPU to GPU (this would need a whole article to discuss it, but our first few tests are going great; it’s much faster and cost less).
  • Cleanup of our SQL database in production, storing terabytes of events data that could be archived.

If we compare October 2022 and October 2023, we saved 6,369.75€, almost 30%, each month with all those changes. And I’m sure we can save even more.

I was not expecting that working on tracking down cloud costs and optimizing them to be that interesting — probably because I’m not the one paying, and I’m actually getting paid to do it.

Playing with the cloud’s free tier on personal projects and using the cloud for a production environment with millions of daily calls are two completely different beasts.

My learning during this process is that you should think about the application’s life from the start and put protection in place to prevent your cloud costs from going overboard — which is easier said than done, I concur.

If only all the money we saved could be added to my salary…

Thank you for reading this article until the end. If you liked it, please don’t hesitate to follow me on X (Twitter) or add me on Linkedin.

--

--

Senior lead developer, explaining complex concepts in simple words. Get a pick behind the curtains of production ready architecture and struggles.