Member-only story

Patching a memory leak

Soumendra's Blog
4 min readJun 18, 2024

--

Identifying and fixing a memory leak in a Python Flask web server using tracemalloc and code analysis

Problem

At Freshworks, we were alerted to a recurring pod restart on a Python Flask web server. I could not find any issue in the logs at the first alert. However, the same alert reappeared for the same server the next month. This time, I was pretty sure that something fishy was going on.

I checked the logs and the infra-monitoring dashboard for the last hour; nothing fishy was there. Then, to increase the scope, I checked the memory usage of the pods from one hour to one day and then from one day to one week.

Figure 1: Memory usage of a Server pod in three days, from ~600MB to ~700MB.

The Server’s memory usage was increasing. It often crossed the maximum allocated memory size, causing the pod to restart. This restart caused an occasional API request loss. The server has many APIs. The next challenge was to pinpoint the exact location of the leak.

Leak Detection

This was a slowly accumulating memory leak over a month. These are generally hard to detect. I tried to scope down the leak circumstances.

Scoping down the leak

--

--

Soumendra's Blog
Soumendra's Blog

Written by Soumendra's Blog

AI Architect and Sustain Lead at PepsiCo

No responses yet