Quantcast
Channel: Brendan's blog » flamegraphs
Viewing all articles
Browse latest Browse all 2

Memory Leak (and Growth) Flame Graphs

$
0
0
Memory Leak

Memory Flame Graph
   

Your application memory usage is steadily growing, and you are racing against time to fix it. This could either be memory growth due to a misconfig, or a memory leak due to a software bug. For some applications, performance can begin to degrade as garbage collection works harder, consuming CPU. If an application grows too large, performance can drop off a cliff due to paging (swapping), or the application may be killed by the system (OOM killer). You want to take a quick look before either occurs, in case it’s an easy fix. But how?

Debugging growth issues involves checking the application config and memory usage, either from application or system tools. Memory leaks are much harder, but there are many tools to help. Some take a core dump and then post-process that, like mdb (modular debugger) and ::findleaks. They usually pause the application while a core dump is taken, or require the application to be terminated, so that free() routines are called. Some launch the application with malloc() calls interposed with instrumentation, like Valgrind memcheck, which can also simulate a CPU so that all memory accesses can be checked. This can cause the application to run 20-30 times slower, or more.

While core dump or interposing techniques provide invaluable detail for diagnosing memory leaks, neither of them can easily be used on a growing application that has caught your attention.

I’ll summarize four dynamic and static tracing approaches I use for analyzing memory growths and leaks on an already running application, for both virtual and physical memory. These focus on the code path responsible for memory usage, examined as stack traces. I’ll also use Flame Graphs to visualize these code paths and the magnitude of memory usage.

The four approaches are illustrated in the following figure, as the events in green text:

All of these have shortcomings, which I’ll explain.

1. Allocator Tracing: malloc(), free(), …

This is where the memory allocator functions, malloc(), free(), etc, are traced. Imagine you could run Valgrind memcheck with “-p PID” on a process, and gather memory leak statistics for 60 seconds or so. Not a complete picture, but hopefully enough to catch egregious leaks. The overheads should also be less (allocator function tracing only), and you only pay this if and when you need to.

These allocator functions operate on virtual memory, not physical (resident) memory, which is usually the target of leak detection. Fortunately, they usually have a strong correlation at this level.

Some of us have tried this approach before. Sanjeev Bagewadi wrote memleak.d using DTrace in 2005, which dumps memory address details on every malloc(), realloc(), calloc(), and free(), which are then processed by a separate Perl program to detect leaks (example output here).

Yichun Zhang (agentzh) has recently written leaks.stp using SystemTap, which is a more efficient approach: processing memory addresses while in kernel context before emitting the summary of detected leaks and stack traces. He also has also created flame graphs from this, example here, which looks great. I’ve since added a new color palette to flame graphs so that we can differentiate CPU flame graphs (hot colors) from memory ones (green colors).

I’ve tried allocator tracing, but wasn’t happy about the overheads it caused, which in one case with DTrace was reducing application throughput by 25% (still, that’s much better than 20-30x), followed by the overheads of post processing the verbose trace output. 25% is unusually high for DTrace, and is because allocator functions like malloc() and free() can be very frequent, and so the low per-probe overhead begins to add up. Even with 25% overhead or so, being able to solve issues this way can make it worth it. If you want to try this approach yourself, remember to trace all allocator functions: malloc(), realloc(), calloc(), etc.

I’ve also developed indirect approaches as described in the following sections on brk(), mmap(), and page faults. This is a tradeoff: for leak detection they aren’t nearly as effective as tracing allocator functions directly, but they do incur much less overhead.

2. brk() syscall

Many applications grow using brk(). This syscall sets the program break point: the end of the heap segment (aka the process data segment). brk() isn’t called by the application directly, but rather the user-level allocator which provides the malloc()/free() interface. Such allocators typically don’t give memory back the OS, keeping freed memory as a cache for future allocations. And so, brk() is typically for growth only (not shrinks). We’ll assume that’s the case, simplifying how it is traced.

In DTrace, tracing brk() can be done as a one-liner, which can also show the user-level stack that led to it. In this example, tracing “mysqld” processes only (MySQL server):

# dtrace -n 'syscall::brk:entry /execname == "mysqld"/ { @[ustack()] = count(); }'
^C
[...]

              libc.so.1`_brk_unlocked+0xa
              libc.so.1`sbrk+0x3b
              libmtmalloc.so.1`morecore+0x29
              libmtmalloc.so.1`malloc_internal+0xf3
              libmtmalloc.so.1`malloc+0x3b
              mysqld`my_malloc+0x32
              mysqld`init_alloc_root+0x73
              mysqld`_Z14init_sql_allocP11st_mem_rootjj+0x15
              mysqld`_ZN18Prepared_statementC1EP3THD+0xaf
              mysqld`_Z19mysqld_stmt_prepareP3THDPKcj+0x4a
              mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0xefb
              mysqld`_Z24do_handle_one_connectionP3THD+0x13f
              mysqld`handle_one_connection+0x47
              mysqld`pfs_spawn_thread+0x16f
              libc.so.1`_thrp_setup+0x8a
              libc.so.1`_lwp_start
               32

The output includes multiple stack traces along with their occurrence count that led to the brk(). Here, only the last stack and count is shown. The full output isn’t too long as brk() is usually infrequent: it’s only occurring when the allocator has requests that spill-over its current heap size. This also means that the overhead is very low, and should be negligible (near zero).

Your OS may or may not have brk() as a kernel-provided syscall, and therefore may not have a DTrace syscall::brk:entry probe. In that case, see the next section on mmap(). Another difference to watch out for is the sbrk() syscall, which, if a sbrk probe exists, needs to be traced as well. The above stack shows that sbrk() is provided by libc on this OS, which calls into brk().

Going further with brk(): the following DTrace script, brkbytes.d, records the stacks along with the size, in bytes, of the heap expansion. Like the one-liner, this matches any process named “mysqld”:

#!/usr/sbin/dtrace -s

inline string target = "mysqld";
uint brk[int];

syscall::brk:entry /execname == target/ { self->p = arg0; }
syscall::brk:return /arg0 == 0 && self->p && brk[pid]/ {
	@[ustack()] = sum(self->p - brk[pid]);
}
syscall::brk:return /arg0 == 0 && self->p/ { brk[pid] = self->p; }
syscall::brk:return /self->p/ { self->p = 0; }

This was executed, and the output rendered as a heap expansion flame graph, using:

# ./brkbytes.d -n 'tick-60s { exit(0); }' > out.mysqld_brkbytes01
# ./stackcollapse.pl out.mysqld_brkbytes01 | ./flamegraph.pl --countname=bytes \
    --title="Heap Expansion Flame Graph" --colors=mem > mysqld_brkbytes.svg

Click for the interactive SVG:

This example MySQL server doesn’t have a leak (that I know of), but rather shows memory growth from brk() calls. About 12 Mbytes were from processing a query, seen above dispatch_command() (mangled C++ signatures are shown here).

The total amount captured, 21 Mbytes, equals the virtual memory expansion as observed from other tools (prstat/top).

What brk() tracing can tell us is the code paths that lead to heap expansion. This could be either:

  • A memory growth code path
  • A memory leak code path
  • An innocent application code path, that happened to spill-over the current heap size
  • Asynchronous allocator code path, that grew the application in response to diminishing free space

It requires some sleuthing to tell them apart. If you are searching for leaks in particular, sometimes you’ll be lucky and it’ll be an unusual code path that is easily found in a bug database as a known leak.

If you’re unlucky, the code path isn’t visible in the stack trace. This can happen with virtual machine languages, where the external profiler shows the stack frames for the VM but not the program it’s running (that problem is fixed in DTrace with ustack helpers).

While brk() tracing shows what led to expansion, page fault tracing, covered later, shows what then consumed that memory.

3. mmap() syscall

The mmap() syscall may be explicitly used by the application for loading data files or creating working segments, especially during initialization and application start. In this context, we’re interested in creeping application growth, which may occur via mmap() if the allocator uses it instead of brk(). glibc does this for larger allocations, which can be returned to the system using munmap().

You can trace and display mmap() in a similar manner to brk(), showing the bytes of the new mappings. In DTrace this can be a one-liner, eg, for “mysqld” processes:

# dtrace -n 'syscall::mmap:entry /execname == "mysqld"/ { @[ustack()] = sum(arg1); }'

Unlike brk(), mmap() calls don’t necessarily mean growth, as they may be freed shortly after using munmap(). And so tracing mmap() using this one-liner may show many new mappings, but most or all of them are neither growth nor leaks.

As with malloc()/free() tracing, mapping addresses can be inspected and associated so that those that were not freed can be identified. I’ll leave that as an excercise for the reader. :-)

As with brk() tracing, once you can identify those mappings that have grown (and not munmap()ed), they will show either:

  • A memory growth code path
  • A mapping memory leak code path
  • Asynchronous allocator code path, that grew the application in response to diminishing free space

This should also be a low overhead approach for analyzing growth, due to the low rate of mmap() and munmap() calls. If these are called frequently, say, over ten thousand times per second, then the overhead can become significant. That would also be a sign of a poorly designed allocator or application.

4. Page Faults

The brk() and mmap() syscalls show virtual memory expansion. Physical memory is consumed later, when the memory is written to. This causes page faults, and virtual to physical mappings to be initialized. Page faults can happen in a different code path, and one that may (or may not) be more illuminating than the malloc(), brk(), or mmap() path.

The following one-liner traces page faults using a static probe that this OS provides (vminfo:::as_fault), and only matches those caused by mysqld. The output is then turned into a flame graph.

# dtrace -x ustackframes=100 -n 'vminfo:::as_fault /execname == "mysqld"/ {
    @[ustack()] = count(); } tick-60s { exit(0); }' > out.mysqld_fault01
# ./stackcollapse.pl out.mysqld_fault01 | ../flamegraph.pl --countname=pages \
    --title="Page Fault Flame Graph" --colors=mem > mysqld_fault.svg

Click for the interactive SVG:

The total amount captured, about 120 Mbytes (30,826 x assumed 4 Kbyte page size), equals the resident memory expansion as observed from other tools (prstat/top).

The earlier techniques show the initial allocation code paths. Page fault tracing shows different code paths: those that are actually eating physical memory. This will be either:

  • A memory growth code path
  • A mapping memory leak code path

Again, some sleuthing is required to tell them apart. If you are hunting leaks and have a similar application that isn’t growing, then taking page fault flame graphs from each and then looking for the extra code paths can be a quick way to identify the difference. If you are developing applications, then collecting baselines each day should let you identify not only that you have an extra growing or leaking code path, but the day that it appeared, helping you track down the change.

The overhead of page fault tracing is likely a bit higher than brk() or mmap() tracing, but not by much: page faults should still be relatively infrequent, making this tracing approach near negligible.

Getting Started

If you want to try these approaches yourself, you’ll need a tracing tool that provides coverage with static probes or dynamic tracing. User-level tracing is needed for the allocator functions, and kernel tracing for the page faults. Syscalls (brk(), mmap()) can usually be traced at either the user- or kernel-level. There are several tracing tools that meet these requirements; DTrace and SystemTap are programatic, and so can reduce overheads further by in-kernel filtering and aggregations.

To use this for solving real issues, live and in production, another requirement is that the tool must be safe to use in production, which DTrace has been designed to be.

You’ll also want the tracing tool to be able to show readable stack traces. My examples used MySQL, which is C and C++. Languages like node.js are trickier, since user stack traces show the VM internals and not the node.js program. In those cases, you need a tracer that meets the earlier requirements, and can also show the program stack. DTrace does this with language ustack helpers, for example, the one Dave Pacheco developed for node.js.

The Flame Graph software is on github.

Conclusion

In this post I described four techniques for analyzing memory growth using dynamic tracing: allocator function tracing, brk() and mmap() syscall tracing, and page fault tracing. These can identify growth of either virtual or physical memory, and includes all reasons for growth including leaks. The brk(), mmap(), and page fault approaches can’t separate out leaks directly, which require further analysis.

Their advantage is that they should be low overhead, such that they can be used on a live production application, and without restarting it.

I also included a heap expansion flame graph to explain virtual memory growth (brk()), and a page fault flame graph for physical memory growth. For these mysqld examples, the size of each growth matched what was observed by other tools: eg, top(1)’s VIRT and RES columns. These flame graphs can be of enormous help for when these columns are increasing and you’d like to know why.

.

Thanks to those who provided comments on this post, including Deirdré Straughan, Dave Pacheco, and Bryan Cantrill.


Viewing all articles
Browse latest Browse all 2

Trending Articles