libwebsockets/READMEs/README.lws_backtrace.md
Andy Green 67931757f8 alloc: compressed backtrace instrumentation support
This adds apis that enable usage of compressed backtraces in heap
instrumentation.

A decompressor tool is also provided that emits a textual
call stack suitable for use with addr2line.
2022-03-15 10:28:09 +00:00

7.3 KiB

lws_backtrace and lws_alloc_metadata

Area Definition
Cmake LWS_WITH_COMPRESSED_BACKTRACES on by default
API ./include/libwebsockets/lws-backtrace.h
README ./READMEs/README.lws_backtrace.md

lws_backtrace

The lws_backtrace apis provide a way to collect backtrace addresses into a struct, and an efficient domain-specific compressor to reduce the number of bytes needed to express the backtrace stack.

This information is particularly useful in RTOS type systems to understand heap usage. The information would typically be sent off the embedded device, in logs or it into own stream, and decompressed and processed off the embedded device, converted to source information via addr2line or similar.

It only works with gcc and probably clang at the moment (patches welcome).

lws_alloc_metadata apis

This provides helpers on top of lws_backtrace that are suitable for adapting your heap allocator to create compressed metadata such as the call stack at allocation time

  • optionally report allocation and free events with this information synchronously to a user supplied callback

  • optionally conceal the additional metadata behind allocations transparently

The extra metadata contains information on allocation size, and the backtrace of the code path that originally performed the allocation. Live allocations are also listed on one or more lws_dll2_owner_t that can be walked to dump active allocations along with the responsible code path.

Tuning the call stack

Entries at both ends of the call stack may be invariant and therefore just bloat to store. At the top end of the call stack, the backtrace will show the path through lws_backtrace apis and perhaps other apis. At the bottom end, depending on your system, the backtrace may detail call sequences from the loader that started your application.

For those reasons, the cmake variables LWS_COMPRESSED_BACKTRACES_SNIP_PRE and LWS_COMPRESSED_BACKTRACES_SNIP_POST (defaulting to 2 and 1 respectively) may be set to remove invariant, uninteresting call stack information from the top and bottom of the call stack.

LWS_WITH_ALLOC_METADATA_LWS

An optional, off-by-default implementation is provided for the lws_*alloc() apis, using the alloc_metadata apis to instrument all allocations via lws_*alloc(). This is not so useful as instrumenting the system allocator with alloc_metadata apis, since it only shows lws allocations, but it is a complete example to show how to do it.

Allocator instrumentation and thread-safety

Unless your application is totally singlethreaded, when instrumenting a real allocator, care must be taken with

  • _lws_alloc_metadata_adjust()
  • _lws_alloc_metadata_trim()
  • _lws_alloc_metadata_dump()

apis which deal with the hidden overallocation and listing allocations, that they are called from a locked critical section that disallows reentry, either an existing one that the allocator already uses, or add a new mutex.

Dumping entire active instrumented heap allocations

Calling _lws_alloc_metadata_dump() allows you to walk the current list of allocations from a heap and dump the backtrace responsible for its allocation. You can define your own iterator callback, or use a helper callback that is provided, lws_alloc_metadata_dump_stdout, which issues the heap metadata in the lws convention base64 format described below.

Convention for emission of compressed backtraces

To simplify triggering dumps, a convention is defined with a 3-character lead-in identifying lines as dumps or backtraces. This kind of approach makes it easy to emit the metadata into logs and fish them out with grep or similar.

lead-in signifies Example
~m# Compressed allocator backtrace, eg, emitted into logs ~m#IF0BmagugNDWgCnkhdAYpQa6wAAV
~b# Decoded, uncompressed backtrace line suitable for addr2line ~b#size: 7520, 0x406651 0x406852 0x406c1b 0x406294

Both examples are complete representations of the same 4-level, 64-bit compressed backtrace.

Compressed backtrace decode tool

The lws-api-test-backtrace example (requires LWS_WITH_COMPRESSED_BACKTRACES to build) decodes the base64 representations with or without the 3-character lead-in, to textual output suitable for addr2line. Eg

$ echo -n "~m#IF0BmagugNDWgCnkhdAYpQa6wAAV" | lws-api-test-backtrace
~b#size: 7520, 0x406651 0x406852 0x406c1b 0x406294

You can use it with addr2line in this kind of way (you probably want to give -f -p to addr2line as well)

addr2line -e myapplication `echo -n "~m#IF0BmUQugNCkgCnkhdAYpQa6wAAV" | ./bin/lws-api-test-backtrace 2>/dev/null | grep '~b#' | cut -d',' -f2-`
/projects/libwebsockets/lib/core/alloc.c:124
/projects/libwebsockets/lib/core/alloc.c:213
/projects/libwebsockets/lib/core/context.c:600
/projects/myapplication/main.c:55

There is a shell script ./contrib/heapmap.sh which takes a screenscrape of a dump's ~m# log lines and processes them into an allocation size, backtrace, and function names (especially in RELEASE mode, either a function name hint or the source coordinates are available).

lws_backtrace compression

The compressed blob has an outer structure designed for prepending, where the information available at recovery is a pointer to the end of it.

overview

Outer compressed blob layout in memory

This goes behind the reported allocation, the actual allocation is increased to allow for it and we report what the caller asked for by pointing at the end of this. It means eg at free() time, we are told the address just past the end of this and work backwards to find the start of the compressed blob (which is further aligned backwards to ptr boundary to recover the true allocation start).

data bits meaning
compressed blob variable, padded to byte boundary Backtrace and extra info
compressed length fixed, 16 MSB-first 16-bit byte count of compressed blob, including the 16-bit length itself
lws_dll2_t fixed, 3 x pointers linked-list for tracking

Bitwise structure inside the compressed blob

data bits meaning
stack depth 5 Number of backtrace callstack levels present
Call stack items, one per stack depth variable Compressed Instruction Pointer value
alloc size bits 6 Number of bits in alloc size
alloc size literal variable Allocation size

Call stack item domain-specific compression

The goal is to compress 32- or 64-bit backtraces efficiently.

The Call stack items are compressed one of two ways and start with a bit indicating which method was used for this Call stack item.

  • 0 = literal value, 1 = delta against a previous reference value

The literals issue a bit count and then the significant bits

  • a 6-bit bit count
  • the significant bits of the literal

The delta from a previous Call stack item looks like this:

  • a 3-bit index (from -1 to -8) says how far back from the current stack item the reference value can be found from the call stack
  • a 1-bit sign where 0 == add the delta and 1 == subtract the delta
  • a 6-bit bit count for the delta
  • the significant bits of the delta

The delta is decoded, and added or subtracted from the earlier reference result to arrive at the correct reconstruction.

The first Call stack item is always a literal.

Note for esp-idf

Backtrace generation in esp-idf requires CONFIG_COMPILER_CXX_EXCEPTIONS set in sdkconfig.