mirror of
https://libwebsockets.org/repo/libwebsockets
synced 2024-12-04 13:57:15 +00:00
67931757f8
This adds apis that enable usage of compressed backtraces in heap instrumentation. A decompressor tool is also provided that emits a textual call stack suitable for use with addr2line.
182 lines
7.3 KiB
Markdown
182 lines
7.3 KiB
Markdown
# lws_backtrace and lws_alloc_metadata
|
|
|
|
|Area|Definition|
|
|
|---|---|
|
|
|Cmake|`LWS_WITH_COMPRESSED_BACKTRACES` on by default|
|
|
|API|`./include/libwebsockets/lws-backtrace.h`|
|
|
|README|./READMEs/README.lws_backtrace.md|
|
|
|
|
## lws_backtrace
|
|
|
|
The `lws_backtrace` apis provide a way to collect backtrace addresses into a
|
|
struct, and an efficient domain-specific compressor to reduce the number of
|
|
bytes needed to express the backtrace stack.
|
|
|
|
This information is particularly useful in RTOS type systems to understand heap
|
|
usage. The information would typically be sent off the embedded device, in logs
|
|
or it into own stream, and decompressed and processed off the embedded device,
|
|
converted to source information via addr2line or similar.
|
|
|
|
It only works with gcc and probably clang at the moment (patches welcome).
|
|
|
|
## lws_alloc_metadata apis
|
|
|
|
This provides helpers on top of `lws_backtrace` that are suitable for adapting
|
|
your heap allocator to create compressed metadata such as the call stack at
|
|
allocation time
|
|
|
|
- optionally report allocation and free events with this information
|
|
synchronously to a user supplied callback
|
|
|
|
- optionally conceal the additional metadata behind allocations transparently
|
|
|
|
The extra metadata contains information on allocation size, and the backtrace of
|
|
the code path that originally performed the allocation. Live allocations are
|
|
also listed on one or more lws_dll2_owner_t that can be walked to dump active
|
|
allocations along with the responsible code path.
|
|
|
|
## Tuning the call stack
|
|
|
|
Entries at both ends of the call stack may be invariant and therefore just
|
|
bloat to store. At the top end of the call stack, the backtrace will show the
|
|
path through lws_backtrace apis and perhaps other apis. At the bottom end,
|
|
depending on your system, the backtrace may detail call sequences from the
|
|
loader that started your application.
|
|
|
|
For those reasons, the cmake variables `LWS_COMPRESSED_BACKTRACES_SNIP_PRE`
|
|
and `LWS_COMPRESSED_BACKTRACES_SNIP_POST` (defaulting to 2 and 1 respectively)
|
|
may be set to remove invariant, uninteresting call stack information from the
|
|
top and bottom of the call stack.
|
|
|
|
## LWS_WITH_ALLOC_METADATA_LWS
|
|
|
|
An optional, off-by-default implementation is provided for the lws_*alloc()
|
|
apis, using the alloc_metadata apis to instrument all allocations via
|
|
lws_*alloc(). This is not so useful as instrumenting the system allocator with
|
|
alloc_metadata apis, since it only shows lws allocations, but it is a complete
|
|
example to show how to do it.
|
|
|
|
## Allocator instrumentation and thread-safety
|
|
|
|
Unless your application is totally singlethreaded, when instrumenting a real
|
|
allocator, care must be taken with
|
|
|
|
- `_lws_alloc_metadata_adjust()`
|
|
- `_lws_alloc_metadata_trim()`
|
|
- `_lws_alloc_metadata_dump()`
|
|
|
|
apis which deal with the hidden overallocation and listing allocations, that
|
|
they are called from a locked critical section that disallows reentry, either
|
|
an existing one that the allocator already uses, or add a new mutex.
|
|
|
|
## Dumping entire active instrumented heap allocations
|
|
|
|
Calling `_lws_alloc_metadata_dump()` allows you to walk the current list of
|
|
allocations from a heap and dump the backtrace responsible for its allocation.
|
|
You can define your own iterator callback, or use a helper callback that is
|
|
provided, `lws_alloc_metadata_dump_stdout`, which issues the heap metadata in
|
|
the lws convention base64 format described below.
|
|
|
|
## Convention for emission of compressed backtraces
|
|
|
|
To simplify triggering dumps, a convention is defined with a 3-character
|
|
lead-in identifying lines as dumps or backtraces. This kind of approach makes
|
|
it easy to emit the metadata into logs and fish them out with grep or similar.
|
|
|
|
|lead-in|signifies|Example|
|
|
|---|---|---|
|
|
|~m#|Compressed allocator backtrace, eg, emitted into logs|~m#IF0BmagugNDWgCnkhdAYpQa6wAAV|
|
|
|~b#|Decoded, uncompressed backtrace line suitable for `addr2line`|~b#size: 7520, 0x406651 0x406852 0x406c1b 0x406294|
|
|
|
|
Both examples are complete representations of the same 4-level, 64-bit compressed
|
|
backtrace.
|
|
|
|
## Compressed backtrace decode tool
|
|
|
|
The `lws-api-test-backtrace` example (requires `LWS_WITH_COMPRESSED_BACKTRACES`
|
|
to build) decodes the base64 representations with or without the 3-character
|
|
lead-in, to textual output suitable for `addr2line`. Eg
|
|
|
|
```
|
|
$ echo -n "~m#IF0BmagugNDWgCnkhdAYpQa6wAAV" | lws-api-test-backtrace
|
|
~b#size: 7520, 0x406651 0x406852 0x406c1b 0x406294
|
|
```
|
|
|
|
You can use it with `addr2line` in this kind of way (you probably want to give
|
|
`-f -p` to `addr2line` as well)
|
|
|
|
```
|
|
addr2line -e myapplication `echo -n "~m#IF0BmUQugNCkgCnkhdAYpQa6wAAV" | ./bin/lws-api-test-backtrace 2>/dev/null | grep '~b#' | cut -d',' -f2-`
|
|
/projects/libwebsockets/lib/core/alloc.c:124
|
|
/projects/libwebsockets/lib/core/alloc.c:213
|
|
/projects/libwebsockets/lib/core/context.c:600
|
|
/projects/myapplication/main.c:55
|
|
```
|
|
|
|
There is a shell script `./contrib/heapmap.sh` which takes a screenscrape of
|
|
a dump's `~m#` log lines and processes them into an allocation size, backtrace,
|
|
and function names (especially in RELEASE mode, either a function name hint or
|
|
the source coordinates are available).
|
|
|
|
## lws_backtrace compression
|
|
|
|
The compressed blob has an outer structure designed for prepending, where the
|
|
information available at recovery is a pointer to the end of it.
|
|
|
|
![overview](../doc-assets/backtrace.png)
|
|
|
|
### Outer compressed blob layout in memory
|
|
|
|
This goes behind the reported allocation, the actual allocation is increased
|
|
to allow for it and we report what the caller asked for by pointing at the end
|
|
of this. It means eg at free() time, we are told the address just past the end
|
|
of this and work backwards to find the start of the compressed blob (which is
|
|
further aligned backwards to ptr boundary to recover the true allocation start).
|
|
|
|
|data|bits|meaning|
|
|
|---|---|---|
|
|
|compressed blob|variable, padded to byte boundary|Backtrace and extra info|
|
|
|compressed length|fixed, 16|MSB-first 16-bit byte count of compressed blob, including the 16-bit length itself|
|
|
|lws_dll2_t|fixed, 3 x pointers|linked-list for tracking|
|
|
|
|
### Bitwise structure inside the compressed blob
|
|
|
|
|data|bits|meaning|
|
|
|---|---|---|
|
|
|stack depth|5|Number of backtrace callstack levels present|
|
|
|Call stack items, one per stack depth|variable|Compressed Instruction Pointer value|
|
|
|alloc size bits|6|Number of bits in alloc size|
|
|
|alloc size literal|variable|Allocation size|
|
|
|
|
### Call stack item domain-specific compression
|
|
|
|
The goal is to compress 32- or 64-bit backtraces efficiently.
|
|
|
|
The Call stack items are compressed one of two ways and start with a bit
|
|
indicating which method was used for this Call stack item.
|
|
|
|
- 0 = literal value, 1 = delta against a previous reference value
|
|
|
|
The literals issue a bit count and then the significant bits
|
|
|
|
- a 6-bit bit count
|
|
- the significant bits of the literal
|
|
|
|
The delta from a previous Call stack item looks like this:
|
|
|
|
- a 3-bit index (from -1 to -8) says how far back from the
|
|
current stack item the reference value can be found from the call stack
|
|
- a 1-bit sign where 0 == add the delta and 1 == subtract the delta
|
|
- a 6-bit bit count for the delta
|
|
- the significant bits of the delta
|
|
|
|
The delta is decoded, and added or subtracted from the earlier reference result
|
|
to arrive at the correct reconstruction.
|
|
|
|
The first Call stack item is always a literal.
|
|
|
|
## Note for esp-idf
|
|
|
|
Backtrace generation in esp-idf requires `CONFIG_COMPILER_CXX_EXCEPTIONS` set
|
|
in sdkconfig.
|