This patch teaches pbtombstone to use llvm-symbolizer to symbolize
stack traces and augment the protobuf tombstones with the symbol
information, before printing tombstones with the symbolized stack
traces included.
The main advantage of adding this information to the tombstone
as opposed to having developers use the stack tool is that stack
does not print all of the information in the original tombstone,
which means that both reports may be required to understand a crash.
Furthermore, stack traces printed by stack are not correlated with
the stack traces in the tombstone, making the report harder to read,
especially with GWP-ASan and MTE which may produce multiple stack
traces for the crashing thread.
Although we could teach stack to print more information, this would
continue to be fragile because stack relies on parsing textual
tombstones. Switching stack to read proto tombstones would be
tantamount to a full rewrite and would require duplicating the C++
proto-to-text logic that we already have in Python. It seems better
to reuse the C++ code for the proto-based symbolization tool.
llvm-symbolizer will look up the symbol files by build ID using a
.build-id directory following the standard here:
https://fedoraproject.org/wiki/RolandMcGrath/BuildID
It will look for .build-id directories under paths specified with
--debug-file-directory, which pbtombstone will pass through to
llvm-symbolizer using its own --debug-file-directory flag. The
intent is that tools for platform developers will pass the flag
--debug-file-directory $ANDROID_PRODUCT_OUT/symbols to pbtombstone.
Soong will start creating .build-id under symbols after a corresponding
Soong CL lands.
Bug: 328531087
Change-Id: Ia4676821cf980c69487cf11aefa2a02dc0c1626f
This is preparation for the next patch, which adds host-side
symbolization capabilities to pbtombstone.
Bug: 328531087
Change-Id: Id5813ae6b121af784643b1ed76084e49fdca118b
We will change the symbolizer to use this information to output
something like:
Potentially referenced stack object:
0 bytes inside a stack variable "variableName" in stack frame of function functionName
at source.cc:1234
Bug: 309446520
Change-Id: I1163ac81ac6b5e184387eb9e058d97a7227e3671
Guest thread information will print out follow host thread.
Revert submission 3081452-revert-3062926-CJGHTRPCBP
Reason for revert: Will make the change base on the original CLs for a reland.
Bug: b/321799516
Test: riscv64, checked tombstone file has wanted block.
https://paste.googleplex.com/6282302317658112
Added arm64 support and tested arm64 unwinding in internal repo.
https://paste.googleplex.com/6545612887818240
Change-Id: Ie54ad6f359d60283442adfcd9ee95f5a116e4b72
Revert submission 3081452-revert-3062926-CJGHTRPCBP
Reason for revert: Will make the change base on the original CLs for a reland.
(Original CL commit message)
This CL is to get guest registers information.
Bug: b/321799516
Test: m
Testing for TLS Slot:
Manual testing by: 1. crash the jni tests to produce tombstones file 2.
get the signature field of guest state header 3. verified it is the same
value as NATIVE_BRIDGE_GUEST_STATE_SIGNATURE
Manual test the arm64 by: 1. flash build to pixel phone and verify
retrieving TLS_SLOT_THREAD_ID's tid field is the same as current thread
id.
Testing for register values:
Test and print out registers values for riscv64, looks make sense that
has null zero value slots.
Change-Id: Ieebf845bff517380ee07fac77f24b48efeb53521
Revert submission 3062926
Reason for revert: We want guest state to be present in all threads - revert to be able to fix the proto field type.
Reverted changes: /q/submissionid:3062926
Change-Id: I32b745cca95a619b78bdce0a7d948ac479d42f21
Revert submission 3062926
Reason for revert: We want guest state to be present in all threads - revert to be able to fix the proto field type.
Reverted changes: /q/submissionid:3062926
Change-Id: I87b282a0d9caebe4eae2e7d8eca8ec8ebaa3eca6
This CL is to get guest registers information.
Bug: b/321799516
Test: m
Testing for TLS Slot:
Manual testing by: 1. crash the jni tests to produce tombstones file 2.
get the signature field of guest state header 3. verified it is the same
value as NATIVE_BRIDGE_GUEST_STATE_SIGNATURE
Manual test the arm64 by: 1. flash build to pixel phone and verify
retrieving TLS_SLOT_THREAD_ID's tid field is the same as current thread
id.
Testing for register values:
Test and print out registers values for riscv64, looks make sense that
has null zero value slots.
Change-Id: Iff44ac5c2b202e44f3fb4e6909fbea141e54ae6b
The malloc_not_svelte variable name is confusing and makes the
low memory config the default. Change this so that the default is
the regular allocator, and that Malloc_low_memory is used to enable
the low memory allocator.
Update blueprint rules so that scudo is the default action.
Test: Verified scudo config is used by default.
Test: Verfified Android GO config uses the jemalloc low memory config.
Change-Id: Ie7b4b005a6377e2a031bbae979d66b50c8b3bcdb
This is done so that we could depend on it elsewhere without needing all the unrelated methods.
Needed for ag/24553347
Bug: 296207744
Test: refactoring build
Change-Id: I7c6733208f3ae63ba9559753a24cffcb8e1b9d1e
This is a no-op but will be used in upcoming scudo changes that allow to
change the depot size at process startup time, and as such we will no
longer be able to call __scudo_get_stack_depot_size in debuggerd.
We already did the equivalent change for the ring buffer size in
https://r.android.com/q/topic:%22scudo_ring_buffer_size%22
Bug: 309446692
Change-Id: I761a7602c54a1f8f2d0575c5e011820d8dbaab63
This is a no-op but will be used in upcoming scudo changes that allow to
change the buffer size at process startup time, and as such we will no
longer be able to call __scudo_get_ring_buffer_size in debuggerd.
Bug: 263287052
Change-Id: I350421d1fcdf22ce3b8b73780b88c1e10fa8a074
r.android.com/2108505 was intended to fix a crash in Scudo in
the case where the stack depot, region info or ring buffer were
unreadable. However, it also ended up introducing a number of bugs into
the code. It failed to call __scudo_get_error_info if the page at the
fault address was unreadable. This can happen in legitimate crash cases
if a primary allocation was close to the boundary of a mapped region,
or if the allocation was a secondary allocation with guard pages. It
also used long as the type for tags, whereas Scudo expects it to be
char. In combination this ended up causing most of the MTE tests to
fail. Therefore, mostly revert that change.
Fix the original crash by null checking the pointers returned by
AllocAndReadFully before proceeding with the rest of the function.
Bug: 233720136
Change-Id: I04d70d2abffaa35fe315d15d9224f9b412a9825d
The code doesn't properly check if data is not read properly, so
make it fail if reads fail. Also, change the algorithm so that
first try and read the faulting page then 16 pages before and 16
pages after. Rather than trying to read every one of these pages,
stop as soon as one is unreadable. This means that the total memory
passed to the scudo error function is all valid data, rather than
potentially being some uninitialized memory.
Added new unit tests to cover scudo address processing.
Bug: 233720136
Test: All unit tests pass.
Test: atest CtsIncidentHostTestCases
Change-Id: I18a97bdee9a0c44075c1c31ccd1b546d10895be9
This simplifies most of the calls to avoid doing any Android
specific code.
Bug: 120606663
Test: All unit tests pass.
Change-Id: I511e637b9459a1f052a01e501b134e31d65b5fbe
Hard to get otherwise if you're trying to debug PAC issues.
Bug: http://b/214314197
Test: treehugger
Change-Id: I2e5502809f84579bf287364e59d6e7ff67770919
The frame data no longer contains map_XXX fields which represent
the map data. Now there is only a shared pointer to the MapInfo
object with which this frame is associated.
Bug: 120606663
Test: Unit tests pass.
Change-Id: I89282963f742f6fcc07e48533da4108dc16bdce9
Proto tombstones were missing tagged fault addresses, tagged_addr_ctrl,
tags in memory dumps and Scudo and GWP-ASan error reports. Since text
tombstones now go via protos, all of these features broke when we
switched to text tombstones generated from protos by default. Fix
the features by adding support for them to the proto format,
tombstone_proto and tombstone_proto_to_text.
Bug: 135772972
Bug: 182489365
Change-Id: I3ca854546c38755b1f6410a1f6198a44d25ed1c5
With this change we can report memory errors involving secondary
allocations. Update the existing crasher tests to also test
UAF/overflow/underflow on allocations with sizes sufficient to trigger
the secondary allocator.
Bug: 135772972
Change-Id: Ic8925c1f18621a8f272e26d5630e5d11d6d34d38
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: Ia0a1ee57e7630e01c495dc166218f665340aad7f
* changes:
libdebuggerd: add protobuf implementation.
tombstoned: support for protobuf fds.
tombstoned: make it easier to add more types of outputs.
tombstoned: switch from goto to RAII.
Currently, all MTE failures end up displaying 'Fault address falls at
0x<addr> after any mapped regions'. Clearly when scanning, we should use
the untagged address to figure out which ranges it's in.
I've taken the liberty of removing all si_addr parsing and moving it
into the common ProcessInfo, as well as making it really explicit
whether you want the (possibly tagged) original si_addr, or whether you
want the untagged variant (for scanning /proc/maps or whatever).
This is not particularly easily testable, as ReadCrashInfo isn't easily
injectable and `dump_all_maps` should already be passed the untagged
pointer to scan for. I've tested this locally on FVP under SYNC MTE with
a simple UaF binary and noted the problem is fixed. Given that this is
making the code more clear, I'm hoping the owners see no need for a
regression test :).
Bug: 135772972
Test: On FVP, run 'adb shell MEMTAG_OPTIONS=sync sanitizer-status' and
check that the use-after-free test ends up with the /proc/maps
desription in the right place.
Change-Id: I220e4200c75a72474a95a67e5bbc36173a438dd2
This commit implements protobuf output for tombstones, along with a
translator that should emit bytewise identical output to the existing
tombstone dumping code, except for ancillary data from GWP-ASan and
Scudo, which haven't been implemented yet.
Test: setprop debug.debuggerd.translate.translate_proto_to_text 1 &&
/data/nativetest64/debuggerd_test/debuggerd_test
Test: for TOMBSTONE in /data/tombstones/tombstone_??; do
pbtombstone $TOMBSTONE.pb | diff $TOMBSTONE -
done
Change-Id: Ieeece6e6d1c26eb608b00ec24e2e725e161c8c92
The discussion on LKML is converging on v16 of the fault address tag
bits patch [1]. In this version of the patch the presence of the tag
bits in si_addr is controlled by a sa_flags bit, and a protocol is
introduced to allow userspace to detect kernel support for sa_flags
bits. Update the tombstone signal handler to use this API to read
the tag bits, update the interceptors in libsigchain to implement
the flag support detection protocol and hide the tag bits in si_addr
from chained signal handlers that did not request them to match the
kernel behavior.
[1] https://lore.kernel.org/linux-arm-kernel/cover.1605235762.git.pcc@google.com/
Change-Id: I57f24c07c01ceb3e5b81cfc15edf559ef7dfc740
This value indicates whether memory tagging is enabled on a thread,
the mode (sync or async) and the set of excluded tags. This information
can sometimes be important for understanding an MTE related crash,
so include it in the per-thread tombstone output.
Bug: 135772972
Change-Id: I25a16e10ac7fbb2b1ab2a961a5279f787039000b
An OEM asks for sub-second granularity, and that's most easily done if
we only have one timestamp generator. I'm not convinced sub-second
granularity is particularly useful myself, and I definitely don't think
that nanosecond resolution is meaningful but I do like this cleanup, and
if I'm going to use sub-second precision I may as well use the maximum
precision available to me.
Also reduce some duplication of code reading cmdline/comm.
Bug: https://issuetracker.google.com/161860597
Test: head /data/tombstones/*
Change-Id: I035ecfd4a3338ccd84dae0ef973a998a7c7c5056
Teach debuggerd to use the new scudo APIs proposed in
https://reviews.llvm.org/D77283 for extracing MTE error reports from crashed
processes, and include those reports in tombstones if possible.
Bug: 135772972
Change-Id: I082dfd0ac9d781cfed2b8c34cc73562614bb0dbb
On aarch64, the top 8 bits of the address (i.e. the tag bits) of
the fault address in si_addr are always clear. This isn't ideal for
MTE which will require these bits in order to correctly diagnose
tag mismatches.
A proposed kernel patch [1] exposes the full fault address including
the tag bits as part of the ucontext. Change debuggerd to read this
fault address if available.
[1] https://patchwork.kernel.org/patch/11435077/
Bug: 135772972
Change-Id: Ia05be574113860f4e9ecc36a310c4b740e0c4afb
We're now passing around a couple of addresses for GWP-ASan in addition
to abort_msg_address and fdsan_table_address, and I'm going to need to add
more of them for MTE. Move them into a data structure in order to simplify
various function signatures.
Bug: 135772972
Change-Id: Ie01e1bd93a9ab64f21865f56574696825a6a125f
GWP-ASan can provide information about a crash that it caused. Grab the
GWP-ASan regions from the globals shared by the linker for crash-handler
purpopses, pull the information from GWP-ASan, and display it.
This adds two regions:
1. Causality tracking by GWP-ASan. We now print a cause header about
the crash, like `Cause: [GWP-ASan]: Use After Free on a 1-byte
allocation at 0x7365bb3ff8`
2. Allocation and deallocation stack traces.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: Id28d5400c9a9a053fcde83a4788f971e677d4643
GWP-ASan's crash information retrieval services requires a Printf()
function (declared by the system/implementing allocator). In this
instance, because _LOG is called with additional arguments (the log_t),
this function must be wrapped to conform to printf_t defined by
GWP-ASan.
We can easily wrap the variadic version.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: I17209cd2b7455ce889e2f8194969f606cac329eb
This is for Android Telemetry to be able to categorise the processes
that produce tombstones.
Test: atest debugerd_test:TombstoneTest
Change-Id: Ie635347c9839eb58bfd27739050bd68cbdbf98da
Modify the unwinder library to indicate that at least one of the stack
frames contains an elf file that is unreadable.
Modify debuggerd to display a note about the unreadable frame and a possible
way to fix it.
Bug: 129769339
Test: New unit tests pass.
Test: Ran an app that crashes and has an unreadable file and verified the
Test: message is displayed. Then setenforce 0 and verify the message is
Test: not displayed.
Change-Id: Ibc4fe1d117e9b5840290454e90914ddc698d3cc2
Somehow the code was still including this include from libbacktrace.
I think the libbacktrace include directory was coming from some
transitive includes. I verified that nothing in debuggerd is using
the libbacktace.so shared library.
Bug: 120606663
Test: Builds, unit tests pass.
Change-Id: I85c2837c5a539ccefc5a7140949988058d21697a
Small modifications to the dump_stack method and added unit tests to
verify the output.
Bug: 120606663
Test: Unit tests pass, debuggerd run on processes on target.
Change-Id: Id385a915b751abda3dd6baebed6c3ce498c3bf6e
This commit only prints the raw value of the owner tag, pretty-printing
will come in a follow-up commit.
Test: debuggerd `pidof adbd`
Test: static_crasher fdsan_file + manual inspection of tombstone
Change-Id: Idb7375a12e410d5b51e6fcb6885d4beb20bccd0e
Suicide doesn't change:
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
But homicide now looks like this (this is `sleep 666` killed by
`kill -SEGV` as root:
signal 11 (SIGSEGV), code 0 (SI_USER from pid 4446, uid 0), fault addr --------
Bug: http://b/78594105
Test: manual
Change-Id: I8c2feafba8cc5a3db85e8250004d428a464c5d9e
In order to support the offline unwinding properly, get rid of the
usage of non-fixed type uintptr_t from all API calls.
In addition, completely remove the old local and remote unwinding code
that used libunwind.
The next step will be to move the offline unwinding to the new unwinder.
Bug: 65682279
Test: Ran unit tests for libbacktrace/debuggerd.
Test: Ran debuggerd -b on a few arm and arm64 processes.
Test: Ran crasher and crasher64 and verified tombstones look correct.
Change-Id: Ib0c6cee3ad6785a102b74908a3d8e5e93e5c6b33
Reduce the amount of time that a process remains paused by pausing its
threads, fetching their registers, and then performing unwinding on a
copy of its address space. This also works around a kernel change
that's in 4.9 that prevents ptrace from reading memory of processes
that we don't have immediate permissions to ptrace (even if we
previously ptraced them).
Bug: http://b/62112103
Bug: http://b/63989615
Test: treehugger
Change-Id: I7b9cc5dd8f54a354bc61f1bda0d2b7a8a55733c4