This uses an std::string, which causes a heap allocation, which is not
async-safe.
Test: atest --no-bazel-mode permissive_mte_test
Change-Id: I4bd53d42d9a6a659abe62a964f14c81d9ec059d0
This is not meant to be enabled long-term, but can be used to assess system
stability with MTE before enabling it.
Bug: 202037138
Change-Id: I9fb9b63ff94da2de0a814fd7150f51559d3af079
When running debuggerd from the command line, it's possible that
the signal will happen on a side thread. The original intercept
in tombstoned is set to only handle crashes from the main thread
pid, so in this case, the intercept doesn't occur. To fix this,
modify the code so that running debuggerd always sends the signal
to the main pid. In addition, modify the signal handler is entered
due to the BIONIC_SIGNAL_DEBUGGER signal, then the crashing tid is
set to the main thread pid instead of the current thread.
Add unit test to cover this case.
Bug: 194346289
Test: All unit tests pass.
Test: Verify the new unit test is getting the signal on the non-main
Test: thread and still properly handling the intercept.
Test: Modify the debuggerd code to send the signal to the non main pid
Test: and verify the dump still occurs correctly.
Change-Id: I2dd1bd11fc8ef4a6fe87f05ecc67ae349a101c82
All but three files are Apache-2.0 already.
Bug: http://b/191499510
Test: /google/src/files/head/depot/google3/wireless/android/busytown/ayeaye/analyzers/copyright/tools/scan_android_project.sh ~/aosp/system/core/debuggerd/ | grep -v APACHE
Change-Id: I430c3382dd160e398f02470d7053ecea39c98f41
Talk of "gdb" when we currently mean "gdb or lldb" and will soon mean
"lldb" is starting to confuse people. Let's use the more neutral
"debugger" in places where it really doesn't matter.
The switch from gdbclient.py to lldbclient.py is a change for another
day...
Test: treehugger
Change-Id: If39ca7e1cdf4c8bb9475f1791cdaf201fbea50e0
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: Ia0a1ee57e7630e01c495dc166218f665340aad7f
* changes:
libdebuggerd: add protobuf implementation.
tombstoned: support for protobuf fds.
tombstoned: make it easier to add more types of outputs.
tombstoned: switch from goto to RAII.
This commit implements protobuf output for tombstones, along with a
translator that should emit bytewise identical output to the existing
tombstone dumping code, except for ancillary data from GWP-ASan and
Scudo, which haven't been implemented yet.
Test: setprop debug.debuggerd.translate.translate_proto_to_text 1 &&
/data/nativetest64/debuggerd_test/debuggerd_test
Test: for TOMBSTONE in /data/tombstones/tombstone_??; do
pbtombstone $TOMBSTONE.pb | diff $TOMBSTONE -
done
Change-Id: Ieeece6e6d1c26eb608b00ec24e2e725e161c8c92
Sadly, it looks like we do still really use libcutils for some of the
socket functions.
Test: treehugger
Change-Id: Ic71f97507c89b10d2f3b7a2971064a9e6b1d349d
The discussion on LKML is converging on v16 of the fault address tag
bits patch [1]. In this version of the patch the presence of the tag
bits in si_addr is controlled by a sa_flags bit, and a protocol is
introduced to allow userspace to detect kernel support for sa_flags
bits. Update the tombstone signal handler to use this API to read
the tag bits, update the interceptors in libsigchain to implement
the flag support detection protocol and hide the tag bits in si_addr
from chained signal handlers that did not request them to match the
kernel behavior.
[1] https://lore.kernel.org/linux-arm-kernel/cover.1605235762.git.pcc@google.com/
Change-Id: I57f24c07c01ceb3e5b81cfc15edf559ef7dfc740
If crash_dump dies before it gets a chance to write to the pipe we use
to let the debugged-process know that it successfully started, we
weren't cleaning up the child we fork to start it, leaving a zombie
child.
Bug: http://b/152119184
Test: debuggerd_test
Change-Id: Id01cc05f693995e9998941774f74ab8e3d8b4d8a
On aarch64, the top 8 bits of the address (i.e. the tag bits) of
the fault address in si_addr are always clear. This isn't ideal for
MTE which will require these bits in order to correctly diagnose
tag mismatches.
A proposed kernel patch [1] exposes the full fault address including
the tag bits as part of the ucontext. Change debuggerd to read this
fault address if available.
[1] https://patchwork.kernel.org/patch/11435077/
Bug: 135772972
Change-Id: Ia05be574113860f4e9ecc36a310c4b740e0c4afb
Similar to r.android.com/1247247 I'll be adding more of them for MTE.
Also, change the protocol between the crasher and crash_dump to make
it easier to add new fields and change the referenced data structures
without needing to worry about versioning. The version number for
static executables is now always 1 (where the protocol will never
change), while the version number for dynamic executables is always
4 (where the protocol can change, because the linker and crash_dump
are version locked).
Bug: 135772972
Change-Id: Ib4696d0544d7c87cb429aaaa15f18c3640059e16
A future change will introduce a version lock between linker and
crash_dump. Move crash_dump into the runtime APEX alongside linker in order to
ensure that they will be the same version even if the runtime APEX is updated.
Bug: 135772972
Change-Id: Ic2eae31b6927eb0e8a62315ac141f50933c00bcc
Merged-In: Ic2eae31b6927eb0e8a62315ac141f50933c00bcc
GWP-ASan can provide information about a crash that it caused. Grab the
GWP-ASan regions from the globals shared by the linker for crash-handler
purpopses, pull the information from GWP-ASan, and display it.
This adds two regions:
1. Causality tracking by GWP-ASan. We now print a cause header about
the crash, like `Cause: [GWP-ASan]: Use After Free on a 1-byte
allocation at 0x7365bb3ff8`
2. Allocation and deallocation stack traces.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: Id28d5400c9a9a053fcde83a4788f971e677d4643
1 page isn't enough to log on AArch64, and clean pages are free, so
increase the stack size to 8 pages.
Bug: http://b/144887737
Test: treehugger
Change-Id: I731b3bc27ab37f4b830a9478a04cd34d4f7648d3
C++20 wants members to be ordered unlike C99.
Bug: 139945549
Test: mm
Change-Id: I3cbca589511c1e0bbc10c691949e18de77e16031
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
There appears to be a kernel bug that causes SIGHUP and SIGCONT to be
sent to the parent process group we spawn from if the process group
contains stopped jobs (e.g. the parent itself, because of wait_for_gdb).
Call setsid in all of our children to prevent this from happening.
Bug: http://b/31124563
Test: adb shell 'setprop debug.debuggerd.wait_for_gdb 1; killall -ABRT surfaceflinger'
Change-Id: I1a48d70886880a5bfbe2deb80d48deece55faf09
If a process is ptraced already, we might not be able to exec crash_dump
due to selinux. Since we can be called for non-fatal events, we
shouldn't abort in that case.
Bug: http://b/128054996
Test: treehugger
Change-Id: I1442041caa7af908df2ab87b9e010c44082e7587
Make it possible for code such as fdsan that generates debugging
tombstones via raise(DEBUGGER_SIGNAL) to pass an abort message as well.
Bug: http://b/112770187
Test: debuggerd_test
Change-Id: Idc34263241c18033573e466da3a45aa6f716ddb3
Pass the address of the fdsan table down to crash_dump so that we can
dump the fdsan table along with the open file descriptor list.
Test: debuggerd_test
Test: manually ran an old static_crasher
Change-Id: Icbac5487109f2db1e1061c4d46de11b016b299e3
Suicide doesn't change:
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
But homicide now looks like this (this is `sleep 666` killed by
`kill -SEGV` as root:
signal 11 (SIGSEGV), code 0 (SI_USER from pid 4446, uid 0), fault addr --------
Bug: http://b/78594105
Test: manual
Change-Id: I8c2feafba8cc5a3db85e8250004d428a464c5d9e
Set and restore PR_SET_PTRACER when performing a dump, so that when
Android is running on a kernel that has the Yama LSM enabled (and the
value of ptrace_scope is > 0), crash_dump can attach to processes and
print nice, symbolized stack traces.
Bug: 70992745
Test: kill -6 `pidof surfaceflinger` && logcat -d -b crash
# in both sailfish and Chrome OS
Change-Id: If4646442c6000fdcc69cf4ab95fdc71ae74baaaf
When a process crashes, both ActivityManager and init will try to kill
its process group when they notice. The recent change to minimize the
amount of time a process is paused results in crash dumps being killed
before they finish as a result of this. Since anything that needs to be
low-latency is probably not going to be too happy if it crashes, just
wait for completion whenever we're processing a real crash.
Bug: http://b/70343110
Test: debuggerd_test
Change-Id: I894bb06efd264b1ba005df06f7326a72f4b767bb
Reduce the amount of time that a process remains paused by pausing its
threads, fetching their registers, and then performing unwinding on a
copy of its address space. This also works around a kernel change
that's in 4.9 that prevents ptrace from reading memory of processes
that we don't have immediate permissions to ptrace (even if we
previously ptraced them).
Bug: http://b/62112103
Bug: http://b/63989615
Test: treehugger
Change-Id: I7b9cc5dd8f54a354bc61f1bda0d2b7a8a55733c4
Always check to see if the fallback handler has been called and is
not trying to dump a specific thread.
Bug: 69110957
Test: Verified on a system where the prctl value changes, that before the
Test: change it dumps multiple tombstones, and after the change it
Test: works as expected.
Test: Ran debuggerd unit tests.
Test: Dumped process using debuggerd -b <PID> and debuggerd <PID>.
Change-Id: Id98bbe96cced9335f7c3e17088bb4ab2ad2e7a64
All intercept requests and crash dump requests must now specify a
dump_type, which can be one of kDebuggerdNativeBacktrace,
kDebuggerdTombstone or kDebuggerdJavaBacktrace. Each process can have
only one outstanding intercept registered at a time.
There's only one non-trivial change in this changeset; and that is
to crash_dump. We now pass the type of dump via a command line
argument instead of inferring it from the (resent) signal, this allows
us to connect to tombstoned before we wait for the signal as the
protocol requires.
Test: debuggerd_test
Change-Id: I189b215acfecd08ac52ab29117e3465da00e3a37
bionic's cached values for getpid/gettid can be invalid if the crashing
process manually invoked clone to create a thread or process, which
will lead the crash_dump refusing to do anything, because it sees the
actual values.
Use the getpid/gettid syscalls directly to ensure correct values on
this end.
Bug: http://b/37769298
Test: debuggerd_test
Change-Id: I0b1e652beb1a66e564a48b88ed7fa971d61c6ff9
Move the name of the "private/libc_logging.h" header to <async_safe/log.h>.
For use of libc_malloc_debug_backtrace, remove the libc_logging library.
The library now includes the async safe log functions.
Remove the references to libc_logging.cpp in liblog, it isn't needed because
the code is already protected by a check of the __ANDROID__ define.
Test: Compiled and boot bullhead device.
Test: Run debuggerd unit tests.
Test: Run liblog unit tests on target and host.
Test: Run libmemunreachable unit tests (these tests are flaky though).
Change-Id: Ie79d7274febc31f210b610a2c4da958b5304e402
Applications can set abort messages via android_set_abort_message
without actually aborting. This leads to following non-fatal dumps
printing their output to logcat in the same format as a regular crash.
Bug: http://b/37754992
Test: debuggerd_test
Change-Id: I9c5e942984dfda36448860202b0ff1c2950bdd07
This just means we were asked to dump, not that something necessarily went
wrong.
Bug: http://b/36191903
Test: builds
Change-Id: I5638b38f3a13081b1e971512f43238010febb59c
`1 << 32` overflows, resulting in bogus PR_CAP_AMBIENT_RAISE attempts,
and breaking dumping for processes with capabilities in the top 32 bits.
Bug: http://b/35241370
Test: debuggerd -b `pidof com.android.bluetooth`
Change-Id: I29c45a8bd36bdeb3492c9f74599993c139821088
Do an in-process unwind for processes that have PR_SET_NO_NEW_PRIVS
enabled.
Bug: http://b/34684590
Test: debuggerd_test, killall -ABRT media.codec
Change-Id: I62562ec2c419d6643970100ab1cc0288982a1eed
snprintf isn't safe to call in the linker after initialization, because
it uses MB_CUR_MAX which is implemented via pthread_getspecific, which
uses TLS slots shared with libc. If the TLS slots are assigned in a
different order between libc.so and the linker, MB_CUR_MAX will
evaluate to an incorrect value, and lead to snprintf doing bad things.
Switch to __libc_format_buffer.
Bug: http://b/35367169
Test: debuggerd -b `pidof zygote`
Change-Id: I9d315cf63e5f3fd2f4545d6e3f707cdbe94ec606
Set and restore PR_SET_DUMPABLE when performing a dump, so that
processes that have it implicitly cleared (e.g. services that acquire
filesystem capabilities) still get crash dumps.
Bug: http://b/35174939
Test: debuggerd -b `pidof surfaceflinger`
Change-Id: Ife933c10086e546726dec12a7efa3f9cedfeea60
Raise CapInh and CapAmb after forking to exec crash_dump, so that it
can ptrace us.
Bug: http://b/35174939
Test: debuggerd -b `pidof surfaceflinger`
Change-Id: I32567010a3603cfa494aae9dc0e3ce73fb86b590
waitpid(..., __WCLONE) fails with ECHILD when passed an explicit PID to
wait for. __WALL and __WCLONE don't seem to be necessary when waiting
for a specific pid, so just pass 0 in the flags instead.
Bug: http://b/35327712
Test: /data/nativetest/debuggerd_test/debuggerd_test32 --gtest_filter="*zombie*"
Change-Id: I3dd7a1bdf7ff35fdfbf631429c089ef4e3172855
Fixed this when I tested on internal, but failed to copy the fix over
when submitting to AOSP.
Bug: http://b/35070339
Test: `adb bugreport` on angler
Change-Id: Ib84d212e5f890958cd21f5c018fbc6f368138d1e
* changes:
debuggerd_handler: don't use clone(..., SIGCHLD, ...)
crash_dump: drop capabilities after we ptrace attach.
crash_dump: use /proc/<pid> fd to check tid process membership.
debuggerd_handler: raise ambient capset before execing.
Revert "Give crash_dump CAP_SYS_PTRACE."