This is a race between init process and bionic libc initialization of
snapuserd.
init->fork() ----------------> SecondStageMain() -> PropertyInit()
|
|
v
execveat ---> __libc_init_common() -> __system_properties_init()
(snapuserd)
When init process calls PropertyInit(), /dev/__properties__ directory
is created. When bionic libc of snapuserd daemon invokes __system_properties_init
_after_ init process PropertyInit() function is invoked, libc will
try to initialize the property by reading
/system/etc/selinux/plat_property_contexts. Since any reads on /system
has to be served by snapuserd, this specific read from libc cannot be
serviced leading to deadlock.
Reproduce the race by inducing a sleep of 1500ms just before execveat()
so that init process calls PropertyInit() before bionic libc
initialization. This leads to deadlock
immediately and with additional kernel instrumentation with debug
logs confirms the failure:
======================================================
init: Relaunched snapuserd with pid: 428
ext4_file_open: SNAPUSERD: path /system/etc/selinux/plat_property_contexts - Pid: 428 comm 8
ext4_file_read_iter: SNAPUSERD for path: /system/etc/selinux/plat_property_contexts pid: 428 comm 8
[ 25.418043][ T428] ext4_file_read_iter+0x3dc/0x3e0
[ 25.423000][ T428] vfs_read+0x2e0/0x354
[ 25.426986][ T428] ksys_read+0x7c/0xec
[ 25.430894][ T428] __arm64_sys_read+0x20/0x30
[ 25.435419][ T428] el0_svc_common.llvm.17612735770287389485+0xd0/0x1e0
[ 25.442095][ T428] do_el0_svc+0x28/0xa0
[ 25.446100][ T428] el0_svc+0x14/0x24
[ 25.449825][ T428] el0_sync_handler+0x88/0xec
[ 25.454343][ T428] el0_sync+0x1c0/0x200
=====================================================
Fix:
Before starting init second stage, we will wait
for snapuserd daemon to be up and running. We do a simple probe by
reading system partition. This read will eventually be serviced by
daemon confirming that daemon is up and running. Furthermore,
we are still in the kernel domain and sepolicy has not been enforced yet.
Thus, access to these device mapper block devices are ok even though
we may see audit logs.
Note that daemon will re-initialize the __system_property_init()
as part of WaitForSocket() call. This is subtle but important; since
bionic libc initialized had failed silently, it is important
that this re-initialization is done.
Bug: 207298357
Test: Induce the failure by explicitly delaying the call of execveat().
With fix, no issues observed.
Tested incremental OTA on pixel ~15 times.
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I86c2de977de052bfe9dcdc002dcbd9026601d0f3
* changes:
init.rc: Set permissions to cgroup.procs files
libprocessgroup: Add fd caching support for SetProcessProfiles
libprocessgroup: Move fd caching code into FdCacheHelper
Set permissions to cgroup.procs files in cgroup hierarchies similar to
permissions for tasks files so that SetProcessProfiles can access them.
Bug: 215557553
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Id0c82288392146c8d536d273790a0252580c4203
Process profiles operating on paths that do not depend on pid or uid of
the process can cache the fd of the file they are operating on. Add
support for fd caching similar to how SetTaskProfiles caches the fd
of the file it needs to write to.
Bug: 215557553
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ie73ebcbbf1919d90409f40c1f6b08743f4edf97c
Refactor file descriptor caching code and move it into FdCacheHelper
because later on when we introduce fd caching for SetProcessProfiles
the children of CachedFdProfileAction become different enough that
sharing the same parent becomes a hindrance.
Bug: 215557553
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If3812a0090c81a29e25f0888b0511cfaf48edea3
Adds a check for a DSU mode boot in storageproxyd. Changes path handling
so that storageproxyd will not allow opening a file in the root data
path in DSU mode. Instead, storageproxyd creates an "alternate/"
directory in the data directory and the TA must use this directory to
store its backing file.
Re-landing reverted change: Iad68872dc6915f64eaf26cd3c92c04d9071ef169
Test: Boot into DSU and inspect logs for "Cannot open root data file"
Test: Test that TD writes in DSU mode don't corrupt host image storage
when using a compatible storage TA that supports alternate data mode.
Bug: 203719297
Change-Id: I1d07e7c3d15dc1beba2d340181d1b11a7988f869
This patch attempts to diagnose snapuserd hangs by performing reads
immediately after entering second-stage init. This is done by spawning
two threads: one to perform the reads, and another to wait for the read
thread to finish. If any aspect of the read fails, or the read thread
does not complete in 10 seconds, then a list of snapuserd's open file
descriptors are logged.
Bug: 207298357
Test: apply working OTA, check logcat for success
apply broken OTA, check logcat for fd map
Change-Id: I549e07b7d576fcdaca9b2d6ff33e0924c3812c07
This reverts commit 7c5658b5fd.
Reason for revert: selinux test errors in some branches
Bug: 215630608
Change-Id: I2a9c9d914b6c1d1248b4f11bd69484ae6b0ba8d1
Add a malformed download command test.
And add a sparse file test with very large block size
Bug: 215236564
Test: bootloader fastboot
Change-Id: I1072ba189ac15b2e1eb8f13ffd754f93c967e2d5
The BSD license used by some of the files in the project was lacking a
license_text file.
Bug: 191508821
Test: m fastboot
Change-Id: I3bdfdea3de69ceaa28528b72a09d02d2a9535e85
Bug: http://b/194128476
Bug: http://b/210012154
This reverts commit e59f0f66fc.
Coverage metrics dropped for ~10 of the 40 modules. There are also
regressions in mainline when running tests on older platform builds.
Test: presubmit
Change-Id: I50a011f68dcdc25883a68701c51e7e2aabc5a7dc
The library is required to log the atoms from virt apex. See bug for more details
Test: m succeeded. The whole topic is tested with statsd_testdrive 409
as mentioned in go/westworld-create-atom#step-3-test-your-atom
Change-Id: If8b13c9d1878265bfcb8e09fc1bd8e78e968f71f
Sparse file can come from an untrusted source.
Need more checking to ensure that it is not a malformed
file and would not cause any OOB read access.
Update fuzz test for decoding also.
Test: adb reboot fastboot
fuzzy_fastboot --gtest_filter=Fuzz.Sparse*
fuzzy_fastboot --gtest_filter=Conformance.Sparse*
sparse_fuzzer
Bug: 212705418
Change-Id: I7622df307bb00e59faaba8bb2c67cb474cffed8e
* changes:
Add a new property to disable io_uring and run vts and snapuserd_test
snapuserd: Async I/O for block verification
snapuserd: Use io_uring api's for snapshot merge
Hard to get otherwise if you're trying to debug PAC issues.
Bug: http://b/214314197
Test: treehugger
Change-Id: I2e5502809f84579bf287364e59d6e7ff67770919
musl libc defines NULL as nullptr, which is explicitly allowed by
C++11. nullptr_t cannot be implicitly cast to an integral type.
Use 0 instead.
Bug: 190084016
Test: m USE_HOST_MUSL=true host-native
Change-Id: I0c3b6c94cd69262f574414bf52494333f2f2645a
vts and snapuserd_test
This should be run on cuttlefish
Bug: 202784286
Test: vts_libnspahost_test, snapuserd_test
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I2c28e98f04beca770b8a6efa9474d602fe26f514
using io_uring READ/WRITE opcodes for snapshot merge.
Specifically, it is used only for readahead and ordered ops
code path.
Snapshot merge perf:
===========================================================
Incremental OTA of 300M between two git_master branches on Pixel 6:
===========================================================
On Android S (with dm-snapshot): ~15 minutes:
update_engine: [INFO:cleanup_previous_update_action.cc(330)] Merge finished with state MergeCompleted.
update_engine: [INFO:cleanup_previous_update_action.cc(130)] Stopping/suspending/completing CleanupPreviousUpdateAction
update_engine: [INFO:cleanup_previous_update_action.cc(501)] Reporting merge stats: MergeCompleted in 926508ms (resumed 0 times), using 0 bytes of COW image.
===========================================================
On Android T (with io_uring: ~38 seconds):
update_engine: [INFO:cleanup_previous_update_action.cc(330)] Merge finished with state MergeCompleted.
update_engine: [INFO:cleanup_previous_update_action.cc(130)] Stopping/suspending/completing CleanupPreviousUpdateAction
update_engine: [INFO:cleanup_previous_update_action.cc(501)] Reporting merge stats: MergeCompleted in 38868ms (resumed 0 times), using 0 bytes of COW image.
===========================================================
Bug: 202784286
Test: Full/Incremental OTA
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I24ed3ae16730d0a18be0350c162dc67e1a7b74e1
This change provide a specialization of android::base::OkOrFail for
status_t. As a result, a statement whose type is status_t can be used
with OR_RETURN.
The specialization also provides conversion operators to Result<T,
StatusT> where StatusT is a wrapper type for status_t. This allows
OR_RETURN macro to be used in newer functions that returns Result<T,
StatusT>.
Example usage:
\#include <utils/ErrorsMacros.h>
status_t legacy_inner();
status_t legacy_outer() {
OR_RETURN(legacy_inner());
return OK;
}
Result<T, StatusT> new_outer() {
OR_RETURN(legacy_inner()); // the same macro
return T{...};
}
Bug: 209929099
Test: atest libutils_test
Change-Id: I0def0e84ce3f0c4ff6d508c202bd51902dfc9618