2038问题
客户报告了一个时间无法设置到2038年的问题,正好有空研究一下
检查了市面上大多数的Android手机都无法选择到2038年,甚至苹果手机虽然可以选到2038,但最多也只能选到1月1号
这是为什么呢?
这其实是一个类似于Window系统上的千年虫问题。
在上世纪60年代,计算机的存储资源非常稀缺,程序员在编写每一行代码的时候,都要为存储的使用而精打细算。
为了更好的节约内存空间,Grace Murray Hopper采用了6位数字的组合来存储时间信息。比如1965年7月23日,对应存储形式就是65/07/23,省去了前面的“19”。
这样的存储形式,很快就在计算机领域流传开来。尽管它节约了一定的存储成本,却埋下了两个隐患:
- 由于年份只保留了后两位,当2000年到来的时候,年份存储的信息是“00”,计算机无法区分这到底是1900年,还是2000年。
- 1900年不是闰年,但2000年是闰年,如果年份的前两位被默认为“19”,那么2000年的2月29日就会被当做3月1日,存储为00/03/01。
这两大隐患,都是到了2000年才会爆发出来,所以这个问题被称为“千年虫”问题,英文缩写为Y2K。
2038也是和这个类似,都是因为数据位数存储限制导致的
在32位系统上,time_t能表示的最大值为0x7ffffffff,当time_t取最大值时表示系统时间为2038-01-19 03:14:07,但时间再往后走时,那time_t会溢出变成一个负值,此时系统时间会倒流回到1901年,届时操作系统和上层软件都会运行错出
具体可以看https://blog.csdn.net/linyt/article/details/52728910
然后还碰到另一个的问题,在使用date命令调整时间到2038年之后重启,机器正常开机进入桌面,但是没有显示状态栏和导航栏,状态栏也没有办法下滑,应用可以点击使用,初步分析是SystemUI挂了,通过logcat发现SystemUI一直在ANR报错
导出ANR trace分析:
Cmd line: com.android.systemui"main" prio=5 tid=1 Native| group="main" sCount=1 ucsCount=0 flags=1 obj=0x7273ffa0 self=0xb4000075bb139be0| sysTid=2634 nice=0 cgrp=foreground sched=0/0 handle=0x777c0f34f8| state=S schedstat=( 418199019 514085264 1167 ) utm=25 stm=16 core=2 HZ=100| stack=0x7fc0793000-0x7fc0795000 stackSize=8188KB| held mutexes=native: #00 pc 00000000000a5b8c /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: 2938f6235116cbc48464ee0f7622625e)native: #01 pc 000000000005c9e0 /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+160) (BuildId: 2938f6235116cbc48464ee0f7622625e)native: #02 pc 000000000005c2a0 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+284) (BuildId: af2c6edea56d998ea36c11c3b4123011)native: #03 pc 000000000005d4fc /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+76) (BuildId: af2c6edea56d998ea36c11c3b4123011)native: #04 pc 000000000005d238 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+224) (BuildId: af2c6edea56d998ea36c11c3b4123011)native: #05 pc 0000000000054a44 /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+192) (BuildId: af2c6edea56d998ea36c11c3b4123011)native: #06 pc 0000000000175abc /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+156) (BuildId: 9e147c02bd5e272a2066aa88f77eee99)at android.os.BinderProxy.transactNative(Native method)at android.os.BinderProxy.transact(BinderProxy.java:584)at android.hardware.ICameraService$Stub$Proxy.getConcurrentCameraIds(ICameraService.java:684)at android.hardware.camera2.CameraManager$CameraManagerGlobal.connectCameraServiceLocked(CameraManager.java:1639)at android.hardware.camera2.CameraManager$CameraManagerGlobal.getCameraIdList(CameraManager.java:1843)- locked <0x0cc4426a> (a java.lang.Object)at android.hardware.camera2.CameraManager.getCameraIdList(CameraManager.java:220)at com.android.systemui.statusbar.policy.FlashlightControllerImpl.getCameraId(FlashlightControllerImpl.java:160)at com.android.systemui.statusbar.policy.FlashlightControllerImpl.tryInitCamera(FlashlightControllerImpl.java:86)at com.android.systemui.statusbar.policy.FlashlightControllerImpl.<init>(FlashlightControllerImpl.java:81)
talkWithDriver是IPC通信,此时处在等待binder对端的返回,寻找通信对端,可以通过binderinfo也可以搜索下getConcurrentCameraIds方法
Cmd line: /system/bin/cameraserver"binder:1403_5" sysTid=3306#00 pc 000000000004ea70 /apex/com.android.runtime/lib64/bionic/libc.so (syscall+32) (BuildId: 2938f6235116cbc48464ee0f7622625e)#01 pc 0000000000053458 /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+148) (BuildId: 2938f6235116cbc48464ee0f7622625e)#02 pc 00000000000bb8f4 /apex/com.android.runtime/lib64/bionic/libc.so (NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+352) (BuildId: 2938f6235116cbc48464ee0f7622625e)#03 pc 00000000000bb5f8 /apex/com.android.runtime/lib64/bionic/libc.so (pthread_mutex_lock+224) (BuildId: 2938f6235116cbc48464ee0f7622625e)#04 pc 0000000000098eb0 /system/lib64/libc++.so (std::__1::mutex::lock()+12) (BuildId: 1f426797e505c9b841f55cc49d32b3f4)#05 pc 0000000000133c50 /system/lib64/libcameraservice.so (android::CameraProviderManager::getConcurrentCameraIds() const+76) (BuildId: f1e3fe60ec02e1fc7ca587b5276b6158)#06 pc 00000000000dbb6c /system/lib64/libcameraservice.so (android::CameraService::getConcurrentCameraIds(std::__1::vector<android::hardware::camera2::utils::ConcurrentCameraIdCombination, std::__1::allocator<android::hardware::camera2::utils::ConcurrentCameraIdCombination> >*)+128) (BuildId: f1e3fe60ec02e1fc7ca587b5276b6158)#07 pc 0000000000035ba0 /system/lib64/libcamera_client.so (android::hardware::BnCameraService::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+2560) (BuildId: 5663b119c402c14301307f0c748f69b2)#08 pc 00000000000df258 /system/lib64/libcameraservice.so (android::CameraService::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+436) (BuildId: f1e3fe60ec02e1fc7ca587b5276b6158)#09 pc 0000000000051a1c /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+240) (BuildId: af2c6edea56d998ea36c11c3b4123011)#10 pc 000000000005caf4 /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+1036) (BuildId: af2c6edea56d998ea36c11c3b4123011)#11 pc 000000000005c61c /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+164) (BuildId: af2c6edea56d998ea36c11c3b4123011)#12 pc 000000000005cee0 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+72) (BuildId: af2c6edea56d998ea36c11c3b4123011)#13 pc 000000000008cde8 /system/lib64/libbinder.so (android::PoolThread::threadLoop()+28) (BuildId: af2c6edea56d998ea36c11c3b4123011)#14 pc 0000000000013414 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+424) (BuildId: ff20dc59f94e7e2a3984a431c354f4cd)#15 pc 00000000000ba598 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 2938f6235116cbc48464ee0f7622625e)#16 pc 0000000000053f3c /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 2938f6235116cbc48464ee0f7622625e)
这个getConcurrentCameraIds方法在cameraserver有出现,并且结合调用堆栈这边也确实是符合binder对端的调用过程,那这个cameraserver的binder:1403_5在做什么呢?,为啥没返回?看下这个线程状态,可以看到这个调用mutex::lock,然后进入pthread_mutex_lock是一个等锁的状态,我们看源码:
qssi13/frameworks/av/services/camera/libcameraservice/common/CameraProviderManager.cpp
CameraProviderManager::getConcurrentCameraIds() const {std::vector<std::unordered_set<std::string>> deviceIdCombinations;std::lock_guard<std::mutex> lock(mInterfaceMutex);for (auto &provider : mProviders) {for (auto &combinations : provider->getConcurrentCameraIdCombinations()) {deviceIdCombinations.push_back(combinations);}}return deviceIdCombinations;}
这里面有等mInterfaceMutex这个对象锁的操作,那肯定是有另一个线程再持有这个对象锁,在这个文件里搜索下lock(mInterfaceMutex),发现好多个,线程持锁一般是同一个进程中其他线程需要做同步控制才会进行持锁,
此时回头看看这个cameraserver中的其他线程都是什么状态,有没有也是挂起的状态的
Cmd line: /system/bin/cameraserversysTid=1403 binder_thread_read
sysTid=1641 binder_thread_read
sysTid=1642 futex_wait_queue_me
sysTid=2033 binder_thread_read
sysTid=2039 binder_thread_read
sysTid=2500 binder_thread_read
sysTid=2501 binder_thread_read
sysTid=2502 binder_thread_read
sysTid=2949 binder_thread_read
sysTid=3306 futex_wait_queue_me
3306就是我们当前的线程,而这个1642也是等待状态futex_wait_queue_me
来看1642的堆栈:
"HwBinder:1403_2" sysTid=1642#00 pc 000000000004ea70 /apex/com.android.runtime/lib64/bionic/libc.so (syscall+32) (BuildId: 2938f6235116cbc48464ee0f7622625e)#01 pc 0000000000053458 /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+148) (BuildId: 2938f6235116cbc48464ee0f7622625e)#02 pc 00000000000b9934 /apex/com.android.runtime/lib64/bionic/libc.so (pthread_cond_timedwait+140) (BuildId: 2938f6235116cbc48464ee0f7622625e)#03 pc 000000000005831c /system/lib64/libc++.so (std::__1::condition_variable::__do_timed_wait(std::__1::unique_lock<std::__1::mutex>&, std::__1::chrono::time_point<std::__1::chrono::system_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)+112) (BuildId: 1f426797e505c9b841f55cc49d32b3f4)#04 pc 000000000004d2cc /system/lib64/libhidlbase.so (android::hardware::details::Waiter::wait(bool)+164) (BuildId: 9de09f701ac5095168fdbd1789aa3da1)#05 pc 000000000004dcfc /system/lib64/libhidlbase.so (android::hardware::details::getRawServiceInternal(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool)+1312) (BuildId: 9de09f701ac5095168fdbd1789aa3da1)#06 pc 000000000006e920 /system/lib64/android.hardware.media.c2@1.0.so (android::sp<android::hardware::media::c2::V1_0::IComponentStore> android::hardware::details::getServiceInternal<android::hardware::media::c2::V1_0::BpHwComponentStore, android::hardware::media::c2::V1_0::IComponentStore, void, void>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool)+252) (BuildId: f03dfef35aa37b5c0f9672376f0d53fb)#07 pc 000000000001b1c0 /system/lib64/libcodec2_client.so (android::Codec2Client::_CreateFromIndex(unsigned long) (.cfi)+120) (BuildId: cdf1945c61d262e5bc8a61423c6c6f34)#08 pc 000000000001a7d8 /system/lib64/libcodec2_client.so (android::Codec2Client::Cache::getClient()+76) (BuildId: cdf1945c61d262e5bc8a61423c6c6f34)#09 pc 000000000001a598 /system/lib64/libcodec2_client.so (android::Codec2Client::Cache::getTraits()::'lambda'()::operator()() const+112) (BuildId: cdf1945c61d262e5bc8a61423c6c6f34)...#26 pc 000000000012a5a4 /system/lib64/libcameraservice.so (android::CameraProviderManager::ProviderInfo::initializeProviderInfoCommon(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)+244) (BuildId: f1e3fe60ec02e1fc7ca587b5276b6158)#27 pc 00000000001372e8 /system/lib64/libcameraservice.so (android::HidlProviderInfo::initializeHidlProvider(android::sp<android::hardware::camera::provider::V2_4::ICameraProvider>&, long)+3372) (BuildId: f1e3fe60ec02e1fc7ca587b5276b6158)#28 pc 000000000011dcd4 /system/lib64/libcameraservice.so (android::CameraProviderManager::tryToInitializeHidlProviderLocked(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, android::sp<android::CameraProviderManager::ProviderInfo> const&)+256) (BuildId: f1e3fe60ec02e1fc7ca587b5276b6158)#29 pc 000000000011d7ac /system/lib64/libcameraservice.so (android::CameraProviderManager::addHidlProviderLocked(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool)+1048) (BuildId: f1e3fe60ec02e1fc7ca587b5276b6158)#30 pc 000000000011c0d4 /system/lib64/libcameraservice.so (android::CameraProviderManager::onRegistration(android::hardware::hidl_string const&, android::hardware::hidl_string const&, bool)+128) (BuildId: f1e3fe60ec02e1fc7ca587b5276b6158)#31 pc 000000000006b694 /system/lib64/libhidlbase.so (android::hidl::manager::V1_0::BnHwServiceNotification::_hidl_onRegistration(android::hidl::base::V1_0::BnHwBase*, android::hardware::Parcel const&, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel&)>)+280) (BuildId: 9de09f701ac5095168fdbd1789aa3da1)#32 pc 000000000006b8c0 /system/lib64/libhidlbase.so (android::hidl::manager::V1_0::BnHwServiceNotification::onTransact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+192) (BuildId: 9de09f701ac5095168fdbd1789aa3da1)
对照堆栈看源码发现在onRegistration方法中有持锁的操作
hardware::Return<void> CameraProviderManager::onRegistration(const hardware::hidl_string& /*fqName*/,const hardware::hidl_string& name,bool preexisting) {status_t res = OK;std::lock_guard<std::mutex> providerLock(mProviderLifecycleLock);{std::lock_guard<std::mutex> lock(mInterfaceMutex);res = addHidlProviderLocked(name, preexisting);}}
继续深入addHidlProviderLocked方法往下看堆栈,最终代码走到了android.hardware.media.c2@1.0.so的getServiceInternal方法,后面也是在wait,因为这个libhidlbase.so是没有源码的,我们结合堆栈推测下,这里应该是在获取底层的media服务
结合log:
12-02 11:37:22.905 4611 6822 W HidlServiceManagement: Waited one second for android.hardware.media.c2@1.0::IComponentStore/default
12-02 11:37:22.905 608 608 I hwservicemanager: Since android.hardware.media.c2@1.0::IComponentStore/default is not registered, trying to start it as a lazy HAL.
12-02 11:37:22.906 4611 6822 I HidlServiceManagement: getService: Trying again for android.hardware.media.c2@1.0::IComponentStore/default...
12-02 11:37:22.907 608 7047 W libc : Unable to set property "ctl.interface_start" to "android.hardware.media.c2@1.0::IComponentStore/default": error code: 0x20
12-02 11:37:22.907 608 7047 I hwservicemanager: Tried to start android.hardware.media.c2@1.0::IComponentStore/default as a lazy service, but was unable to. Usually this happens when a service is not installed, but if the service is intended to be used as a lazy service, then it may be configured incorrectly.
确实是底层服务起不来,这里对应的底层服务是mediaserver
12-02 11:36:14.831 1992 1992 F DEBUG : Timestamp: 1911-10-27 04:31:22.712130414-0636
12-02 11:36:14.831 1992 1992 F DEBUG : Process uptime: 4s
12-02 11:36:14.831 1992 1992 F DEBUG : Cmdline: /system/bin/mediaserver
12-02 11:36:14.831 1992 1992 F DEBUG : pid: 1423, tid: 1766, name: binder:1423_1 >>> /system/bin/mediaserver <<<
12-02 11:36:14.831 1992 1992 F DEBUG : uid: 1013
12-02 11:36:14.831 1992 1992 F DEBUG : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
12-02 11:36:14.832 1992 1992 F DEBUG : Abort message: 'terminating with uncaught exception of type NSt3__112system_errorE: condition_variable timed_wait failed: Invalid argument'
12-02 11:36:14.832 1992 1992 F DEBUG : r0 00000000 r1 000006e6 r2 00000006 r3 e8bbf758
12-02 11:36:14.832 1992 1992 F DEBUG : r4 e8bbf768 r5 e8bbf750 r6 0000058f r7 0000016b
12-02 11:36:14.832 1992 1992 F DEBUG : r8 00000000 r9 ffffffff r10 e8bbf758 r11 ed388fd0
12-02 11:36:14.832 1992 1992 F DEBUG : ip 000006e6 sp e8bbf738 lr ec1fbb97 pc ec1fbbaa
12-02 11:36:14.832 1992 1992 F DEBUG : backtrace:
12-02 11:36:14.832 1992 1992 F DEBUG : #00 pc 00039baa /apex/com.android.runtime/lib/bionic/libc.so (abort+138) (BuildId: 3d675773e58a959e4952cc3f328513e6)
12-02 11:36:14.832 1992 1992 F DEBUG : #01 pc 000336e1 /system/lib/libc++.so (abort_message+92) (BuildId: addaf40eb9e51c378faed2c17f92f150)
12-02 11:36:14.832 1992 1992 F DEBUG : #02 pc 00033837 /system/lib/libc++.so (demangling_terminate_handler()+138) (BuildId: addaf40eb9e51c378faed2c17f92f150)
12-02 11:36:14.832 1992 1992 F DEBUG : #03 pc 000341db /system/lib/libc++.so (std::__terminate(void (*)())+2) (BuildId: addaf40eb9e51c378faed2c17f92f150)
12-02 11:36:14.832 1992 1992 F DEBUG : #04 pc 00034193 /system/lib/libc++.so (std::terminate()+46) (BuildId: addaf40eb9e51c378faed2c17f92f150)
12-02 11:36:14.832 1992 1992 F DEBUG : #05 pc 00034151 /system/lib/libc++.so (__clang_call_terminate+4) (BuildId: addaf40eb9e51c378faed2c17f92f150)
这里我们能发现两个时间明显不对12-02 11:36:14.831和1911-10-27 04:31:22
并且报错信息condition_variable timed_wait failed: Invalid argument,表示timed_wait这个参数无效
这其实才是2038年问题最重要的影响,因为Android系统是一个很庞大的系统,虽然现如今64位的机器从内核层面上已经没有2038问题,但系统中所使用的一些关键的底层库还是有上古一脉相承下来的代码,很多地方会有一些限制,比如这里的mediaserver
好在linux社区已经在着手解决2038问题了,还有15年时间,相信到时候会有一个妥善的解决办法的