I've been reading the other posts on tracking down the reasons for getting a SIGSEGV
in an Android app. I plan to scour my app for possible NullPointers related to Canvas use, but my SIGSEGV
barfs up a different memory address each time. Plus I've seen code=1
and code=2
. If the memory address was 0x00000000
, I'd have a clue it is a NullPointer.
The last one I got was a code=2
:
A/libc(4969): Fatal signal 11 (SIGSEGV) at 0x42a637d9 (code=2)
Any suggestions on how to track this down?
I have a suspect, but I'm not keen on experimenting with it yet. My app uses the OSMDroid API for offline mapping. The OverlayItem class represents markers/nodes on the map. I have a Service that collects data via the network to populate the OverlayItem which are then displayed on the map. In an effort to simplify my design, I extended OverlayItem into my own NodeOverlayItem class, which includes some addition attributes I use in the UI Activity and in the Service. This gave me a single point of Item information for the UI and Service. I used Intents to broadcast to the Activity to refresh the UI map when something changed. The Activity binds to the Service and there's a Service method to get the list of NodeOverlayItem's. I think it might be the OSMDroid API's use of OverlayItem, and my Service updating node information at the same time. (a concurrency issue)
As I write this I think that's really the problem. The headache isn't splitting out the Node and OverlayItem from NodeOverlayItem, it's that the Activity will need some data from the Node, that the Service holds. Plus when the Activity is created (onResume, etc...) the OverlayItem objects will need to be re-created from the Node data that the Service has been maintaining while the Activity was away. e.g. You start the app, the Service collects data, the UI displays it, you go to Home, then back to the app, the Activity will need to pull and re-create the OverlayItem's from the latest Service node data.
I know this isn't a great or clear questions. It's like all my SO questions are niche or obscure. If anyone has a suggestion on how to interpret those SIGSEGV
errors, it would be greatly appreciated!
UPDATE Here's the latest crash captured during a debug session. I have 3 of these devices being used for testing and they don't all crash reliably when I'm developing and testing. I included a bit extra just so the GC logging could be noted. You can see the problem is probably not related to memory exhaustion.
03-03 02:02:38.328: I/CommService(7477): Received packet from: 192.168.1.102
03-03 02:02:38.328: I/CommService(7477): Already processed this packet. It's a re-broadcast from another node, or from myself. It's not a repeat broadcast though.
03-03 02:02:38.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:38.460: D/CommService(7477): Monitoring nodes...
03-03 02:02:38.515: D/dalvikvm(7477): GC_CONCURRENT freed 2050K, 16% free 17151K/20359K, paused 3ms+6ms
03-03 02:02:38.515: I/CommService(7477): Received packet from: 192.168.1.102
03-03 02:02:38.515: D/CommService(7477): Forwarding packet (4f68802cf10684a83ac4936ebb3c934d) along to other nodes.
03-03 02:02:38.609: I/CommService(7477): Received packet from: 192.168.1.100
03-03 02:02:38.609: D/CommService(7477): Forwarding packet (e4bc81e91ec92d06f83e03068f52ab4) along to other nodes.
03-03 02:02:38.609: D/CommService(7477): Already processed this packet: 4204a5b27745ffe5e4f8458e227044bf
03-03 02:02:38.609: A/libc(7477): Fatal signal 11 (SIGSEGV) at 0x68f52abc (code=1)
03-03 02:02:38.914: I/DEBUG(4008): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
03-03 02:02:38.914: I/DEBUG(4008): Build fingerprint: 'Lenovo/IdeaTab_A1107/A1107:4.0.4/MR1/eng.user.20120719.150703:user/release-keys'
03-03 02:02:38.914: I/DEBUG(4008): pid: 7477, tid: 7712 >>> com.test.testm <<<
03-03 02:02:38.914: I/DEBUG(4008): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 68f52abc
03-03 02:02:38.914: I/DEBUG(4008): r0 68f52ab4 r1 412ef268 r2 4d9c3bf4 r3 412ef268
03-03 02:02:38.914: I/DEBUG(4008): r4 001ad8f8 r5 4d9c3bf4 r6 412ef268 r7 4c479df8
03-03 02:02:38.914: I/DEBUG(4008): r8 4d9c3c0c r9 4c479dec 10 46cf260a fp 4d9c3c24
03-03 02:02:38.914: I/DEBUG(4008): ip 40262a04 sp 4d9c3bc8 lr 402d01dd pc 402d0182 cpsr 00000030
03-03 02:02:38.914: I/DEBUG(4008): d0 00000001000c0102 d1 3a22364574614c7d
03-03 02:02:38.914: I/DEBUG(4008): d2 403fc0000000007d d3 363737343433350a
03-03 02:02:38.914: I/DEBUG(4008): d4 49544341223a2273 d5 6f6567222c224556
03-03 02:02:38.914: I/DEBUG(4008): d6 3a223645676e6f4c d7 000000013835372d
03-03 02:02:38.914: I/DEBUG(4008): d8 0000000000000000 d9 4040000000000000
03-03 02:02:38.914: I/DEBUG(4008): d10 0000000000000000 d11 4040000000000000
03-03 02:02:38.914: I/DEBUG(4008): d12 4040000000000000 d13 0000000000000021
03-03 02:02:38.914: I/DEBUG(4008): d14 0000000000000000 d15 0000000000000000
03-03 02:02:38.914: I/DEBUG(4008): d16 3fe62e42fefa39ef d17 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008): d18 3fe62e42fee00000 d19 0000000000000000
03-03 02:02:38.914: I/DEBUG(4008): d20 0000000000000000 d21 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008): d22 4028000000000000 d23 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008): d24 0000000000000000 d25 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008): d26 0000000000000000 d27 c028000000000000
03-03 02:02:38.914: I/DEBUG(4008): d28 0000000000000000 d29 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008): d30 3ff0000000000000 d31 3fecccccb5c28f6e
03-03 02:02:38.914: I/DEBUG(4008): scr 60000013
03-03 02:02:39.046: I/DEBUG(4008): #00 pc 0006b182 /system/lib/libcrypto.so (EVP_DigestFinal_ex)
03-03 02:02:39.046: I/DEBUG(4008): #01 pc 0006b1d8 /system/lib/libcrypto.so (EVP_DigestFinal)
03-03 02:02:39.054: I/DEBUG(4008): #02 pc 0001f814 /system/lib/libnativehelper.so
03-03 02:02:39.054: I/DEBUG(4008): #03 pc 0001ec30 /system/lib/libdvm.so (dvmPlatformInvoke)
03-03 02:02:39.054: I/DEBUG(4008): #04 pc 00058c70 /system/lib/libdvm.so (_Z16dvmCallJNIMethodPKjP6JValuePK6MethodP6Thread)
03-03 02:02:39.054: I/DEBUG(4008): code around pc:
03-03 02:02:39.054: I/DEBUG(4008): 402d0160 0003151e 4604b570 f7ff460d 4620ff81 ....p..F.F.... F
03-03 02:02:39.054: I/DEBUG(4008): 402d0170 f7ff4629 bd70ff93 4604b570 460e6800 )F....p.p..F.h.F
03-03 02:02:39.054: I/DEBUG(4008): 402d0180 68834615 dd062b40 21fa4810 44784a10 .F.h@+...H.!.JxD
03-03 02:02:39.054: I/DEBUG(4008): 402d0190 f7c8447a 6821f80f 698a4620 47904631 zD....!h F.i1F.G
03-03 02:02:39.054: I/DEBUG(4008): 402d01a0 b1154606 68836820 6822602b b12b6a13 .F.. h.h+`"h.j+.
03-03 02:02:39.054: I/DEBUG(4008): code around lr:
03-03 02:02:39.054: I/DEBUG(4008): 402d01bc 68e06821 21006c4a ea0af7c4 bd704630 !h.hJl.!....0Fp.
03-03 02:02:39.054: I/DEBUG(4008): 402d01cc 00031492 000314b5 4604b570 ffcef7ff ........p..F....
03-03 02:02:39.054: I/DEBUG(4008): 402d01dc 46204605 ff12f7ff bd704628 4604b573 .F F....(Fp.s..F
03-03 02:02:39.054: I/DEBUG(4008): 402d01ec 2102460d fb36f002 42ab6823 b123d020 .F.!..6.#h.B .#.
03-03 02:02:39.054: I/DEBUG(4008): 402d01fc b1136c5b f7c868e0 68a0fccf 05c26025 [l...h.....h%`..
03-03 02:02:39.054: I/DEBUG(4008): memory map around addr 68f52abc:
03-03 02:02:39.054: I/DEBUG(4008): 4d8c5000-4d9c4000
03-03 02:02:39.054: I/DEBUG(4008): (no map for address)
03-03 02:02:39.054: I/DEBUG(4008): b0001000-b0009000 /system/bin/linker
03-03 02:02:39.054: I/DEBUG(4008): stack:
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3b88 408d1f90 /system/lib/libdvm.so
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3b8c 412ef258 /dev/ashmem/dalvik-heap (deleted)
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3b90 00000001
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3b94 408d6c58 /system/lib/libdvm.so
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3b98 408d6fa8 /system/lib/libdvm.so
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3b9c 4c479dec
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3ba0 46cf260a /system/framework/core.odex
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3ba4 408735e7 /system/lib/libdvm.so
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3ba8 412ef258 /dev/ashmem/dalvik-heap (deleted)
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bac 002bf070 [heap]
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bb0 412ef258 /dev/ashmem/dalvik-heap (deleted)
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bb4 00000000
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bb8 412ef268 /dev/ashmem/dalvik-heap (deleted)
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bbc 00000000
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bc0 df0027ad
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bc4 00000000
03-03 02:02:39.054: I/DEBUG(4008): #00 4d9c3bc8 001ad8f8 [heap]
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bcc 002ae0b8 [heap]
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bd0 00000004
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bd4 402d01dd /system/lib/libcrypto.so
03-03 02:02:39.054: I/DEBUG(4008): #01 4d9c3bd8 001ad8f8 [heap]
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3bdc 002ae0b8 [heap]
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3be0 00000004
03-03 02:02:39.054: I/DEBUG(4008): 4d9c3be4 4024e817 /system/lib/libnativehelper.so
03-03 02:02:39.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:39.500: D/CommService(7477): Monitoring nodes...
03-03 02:02:39.500: D/dalvikvm(7477): GC_FOR_ALLOC freed 2073K, 16% free 17118K/20359K, paused 51ms
03-03 02:02:39.632: D/dalvikvm(7477): GC_CONCURRENT freed 1998K, 16% free 17162K/20359K, paused 2ms+4ms
03-03 02:02:40.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:40.445: D/CommService(7477): Monitoring nodes...
03-03 02:02:40.562: D/dalvikvm(7477): GC_CONCURRENT freed 2045K, 16% free 17158K/20359K, paused 3ms+4ms
03-03 02:02:41.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:41.445: D/CommService(7477): Monitoring nodes...
03-03 02:02:41.531: D/dalvikvm(7477): GC_CONCURRENT freed 2045K, 16% free 17154K/20359K, paused 3ms+12ms
03-03 02:02:42.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:42.445: D/CommService(7477): Monitoring nodes...
03-03 02:02:42.507: D/dalvikvm(7477): GC_CONCURRENT freed 2068K, 16% free 17128K/20359K, paused 3ms+4ms
03-03 02:02:42.679: D/dalvikvm(7477): GC_CONCURRENT freed 2006K, 16% free 17161K/20359K, paused 2ms+12ms
03-03 02:02:43.140: I/BootReceiver(1236): Copying /data/tombstones/tombstone_05 to DropBox (SYSTEM_TOMBSTONE)
03-03 02:02:43.210: D/dalvikvm(1236): GC_FOR_ALLOC freed 912K, 17% free 10207K/12295K, paused 62ms
03-03 02:02:43.265: D/dalvikvm(1236): GC_FOR_ALLOC freed 243K, 16% free 10374K/12295K, paused 49ms
03-03 02:02:43.265: I/dalvikvm-heap(1236): Grow heap (frag case) to 10.507MB for 196628-byte allocation
I found the problem. I don't think this will help a lot of others trying to track down their personal SIGSEGV, but mine (and it was very hard) was entirely related to this:
https://code.google.com/p/android/issues/detail?id=8709
The libcrypto.so in my dump kind of clued me in. I do a MD5 hash of packet data when trying to determine if I've already seen the packet, and skipping it if I had. I thought at one point this was an ugly threading issue related to tracking those hashes, but it turned out it was the java.security.MessageDigest class! It's not thread safe!
I swapped it out with a UID I was stuffing in every packet based on the device UUID and a timestamp. No problems since.
I guess the lesson I can impart to those that were in my situation is, even if you're a 100% Java application, pay attention to the native library and symbol noted in the crash dump for clues. Googling for SIGSEGV + the lib .so name will go a lot farther than the useless code=1, etc... Next think about where your Java app could touch native code, even if it's nothing you're doing. I made the mistake of assuming it was a Service + UI threading issue where the Canvas was drawing something that was null, (the most common case I Googled on SIGSEGV) and ignored the possibility it could have been completely related to code I wrote that was related to the lib .so in the crash dump. Naturally java.security would use a native component in libcrypto.so for speed, so once I clued in, I Googled for Android + SIGSEGV + libcrypto.so and found the documented issue.