androidandroid-ndkandroid-servicesegmentation-fault

Android Fatal signal 11 (SIGSEGV) at 0x636f7d89 (code=1). How can it be tracked down?


I've been reading the other posts on tracking down the reasons for getting a SIGSEGV in an Android app. I plan to scour my app for possible NullPointers related to Canvas use, but my SIGSEGV barfs up a different memory address each time. Plus I've seen code=1 and code=2. If the memory address was 0x00000000, I'd have a clue it is a NullPointer.

The last one I got was a code=2:

A/libc(4969): Fatal signal 11 (SIGSEGV) at 0x42a637d9 (code=2)

Any suggestions on how to track this down?

I have a suspect, but I'm not keen on experimenting with it yet. My app uses the OSMDroid API for offline mapping. The OverlayItem class represents markers/nodes on the map. I have a Service that collects data via the network to populate the OverlayItem which are then displayed on the map. In an effort to simplify my design, I extended OverlayItem into my own NodeOverlayItem class, which includes some addition attributes I use in the UI Activity and in the Service. This gave me a single point of Item information for the UI and Service. I used Intents to broadcast to the Activity to refresh the UI map when something changed. The Activity binds to the Service and there's a Service method to get the list of NodeOverlayItem's. I think it might be the OSMDroid API's use of OverlayItem, and my Service updating node information at the same time. (a concurrency issue)

As I write this I think that's really the problem. The headache isn't splitting out the Node and OverlayItem from NodeOverlayItem, it's that the Activity will need some data from the Node, that the Service holds. Plus when the Activity is created (onResume, etc...) the OverlayItem objects will need to be re-created from the Node data that the Service has been maintaining while the Activity was away. e.g. You start the app, the Service collects data, the UI displays it, you go to Home, then back to the app, the Activity will need to pull and re-create the OverlayItem's from the latest Service node data.

I know this isn't a great or clear questions. It's like all my SO questions are niche or obscure. If anyone has a suggestion on how to interpret those SIGSEGV errors, it would be greatly appreciated!

UPDATE Here's the latest crash captured during a debug session. I have 3 of these devices being used for testing and they don't all crash reliably when I'm developing and testing. I included a bit extra just so the GC logging could be noted. You can see the problem is probably not related to memory exhaustion.

03-03 02:02:38.328: I/CommService(7477): Received packet from: 192.168.1.102
03-03 02:02:38.328: I/CommService(7477): Already processed this packet. It's a re-broadcast from another node, or from myself. It's not a repeat broadcast though.
03-03 02:02:38.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:38.460: D/CommService(7477): Monitoring nodes...
03-03 02:02:38.515: D/dalvikvm(7477): GC_CONCURRENT freed 2050K, 16% free 17151K/20359K, paused 3ms+6ms
03-03 02:02:38.515: I/CommService(7477): Received packet from: 192.168.1.102
03-03 02:02:38.515: D/CommService(7477): Forwarding packet (4f68802cf10684a83ac4936ebb3c934d) along to other nodes.
03-03 02:02:38.609: I/CommService(7477): Received packet from: 192.168.1.100
03-03 02:02:38.609: D/CommService(7477): Forwarding packet (e4bc81e91ec92d06f83e03068f52ab4) along to other nodes.
03-03 02:02:38.609: D/CommService(7477): Already processed this packet: 4204a5b27745ffe5e4f8458e227044bf
03-03 02:02:38.609: A/libc(7477): Fatal signal 11 (SIGSEGV) at 0x68f52abc (code=1)
03-03 02:02:38.914: I/DEBUG(4008): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
03-03 02:02:38.914: I/DEBUG(4008): Build fingerprint: 'Lenovo/IdeaTab_A1107/A1107:4.0.4/MR1/eng.user.20120719.150703:user/release-keys'
03-03 02:02:38.914: I/DEBUG(4008): pid: 7477, tid: 7712  >>> com.test.testm <<<
03-03 02:02:38.914: I/DEBUG(4008): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 68f52abc
03-03 02:02:38.914: I/DEBUG(4008):  r0 68f52ab4  r1 412ef268  r2 4d9c3bf4  r3 412ef268
03-03 02:02:38.914: I/DEBUG(4008):  r4 001ad8f8  r5 4d9c3bf4  r6 412ef268  r7 4c479df8
03-03 02:02:38.914: I/DEBUG(4008):  r8 4d9c3c0c  r9 4c479dec  10 46cf260a  fp 4d9c3c24
03-03 02:02:38.914: I/DEBUG(4008):  ip 40262a04  sp 4d9c3bc8  lr 402d01dd  pc 402d0182  cpsr 00000030
03-03 02:02:38.914: I/DEBUG(4008):  d0  00000001000c0102  d1  3a22364574614c7d
03-03 02:02:38.914: I/DEBUG(4008):  d2  403fc0000000007d  d3  363737343433350a
03-03 02:02:38.914: I/DEBUG(4008):  d4  49544341223a2273  d5  6f6567222c224556
03-03 02:02:38.914: I/DEBUG(4008):  d6  3a223645676e6f4c  d7  000000013835372d
03-03 02:02:38.914: I/DEBUG(4008):  d8  0000000000000000  d9  4040000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d10 0000000000000000  d11 4040000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d12 4040000000000000  d13 0000000000000021
03-03 02:02:38.914: I/DEBUG(4008):  d14 0000000000000000  d15 0000000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d16 3fe62e42fefa39ef  d17 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d18 3fe62e42fee00000  d19 0000000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d20 0000000000000000  d21 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d22 4028000000000000  d23 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d24 0000000000000000  d25 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d26 0000000000000000  d27 c028000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d28 0000000000000000  d29 3ff0000000000000
03-03 02:02:38.914: I/DEBUG(4008):  d30 3ff0000000000000  d31 3fecccccb5c28f6e
03-03 02:02:38.914: I/DEBUG(4008):  scr 60000013
03-03 02:02:39.046: I/DEBUG(4008):          #00  pc 0006b182  /system/lib/libcrypto.so (EVP_DigestFinal_ex)
03-03 02:02:39.046: I/DEBUG(4008):          #01  pc 0006b1d8  /system/lib/libcrypto.so (EVP_DigestFinal)
03-03 02:02:39.054: I/DEBUG(4008):          #02  pc 0001f814  /system/lib/libnativehelper.so
03-03 02:02:39.054: I/DEBUG(4008):          #03  pc 0001ec30  /system/lib/libdvm.so (dvmPlatformInvoke)
03-03 02:02:39.054: I/DEBUG(4008):          #04  pc 00058c70  /system/lib/libdvm.so (_Z16dvmCallJNIMethodPKjP6JValuePK6MethodP6Thread)
03-03 02:02:39.054: I/DEBUG(4008): code around pc:
03-03 02:02:39.054: I/DEBUG(4008): 402d0160 0003151e 4604b570 f7ff460d 4620ff81  ....p..F.F.... F
03-03 02:02:39.054: I/DEBUG(4008): 402d0170 f7ff4629 bd70ff93 4604b570 460e6800  )F....p.p..F.h.F
03-03 02:02:39.054: I/DEBUG(4008): 402d0180 68834615 dd062b40 21fa4810 44784a10  .F.h@+...H.!.JxD
03-03 02:02:39.054: I/DEBUG(4008): 402d0190 f7c8447a 6821f80f 698a4620 47904631  zD....!h F.i1F.G
03-03 02:02:39.054: I/DEBUG(4008): 402d01a0 b1154606 68836820 6822602b b12b6a13  .F.. h.h+`"h.j+.
03-03 02:02:39.054: I/DEBUG(4008): code around lr:
03-03 02:02:39.054: I/DEBUG(4008): 402d01bc 68e06821 21006c4a ea0af7c4 bd704630  !h.hJl.!....0Fp.
03-03 02:02:39.054: I/DEBUG(4008): 402d01cc 00031492 000314b5 4604b570 ffcef7ff  ........p..F....
03-03 02:02:39.054: I/DEBUG(4008): 402d01dc 46204605 ff12f7ff bd704628 4604b573  .F F....(Fp.s..F
03-03 02:02:39.054: I/DEBUG(4008): 402d01ec 2102460d fb36f002 42ab6823 b123d020  .F.!..6.#h.B .#.
03-03 02:02:39.054: I/DEBUG(4008): 402d01fc b1136c5b f7c868e0 68a0fccf 05c26025  [l...h.....h%`..
03-03 02:02:39.054: I/DEBUG(4008): memory map around addr 68f52abc:
03-03 02:02:39.054: I/DEBUG(4008): 4d8c5000-4d9c4000 
03-03 02:02:39.054: I/DEBUG(4008): (no map for address)
03-03 02:02:39.054: I/DEBUG(4008): b0001000-b0009000 /system/bin/linker
03-03 02:02:39.054: I/DEBUG(4008): stack:
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3b88  408d1f90  /system/lib/libdvm.so
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3b8c  412ef258  /dev/ashmem/dalvik-heap (deleted)
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3b90  00000001  
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3b94  408d6c58  /system/lib/libdvm.so
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3b98  408d6fa8  /system/lib/libdvm.so
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3b9c  4c479dec  
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3ba0  46cf260a  /system/framework/core.odex
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3ba4  408735e7  /system/lib/libdvm.so
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3ba8  412ef258  /dev/ashmem/dalvik-heap (deleted)
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bac  002bf070  [heap]
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bb0  412ef258  /dev/ashmem/dalvik-heap (deleted)
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bb4  00000000  
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bb8  412ef268  /dev/ashmem/dalvik-heap (deleted)
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bbc  00000000  
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bc0  df0027ad  
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bc4  00000000  
03-03 02:02:39.054: I/DEBUG(4008): #00 4d9c3bc8  001ad8f8  [heap]
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bcc  002ae0b8  [heap]
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bd0  00000004  
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bd4  402d01dd  /system/lib/libcrypto.so
03-03 02:02:39.054: I/DEBUG(4008): #01 4d9c3bd8  001ad8f8  [heap]
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3bdc  002ae0b8  [heap]
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3be0  00000004  
03-03 02:02:39.054: I/DEBUG(4008):     4d9c3be4  4024e817  /system/lib/libnativehelper.so
03-03 02:02:39.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:39.500: D/CommService(7477): Monitoring nodes...
03-03 02:02:39.500: D/dalvikvm(7477): GC_FOR_ALLOC freed 2073K, 16% free 17118K/20359K, paused 51ms
03-03 02:02:39.632: D/dalvikvm(7477): GC_CONCURRENT freed 1998K, 16% free 17162K/20359K, paused 2ms+4ms
03-03 02:02:40.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:40.445: D/CommService(7477): Monitoring nodes...
03-03 02:02:40.562: D/dalvikvm(7477): GC_CONCURRENT freed 2045K, 16% free 17158K/20359K, paused 3ms+4ms
03-03 02:02:41.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:41.445: D/CommService(7477): Monitoring nodes...
03-03 02:02:41.531: D/dalvikvm(7477): GC_CONCURRENT freed 2045K, 16% free 17154K/20359K, paused 3ms+12ms
03-03 02:02:42.406: D/CommService(7477): Checking OLSRd info...
03-03 02:02:42.445: D/CommService(7477): Monitoring nodes...
03-03 02:02:42.507: D/dalvikvm(7477): GC_CONCURRENT freed 2068K, 16% free 17128K/20359K, paused 3ms+4ms
03-03 02:02:42.679: D/dalvikvm(7477): GC_CONCURRENT freed 2006K, 16% free 17161K/20359K, paused 2ms+12ms
03-03 02:02:43.140: I/BootReceiver(1236): Copying /data/tombstones/tombstone_05 to DropBox (SYSTEM_TOMBSTONE)
03-03 02:02:43.210: D/dalvikvm(1236): GC_FOR_ALLOC freed 912K, 17% free 10207K/12295K, paused 62ms
03-03 02:02:43.265: D/dalvikvm(1236): GC_FOR_ALLOC freed 243K, 16% free 10374K/12295K, paused 49ms
03-03 02:02:43.265: I/dalvikvm-heap(1236): Grow heap (frag case) to 10.507MB for 196628-byte allocation

Solution

  • I found the problem. I don't think this will help a lot of others trying to track down their personal SIGSEGV, but mine (and it was very hard) was entirely related to this:

    https://code.google.com/p/android/issues/detail?id=8709

    The libcrypto.so in my dump kind of clued me in. I do a MD5 hash of packet data when trying to determine if I've already seen the packet, and skipping it if I had. I thought at one point this was an ugly threading issue related to tracking those hashes, but it turned out it was the java.security.MessageDigest class! It's not thread safe!

    I swapped it out with a UID I was stuffing in every packet based on the device UUID and a timestamp. No problems since.

    I guess the lesson I can impart to those that were in my situation is, even if you're a 100% Java application, pay attention to the native library and symbol noted in the crash dump for clues. Googling for SIGSEGV + the lib .so name will go a lot farther than the useless code=1, etc... Next think about where your Java app could touch native code, even if it's nothing you're doing. I made the mistake of assuming it was a Service + UI threading issue where the Canvas was drawing something that was null, (the most common case I Googled on SIGSEGV) and ignored the possibility it could have been completely related to code I wrote that was related to the lib .so in the crash dump. Naturally java.security would use a native component in libcrypto.so for speed, so once I clued in, I Googled for Android + SIGSEGV + libcrypto.so and found the documented issue.