android format nfc mifare contactless-smartcard

Reading Mifare Classic returns strange characters

When reading a MIFARE card with Android and converting the data to UTF-8 I get strange characters like �. I'm trying to build an application that can read some kind of ID card we're using. The problem now is that I get weird characters between words and some words are split between blocks so how can I safely get a word I'm looking for? For instance my readings is something like this:

43224��19032019�� at block 2 sektor 2 bindex :8

and with splitting where rest of the number starting with 19 is at a new block:

�me Name��M��19

at block 1 sektor 1 bindex :4

930402��NO934951

at block 2 sektor 1 bindex :4

c5 42 4e 49 44 00 07 4f 4f 4f 4f 4f 4f 00 4b 42   "Åbnid" "OOOOOO" "KB"
44 44 44 20 44 44 44 44 44 00 82 4d 00 c9 31 39   "DDD DDDDD" "M" "19"
39 34 34 33 34 32 00 d0 4e 4f 39 36 36 36 35 31   "944342" "NO966651"
00 00 00 00 00 00 70 f7 88 00 00 00 00 00 00 00
30 32 32 20 20 41 53 00 d3 54 4f 54 41 4c 20 4b   "022" "AS" "Total k"
4f 4e 54 52 4f 4c 4c 20 41 53 20 00 c9 30 32 38   "ONTROLL AS" "028"
37 30 34 33 33 00 c9 32 30 32 31 30 32 31 31 00   "70433" "20210211"
00 00 00 00 00 00 70 f7 88 00 00 00 00 00 00 00

This is how I read from the card:

Tag tagFromIntent = intent.getParcelableExtra(NfcAdapter.EXTRA_TAG);
MifareClassic mfc = MifareClassic.get(tagFromIntent);

Here is my code I use for reading inside a for loop:

 data = mfc.readBlock(bIndex + block);

and then for converting data to UTF8 for printing I use:

   public String convertByteArrayToUTF8(byte[] bytes){
    String encoded = null;
    try {
        encoded = new String(bytes, StandardCharsets.UTF_8);
    }
    catch (Exception e){
        encoded = new String(bytes, Charset.defaultCharset());
    }
    return encoded;
}

I've tried with ASCII, UTF-16 etc with no luck.

Solution

So the data on your tag (excluding the sector trailers looks somewhat like that:

C5 42 4E 49 44 00 07 4F 4F 4F 4F 4F 4F 00 4B 42        ÅBNID..OOOOOO.KB
44 44 44 20 44 44 44 44 44 00 82 4D 00 C9 31 39        DDD DDDDD.‚M.É19
39 34 34 33 34 32 00 D0 4E 4F 39 36 36 36 35 31        944342.ÐNO966651
30 32 32 20 20 41 53 00 D3 54 4F 54 41 4C 20 4B        022  AS.ÓTOTAL K
4F 4E 54 52 4F 4C 4C 20 41 53 20 00 C9 30 32 38        ONTROLL AS .É028
37 30 34 33 33 00 C9 32 30 32 31 30 32 31 31 00        70433.É20210211.

This seems to be some form of structured data. Simply converting the whole binary blob into a UTF-8 (or ASCII) encoded string doesn't make much sense. Instead, you will need to reverse engineer the way that the data is structured (or, even better, you try to obtain the specification from the system manufacturer).

From what I can see, it looks as if that data consisted of multiple null-terminated strings embedded into some compact (Tag)-Length-Value format. The first byte seems to be the tag(?) + length, so we have

C5    Length = 5
    42 4E 49 44 00                                               "BNID"
07    Length = 7
    4F 4F 4F 4F 4F 4F 00                                         "OOOOOO"
4B    Length = 11
    42 44 44 44 20 44 44 44 44 44 00                             "KBDDD DDDDD"
82    Length = 2
    4D 00                                                        "M"
C9    Length = 9
    31 39 39 34 34 33 34 32 00                                   "19944342"
D0    Length = 16
    4E 4F 39 36 36 36 35 31 30 32 32 20 20 41 53 00              "NO966651022  AS"
D3    Length = 19
    54 4F 54 41 4C 20 4B 4F 4E 54 52 4F 4C 4C 20 41 53 20 00     "TOTAL KONTROLL AS "
C9    Length = 9
    30 32 38 37 30 34 33 33 00                                   "02870433"
C9    Length = 9
    32 30 32 31 30 32 31 31 00                                   "20210211"

The first byte could, for instance, be split into tag and length like this: TTTL LLLL (upper 3 bits encode the tag, lower 5 bits encode the length of the following value). This would give the following tags

0x6 for "BNID", "19944342", "NO966651022 AS", "TOTAL KONTROLL AS ", "02870433", and "20210211"
0x0 for "OOOOOO"
0x2 for "KBDDD DDDDD"
0x4 for "M"

Hence, the split between tag and length might also be TTLL LLLL (upper 2 bits encode the tag, lower 6 bits encode the length of the following value).

Unfortunately, the format doesn't resemble any of the popular formats that I'm aware of. So you could just continue your reverse engineering by comparing multiple different cards and by deriving meaning from the values.

So far, in order to decode the above, you would start by reading the first byte, extract the length from that byte, cut that amount of follow-up bytes and convert them into a string (based on the sample that you provided, ASCII encoding should do). You can then continue with the next byte, extract the length information from it, ...