When reading a MIFARE card with Android and converting the data to UTF-8 I get strange characters like �. I'm trying to build an application that can read some kind of ID card we're using. The problem now is that I get weird characters between words and some words are split between blocks so how can I safely get a word I'm looking for? For instance my readings is something like this:
43224���19032019�� at block 2 sektor 2 bindex :8
and with splitting where rest of the number starting with 19 is at a new block:
�me Name���M���19
at block 1 sektor 1 bindex :4
930402���NO934951
at block 2 sektor 1 bindex :4
c5 42 4e 49 44 00 07 4f 4f 4f 4f 4f 4f 00 4b 42 "Åbnid" "OOOOOO" "KB" 44 44 44 20 44 44 44 44 44 00 82 4d 00 c9 31 39 "DDD DDDDD" "M" "19" 39 34 34 33 34 32 00 d0 4e 4f 39 36 36 36 35 31 "944342" "NO966651" 00 00 00 00 00 00 70 f7 88 00 00 00 00 00 00 00 30 32 32 20 20 41 53 00 d3 54 4f 54 41 4c 20 4b "022" "AS" "Total k" 4f 4e 54 52 4f 4c 4c 20 41 53 20 00 c9 30 32 38 "ONTROLL AS" "028" 37 30 34 33 33 00 c9 32 30 32 31 30 32 31 31 00 "70433" "20210211" 00 00 00 00 00 00 70 f7 88 00 00 00 00 00 00 00
This is how I read from the card:
Tag tagFromIntent = intent.getParcelableExtra(NfcAdapter.EXTRA_TAG);
MifareClassic mfc = MifareClassic.get(tagFromIntent);
Here is my code I use for reading inside a for loop:
data = mfc.readBlock(bIndex + block);
and then for converting data to UTF8 for printing I use:
public String convertByteArrayToUTF8(byte[] bytes){
String encoded = null;
try {
encoded = new String(bytes, StandardCharsets.UTF_8);
}
catch (Exception e){
encoded = new String(bytes, Charset.defaultCharset());
}
return encoded;
}
I've tried with ASCII, UTF-16 etc with no luck.
So the data on your tag (excluding the sector trailers looks somewhat like that:
C5 42 4E 49 44 00 07 4F 4F 4F 4F 4F 4F 00 4B 42 ÅBNID..OOOOOO.KB 44 44 44 20 44 44 44 44 44 00 82 4D 00 C9 31 39 DDD DDDDD.‚M.É19 39 34 34 33 34 32 00 D0 4E 4F 39 36 36 36 35 31 944342.ÐNO966651 30 32 32 20 20 41 53 00 D3 54 4F 54 41 4C 20 4B 022 AS.ÓTOTAL K 4F 4E 54 52 4F 4C 4C 20 41 53 20 00 C9 30 32 38 ONTROLL AS .É028 37 30 34 33 33 00 C9 32 30 32 31 30 32 31 31 00 70433.É20210211.
This seems to be some form of structured data. Simply converting the whole binary blob into a UTF-8 (or ASCII) encoded string doesn't make much sense. Instead, you will need to reverse engineer the way that the data is structured (or, even better, you try to obtain the specification from the system manufacturer).
From what I can see, it looks as if that data consisted of multiple null-terminated strings embedded into some compact (Tag)-Length-Value format. The first byte seems to be the tag(?) + length, so we have
C5 Length = 5 42 4E 49 44 00 "BNID" 07 Length = 7 4F 4F 4F 4F 4F 4F 00 "OOOOOO" 4B Length = 11 42 44 44 44 20 44 44 44 44 44 00 "KBDDD DDDDD" 82 Length = 2 4D 00 "M" C9 Length = 9 31 39 39 34 34 33 34 32 00 "19944342" D0 Length = 16 4E 4F 39 36 36 36 35 31 30 32 32 20 20 41 53 00 "NO966651022 AS" D3 Length = 19 54 4F 54 41 4C 20 4B 4F 4E 54 52 4F 4C 4C 20 41 53 20 00 "TOTAL KONTROLL AS " C9 Length = 9 30 32 38 37 30 34 33 33 00 "02870433" C9 Length = 9 32 30 32 31 30 32 31 31 00 "20210211"
The first byte could, for instance, be split into tag and length like this: TTTL LLLL (upper 3 bits encode the tag, lower 5 bits encode the length of the following value). This would give the following tags
0x6
for "BNID", "19944342", "NO966651022 AS", "TOTAL KONTROLL AS ", "02870433", and "20210211"0x0
for "OOOOOO"0x2
for "KBDDD DDDDD"0x4
for "M"Hence, the split between tag and length might also be TTLL LLLL (upper 2 bits encode the tag, lower 6 bits encode the length of the following value).
Unfortunately, the format doesn't resemble any of the popular formats that I'm aware of. So you could just continue your reverse engineering by comparing multiple different cards and by deriving meaning from the values.
So far, in order to decode the above, you would start by reading the first byte, extract the length from that byte, cut that amount of follow-up bytes and convert them into a string (based on the sample that you provided, ASCII encoding should do). You can then continue with the next byte, extract the length information from it, ...