Extract GTIN, LoT, SN and EXP from GS1 DataMatrix barcode

I create software for pharmacies to validate drugs in NMVS. The program should work in such a way that I scan the drug code with a handheld scanner, click "Verify" and connect to NMVS. Most of the work is done, but to correctly verify the drug, I need to extract from the GTIN code (PC), batch number (LoT), serial number (SN) and expiry date (EXP)

Here are the scan results for the test drugs:

01059099913808231003ZP082117230831210XXFAE5AWA6RF8
0105909990054152101123926172207012162RB6FBN09
010590999109968821100322567773831721093010100013978
01059099907954202190EPCNT32ZH5581004032217250331
010590999032841321YCK3EB53CNZXD1725083110C48700
0105909990071029211165895472021010MU465417241031

I know that it's GS1 DataMatrix format and GTIN is prefixed with 01 (following 14 digits is GTIN), LoT with prefix 10 (following 1-20 alphanumeric characters is LoT), SN with prefix 21 (following 1-20 alphanumeric characters is LoT) LoT) and the expiry date is prefixed with 17 (following 6 digits is EXP).

For the given examples, I should have e.g.:

[
    {
        "gtin": "05909991380823",
        "lot": "03ZP08",
        "sn": "0XXFAE5AWA6RF8",
        "exp": "230831"
    },
    {
        "gtin": "05909990054152",
        "lot": "1123926",
        "sn": "62RB6FBN09",
        "exp": "220701"
    },
    {
        "gtin": "05909991099688",
        "lot": "100013978",
        "sn": "10032256777383",
        "exp": "210930"
    },
    {
        "gtin": "05909990795420",
        "lot": "040322",
        "sn": "90EPCNT32ZH558",
        "exp": "250331"
    },
    {
        "gtin": "05909990328413",
        "lot": "C48700",
        "sn": "YCK3EB53CNZXD",
        "exp": "250831"
    },
    {
        "gtin": "05909990071029",
        "lot": "10MU4654",
        "sn": "116589547202",
        "exp": "241031"
    }
]

The problem is that these sections can be in any order and of varying lengths. Only GTIN and EXP have a fixed length.

I created a regex to extract these sections: ^(?=.*01(\d{14}))(?=.*10([a-zA-Z0-9]{1,20}))(? =.*17(\d{6}))(?=.*21([a-zA-Z0-9]{1,20})).*$ but unfortunately it doesn't work properly. The client is written in Javascript (not in TS, exactly in AngularJS - yes, it's a legacy project, I'm trying to persuade the company to update it), and the server in Java.

I'm looking for any solution - whether it's a regex, library (Javascript or Java), external API - for this problem, personally I'm running out of ideas...

Also, I'll add that the handheld scanner I'm using is the Zebra DS2208.

I would appreciate any help on this topic.

EDIT:

I tried read barcode scanner output character by character, but I don't see a pattern. This is what I got:

Solution

I did it! I noticed that GTIN and EXP are always extracted in proper way, so I tried something like this:

    const extractDataMatrix = (code) => {
        const response = {gtin: '', lot: '', sn: '', exp: ''};
        let responseCode = code;

        const prefixes = [
            {prefix: '01', key: 'gtin', length: 14},
            {prefix: '17', key: 'exp', length: 6}
        ];

        prefixes.forEach(({prefix, key, length}) => {
            const position = responseCode.indexOf(prefix);

            if (position !== -1) {
                const start = position + prefix.length;
                const end = start + length;

                response[key] = responseCode.substring(start, end);
                responseCode = responseCode.slice(0, position) + responseCode.slice(end);
            }
        });

        const lotAndSn = extractLotAndSn(responseCode);
        response.lot = lotAndSn.lot;
        response.sn = lotAndSn.sn;

        return response;
    };

    const extractLotAndSn = (responseCode) => {
        const pattern = /^(10.+?)(?=10|21)(21.+?)$|^(21.+?)(?=10|21)(10.+?)$/;
        const matches = responseCode.match(pattern);
        if (!matches) return {lot: '', sn: ''};

        const [lot1, sn1, sn2, lot2] = matches.slice(1);
        const lot = (lot1 || lot2 || '').substring(2);
        const sn = (sn1 || sn2 || '').substring(2);
        return checkLotAndSn(lot, sn, responseCode);
    };

    const checkLotAndSn = (lot, sn, responseCode) => {
        if (responseCode.includes("1010") && !responseCode.includes("10100")) {
            const isLotStart = lot.startsWith("10");
            if (isLotStart) {
                lot = lot.slice(2);
                sn += "10";
            }
        } else if (responseCode.includes("2121") && responseCode.includes("21210")) {
            const isSnStart = sn.startsWith("21");
            if (isSnStart) {
                sn = sn.slice(2);
                lot += "21";
            }
        }

        return {lot, sn};
    };

I think it can be optimized anyway, but for now I don't care ;).

What is going on?

In extractDataMatrix I check prefixes with fixed length (GTIN and EX).
After forEach I remove it from responseCode.
I pass this responseCode to extractLotAndSn() function.
I this function I used regex to get lot and sn.
Sometimes I got code, where prefix is a part of previous sequence (ex. 1010 and 2121), so checkLotAndSn() function remove prefix from start of sequence and add it to previous sequence.
I don't know why in case where 10100 and 21210 is part of responseCode everything is splitted correctly, so I excluded it from swap.