javascriptjavags1-datamatrix

Extract GTIN, LoT, SN and EXP from GS1 DataMatrix barcode


I create software for pharmacies to validate drugs in NMVS. The program should work in such a way that I scan the drug code with a handheld scanner, click "Verify" and connect to NMVS. Most of the work is done, but to correctly verify the drug, I need to extract from the GTIN code (PC), batch number (LoT), serial number (SN) and expiry date (EXP)

Here are the scan results for the test drugs:

01059099913808231003ZP082117230831210XXFAE5AWA6RF8
0105909990054152101123926172207012162RB6FBN09
010590999109968821100322567773831721093010100013978
01059099907954202190EPCNT32ZH5581004032217250331
010590999032841321YCK3EB53CNZXD1725083110C48700
0105909990071029211165895472021010MU465417241031

I know that it's GS1 DataMatrix format and GTIN is prefixed with 01 (following 14 digits is GTIN), LoT with prefix 10 (following 1-20 alphanumeric characters is LoT), SN with prefix 21 (following 1-20 alphanumeric characters is LoT) LoT) and the expiry date is prefixed with 17 (following 6 digits is EXP).

For the given examples, I should have e.g.:

[
    {
        "gtin": "05909991380823",
        "lot": "03ZP08",
        "sn": "0XXFAE5AWA6RF8",
        "exp": "230831"
    },
    {
        "gtin": "05909990054152",
        "lot": "1123926",
        "sn": "62RB6FBN09",
        "exp": "220701"
    },
    {
        "gtin": "05909991099688",
        "lot": "100013978",
        "sn": "10032256777383",
        "exp": "210930"
    },
    {
        "gtin": "05909990795420",
        "lot": "040322",
        "sn": "90EPCNT32ZH558",
        "exp": "250331"
    },
    {
        "gtin": "05909990328413",
        "lot": "C48700",
        "sn": "YCK3EB53CNZXD",
        "exp": "250831"
    },
    {
        "gtin": "05909990071029",
        "lot": "10MU4654",
        "sn": "116589547202",
        "exp": "241031"
    }
]

The problem is that these sections can be in any order and of varying lengths. Only GTIN and EXP have a fixed length.

I created a regex to extract these sections: ^(?=.*01(\d{14}))(?=.*10([a-zA-Z0-9]{1,20}))(? =.*17(\d{6}))(?=.*21([a-zA-Z0-9]{1,20})).*$ but unfortunately it doesn't work properly. The client is written in Javascript (not in TS, exactly in AngularJS - yes, it's a legacy project, I'm trying to persuade the company to update it), and the server in Java.

I'm looking for any solution - whether it's a regex, library (Javascript or Java), external API - for this problem, personally I'm running out of ideas...

Also, I'll add that the handheld scanner I'm using is the Zebra DS2208.

I would appreciate any help on this topic.

EDIT:

I tried read barcode scanner output character by character, but I don't see a pattern. This is what I got:

Output with special characters


Solution

  • I did it! I noticed that GTIN and EXP are always extracted in proper way, so I tried something like this:

        const extractDataMatrix = (code) => {
            const response = {gtin: '', lot: '', sn: '', exp: ''};
            let responseCode = code;
    
            const prefixes = [
                {prefix: '01', key: 'gtin', length: 14},
                {prefix: '17', key: 'exp', length: 6}
            ];
    
            prefixes.forEach(({prefix, key, length}) => {
                const position = responseCode.indexOf(prefix);
    
                if (position !== -1) {
                    const start = position + prefix.length;
                    const end = start + length;
    
                    response[key] = responseCode.substring(start, end);
                    responseCode = responseCode.slice(0, position) + responseCode.slice(end);
                }
            });
    
            const lotAndSn = extractLotAndSn(responseCode);
            response.lot = lotAndSn.lot;
            response.sn = lotAndSn.sn;
    
            return response;
        };
    
        const extractLotAndSn = (responseCode) => {
            const pattern = /^(10.+?)(?=10|21)(21.+?)$|^(21.+?)(?=10|21)(10.+?)$/;
            const matches = responseCode.match(pattern);
            if (!matches) return {lot: '', sn: ''};
    
            const [lot1, sn1, sn2, lot2] = matches.slice(1);
            const lot = (lot1 || lot2 || '').substring(2);
            const sn = (sn1 || sn2 || '').substring(2);
            return checkLotAndSn(lot, sn, responseCode);
        };
    
        const checkLotAndSn = (lot, sn, responseCode) => {
            if (responseCode.includes("1010") && !responseCode.includes("10100")) {
                const isLotStart = lot.startsWith("10");
                if (isLotStart) {
                    lot = lot.slice(2);
                    sn += "10";
                }
            } else if (responseCode.includes("2121") && responseCode.includes("21210")) {
                const isSnStart = sn.startsWith("21");
                if (isSnStart) {
                    sn = sn.slice(2);
                    lot += "21";
                }
            }
    
            return {lot, sn};
        };
    

    I think it can be optimized anyway, but for now I don't care ;).

    What is going on?

    1. In extractDataMatrix I check prefixes with fixed length (GTIN and EX).
    2. After forEach I remove it from responseCode.
    3. I pass this responseCode to extractLotAndSn() function.
    4. I this function I used regex to get lot and sn.
    5. Sometimes I got code, where prefix is a part of previous sequence (ex. 1010 and 2121), so checkLotAndSn() function remove prefix from start of sequence and add it to previous sequence.
    6. I don't know why in case where 10100 and 21210 is part of responseCode everything is splitted correctly, so I excluded it from swap.