typescriptchecksum

Is it possible to create a TS Type that can be used for CAS registry numbers?


Is it possible to create a custom TypeScript type that can be used for CAS numbers, which is a string of three separate integers separated by a dash? There are restrictions on valid values for all three sections. The first two are processed (details below) and then compared with the checksum (third section).

I created a simple RegExp pattern to check for the format, but there is some logic to the values that are allowed in the CAS sections that can't be checked for in regex. That logic is:

  1. The whole thing can be no longer than 10 characters (not including the hyphen delimiters)
  2. First section can be 2 to 7 characters in length, can not start with a zero (or it's just ignored)
  3. Second section is always 2 characters in length.
  4. The second section can start with 0 or even be 00, and still be valid (eg: 134842-07-2 and 50-00-0 are both valid)
  5. The third section is the single digit checksum character.
  6. The checksum needs to be the calculated by concatenating the first two segments, reversing it, multiplying each integer by its position, adding these values up then getting its mod(10) value.

Valid CAS example: 151-21-3, using the steps above:

151-21 (first and second sections)
-> 15121 (concat)
-> 12151 (reversed)
-> (1*5) + (5*4) + (1*3) + (2*2) + (1*1) = 33
-> 33 % 10 = 3

Or to make it look more scientific:

(5*1) + (4*5) + (3*1) + (2*2) + (1*1)   33      
------------------------------------- = -- = 3 
                  10                    10   

I have tried making a type for this myself using some examples online (like this SO post), but I'm not sure how to add the character length validation or checksum validation. But this is the progress I've made thus far:

type PrependNextNum<A extends Array<unknown>> = A['length'] extends infer T ? ((t: T, ...a: A) => void) extends ((...x: infer X) => void) ? X : never : never;
type EnumerateInternal<A extends Array<unknown>, N extends number> = { 0: A, 1: EnumerateInternal<PrependNextNum<A>, N> }[N extends A['length'] ? 0 : 1];
export type Enumerate<N extends number> = EnumerateInternal<[], N> extends (infer E)[] ? E : never;
export type Range<FROM extends number, TO extends number> = Exclude<Enumerate<TO>, Enumerate<FROM>>;

type SEG_A = Range<0, 9999999>;
type SEG_B = Range<0, 99>; // How to ensure that this is two chars in length?
type SEG_CHECKSUM = Range<0, 10>;
type CAS = `${SEG_A}-${SEG_B}-${SEG_CHECKSUM}`

let cas: CAS

// Valid CAS numbers that don't throw an error
cas = '6123-1-1'
cas = '7664-93-9'
cas = '7732-18-5'
cas = '100-00-5'
cas = '50-00-0'
cas = '7647-01-0'

// Invalid CAS numbers that do throw an error (correctly)
cas = '123232-a-14' // Second segment isn't even a number
cas = 'abcd-ef-g' // All alpha
cas = '612311' // Numbers are correct, but no hyphens are present
cas = '6123-01-11' // Too many checksum digits

// Invalid CAS numbers that don't throw an error (but should)
cas = '600000012-999-1' // too many chars in first two segments, incorrect checksum
cas = '0000000-00-0' // too many chars in first two segments, incorrect checksum
cas = '7647-1-0' // Second segment is only one char (7647-01-0 IS valid)
cas = '7647-01-1' // Checksum is incorrect

Here's the TS playground with the above code. Also, both SEG_A and SEG_B show an error which I do not see locally:

Type instantiation is excessively deep and possibly infinite.(2589)

Solution

  • TLDR

    TS playground with solution.

    This was a very fun typescript challenge to do. I might write an article with a more in depth explanation about this problem and post it in a comment.

    I solved it by breaking the problem down into 3 steps:

    Step 1: character restrictions

    We need to make types that can validate the length of a string. The following code validate the types by returning the string if its valid and never if its not. The restriction of this code is that it won't work if you need to check for length > 10, you would have to update the INDEX_HIGHER tuple.

    // string length utility types (up to 10, depends on the `INDEX_HIGHER` tuple)
    
    type INDEX_HIGHER = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12];
    
    type L_MAX<T extends string, L extends number, C extends number = 0> = C extends L
      ? T
      : T extends `${infer _}${infer R}`
        ? L_MAX<R, L, INDEX_HIGHER[C]>
        : never;
    
    type MAX_LEN<T extends string, L extends number> =
      L_MAX<T, INDEX_HIGHER[L], 0> extends never ? T : never;
    // min is just the inverted version of MAX_LEN
    type MIN_LEN<T extends string, L extends number> = L_MAX<T, L, 0> extends never ? never : T;
    
    type MINMAX_LEN<T extends string, MIN extends number, MAX extends number> =
      MIN_LEN<T, MIN> extends never ? never : MAX_LEN<T, MAX> extends never ? never : T;
    
    type EXACT_LEN<T extends string, L extends number> =
      MIN_LEN<T, L> extends never ? never : MAX_LEN<T, L> extends never ? never : T;
    
    // test string length functions
    
    function max_strlen<T extends string, L extends number>(s: MAX_LEN<T, L>, len: L) {}
    
    max_strlen('12345', 5);
    max_strlen('123456', 5); // error
    
    function min_strlen<T extends string, L extends number>(s: MIN_LEN<T, L>, len: L) {}
    
    min_strlen('12345', 5);
    min_strlen('1234', 5); // error
    
    function min_max_strlen<T extends string, MIN extends number, MAX extends number>(
      s: MINMAX_LEN<T, MIN, MAX>,
      min: MIN,
      max: MAX
    ) {}
    
    min_max_strlen('1', 2, 7); // error
    min_max_strlen('12', 2, 7);
    min_max_strlen('1234', 2, 7);
    min_max_strlen('1234567', 2, 7);
    min_max_strlen('12345678', 2, 7); // error
    
    function strlen<T extends string, L extends number>(S: EXACT_LEN<T, L>, len: L) {}
    
    strlen('1234', 5); // error
    strlen('12345', 5);
    strlen('123456', 5); // error
    

    Step 2: CAS format

    Now that we have the types that can restrict strings by their length we can make a type for a CAS number.

    I couldn't figure out how to make the type not generic. I don't think it's possible to make a type like this and use it as const cas_number: CAS = '...'.

    So instead I made it like CAS<T>, which you can take as function argument.

    // 1. check if all sections are numbers
    type CAS<T extends string> = T extends `${number}-${number}-${number}`
      ? // 2. get the 3 sections as types
        T extends `${infer SEG_A}-${infer SEG_B}-${infer SEG_C}`
        ? // 3. validate the length of the first 2 sections and the checksum
          T extends `${MINMAX_LEN<SEG_A, 2, 7>}-${EXACT_LEN<SEG_B, 2>}-${EXACT_LEN<SEG_C, 1>}`
          ? T
          : never
        : never
      : never;
    
    function cas<T extends string>(s: CAS<T>) {}
    
    cas('151-21-3'); // no error
    // these all error now
    cas('123232-a-14'); // Second segment isn't even a number
    cas('abcd-ef-g'); // All alpha
    cas('612311'); // Numbers are correct, but no hyphens are present
    cas('6123-01-11'); // Too many checksum digits
    

    Step 3: the checksum

    This was the hardest part. At the end of the checksum calculation we do a mod 10 operation, which is the same as taking the last digit of any number: 564 % 10 = 4.

    Instead of doing the modulus operator after the entire sum, we can do it each time we add or multiply a number, since we're only interested in the last digit of the same this would yield the same result:

    151-21 (first and second sections)
    -> 15121 (concat)
    -> 12151 (reversed)
    -> 0 + (1*5 % 10) = 0 + 5 = 5 % 10 = 5
    -> 5 + (5*4 % 10) = 5 + 0 = 5 % 10 = 5
    -> 5 + (1*3 % 10) = 5 + 3 = 8 % 10 = 8
    -> 8 + (2*2 % 10) = 8 + 4 = 12 % 10 = 2
    -> 2 + (1*1 % 10) = 2 + 1 = 3 % 10 = 3
    

    The input of each operation are 2 numbers between 0-9 and the output of each operation is a single number between 0-9, so we can create 2 lookup tables (10x10 2d arrays) with the result of each operation. One lookup table for addition and one for multiplication:

    type ADDITION_MAP = [
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
      [1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
      [2, 3, 4, 5, 6, 7, 8, 9, 0, 1],
      [3, 4, 5, 6, 7, 8, 9, 0, 1, 2],
      [4, 5, 6, 7, 8, 9, 0, 1, 2, 3],
      [5, 6, 7, 8, 9, 0, 1, 2, 3, 4],
      [6, 7, 8, 9, 0, 1, 2, 3, 4, 5],
      [7, 8, 9, 0, 1, 2, 3, 4, 5, 6],
      [8, 9, 0, 1, 2, 3, 4, 5, 6, 7],
      [9, 0, 1, 2, 3, 4, 5, 6, 7, 8]
    ];
    
    type MULTIPLY_MAP = [
      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
      [0, 2, 4, 6, 8, 0, 2, 4, 6, 8],
      [0, 3, 6, 9, 2, 5, 8, 1, 4, 7],
      [0, 4, 8, 2, 6, 0, 4, 8, 2, 6],
      [0, 5, 0, 5, 0, 5, 0, 5, 0, 5],
      [0, 6, 2, 8, 4, 0, 6, 2, 8, 4],
      [0, 7, 4, 1, 8, 5, 2, 9, 6, 3],
      [0, 8, 6, 4, 2, 0, 8, 6, 4, 2],
      [0, 9, 8, 7, 6, 5, 4, 3, 2, 1]
    ];
    

    How it works: if we want to do 7 * 8 we lookup the result of MULTIPLY_MAP[7][8] which gives us 6, because (7*8) % 10 = 6.

    Likewise, if you wanted to do 8 + 4 we lookup the result of ADDITION_MAP[8][4] which gives us 2, because (8 + 4) % 10 = 2.

    Step 4: putting it al together

    With the lookup tables we can write a type for the checksum and use it in the CAS type:

    // lookup table for stringified numbers
    type NUMBERS = {
      '0': 0;
      '1': 1;
      '2': 2;
      '3': 3;
      '4': 4;
      '5': 5;
      '6': 6;
      '7': 7;
      '8': 8;
      '9': 9;
    };
    
    // 1. "loop" over first character
    type CHECKSUM<
      T extends string,
      I extends number = 1,
      C extends number = 0
    > = T extends `${infer F}${infer R}`
      ? // 2. check that first character is a digit
        F extends keyof NUMBERS
        ? // 3. do current = current + (index * number)
          CHECKSUM<R, INDEX_HIGHER[I], ADDITION_MAP[C][MULTIPLY_MAP[I][NUMBERS[F]]]>
        : never
      : // 4. we're at the last character
        C;
    
    // reverse a string
    type REVERSE<T extends string> = T extends `${infer First}${infer Rest}`
      ? `${REVERSE<Rest>}${First}`
      : '';
    
    // 1. check if all sections are numbers
    type CAS<T extends string> = T extends `${number}-${number}-${number}`
      ? // 2. get the 3 sections as types
        T extends `${infer SEG_A}-${infer SEG_B}-${number}`
        ? // 3. validate the length of the first 2 sections and the checksum
          T extends `${MINMAX_LEN<SEG_A, 2, 7>}-${EXACT_LEN<SEG_B, 2>}-${CHECKSUM<REVERSE<`${SEG_A}${SEG_B}`>>}`
          ? T
          : never
        : never
      : never;
    

    Results

    I believe some of your provided test cases were invalid:

    
    // Valid CAS numbers that don't throw an error
    cas('6123-1-1'); // I believe this CAS number is invalid anyway??
    cas('6123-01-9'); // this one is valid tho
    cas('7664-93-9');
    cas('7732-18-5');
    cas('100-00-5');
    cas('50-00-0');
    cas('7647-01-0');
    
    // Invalid CAS numbers that do throw an error (correctly)
    cas('123232-a-14'); // Second segment isn't even a number
    cas('abcd-ef-g'); // All alpha
    cas('612311'); // Numbers are correct, but no hyphens are present
    cas('6123-01-11'); // Too many checksum digits
    
    // Invalid CAS numbers that don't throw an error (but should)
    cas('600000012-999-1'); // too many chars in first two segments, incorrect checksum
    cas('0000000-00-0'); // this one in your exmple should be correct right?
    cas('7647-1-0'); // Second segment is only one char (7647-01-0 IS valid)
    cas('7647-01-1'); // Checksum is incorrect