Is it possible to create a custom TypeScript type that can be used for CAS numbers, which is a string of three separate integers separated by a dash? There are restrictions on valid values for all three sections. The first two are processed (details below) and then compared with the checksum (third section).
I created a simple RegExp pattern to check for the format, but there is some logic to the values that are allowed in the CAS sections that can't be checked for in regex. That logic is:
134842-07-2
and 50-00-0
are both valid)Valid CAS example: 151-21-3, using the steps above:
151-21 (first and second sections)
-> 15121 (concat)
-> 12151 (reversed)
-> (1*5) + (5*4) + (1*3) + (2*2) + (1*1) = 33
-> 33 % 10 = 3
Or to make it look more scientific:
(5*1) + (4*5) + (3*1) + (2*2) + (1*1) 33
------------------------------------- = -- = 3
10 10
I have tried making a type for this myself using some examples online (like this SO post), but I'm not sure how to add the character length validation or checksum validation. But this is the progress I've made thus far:
type PrependNextNum<A extends Array<unknown>> = A['length'] extends infer T ? ((t: T, ...a: A) => void) extends ((...x: infer X) => void) ? X : never : never;
type EnumerateInternal<A extends Array<unknown>, N extends number> = { 0: A, 1: EnumerateInternal<PrependNextNum<A>, N> }[N extends A['length'] ? 0 : 1];
export type Enumerate<N extends number> = EnumerateInternal<[], N> extends (infer E)[] ? E : never;
export type Range<FROM extends number, TO extends number> = Exclude<Enumerate<TO>, Enumerate<FROM>>;
type SEG_A = Range<0, 9999999>;
type SEG_B = Range<0, 99>; // How to ensure that this is two chars in length?
type SEG_CHECKSUM = Range<0, 10>;
type CAS = `${SEG_A}-${SEG_B}-${SEG_CHECKSUM}`
let cas: CAS
// Valid CAS numbers that don't throw an error
cas = '6123-1-1'
cas = '7664-93-9'
cas = '7732-18-5'
cas = '100-00-5'
cas = '50-00-0'
cas = '7647-01-0'
// Invalid CAS numbers that do throw an error (correctly)
cas = '123232-a-14' // Second segment isn't even a number
cas = 'abcd-ef-g' // All alpha
cas = '612311' // Numbers are correct, but no hyphens are present
cas = '6123-01-11' // Too many checksum digits
// Invalid CAS numbers that don't throw an error (but should)
cas = '600000012-999-1' // too many chars in first two segments, incorrect checksum
cas = '0000000-00-0' // too many chars in first two segments, incorrect checksum
cas = '7647-1-0' // Second segment is only one char (7647-01-0 IS valid)
cas = '7647-01-1' // Checksum is incorrect
Here's the TS playground with the above code. Also, both SEG_A and SEG_B show an error which I do not see locally:
Type instantiation is excessively deep and possibly infinite.(2589)
TS playground with solution.
This was a very fun typescript challenge to do. I might write an article with a more in depth explanation about this problem and post it in a comment.
I solved it by breaking the problem down into 3 steps:
We need to make types that can validate the length of a string. The following code validate the types by returning the string if its valid and never
if its not. The restriction of this code is that it won't work if you need to check for length > 10
, you would have to update the INDEX_HIGHER
tuple.
// string length utility types (up to 10, depends on the `INDEX_HIGHER` tuple)
type INDEX_HIGHER = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12];
type L_MAX<T extends string, L extends number, C extends number = 0> = C extends L
? T
: T extends `${infer _}${infer R}`
? L_MAX<R, L, INDEX_HIGHER[C]>
: never;
type MAX_LEN<T extends string, L extends number> =
L_MAX<T, INDEX_HIGHER[L], 0> extends never ? T : never;
// min is just the inverted version of MAX_LEN
type MIN_LEN<T extends string, L extends number> = L_MAX<T, L, 0> extends never ? never : T;
type MINMAX_LEN<T extends string, MIN extends number, MAX extends number> =
MIN_LEN<T, MIN> extends never ? never : MAX_LEN<T, MAX> extends never ? never : T;
type EXACT_LEN<T extends string, L extends number> =
MIN_LEN<T, L> extends never ? never : MAX_LEN<T, L> extends never ? never : T;
// test string length functions
function max_strlen<T extends string, L extends number>(s: MAX_LEN<T, L>, len: L) {}
max_strlen('12345', 5);
max_strlen('123456', 5); // error
function min_strlen<T extends string, L extends number>(s: MIN_LEN<T, L>, len: L) {}
min_strlen('12345', 5);
min_strlen('1234', 5); // error
function min_max_strlen<T extends string, MIN extends number, MAX extends number>(
s: MINMAX_LEN<T, MIN, MAX>,
min: MIN,
max: MAX
) {}
min_max_strlen('1', 2, 7); // error
min_max_strlen('12', 2, 7);
min_max_strlen('1234', 2, 7);
min_max_strlen('1234567', 2, 7);
min_max_strlen('12345678', 2, 7); // error
function strlen<T extends string, L extends number>(S: EXACT_LEN<T, L>, len: L) {}
strlen('1234', 5); // error
strlen('12345', 5);
strlen('123456', 5); // error
Now that we have the types that can restrict strings by their length we can make a type for a CAS number.
I couldn't figure out how to make the type not generic. I don't think it's possible to make a type like this and use it as const cas_number: CAS = '...'
.
So instead I made it like CAS<T>
, which you can take as function argument.
// 1. check if all sections are numbers
type CAS<T extends string> = T extends `${number}-${number}-${number}`
? // 2. get the 3 sections as types
T extends `${infer SEG_A}-${infer SEG_B}-${infer SEG_C}`
? // 3. validate the length of the first 2 sections and the checksum
T extends `${MINMAX_LEN<SEG_A, 2, 7>}-${EXACT_LEN<SEG_B, 2>}-${EXACT_LEN<SEG_C, 1>}`
? T
: never
: never
: never;
function cas<T extends string>(s: CAS<T>) {}
cas('151-21-3'); // no error
// these all error now
cas('123232-a-14'); // Second segment isn't even a number
cas('abcd-ef-g'); // All alpha
cas('612311'); // Numbers are correct, but no hyphens are present
cas('6123-01-11'); // Too many checksum digits
This was the hardest part.
At the end of the checksum calculation we do a mod 10
operation, which is the same as taking the last digit of any number: 564 % 10 = 4
.
Instead of doing the modulus operator after the entire sum, we can do it each time we add or multiply a number, since we're only interested in the last digit of the same this would yield the same result:
151-21 (first and second sections)
-> 15121 (concat)
-> 12151 (reversed)
-> 0 + (1*5 % 10) = 0 + 5 = 5 % 10 = 5
-> 5 + (5*4 % 10) = 5 + 0 = 5 % 10 = 5
-> 5 + (1*3 % 10) = 5 + 3 = 8 % 10 = 8
-> 8 + (2*2 % 10) = 8 + 4 = 12 % 10 = 2
-> 2 + (1*1 % 10) = 2 + 1 = 3 % 10 = 3
The input of each operation are 2 numbers between 0-9 and the output of each operation is a single number between 0-9, so we can create 2 lookup tables (10x10 2d arrays) with the result of each operation. One lookup table for addition and one for multiplication:
type ADDITION_MAP = [
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
[2, 3, 4, 5, 6, 7, 8, 9, 0, 1],
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2],
[4, 5, 6, 7, 8, 9, 0, 1, 2, 3],
[5, 6, 7, 8, 9, 0, 1, 2, 3, 4],
[6, 7, 8, 9, 0, 1, 2, 3, 4, 5],
[7, 8, 9, 0, 1, 2, 3, 4, 5, 6],
[8, 9, 0, 1, 2, 3, 4, 5, 6, 7],
[9, 0, 1, 2, 3, 4, 5, 6, 7, 8]
];
type MULTIPLY_MAP = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 2, 4, 6, 8, 0, 2, 4, 6, 8],
[0, 3, 6, 9, 2, 5, 8, 1, 4, 7],
[0, 4, 8, 2, 6, 0, 4, 8, 2, 6],
[0, 5, 0, 5, 0, 5, 0, 5, 0, 5],
[0, 6, 2, 8, 4, 0, 6, 2, 8, 4],
[0, 7, 4, 1, 8, 5, 2, 9, 6, 3],
[0, 8, 6, 4, 2, 0, 8, 6, 4, 2],
[0, 9, 8, 7, 6, 5, 4, 3, 2, 1]
];
How it works: if we want to do 7 * 8
we lookup the result of MULTIPLY_MAP[7][8]
which gives us 6
, because (7*8) % 10 = 6
.
Likewise, if you wanted to do 8 + 4
we lookup the result of ADDITION_MAP[8][4]
which gives us 2
, because (8 + 4) % 10 = 2
.
With the lookup tables we can write a type for the checksum and use it in the CAS
type:
// lookup table for stringified numbers
type NUMBERS = {
'0': 0;
'1': 1;
'2': 2;
'3': 3;
'4': 4;
'5': 5;
'6': 6;
'7': 7;
'8': 8;
'9': 9;
};
// 1. "loop" over first character
type CHECKSUM<
T extends string,
I extends number = 1,
C extends number = 0
> = T extends `${infer F}${infer R}`
? // 2. check that first character is a digit
F extends keyof NUMBERS
? // 3. do current = current + (index * number)
CHECKSUM<R, INDEX_HIGHER[I], ADDITION_MAP[C][MULTIPLY_MAP[I][NUMBERS[F]]]>
: never
: // 4. we're at the last character
C;
// reverse a string
type REVERSE<T extends string> = T extends `${infer First}${infer Rest}`
? `${REVERSE<Rest>}${First}`
: '';
// 1. check if all sections are numbers
type CAS<T extends string> = T extends `${number}-${number}-${number}`
? // 2. get the 3 sections as types
T extends `${infer SEG_A}-${infer SEG_B}-${number}`
? // 3. validate the length of the first 2 sections and the checksum
T extends `${MINMAX_LEN<SEG_A, 2, 7>}-${EXACT_LEN<SEG_B, 2>}-${CHECKSUM<REVERSE<`${SEG_A}${SEG_B}`>>}`
? T
: never
: never
: never;
I believe some of your provided test cases were invalid:
// Valid CAS numbers that don't throw an error
cas('6123-1-1'); // I believe this CAS number is invalid anyway??
cas('6123-01-9'); // this one is valid tho
cas('7664-93-9');
cas('7732-18-5');
cas('100-00-5');
cas('50-00-0');
cas('7647-01-0');
// Invalid CAS numbers that do throw an error (correctly)
cas('123232-a-14'); // Second segment isn't even a number
cas('abcd-ef-g'); // All alpha
cas('612311'); // Numbers are correct, but no hyphens are present
cas('6123-01-11'); // Too many checksum digits
// Invalid CAS numbers that don't throw an error (but should)
cas('600000012-999-1'); // too many chars in first two segments, incorrect checksum
cas('0000000-00-0'); // this one in your exmple should be correct right?
cas('7647-1-0'); // Second segment is only one char (7647-01-0 IS valid)
cas('7647-01-1'); // Checksum is incorrect