kdb+

Propagating the x1 Column's Data Type to Subsequent Columns in Grouped q Tables


Data:

q) t:([] sym1: 1 2 3 4 5 6 7; 
sym2: 1.1 2.2 3.3 4.4 5.5 6.6 7.7; 
sym3: 2.1 3.2 4.3 5.4 6.5 7.6 8.7; 
sym4: 3.1 4.2 5.3 6.4 7.5 8.6 9.7; 
sym5: 4.1 5.2 6.3 7.4 8.5 9.6 10.7; 
sym6: 5.1 6.2 7.3 8.4 9.5 10.6 11.7; 
sym7: 6.1 7.2 8.3 9.4 10.5 11.6 12.7; 
age1: (`x1;`x2;`x3;`x4;`x5;`x6;`x7); 
age2: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
age3: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
age4: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
age5: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
age6: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
age7: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
time1: (`x1;`x2;`x3;`x4;`x5;`x6;`x7); 
time2: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
time3: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
time4: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
time5: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
time6: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); 
time7: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"))

Statements:

q)strCols:string cols t
q)strCols
"sym1"
"sym2"
"sym3"
"sym4"
"sym5"
"sym6"
"sym7"
"age1"
"age2"
"age3"
"age4"
"age5"
"age6"
"age7"
"time1"
"time2"
"time3"
"time4"
"time5"
"time6"
"time7"
q)cols1:{(-1_x),"*"}each strCols[til[floor count[strCols]%7]*7]
q)cols1
"sym*"
"age*"
"time*"
q)ind:{where strCols like x}each cols1
q)ind
0  1  2  3  4  5  6 
7  8  9  10 11 12 13
14 15 16 17 18 19 20
q)indCols:{1_`$strCols[x]}each ind
q)indCols
sym2  sym3  sym4  sym5  sym6  sym7 
age2  age3  age4  age5  age6  age7 
time2 time3 time4 time5 time6 time7
q)typeCols:{type t[x]}each `${(-1_x),"1"}each cols1
q)typeCols
11 7 7h

Function:

castFunc:{{
[col1;col2;col3;col4;col5;col6;typ]
![t;();0b;(`col1`col2`col3`col4`col5`col6)!((mmu;typ;col1);(mmu;typ;col2);(mmu;typ;col3);(mmu;typ;col4);(mmu;typ;col5);(mmu;typ;col6))]
}[x[0];x[1];x[2];x[3];x[4];x[5];y]
}'[indCols;typeCols]

I'm trying to make a function that assigns the data type of the x1 column for the rest of the columns from x2 to x7 for each column group e.g. sym,age & time.

I've tried using a functional update to loop through the columns and datatypes but running into issues and I'm looking for the best way to approach this.


Solution

  • t:([] sym1: 1 2 3 4 5 6 7; sym2: 1.1 2.2 3.3 4.4 5.5 6.6 7.7; sym3: 2.1 3.2 4.3 5.4 6.5 7.6 8.7; sym4: 3.1 4.2 5.3 6.4 7.5 8.6 9.7; sym5: 4.1 5.2 6.3 7.4 8.5 9.6 10.7; sym6: 5.1 6.2 7.3 8.4 9.5 10.6 11.7; sym7: 6.1 7.2 8.3 9.4 10.5 11.6 12.7; age1: (`x1;`x2;`x3;`x4;`x5;`x6;`x7); age2: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); age3: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); age4: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); age5: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); age6: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); age7: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); time1: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); time2: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); time3: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); time4: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); time5: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); time6: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"); time7: (`x1;"x2";"x3";"x4";"x5";"x6";"x7"))
    
    //Get the cols of the tab
    c:cols t
    
    //Remove the numbers to get the distinct prefixes
    u:distinct except[;.Q.n] each string c
    
    //Find matching coulmns for each prefix
    uu:{x where x like y,"*"}[c] each u
    
    //Sort the columns based on the numeric suffix
    uu:{y iasc "J"$count[x] _/: string y}'[u;uu]
    
    //Get the type of the first cell for the first column of each prefix 
    tu:abs{type first t first x} each uu
    
    //Because there are mixed datatype columns we need a special casting function
    d:7 11h!(7h$;{{$[-11h~type x;x;`$x]}each x})
    
    //Map each column to intended datatype
    //We remove the first columns which are used as the prototype 1_/:uu as they do not need casting/parsing
    dc:(!). ungroup[([]c:1_/:uu;t:tu)]`c`t
    
    //Run update on table
    t:![t;();0b;c!{((`d;(`dc;enlist x));x)}each c]
    
    //Result
    meta t
    c    | t f a
    -----| -----
    sym1 | j
    sym2 | j
    sym3 | j
    sym4 | j
    sym5 | j
    sym6 | j
    sym7 | j
    age1 | s
    age2 | s
    age3 | s
    age4 | s
    age5 | s
    age6 | s
    age7 | s
    time1| s
    time2| s
    time3| s
    time4| s
    time5| s
    time6| s
    time7| s
    

    If the time/age columns were not of mixed type the answer is a little different

    //Define t - the age and time columns in this example do not have mixed datatypes
    t:([] sym1: 1 2 3 4 5 6 7; sym2: 1.1 2.2 3.3 4.4 5.5 6.6 7.7; sym3: 2.1 3.2 4.3 5.4 6.5 7.6 8.7; sym4: 3.1 4.2 5.3 6.4 7.5 8.6 9.7; sym5: 4.1 5.2 6.3 7.4 8.5 9.6 10.7; sym6: 5.1 6.2 7.3 8.4 9.5 10.6 11.7; sym7: 6.1 7.2 8.3 9.4 10.5 11.6 12.7; age1: `x1`x2`x3`x4`x5`x6`x7; age2: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); age3: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); age4: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); age5: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); age6: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); age7: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); time1: `x1`x2`x3`x4`x5`x6`x7; time2: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); time3: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); time4: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); time5: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); time6: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"); time7: ("x1";"x2";"x3";"x4";"x5";"x6";"x7"))
    

    Different cast/parsing dictionary needed only:

    // Float to Long is a cast but string to symbol is a parse 
    d:7 11h!(7h$;-11h$)