I am trying to create uniqueid
column for each child (as they appear more than once in the table) so that I can identify between children with same name (no child has the same dadname
, mumname
and childdob
)
q)group select dadname, mumname, childdob from peopletb
dadname mumname childdob |
----------------------------------------------| -----------------
simon jessica 2007.03.27| 0 1658 2292 18295
mike rachel 2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..
james lucy 2005.04.09| 2 12756 21354
francis anne 2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
francis anne 2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..
mike rachel 2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..
and so on
The issue is, is that there are some slight discrepancies in the childdob
where it has been incremented by one day or the opposite it has gone back one day.
when grouping the data i want it to group childdob
+1 or -1 from each other so that the discrepancy is eliminated.
e.g.
mike rachel 2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..
&
mike rachel 2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..
would have the same uniqueid
as would
francis anne 2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
&
francis anne 2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..
There's a few tricky edge cases when you don't know the true dob, and if you have datapoints both above and below, but something like this might be a starting point:
peopletb:([]dadname:100?`simon`mike`james;mumname:100?`jessica`rachel`lucy;childdob:100?2007.01.03 2007.01.02 2007.01.04 2007.06.01 2007.08.10 2007.08.09 2007.10.23 2007.10.24)
/default grouping
q)select idx:i by dadname,mumname,childdob from `childdob xasc peopletb
dadname mumname childdob | idx
--------------------------| --------------
james jessica 2007.01.02| ,9
james jessica 2007.01.04| ,26
james jessica 2007.08.09| 56 58 59
james jessica 2007.08.10| 75 77
james jessica 2007.10.24| ,96
james lucy 2007.01.02| ,3
james lucy 2007.01.03| 13 21 24
...
/step 1 - fudge the date
q)fudge:update childdob:{(y;x)2>y-x}\[childdob]by dadname,mumname from `childdob xasc peopletb;
/step 2 - group the fudged dates
q)select idx:i by dadname,mumname,childdob from fudge
dadname mumname childdob | idx
--------------------------| --------------------------
james jessica 2007.01.02| ,9
james jessica 2007.01.04| ,26
james jessica 2007.08.09| 56 58 59 75 77
james jessica 2007.10.24| ,96
james lucy 2007.01.02| 3 13 21 24
james lucy 2007.01.04| ,34
james lucy 2007.08.09| 50 51 55 60 65 66 70 76 78
...
Notice how james/jessica combined 2007.08.09/10 into 09 but didn't combine 2007.01.02/04 (ambiguous, doesn't satisfy 2>y-x). However you could increase the fudge to 3>y-x
for more leniency