I am studying APL and I am facing this problem: in real life cases, it is common to deal with non-rectangular datasets (imagine importing data from a SQL database where the "count(*) ... group by ..." gives different counts for different groups). I have setup an example where I import a time series of currencies exchange rates (both with ⎕CSV
and ⎕JSON
) and I end up with a table with lot of rows and three columns, the ISO code, the reference date and the rate. I find this table is useless in many cases, for instance if one wants to calculate daily return rates. I have manage to transform this table into a two entries table with Iso Codes in rows and dates in columns. I'd like to know if there are better ways to achieve this (or better) result.
(d h)←⎕csv 'dayliesjan.csv' '' (0 1 0 4 0 1) 1 ⍝ remove header row from data
This creates table d with rows that look like this:
├───┼──────────┼──────────┤
│FJD│2.451 │2024-01-10│
├───┼──────────┼──────────┤
│FJD│2.4514 │2024-01-29│
├───┼──────────┼──────────┤
│FJD│2.4523 │2024-01-12│
├───┼──────────┼──────────┤
Then
dates←∪d[;3] ⍝ create a vector of unique dates
isos←∪d[;1] ⍝ create a vector of unique isos
hdates←(⊂'---'),dates ⍝ prepare a header for the two entries table
PRint the two entries table
hdates⍪isos,(≢isos) (≢dates) ⍴ d[{⊃⍵}⌸d[;1 3];2]
┌───┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬
│---│2024-01-02│2024-01-03│2024-01-04│2024-01-05│2024-01-08│2024-01-09│2024-01-10│
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│AFN│76.7668 │76.7351 │77.1061 │76.8808 │77.2931 │77.1918 │77.3973 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│ALL│103.17 │103.2 │103.88 │104.3 │104.47 │104.29 │104.28 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│DZD│147.147 │146.9074 │147.3182 │146.7768 │147.0278 │146.9367 │147.0292 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│AOA│917.42 │914.384 │917.186 │914.427 │916.706 │916.444 │917.425 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│XCD│2.9581 │2.9481 │2.9573 │2.9487 │2.9554 │2.9538 │2.9554 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│ARS│886.1662 │885.2383 │888.5621 │886.6029 │889.3809 │890.4455 │891.496 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│AMD│443.13 │441.54 │443.1 │441.75 │442.66 │442.4 │443.14 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│AWG│1.9611 │1.9545 │1.9606 │1.9549 │1.9593 │1.9583 │1.9593 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
On such a table I could easily (so I think) calculate daily return rates.
Does this make sense? Is there a better way?
Really, all you want is the exchange rate for all combinations of isos and dates. This calls for an outer product:
isos∘.{⍺ ⍵}dates
┌────────────────┬────────────────┬────────────────┬────────────────┬─────────────
│┌───┬──────────┐│┌───┬──────────┐│┌───┬──────────┐│┌───┬──────────┐│┌───┬────────
││AFN│2024-01-02│││AFN│2024-01-03│││AFN│2024-01-04│││AFN│2024-01-05│││AFN│2024-01-
│└───┴──────────┘│└───┴──────────┘│└───┴──────────┘│└───┴──────────┘│└───┴────────
├────────────────┼────────────────┼────────────────┼────────────────┼─────────────
│┌───┬──────────┐│┌───┬──────────┐│┌───┬──────────┐│┌───┬──────────┐│┌───┬────────
││ALL│2024-01-02│││ALL│2024-01-03│││ALL│2024-01-04│││ALL│2024-01-05│││ALL│2024-01-
│└───┴──────────┘│└───┴──────────┘│└───┴──────────┘│└───┴──────────┘│└───┴────────
├────────────────┼────────────────┼────────────────┼────────────────┼─────────────
│┌───┬──────────┐│┌───┬──────────┐│┌───┬──────────┐│┌───┬──────────┐│┌───┬────────
││DZD│2024-01-02│││DZD│2024-01-03│││DZD│2024-01-04│││DZD│2024-01-05│││DZD│2024-01-
│└───┴──────────┘│└───┴──────────┘│└───┴──────────┘│└───┴──────────┘│└───┴────────
├────────────────┼────────────────┼────────────────┼────────────────┼─────────────
For each combination, we want the row number in d
:
isos∘.{d[;1 3]⍳⍺ ⍵}dates
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35
36 37 38 39 40 41 42
43 44 45 46 47 48 49
50 51 52 53 54 55 56
Now it is trivial to get the rate:
isos∘.{d[d[;1 3]⍳⍺ ⍵;2]}dates
76.7668 76.7351 77.1061 76.8808 77.2931 77.1918 77.3973
103.17 103.2 103.88 104.3 104.47 104.29 104.28
147.147 146.9074 147.3182 146.7768 147.0278 146.9367 147.0292
917.42 914.384 917.186 914.427 916.706 916.444 917.425
2.9581 2.9481 2.9573 2.9487 2.9554 2.9538 2.9554
886.1662 885.2383 888.5621 886.6029 889.3809 890.4455 891.496
443.13 441.54 443.1 441.75 442.66 442.4 443.14
1.9611 1.9545 1.9606 1.9549 1.9593 1.9583 1.9593
From here, you can add the headers if you want, but that'll just make usage of the table harder.
You can maybe make the code easier to follow through naming:
isos_dates←d[;1 3]
rates←d[;2]
isos∘.{rates[isos_dates⍳⍺ ⍵]}dates
76.7668 76.7351 77.1061 76.8808 77.2931 77.1918 77.3973
Or even creating a custom lookup function:
RateOn←{rates[isos_dates⍳⍺ ⍵]}
isos ∘.RateOn dates