I am using miller , to work with .csv files.
There is an issue though that I dont manage how to get around.
So I have 2 .csv files.
product-feeder.csv
name-manufacturer,id-product,name-product,id-manufacturer,id-supplier,image-posted
MANUFACTURER ONE,0,PRODUCT A,0,0,0
MANUFACTURER TWO,0,PRODUCT B,0,0,0
MANUFACTURER ONE,0,UNMATCHED PRODUCT ONE,0,0,0
UNKOWN MANUFACTURER,0,UNMATCHED PRODUCT TWO,0,0,0
and
manufacturer-feeder.csv
id-manufacturer,name-manufacturer,link-rewrite,image-posted
114,MANUFACTURER ONE,manu-one,1
116,MANUFACTURER TWO,manu-two,1
I just want to change the values on the column of product-feeder.csv , ${id-manufacturer} , and place them according to what it matches on manufacturer-feeder.csv , the ${name-manufacturer} of product-feeder.csv that ${id-manufacturer} having as a result
name-manufacturer,id-product,name-product,id-manufacturer,id-supplier,image-posted
MANUFACTURER ONE,0,PRODUCT A,114,0,0
MANUFACTURER TWO,0,PRODUCT B,116,0,0
MANUFACTURER ONE,0,UNMATCHED PRODUCT ONE,114,0,0
UNKOWN MANUFACTURER,0,UNMATCHED PRODUCT TWO,0,0,0
I think in the Spreadsheet world this function would be called VLOOKUP
Do you know how to tackle this with miller ?
A solution is to use the join verb
Running
mlr --csv join --ul \
-j name-manufacturer \
--rp "r_" \
-f product-feeder.csv \
then unsparsify \
then put '${id-manufacturer}=${r_id-manufacturer}' \
then cut -x -r -f "^r_.+" \
manufacturer-feeder.csv
+---------------------+------------+-----------------------+-----------------+-------------+--------------+
| name-manufacturer | id-product | name-product | id-manufacturer | id-supplier | image-posted |
+---------------------+------------+-----------------------+-----------------+-------------+--------------+
| MANUFACTURER ONE | 0 | PRODUCT A | 114 | 0 | 0 |
| MANUFACTURER ONE | 0 | UNMATCHED PRODUCT ONE | 114 | 0 | 0 |
| MANUFACTURER TWO | 0 | PRODUCT B | 116 | 0 | 0 |
| UNKOWN MANUFACTURER | 0 | UNMATCHED PRODUCT TWO | | 0 | 0 |
+---------------------+------------+-----------------------+-----------------+-------------+--------------+
Explanation:
--ul) is performed on name-manufacturer, keeping allmanufacturer-feeder.csv.product-feeder.csv are prefixed with r_ to avoidunsparsify normalizes records after the join.r_id-manufacturer is copied into id-manufacturer.r_ fields are removed at the end.