shellcsvopenstreetmap

How to adjust text inside a specific column of a CSV text file as shell script?


I have a CSV source file like this one.

id;kat;kat_de;kat_en;kat_sub_de;kat_sub_en;name;strasse;plz;gemeinde;landkreis;stellplaetze;emob;linien;website;phone;betreiber;oeffnungszeiten;vbn;x_wgs84;y_wgs84;haltestelle
VW105044690;5;Kundencenter;Costumer Service Centres;retail,office,information,ticket;retail,office,information,ticket;Kunden-Servicecenter Vegesack BSAG Kundencenter;Vegesacker Bahnhofsplatz;28757;Bremen;Bremen;0;f;;https://bsag.de/de/tickets/kundencenter-servicestellen/bsag-kundencenter-vegesack.html https://bsag.de/de/tickets/kundencenter-servicestellen/bsag-kundencenter-vegesack.html;"";BSAG;"Mo-Fr 07:00-18:30; Sa 09:00-15:00; Su,PH off";f;8.628101750000006;53.16975189909493;

The 18th column contains the value of the OSM tag opening_hours. That value is typically in English.

I like to change the Abbr. of the Day of week from English to German from

Abbr.   Day of week
Mo  Monday
Tu  Tuesday
We  Wednesday
Th  Thursday
Fr  Friday
Sa  Saturday
Su  Sunday
PH off  Public Holiday

to

Abbr.   Day of week
Mo  Monday
Di  Tuesday
Mi  Wednesday
Do  Thursday
Fr  Friday
Sa  Saturday
So  Sunday
Feiertag  Public Holiday
off geschlossen

That means the target file would look like this example.

id;kat;kat_de;kat_en;kat_sub_de;kat_sub_en;name;strasse;plz;gemeinde;landkreis;stellplaetze;emob;linien;website;phone;betreiber;oeffnungszeiten;vbn;x_wgs84;y_wgs84;haltestelle
VW105044690;5;Kundencenter;Costumer Service Centres;retail,office,information,ticket;retail,office,information,ticket;Kunden-Servicecenter Vegesack BSAG Kundencenter;Vegesacker Bahnhofsplatz;28757;Bremen;Bremen;0;f;;https://bsag.de/de/tickets/kundencenter-servicestellen/bsag-kundencenter-vegesack.html https://bsag.de/de/tickets/kundencenter-servicestellen/bsag-kundencenter-vegesack.html;"";BSAG;"Mo-Fr 07:00-18:30; Sa 09:00-15:00; So,Feiertag geschlossen";f;8.628101750000006;53.16975189909493;

Any idea how to approach this issue with GNU/Linux shell script?


Solution

  • You'll most probably have to use a tool that can parse SCSV files accurately, so here's an example with miller (available here for several OSs); the code isn't very subtle but at the very least it exclusively targets the oeffnungszeiten field to do the replacements:

    mlr --csv --fs=';' put '
        $oeffnungszeiten = gssub(gssub(gssub(gssub(gssub($oeffnungszeiten,
            "Tu", "Di"),
            "We", "Mi"),
            "Th", "Do"),
            "Su", "So"),
            "PH off", "Feiertag geschlossen"
        );
    ' input_file.scsv
    

    output:

    id;kat;kat_de;kat_en;kat_sub_de;kat_sub_en;name;strasse;plz;gemeinde;landkreis;stellplaetze;emob;linien;website;phone;betreiber;oeffnungszeiten;vbn;x_wgs84;y_wgs84;haltestelle
    VW105044690;5;Kundencenter;Costumer Service Centres;retail,office,information,ticket;retail,office,information,ticket;Kunden-Servicecenter Vegesack BSAG Kundencenter;Vegesacker Bahnhofsplatz;28757;Bremen;Bremen;0;f;;https://bsag.de/de/tickets/kundencenter-servicestellen/bsag-kundencenter-vegesack.html https://bsag.de/de/tickets/kundencenter-servicestellen/bsag-kundencenter-vegesack.html;;BSAG;"Mo-Fr 07:00-18:30; Sa 09:00-15:00; So,Feiertag geschlossen";f;8.628101750000006;53.16975189909493;