sqldatabasecommand-lineshortcut

Run SQL query on CSV file contents from command-line?


I have a (syntactically-valid, not-huge) CSV file. I want to run an arbitrary SQL SELECT query on it; let's assume that it can be a complex query, with some inner queries, with self-joins, maybe with window functions etc.

Now, obviously, I could install a DBMS, run the server process, create a new DB, maybe create an appropriate table, load that CSV data into a table, then finally use a client for the DB to send the query and get the result.

But - that seems like quite a hassle, and I don't need the loaded DB table again for another query.

So, is there an easier way to do this, from a command-line/shell session, hopefully without so much overhead and waste of space?

Notes:


Solution

  • You can use sqlite for this:

    sqlite3 :memory: -cmd '.import -csv taxi.csv taxi' \
        'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'
    

    There are many wrappers that try to simplify this (e.g. csvsql --query, dsq or q), but you may find they have limitations of their own (e.g. q doesn't support CTEs, though sqlite obiously does).

    duckdb, as an analytic-focused alternative to sqlite, makes this use-case very simple and fast out of the box, while providing some flexibility too:

    duckdb -c "SELECT * FROM 'data_source.csv.gz'"