csvluaelementnamed

reading delimited text file into Lua table


I have a file, something of this sort (CSV, TSV: delimited text):

"col1","col2","col3"
"1","ab","6.5"
"4","df","9.7"

I would like to read it to the Lua's table with named fields (like Python's list of dictionaries). Something of this sort:

{
  {'col1'='1', 'col2'='ab', 'col3'='6.5'},
  {'col1'='4', 'col2'='df', 'col3'='9.7'}
}

In the end I should be able to access the data in the following way:

t[1].col1
t[1].col2
or
t[2].col3

I cannot get my head around Lua's syntax on this matter. Unfortunately various modules working with CSV are not really usable for me in this case. Because it will not be supported by application. It should be Lua's simple solution. Here is my attempt, but it does not work.

    local function split(str, sep)
       local t={}
       for s in string.gmatch(str, "([^"..sep.."]+)") do
          table.insert(t,s)
       end
       return t
    end

    local t_line = {}
    local t_row = {}
    local t_header = {}
    local t_target = {}
    local i = 1
    for line in io.lines('some_file.txt') do
        t_line = split(line, ',')
        for j=1, #t_line do
            if i == 1 then -- first line of the file
                -- read header into table
                table.insert(t_header, t_line[j])
            else
                -- this suppose to tell that t_header[j] is the key
                -- and t_line[j] is its value
                table.insert(t_row, [t_header[j]] = t_line[j])
            end
        end
        -- here I add table as a row into main table
        if i > 1 then
            table.insert(t_target, t_row)
        end
        i = i + 1
    end

This part does not work:

table.insert(t_row, [t_header[j]] = t_line[j])

Execution of the script throws an error:

lua: ./example_10.lua:28: unexpected symbol near '['

but I tried to remove square brackets and it starts complaining about equal sign. Here is the way that line works:

table.insert(t_row, t_line[j])

but then I will not able to call element by key name only by index. How to add value to the table and name the key by column name?


Solution

  • table.insert needs a value to insert into the table, but [t_header[j]] = t_line[j] is not a value; this looks a like a misapplication of table constructor syntax. In any case this is a syntax violation: you can only use table constructor syntax in a table constructor. It sort of looks like an assignment statement, which might confuse someone who comes from C which has assignment expressions which evaluate to the value of the assignment.

    The OP code could most simply be fixed by changing:

    table.insert(t_row, [t_header[j]] = t_line[j])
    

    to:

    t_row[t_header[j]] = t_line[j]
    

    This sets the fields of t_row to the right values before inserting the rows into the t_target table. There is no need for table.insert here. To complete the fix, the line local t_row = {} would also need to be moved inside the outer loop so that a new row table is created for each line.

    Still, the design of the posted code seems a little more complicated than it needs to be. Here is an adjusted version that uses the OP posted split function. The parse_csv function parses an input file into a table. Here the iterator returned by io.lines is saved in lines and called once to get the headers first, then the iterator is called in a loop to get the remainder of the lines from the file.

    I have added a simple csv_dump function to print the contents of the parsed csv table.

    function parse_csv(f)
       local parsed_csv = {}
       local lines = io.lines(f)
       local labels = split(lines(), ",")
       for line in lines do
          local t_line = split(line, ",")
          local t_row = {}
          for k, v in ipairs(t_line) do
             t_row[labels[k]] = v
          end
          table.insert(parsed_csv, t_row)
       end
       return parsed_csv
    end
    
    function csv_dump(t)
       for i, row in ipairs(t) do
          io.write(i)
          io.write(" { ")
          for k, v in pairs(row) do
             io.write(k, "=", v, " ")
          end
          print("}")
       end
    end
    

    Sample usage:

    > parsed = parse_csv('csv_test.txt')
    > csv_dump(parsed)
    1 { "col3"="6.5" "col2"="ab" "col1"="1" }
    2 { "col3"="9.7" "col2"="df" "col1"="4" }