lualpeg

Can I create a gmatch pattern that returns a variadic number of values?


I need to iterate over some pairs of strings in a program that I am writing. Instead of putting the string pairs in a big table-of-tables, I am putting them all in a single string, because I think the end result is easier to read:

function two_column_data(data)
  return data:gmatch('%s*([^%s]+)%s+([^%s]+)%s*\n')
end

for a, b in two_column_data [[
  Hello  world
  Olá    hugomg
]] do
  print( a .. ", " .. b .. "!")
end

The output is what you would expect:

Hello, world!
Olá, hugomg!

However, as the name indicates, the two_column_data function only works if there are two exactly columns of data. How can I make it so it works on any number of columns?

for x in any_column_data [[
  qwe
  asd
]] do
  print(x)
end

for x,y,z in any_column_data [[
  qwe rty uio
  asd dfg hjk
]] do
  print(x,y,z)
end

I'm OK with using lpeg for this task if its necessary.


Solution

  • Here is an lpeg re version

    function re_column_data(subj)
        local t, i = re.compile([[
              record <- {| ({| [ %t]* field ([ %t]+ field)* |} (%nl / !.))* |}
              field <- escaped / nonescaped
              nonescaped <- { [^ %t"%nl]+ }
              escaped <- '"' {~ ([^"] / '""' -> '"')* ~} '"']], { t = '\t' }):match(subj)
        return function()
            local ret 
            i, ret = next(t, i)
            if i then
                return unpack(ret)
            end
        end
    end
    

    It basicly is a redo of the CSV sample and supports quoted fields for some nice use-cases: values with spaces, empty values (""), multi-line values, etc.

    for a, b, c in re_column_data([[
        Hello  world "test
    test"
        Olá    "hug omg"
    ""]].."\tempty a") do
        print( a .. ", " .. b .. "! " .. (c or ''))
    end