csvf#type-providersquery-expressions

Assign indexes to read CSV rows in F# query


What is the easiest way to add indexes to the read contents of a CSV/TSV file read in with CsvProvider and a query expression?

I have a tab separated file that contains thousands of orders that regularly need to be read, and the relevant orders are the ones that are the most recent not written to a certain database. The orders are not indexed and have no timestamps, so I have to cross reference to see which orders have not yet been written to the database. I would like to index these so I can find the newest order not written to the DB and then select all rows including and after that (the file is written to sequentially by a 3rd party so the newest orders will be the lines furthest down in the file), but I don't see very simple way to do this in a single query expression so far.

let data = new CsvProvider<fileLocation>()
let allOrders = query {
    for row in data.Rows do
    select row (*perhaps something like a "select (index, row)" here?*)
    (*how do I increment the index in the expression?*)
}

How would I index these as such?


Solution

  • You can use Seq.indexed to transform the sequence data.Rows to a sequence of tuples, where first element is the zero-based index and the second element is the row:

    let allOrders = query {
        for index, row in Seq.indexed data.Rows do
        where (index < threshold)
        select row
    }
    

    For illustration of how Seq.indexed works:

    > let xs = ["a"; "b"; "c"; "d"]
    > Seq.indexed xs
    val it : seq<int * string> = seq [(0, "a"); (1, "b"); (2, "c"); (3, "d")]