csvhaskellhaskell-pipes

Reading first row from a csv file with pipes-csv


I am reading a csv file with pipes-csv library. I want to read first line and read the rest later. Unfortunately after Pipes.Prelude.head function returns. pipe is being closed somehow. Is there a way to read head of the csv first and read the rest later.

import qualified Data.Vector as V
import Pipes
import qualified Pipes.Prelude as P
import qualified System.IO as IO
import qualified Pipes.ByteString as PB
import qualified Data.Text as Text
import qualified Pipes.Csv as PCsv
import Control.Monad (forever)

showPipe :: Proxy () (Either String (V.Vector Text.Text)) () String IO b
showPipe = forever $ do
    x::(Either String (V.Vector Text.Text)) <- await
    yield $ show x


main :: IO ()
main = do
  IO.withFile "./test.csv"
              IO.ReadMode
              (\handle -> do
                  let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
                  headers <- P.head producer
                  putStrLn "Header"
                  putStrLn $ show headers
                  putStrLn $ "Rows"
                  runEffect ( producer>->
                              (showPipe) >->
                              P.stdoutLn)
               )

If we do not read the header first, we can read whole csv without any problem:

main :: IO ()
main = do
  IO.withFile "./test.csv"
              IO.ReadMode
              (\handle -> do
                  let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
                  putStrLn $ "Rows"
                  runEffect ( producer>->
                              (showPipe) >->
                              P.stdoutLn)
               )

Solution

  • Pipes.Csv has material for handling headers, but I think that this question is really looking for a more sophisticated use of Pipes.await or else Pipes.next. First next:

    >>> :t Pipes.next 
    Pipes.next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))
    

    next is the basic way of inspecting a producer. It is sort of like pattern matching on a list. With a list the two possibilities are [] and x:xs - here they are Left () and Right (headers, rows). The latter pair is what you are looking for. Of course an action (here in IO) is needed to get one's hands on it:

    main :: IO ()
    main = do
      handle <- IO.openFile  "./test.csv" IO.ReadMode
      let producer :: Producer (V.Vector Text.Text) IO ()
          producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle)  >-> P.concat
      e <- next producer
      case e of
        Left () -> putStrLn "No lines!"
        Right (headers, rows) -> do
          putStrLn "Header"
          print headers
          putStrLn $ "Rows"
          runEffect ( rows >-> P.print)
      IO.hClose handle
    

    Since the Either values are distraction here, I eliminate Left values - the lines that don't parse - with P.concat

    next does not act inside a pipeline, but directly on the Producer, which it treats as a sort of "effectful list" with a final return value at the end. The particular effect we got above can of course be achieved with await, which acts inside a pipeline. I can use it to intercept the first item that comes along in a pipeline, do some IO based on it, and then forward the remaining elements:

    main :: IO ()
    main = do
      handle <- IO.openFile  "./grades.csv" IO.ReadMode
      let producer :: Producer (V.Vector Text.Text) IO ()
          producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle)  >-> P.concat
          handleHeader :: Pipe (V.Vector Text.Text) (V.Vector Text.Text) IO ()
          handleHeader = do
            headers <- await  -- intercept first value
            liftIO $ do       -- use it for IO
              putStrLn "Header"
              print headers
              putStrLn $ "Rows"
            cat               -- pass along all later values
      runEffect (producer >-> handleHeader >-> P.print)
      IO.hClose handle
    

    The difference is just that if producer is empty, I won't be able to declare this, as I do with No lines! in the previous program.

    Note by the way that showPipe can be defined as P.map show, or simply as P.show (but with the specialized type you add.)