Based on provided example, we can get length of each line
import Conduit
import Data.Text (Text, pack)
import Text.Regex.TDFA ((=~), getAllTextMatches)
import Control.Monad.IO.Class (liftIO)
wc :: IO ()
wc = runResourceT
$ runConduit
$ sourceFile "input.txt"
.| decodeUtf8C
.| peekForeverE (lineC lengthCE >>= liftIO . print)
However, how would I get all matches based on regex? and in the end write them to a file?
regex :: IO ()
regex = runResourceT
$ runConduit
$ sourceFile "input.txt"
.| decodeUtf8C
.| do
line <- mapCE (\l -> getAllTextMatches (l =~ "^foo") :: [Text])
liftIO $ print $ line
Update:
Figured out there's built-in lines
function, but is there a way to print a line and pass it along without consuming it?
grep :: IO ()
grep = runResourceT
$ runConduit
$ yield "foo\ndoo"
.| decodeUtf8C
.| Data.Conduit.Text.lines
.| mapC (\a -> a =~ ("[fd]oo" :: Text))
.| mapM_C (liftIO . (print :: Text -> IO ()))
.| encodeUtf8C
.| stdoutC
The above does print per line, but stdoutC
ends up being not consumed
ghci> grep
"foo"
"doo"
Update 2: Figured out how to print in a pipeline
grep :: IO ()
grep = runResourceT
$ runConduit
$ yieldMany ["foo\ndoo", "\nduh"]
.| decodeUtf8C
.| Data.Conduit.Text.lines
.| mapC (\a -> a =~ ("[fd]oo" :: Text) :: Text)
.| log1
.| unlinesC
.| encodeUtf8C
.| stdoutC
But why does order of await
matters?
log1 :: ConduitT Text Text (ResourceT IO) ()
log1 = do
Just l <- await -- <- has to be first
liftIO $ print l
yield l
It's not that clear from your question what you're trying to do, but if you are trying to copy all matching lines from "input.txt"
to "output.txt"
, kind of like the grep
command line utility, then you probably want a conduit that looks something like this:
sourceFile "input.txt" .| decodeUtf8C .| linesUnboundedC
.| filterC (=~ ("[fd]oo" :: Text))
.| unlinesC .| encodeUtf8C .| sinkFile "output.txt"
Note that linesUnboundedC
is a function in the "conduit" package that's equivalent to the deprecated lines
function from "conduit-extra". Also, using filterC
here is probably more natural than your mapC
for filtering matching lines, rather than generating empty matches.
Operating on the text file:
A famous linguist once said
that of all the phrases in the English language,
of all the endless combinations of words in all of history, that
"cellar door"
is the most beautiful.
That's some food for thought.
this conduit will copy the two matching lines to the output:
"cellar door"
That's some food for thought.
If you want to write the matching lines to both standard output and output.txt
simultaneously, the conduit-friendly method is probably to end your conduit with a sequenceSinks
component. (The void
call here is needed to get the return type right.)
import Control.Monad (void)
... .| void (sequenceSinks [stdoutC, sinkFile "output.txt"])
If you prefer a log
conduit that you can insert in the middle to write a copy to stdout
, then the following ought to work:
log1 :: (MonadIO m) => ConduitT Text Text m ()
log1 = passthroughSink (unlinesC .| encodeUtf8C .| stdoutC) pure
or, if you're okay with having the Haskell quoted representations printed (i.e., surrounded by quotation marks with character escaping), then:
log2 :: (MonadIO m, Show a) => ConduitT a a m ()
log2 = passthroughSink printC pure
Some code to play around with:
{-# LANGUAGE OverloadedStrings #-}
import Conduit
import Control.Monad (void)
import Data.ByteString (ByteString)
import Data.Text (Text)
import Text.Regex.TDFA
c1, c2, c3, c4 :: ConduitT () Void (ResourceT IO) ()
-- copy matching lines from input.txt to output.txt
c1 = sourceFile "input.txt" .| decodeUtf8C .| linesUnboundedC
.| filterC (=~ ("[fd]oo" :: Text))
.| unlinesC .| encodeUtf8C .| sinkFile "output.txt"
-- copy final stream to both output.txt and stdout
c2 = sourceFile "input.txt" .| decodeUtf8C .| linesUnboundedC
.| filterC (=~ ("[fd]oo" :: Text))
.| unlinesC .| encodeUtf8C
.| void (sequenceSinks [stdoutC, sinkFile "output.txt"])
-- log Text to stdout in the middle of a conduit
c3 = sourceFile "input.txt" .| decodeUtf8C .| linesUnboundedC
.| filterC (=~ ("[fd]oo" :: Text)) .| log1
.| unlinesC .| encodeUtf8C .| sinkFile "output.txt"
where log1 :: (MonadIO m) => ConduitT Text Text m ()
log1 = passthroughSink (unlinesC .| encodeUtf8C .| stdoutC) pure
-- log Haskell representations of stream in middle of a conduit
c4 = sourceFile "input.txt" .| decodeUtf8C .| linesUnboundedC
.| filterC (=~ ("[fd]oo" :: Text)) .| log2
.| unlinesC .| encodeUtf8C .| sinkFile "output.txt"
where log2 :: (MonadIO m, Show a) => ConduitT a a m ()
log2 = passthroughSink printC pure
main :: IO ()
main = runResourceT $ runConduit $ c4 -- pick your conduit here