haskellvectorinputintegerconstruction

Haskell fast construction of vector from input of space separated ints


If I expect a string with space separated ints from user input (number of expected int's known), I can transfer it in haskell Data.Vector like this:

main = do
        n <- readLn -- this is number of ints        
        l' <- getLine
        let a = (read :: String -> Int) <$> words l' 
        return $ V.fromList a

But I want to do this as fast as possible, w/o intermediate list, what is a fastest way to construct Vector in this case?

ps. I am think about looping getChar + replicateM but not shure will it be faster or not.


Solution

  • The following should be pretty fast.

    {-# LANGUAGE BangPatterns #-}
    
    import Data.Char
    import qualified Data.Vector.Unboxed as V
    import qualified Data.ByteString.Lazy as BL
    
    numbers :: Int -> BL.ByteString -> V.Vector Int
    numbers n = V.unfoldrExactN n $ go 0
      where go !acc bs = case BL.uncons bs of
              Just (c, bs') | c' >= zero && c' <= nine -> go (acc*10+c'-zero) bs'
                            | otherwise -> (acc, bs')
                where c' = fromIntegral c
                      zero = ord '0'
                      nine = ord '9'
              Nothing -> (acc, BL.empty)
    
    main = do
      n <- readLn
      inp <- BL.getContents
      print $ V.sum (numbers n inp)
    

    Note that this version does no input validation. It just parses runs of digits separated by single non-digit delimiters, and it'll misread repeated delimiters as additional zero values.

    Compiled with GHC 8.10.4 on my machine, it processes about half a gigabyte per second and outperforms a similar GCC-compiled C version by about 10%, which is a little suspect, but I don't see anything obviously wrong with my C version:

    #include <stdio.h>
    
    long v[100000000];
    
    int
    main()
    {
            long n;
            scanf("%ld\n", &n);
            for (int i = 0; i < n; ++i) {
                    long acc = 0;
                    int c;
                    while ((c=getchar())>='0')
                            acc = acc*10+c-'0';
                    v[i] = acc;
            }
            long total = 0;
            for (int i = 0; i < n; ++i) {
                    total += v[i];
            }
            printf("%ld\n",total);
            return 0;
    }