stringfile-iofortrandata-conversionfortran2003

Reading a file of lists of integers in Fortran


I would like to read a data file with a Fortran program, where each line is a list of integers.

Each line has a variable number of integers, separated by a given character (space, comma...).

Sample input:

1,7,3,2
2,8
12,44,13,11

I have a solution to split lines, which I find rather convoluted:

module split
    implicit none
contains
    function string_to_integers(str, sep) result(a)
        integer, allocatable :: a(:)
        integer :: i, j, k, n, m, p, r
        character(*) :: str
        character :: sep, c
        character(:), allocatable :: tmp

        !First pass: find number of items (m), and maximum length of an item (r)
        n = len_trim(str)
        m = 1
        j = 0
        r = 0
        do i = 1, n
            if(str(i:i) == sep) then
                m = m + 1
                r = max(r, j)
                j = 0
            else
                j = j + 1
            end if
        end do
        r = max(r, j)

        allocate(a(m))
        allocate(character(r) :: tmp)

        !Second pass: copy each item into temporary string (tmp),
        !read an integer from tmp, and write this integer in the output array (a)
        tmp(1:r) = " "
        j = 0
        k = 0
        do i = 1, n
            c = str(i:i)
            if(c == sep) then
                k = k + 1
                read(tmp, *) p
                a(k) = p
                tmp(1:r) = " "
                j = 0
            else
                j = j + 1
                tmp(j:j) = c
            end if
        end do
        k = k + 1
        read(tmp, *) p
        a(k) = p
        deallocate(tmp)
    end function
end module

My question:

Here is the current program:

program read_data
    use split
    implicit none
    integer :: q
    integer, allocatable :: a(:)
    character(80) :: line
    open(unit=10, file="input.txt", action="read", status="old", form="formatted")
    do
        read(10, "(A80)", iostat=q) line
        if(q /= 0) exit
        if(line(1:1) /= "#") then
            a = string_to_integers(line, ",")
            print *, ubound(a), a
        end if
    end do
    close(10)
end program

A comment about the question: usually I would do this in Python, for example converting a line would be as simple as a = [int(x) for x in line.split(",")], and reading a file is likewise almost a trivial task. And I would do the "real" computing stuff with a Fortran DLL. However, I'd like to improve my Fortran skills on file I/O.


Solution

  • I don't claim it is the shortest possible, but it is much shorter than yours. And once you have it, you can reuse it. I don't completely agree with these claims how Fotran is bad at string processing, I do tokenization, recursive descent parsing and similar stuff just fine in Fortran, although it is easier in some other languages with richer libraries. Sometimes you can use the libraries written in other languages (especially C and C++) in Fortran too.

    If you always use the comma you can remove the replacing by comma and thus shorten it even more.

    function string_to_integers(str, sep) result(a)
        integer, allocatable :: a(:)
        character(*) :: str
        character :: sep
        integer :: i, n_sep
    
        n_sep = 0
        do i = 1, len_trim(str)
          if (str(i:i)==sep) then
            n_sep = n_sep + 1
            str(i:i) = ','
           end if
        end do
        allocate(a(n_sep+1))
        read(str,*) a
    end function
    

    Potential for shortening: view the str as a character array using equivalence or transfer and use count() inside of allocate to get the size of a.

    The code assumes that there is just one separator between each number and there is no separator before the first one. If multiple separators are allowed between two numbers, you have to check whether the preceding character is a separator or not

        do i = 2, len_trim(str)
          if (str(i:i)==sep .and. str(i-1:i-1)/=sep) then
            n_sep = n_sep + 1
            str(i:i) = ','
           end if
        end do