c++cmatlabcsvfile-io

c++ program for reading an unknown size csv file (filled only with floats) with constant (but unknown) number of columns into an array


was wondering if someone could give me a hand im trying to build a program that reads in a big data block of floats with unknown size from a csv file. I already wrote this in MATLAB but want to compile and distribute this so moving to c++.

Im just learning and trying to read in this to start

7,5,1989
2,4,2312

from a text file.

code so far.

// Read in CSV
//
// Alex Byasse

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <sstream>
#include <stdlib.h>

int main() {

    unsigned int number_of_lines = 0;
    FILE *infile = fopen("textread.csv", "r");
    int ch;
    int c = 0;
    bool tmp = true;
    while (EOF != (ch=getc(infile))){
      if(',' == ch){
    ++c;
      }
      if ('\n' == ch){
    if (tmp){
      int X = c;
      tmp = false;
    }
            ++number_of_lines;
    }
    }
    fclose(infile);

  std::ifstream file( "textread.csv" );

  if(!file){
    std:cerr << "Failed to open File\n";
    return 1;
  }

  const int ROWS = X;
  const int COLS = number_of_lines;
  const int BUFFSIZE = 100;
  int array[ROWS][COLS];
  char buff[BUFFSIZE];
  std::string line; 
  int col = 0;
  int row = 0;
  while( std::getline( file, line ) )
  {
    std::istringstream iss( line );
    std::string result;
    while( std::getline( iss, result, ',' ) )
      {
        array[row][col] = atoi( result.c_str() );
        std::cout << result << std::endl;
        std::cout << "column " << col << std::endl;
        std::cout << "row " << row << std::endl;
        col = col+1;
    if (col == COLS){
    std:cerr << "Went over number of columns " << COLS;
    }
      }
    row = row+1;
    if (row == ROWS){
      std::cerr << "Went over length of ROWS " << ROWS;
    }
    col = 0;
  }
  return 0;
}

My matlab code i use is >>

fid = fopen(twoDM,'r');

s = textscan(fid,'%s','Delimiter','\n');
s = s{1};
s_e3t = s(strncmp('E3T',s,3));
s_e4q = s(strncmp('E4Q',s,3));
s_nd = s(strncmp('ND',s,2));

[~,cell_num_t,node1_t,node2_t,node3_t,mat] = strread([s_e3t{:}],'%s %u %u %u %u %u');
node4_t = node1_t;
e3t = [node1_t,node2_t,node3_t,node4_t];
[~,cell_num_q,node1_q,node2_q,node3_q,node_4_q,~] = strread([s_e4q{:}],'%s %u %u %u %u %u %u');
e4q = [node1_q,node2_q,node3_q,node_4_q];
[~,~,node_X,node_Y,~] = strread([s_nd{:}],'%s %u %f %f %f');

cell_id = [cell_num_t;cell_num_q];
[~,i] = sort(cell_id,1,'ascend');

cell_node = [e3t;e4q];
cell_node = cell_node(i,:);

Any help appreciated. Alex


Solution

  • I would, obviously, just use IOStreams. Reading a homogeneous array or arrays from a CSV file without having to bother with any quoting is fairly trivial:

    #include <iostream>
    #include <sstream>
    #include <string>
    #include <vector>
    
    std::istream& comma(std::istream& in)
    {
        if ((in >> std::ws).peek() != std::char_traits<char>::to_int_type(',')) {
            in.setstate(std::ios_base::failbit);
        }
        return in.ignore();
    }
    
    int main()
    {
        std::vector<std::vector<double>> values;
        std::istringstream in;
        for (std::string line; std::getline(std::cin, line); )
        {
            in.clear();
            in.str(line);
            std::vector<double> tmp;
            for (double value; in >> value; in >> comma) {
                tmp.push_back(value);
            }
            values.push_back(tmp);
        }
    
        for (auto const& vec: values) {
            for (auto val: vec) {
                std::cout << val << ", ";
            }
            std::cout << "\n";
        }
    }
    

    Given the simple structure of the file, the logic can actually be simplified: Instead of reading the values individually, each line can be viewed as a sequence of values if the separators are read automatically. Since a comma won't be read automatically, the commas are replaced by spaced before creating the string stream for the internal lines. The corresponding code becomes

    #include <algorithm>
    #include <fstream>
    #include <iostream>
    #include <iterator>
    #include <sstream>
    #include <string>
    #include <vector>
    
    int main()
    {
        std::vector<std::vector<double> > values;
        std::ifstream fin("textread.csv");
        for (std::string line; std::getline(fin, line); )
        {
            std::replace(line.begin(), line.end(), ',', ' ');
            std::istringstream in(line);
            values.push_back(
                std::vector<double>(std::istream_iterator<double>(in),
                                    std::istream_iterator<double>()));
        }
    
        for (std::vector<std::vector<double> >::const_iterator
                 it(values.begin()), end(values.end()); it != end; ++it) {
            std::copy(it->begin(), it->end(),
                      std::ostream_iterator<double>(std::cout, ", "));
            std::cout << "\n";
        }
    }
    

    Here is what happens:

    1. The destination values is defined as a vector of vectors of double. There isn't anything guaranteeing that the different rows are the same size but this is trivial to check once the file is read.
    2. An std::ifstream is defined and initialized with the file. It may be worth checking the file after construction to see if it could be opened for reading (if (!fin) { std::cout << "failed to open...\n";).
    3. The file is processed one line at a time. The lines are simply read using std::getline() to read them into a std::string. When std::getline() fails it couldn't read another line and the conversion ends.
    4. Once the line is read, all commas are replaced by spaces.
    5. From the thus modified line a string stream for reading the line is constructed. The original code reused a std::istringstream which was declared outside the loop to save the cost of constructing the stream all the time. Since the stream goes bad when the lines is completed, it first needed to be in.clear()ed before its content was set with in.str(line).
    6. The individual values are iterated using an std::istream_iterator<double> which just read a value from the stream it is constructed with. The iterator given in is the start of the sequence and the default constructed iterator is the end of the sequence.
    7. The sequence of values produced by the iterators is used to immediately construct a temporary std::vector<double> representing a row.
    8. The temporary vector is pushed to the end of the target array.

    Everything after that is simply printing the content of the produced matrix using C++11 features (range-based for and variables with automatically deduced type).