c++regexsplit

C++: Parsing a log with split but one entry can have several lines


I am started to learn C++, and my current project should extend my knowledge in using files, split and finally do a regexp on a varchar string.

The problem:

I have a logfile which contains data like

<date> <time> <username> (<ip:port>) <uuid> - #<id> "<varchar text>"

e.g.:

10.03.2016 07:40:38: blacksheep (127.0.0.1:54444) #865 "(this can have text
over several lines 
without ending marker"
10.03.2016 07:40:38: blacksheep (127.0.0.1:54444) #865 "A new line, just one without \n"

So I am starting with the following but I am stuck now with how to get the lines with \n into the string. How can this be solved the right way without unnecessary steps like splitting several times and how can I define where a complete line (even if it's having some \n within) stops?

With fin.ignore(80, '\n');, \ns are being ignored, but this implicates that I will only have one line... Short text before # and a very large string after :-|

#include <iostream>
#include <fstream>
#include <string>
#include <vector>

std::vector<std::string> split(std::string str, char separator) {
   std::vector<std::string> result;
   std::string::size_type token_offset = 0;
   std::string::size_type separator_offset = 0;
   while (separator_offset != std::string::npos) {
      separator_offset = str.find(separator, separator_offset);
      std::string::size_type token_length;
      if(separator_offset == std::string::npos) {
         token_length = separator_offset;
      } else {
         token_length = separator_offset - token_offset;
         separator_offset++;
      }
      std::string token = str.substr(token_offset, token_length);
      if (!token.empty()) {
         result.push_back(token);
      }
      token_offset = separator_offset;
   }
   return result;
}

int main(int argc, char **argv) {
   std::fstream fin("input.dat");
   while(!fin.eof()) {
      std::string line;
      getline(fin, line, ';');
      fin.ignore(80, '\n'); 
      std::vector<std::string> strs = split(line, ',');
      for(int i = 0; i < strs.size(); ++i) {
         std::cout << strs[i] << std::endl;
      }
   }
   fin.close();
   return 0;
}

Regards


Solution

  • There is no canned C++ library function for swallowing input like that. std::getline reads the next line of text, up until the next newline character (by default). That's it. std::getline does not do any further examination on the input, beyond that.

    I will suggest the following approach for you.

    Some obvious optimizations are possible here, of course, but this should be a good start.