I am started to learn C++, and my current project should extend my knowledge in using files, split and finally do a regexp on a varchar string.
The problem:
I have a logfile which contains data like
<date> <time> <username> (<ip:port>) <uuid> - #<id> "<varchar text>"
e.g.:
10.03.2016 07:40:38: blacksheep (127.0.0.1:54444) #865 "(this can have text
over several lines
without ending marker"
10.03.2016 07:40:38: blacksheep (127.0.0.1:54444) #865 "A new line, just one without \n"
So I am starting with the following but I am stuck now with how to get the lines with \n
into the string. How can this be solved the right way without unnecessary steps like splitting several times and how can I define where a complete line (even if it's having some \n
within) stops?
With fin.ignore(80, '\n');
, \n
s are being ignored, but this implicates that I will only have one line... Short text before # and a very large string after :-|
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
std::vector<std::string> split(std::string str, char separator) {
std::vector<std::string> result;
std::string::size_type token_offset = 0;
std::string::size_type separator_offset = 0;
while (separator_offset != std::string::npos) {
separator_offset = str.find(separator, separator_offset);
std::string::size_type token_length;
if(separator_offset == std::string::npos) {
token_length = separator_offset;
} else {
token_length = separator_offset - token_offset;
separator_offset++;
}
std::string token = str.substr(token_offset, token_length);
if (!token.empty()) {
result.push_back(token);
}
token_offset = separator_offset;
}
return result;
}
int main(int argc, char **argv) {
std::fstream fin("input.dat");
while(!fin.eof()) {
std::string line;
getline(fin, line, ';');
fin.ignore(80, '\n');
std::vector<std::string> strs = split(line, ',');
for(int i = 0; i < strs.size(); ++i) {
std::cout << strs[i] << std::endl;
}
}
fin.close();
return 0;
}
Regards
There is no canned C++ library function for swallowing input like that. std::getline
reads the next line of text, up until the next newline character (by default). That's it. std::getline
does not do any further examination on the input, beyond that.
I will suggest the following approach for you.
Initialize a buffer representing the entire logical line just read.
Read the next line of input, using std::getline
(), and append the line to the input buffer.
Count the number of quote characters in the buffer.
Is the number of quotes even? Stop. If the quote character count is odd, append a newline to the buffer, then go back and read another line of input.
Some obvious optimizations are possible here, of course, but this should be a good start.