I constructed a class for dealing in a certain file formal and it's constructor goes through the file and searches for the key information I need - the idea is characters are written on multiple lines, and I want to read the first character of every line, the second character of every line and so on.
I've got the constructor and definition below (possibly horrible - this is my first time writing anything serious in C++),
class AlignmentStream{
private:
const char* FileName;
std::ifstream FileStream;
std::vector<int> NamesStart;
std::vector<int> SequencesStart;
std::vector<int> SequenceLengths;
int CurrentPosition;
int SequenceNum;
public:
AlignmentStream(const char* Filename);
std::vector<int> tellSeqBegins();
std::vector<int> tellNamesStart();
std::vector<int> tellSequenceLengths();
int getSequenceNum();
AlignedPosition get();
};
AlignmentStream::AlignmentStream(const char* Filename)
{
FileName = Filename;
FileStream.open(FileName);
std::cout << "Filestream is open: " << FileStream.is_open() << std::endl;
std::cout << "Profiling the alignment file..." << std::endl;
if (FileStream.is_open() == false)
throw StreamClosed(); // Make sure the stream is indeed open else throw an exception.
if (FileStream.eof())
throw FileEnd();
char currentchar;
// Let's check that the file starts out in the correct fasta format.
currentchar = FileStream.get();
if (FileStream.eof())
throw FileEnd();
if (currentchar != '>')
throw FormatError();
NamesStart.push_back(FileStream.tellg());
bool inName = true;
bool inSeq = false;
int currentLength = 0;
while(!FileStream.eof()){
while (!FileStream.eof() && inName == true) {
if (currentchar == '\n') {
inName = false;
inSeq = true;
SequencesStart.push_back(FileStream.tellg());
} else {
currentchar = FileStream.get();
}
}
while (!FileStream.eof() && inSeq == true) {
if (currentchar == '>') {
inName = true;
inSeq = false;
NamesStart.push_back(FileStream.tellg());
} else {
if (currentchar != '\n') {
currentLength++;
}
currentchar = FileStream.get();
}
}
SequenceLengths.push_back(currentLength); // Sequence lengths is built up here - (answer to comment)
currentLength = 0;
}
SequenceNum = (int)SequencesStart.size();
// Now let's make sure all the sequence lengths are the same.
std::sort(SequenceLengths.begin(), SequenceLengths.end());
//Establish an iterator.
std::vector<int>::iterator it;
//Use unique algorithm to get the unique values.
it = std::unique(SequenceLengths.begin(), SequenceLengths.end());
SequenceLengths.resize(std::distance(SequenceLengths.begin(),it));
if (SequenceLengths.size() > 1) {
throw FormatError();
}
std::cout << "All sequences are of the same length - good!" << std::endl;
CurrentPosition = 1;
FileStream.close();
}
Apologies for it being quite the chunk,anyway the constructor goes through char by char and gets the starting points of each line to be read. The get function (not shown) then goes through and seeks to the start of each line + how many more to get to the right character - given by the member variable CurrentPos. It then constructs another custom object of mine called AlignedPosition and returns it.
AlignedPosition AlignmentStream::get()
{
std::vector<char> bases;
for (std::vector<int>::iterator i = SequencesStart.begin(); i != SequencesStart.end(); i++) {
// cout messages are for debugging purposes.
std::cout << "The current filestream position is " << FileStream.tellg() << std::endl;
std::cout << "The start of the sequence is " << *i << std::endl;
std::cout << "The position is " << CurrentPosition << std::endl;
FileStream.seekg((int)(*i) + (CurrentPosition - 1) );
std::cout << "The Filestream has been moved to " << FileStream.tellg() << std::endl;
bases.push_back(FileStream.get());
}
CurrentPosition++;
//this for loop is just to print the chars read in for debugging purposes.
for (std::vector<char>::iterator i = bases.begin(); i != bases.end(); i++) {
std::cout << *i << std::endl;
}
return AlignedPosition(CurrentPosition, bases);
}
As you can see the first loop iterates through the start position of each line + the CurrentPosition and then gets the char and pushes it back onto a vector, this vector is passed to my AlignedPosition constructor, everything else is messages for debugging. However upon execution I see this:
eduroam-180-37:libHybRIDS wardb$ ./a.out
Filestream is open: 1
Profiling the alignment file...
All sequences are of the same length - good!
SeqNum: 3
Let's try getting an aligned position
The current filestream position is -1
The start of the sequence is 6
The position is 1
The Filestream has been moved to -1
The current filestream position is -1
The start of the sequence is 398521
The position is 1
The Filestream has been moved to -1
The current filestream position is -1
The start of the sequence is 797036
The position is 1
The Filestream has been moved to -1
?
?
?
Error, an invalid character was present
Couldn't get the base, caught a format error!
In short what I see is that the file stream position is -1 and does not change when seeks is used.Which leads to invalid characters and an exception getting thrown in my AlignedPosition constructor. Is this something do do with already having navigated through the file until the end in my constructor? Why does my position in the input stream remain at -1 all the time?
Thanks, Ben.
If you get an end of file on a stream, seekg
may not clear it. You need to call clear()
on the stream first. Since you read until EOF, you probably need to call clear()
. (Ref: en.wikipedia.org/wiki/Seekg )