I'm trying to parse a large TXT-File line by line (6mio. Lines, 200MB) using if statements with the String.contains(String) method. At the moment it is very slow is there a method to improve the speed.
I know there's also String.firstIndexOf but that seems to be slower. Regex is probably slower too.
Importing the TXT and splitting lines:
let content = try String(contentsOfFile:path, encoding: String.Encoding.ascii)
print("LOADED 0");
return content.components(separatedBy: "\n")
Parsing:
if(line.contains("<TAG1>")) {
var thisline = line;
thisline = thisline.replacingOccurrences(of: "<TAG1>", with: "")
thisline = thisline.replacingOccurrences(of: "</TAG1>", with: "")
text = "\(text)\n\(thisline): ";
} else if(line.contains("<TAG2>")) {
var thisline = line;
thisline = thisline.replacingOccurrences(of: "<TAG2>", with: "")
thisline = thisline.replacingOccurrences(of: "</TAG2>", with: "")
text = "\(text) - \(thisline) ";
}
There will probably be more if statements (which will probably slow down the parsing even more)
It would be awesome if the speed could be improved, it takes approx. 5-10 Minutes on my Macbook (depending on the filesize)
Edit: It seems like string + " \n " + string2 is faster than "(string) \n (string2)", but it doesn't help too much
Edit2: I've added a progress-bar to the application and it seems to start fast and get slower by the end?
Building up your final text
variable as you are causes an ever-growing string to be copied (with a small addition) for every line and then re-assigned back to text
.
// Slow
text = "\(text)\n\(thisline): "
Appending just the addition to the original variable will be much quicker:
// Fast(er)
text.append("\n\(thisline): ")
Depending on the required level of sophistication (and whether this is just a one-time transformation or something that will happen frequently?), you may want to look into @rmaddy's suggestion of using a proper parser.