I'm reading a file in using a basic FileReader
with a Buffer
, and sorting the results into different HashSet
s based on with they have a period in them or not.
Later in my program I compare strings to the HashSet
using the contains()
method.
The Non-period set works fine, but the set with a period in it is screwed up. I believe I've narrowed it down to the add method, although it's also quite possible it has something to do with the way the file is read.
{
FileReader file;
BufferedReader br;
try {
file = new FileReader(new File("./support/effective_tld_names.txt"));
br = new BufferedReader(file);
String temp;
while ((temp = br.readLine()) != null) {
if(!(temp.startsWith("//") || temp.isEmpty())){
int dotCount = temp.length() - temp.replace(".", "").length();
if(dotCount == 0){
singleTLDSet.add(temp);
} else if(dotCount == 1) {
System.out.println(StringEscapeUtils.escapeJava(temp));
doubleTLDSet.add(StringEscapeUtils.escapeJava(temp));
} else {
}
}
}
file.close();
br.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Later in my program:
Iterator i = ValidTLDS.getDoubleTLDSSet().iterator();
while(i.hasNext()) {
String next = (String) i.next();
System.out.println(next);
}
The weird part is that when I iterate through the HashSet
, the values are different from what I thought I was putting in there:
A sample of the results from the println right before they're added to the Double
HashSet
:
codespot.com
googleapis.com
googlecode.com
pagespeedmobilizer.com
withgoogle.com
herokuapp.com
herokussl.com
iki.fi
biz.at
info.at
co.pl
azurewebsites.net
A sample of results form iterating through:
eurovision
ventures
ads
ninja
claims
pharmacy
exchange
trust
بھارت
epson
Looks like some of the TLDs are getting truncated before the period, and some just aren't showing up in the hashset at all.
Anyone have any ideas what I'm doing wrong here? Is there some special rule or edge case about Hashsets with strings, or reading from files? Am I just being a noob with a basic typo or something?
Either there is a third Set<String>
that is returned via getDoubleTLD-S-Set or the getter
Set<String> getDoubleTLDSet(){ //getDoubleTLD-S-Set ??
return singleTLDSet;
}
returns the singleTLDSet.
Otherwise the code is fine.
(What's the point of calling StringEscapeUtils.escapeJava? I wouldn't do that just for storing the strings.)