I try to write a program that counts all the words in text file. I put any word that matches the patterns in TreeMap.
The text file I get through args0
For example, the text file contains this text: The Project Gutenberg EBook of The Complete Works of William Shakespeare
The condition that checks if the TreeMap already has the word, return false
for the second appearance of word The
, but returns true
the second appearance of word of
.
I don't understand why...
This is my code:
public class WordCount
{
public static void main(String[] args)
{
// Charset charset = Charset.forName("UTF-8");
// Locale locale = new Locale("en", "US");
Path p0 = Paths.get(args[0]);
Path p1 = Paths.get(args[1]);
Path p2 = Paths.get(args[2]);
Pattern pattern1 = Pattern.compile("[a-zA-Z]");
Matcher matcher;
Pattern pattern2 = Pattern.compile("'.");
Map<String, Integer> alphabetical = new TreeMap<String, Integer>();
try (BufferedReader reader = Files.newBufferedReader(p0))
{
String line = null;
while ((line = reader.readLine()) != null)
{
// System.out.println(line);
for (String word : line.split("\\s"))
{
boolean found = false;
matcher = pattern1.matcher(word);
while (matcher.find())
{
found = true;
}
if (found)
{
boolean check = alphabetical.containsKey(word.toLowerCase());
if (!alphabetical.containsKey(word.toLowerCase()))
alphabetical.put(word.toLowerCase(), 1);
else
alphabetical.put(word.toLowerCase(), alphabetical.get(word.toLowerCase()).intValue() + 1);
}
else
{
matcher = pattern2.matcher(word);
while (matcher.find())
{
found = true;
}
if (found)
{
if (!alphabetical.containsKey(word.substring(1, word.length())))
alphabetical.put(word.substring(1, word.length()).toLowerCase(), 1);
else
alphabetical.put(word.substring(1, word.length()).toLowerCase(), alphabetical.get(word).intValue() + 1);
}
}
}
}
}
I've tested your code, it is ok. I think you have to check your file encoding.
It is certainly in "UTF-8". Put it in "UTF-8 without BOM", and you'll be OK !
Edit : If you can't change the encoding, you can do it manually. See this link : http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
Regards