javatreemapcontainskey

containsKey method of TreeMap returns false despite that the key is already in the Map


I try to write a program that counts all the words in text file. I put any word that matches the patterns in TreeMap.

The text file I get through args0

For example, the text file contains this text: The Project Gutenberg EBook of The Complete Works of William Shakespeare

The condition that checks if the TreeMap already has the word, return false for the second appearance of word The, but returns true the second appearance of word of.

I don't understand why...
This is my code:

public class WordCount
{
    public static void main(String[] args)
    {
        // Charset charset = Charset.forName("UTF-8");
        // Locale locale = new Locale("en", "US");

        Path p0 = Paths.get(args[0]);
        Path p1 = Paths.get(args[1]);
        Path p2 = Paths.get(args[2]);

        Pattern pattern1 = Pattern.compile("[a-zA-Z]");
        Matcher matcher;
        Pattern pattern2 = Pattern.compile("'.");

        Map<String, Integer> alphabetical = new TreeMap<String, Integer>();

        try (BufferedReader reader = Files.newBufferedReader(p0))
        {
            String line = null;

            while ((line = reader.readLine()) != null)
            {
                // System.out.println(line);
                for (String word : line.split("\\s"))
                {
                    boolean found = false;

                    matcher = pattern1.matcher(word);
                    while (matcher.find())
                    {
                        found = true;
                    }
                    if (found)
                    {
                        boolean check = alphabetical.containsKey(word.toLowerCase());
                        if (!alphabetical.containsKey(word.toLowerCase()))
                            alphabetical.put(word.toLowerCase(), 1);
                        else
                            alphabetical.put(word.toLowerCase(), alphabetical.get(word.toLowerCase()).intValue() + 1);
                    }
                    else
                    {
                        matcher = pattern2.matcher(word);
                        while (matcher.find())
                        {
                            found = true;
                        }
                        if (found)
                        {
                            if (!alphabetical.containsKey(word.substring(1, word.length())))
                                alphabetical.put(word.substring(1, word.length()).toLowerCase(), 1);
                            else
                                alphabetical.put(word.substring(1, word.length()).toLowerCase(), alphabetical.get(word).intValue() + 1);
                        }
                    }
                }
            }
}

Solution

  • I've tested your code, it is ok. I think you have to check your file encoding.

    It is certainly in "UTF-8". Put it in "UTF-8 without BOM", and you'll be OK !

    Edit : If you can't change the encoding, you can do it manually. See this link : http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html

    Regards