javadictionarytrieprefix-tree

Java in-memory map with string key part reuse


I have tens of thousands of records in a map. Map keys are string like s3://mybucket/some/path/2021/03/03/file.txt, s3://mybucket/some/path/2021/03/04/file.txt, value is 0 or 1. So far I use HashMap but the memory usage is too high, I want to reduce it.

I am looking for a something that is key-value and utilizes key part reusability. What comes to mind naturally is the use some tree structure to store the prefixes.

Could somebody point to an appropriate implementation, preferably lightweight?


Solution

  • A Prefix Tree is a possible solution that will be lightweight in terms of space complexity. Here is a Java implementation of a Prefix Tree using Trie. I used a HashMap for the child as I'm not sure of the alphabet size. You can definitely use a fixed length for the child TrieNode if you are certain about the alphabet size in your dictionary. Each TrieNode will store the value of the key only if the TrieNode marks the end of the string. When you find the TrieNode which stores the last character of your key, you can then access the value;

    class TrieNode {
        HashMap<Character, TrieNode> child;
        boolean isEnd;
        int value;
    
        TrieNode() {
            this.child = new HashMap<>();
        }
    
    }
    
    public class PrefixTree {
        private TrieNode root;
    
        public PrefixTree() {
            this.root = new TrieNode();
        }
    
        public void put(String key, int value) {
            char[] data = key.toCharArray();
            TrieNode tempNode = root;
    
            for (char ch : data) {
                if (tempNode.child.get(ch) == null) {
                    tempNode.child.put(ch, new TrieNode());
                }
                tempNode = tempNode.child.get(ch);
    
            }
            tempNode.isEnd = true;
            tempNode.value = value;
        }
    
    
        public TrieNode get(String key) {
            char[] data = key.toCharArray();
            TrieNode tempNode = root;
            for (char ch : data) {
                if (tempNode.child.get(ch) == null) {
                    return null;
                }
                tempNode = tempNode.child.get(ch);
            }
            if (tempNode != null && tempNode.isEnd == false)
                return null;
    
            return tempNode;
        }
    
        public static void main(String[] args) {
            String[] input = {
                    "s3://mybucket/some/path/2021/03/03/file.txt",
                    "s3://mybucket/some/path/2021/03/04/file.txt",
                    "s3://mybucket/some/path/2021/03/05/file.txt",
                    "s3://mybucket/some/path/2021/03/06/file.txt",
                    "s3://mybucket/some/path/2021/03/07/file.txt",
            };
            PrefixTree prefixTree = new PrefixTree();
            Random random = new Random();
    
            for(String key: input){
                prefixTree.put(key, random.nextInt(2)); // inserts 0-1 randomly
            }
    
            System.out.println(prefixTree.get(input[1]).value);
    
            // Update functionality demo
            prefixTree.put(input[1], prefixTree.get(input[1]).value+3);
    
            // value will be updated
            System.out.println(prefixTree.get(input[1]).value);
    
        }
    }