optimizationandroid-roomtriecustom-keyboardandroid-custom-keyboard

Android custom keyboard suggestions


I am building a custom keyboard for android, the one that atleast supports autocomplete suggestions. To achieve this, I am storing every word that user types (not password fields) in a Room database table which has a simple model, the word and its frequency. Now for showing the suggestions, I am using a Trie which is populated by words from this database table. My query is basically to order by the table based on frequency of the word and limit the results to 5K (I do not feel like overpopulating the Trie, these 5K words can be considered as the users' favourite words that he uses often and needs suggestions for). Now my actual problem is the ORDER BY clause, this is a rapidly growing data set, sorting lets say 0.1M words to get 5K words seems like an overkill. How can i rework this approach to improve the efficiency of this entire suggestions logic.


Solution

  • If not already implemented, an index on the frequency @ColumnInfo(index = true).

    Another could be to add a table that maintains the highest 5k. Supported by yet another table (the support table) that has 1 row, with columns for; the highest frequency (not really required), the lowest frequency in the current 5k, and a 3rd column for the number currently held. So you could then, after adding an existing word get whether or not the new/updated word should be added to the 5k table (perhaps a 4th column for the primary key of the lowest to facilitate efficient deletion).

    So

    1. if the number currently held is less than 5k insert or update the 5k table and increment the number currently held in the support table.
    2. otherwise if the number is lower then the lowest skip it.
    3. otherwise update if it already exists.
    4. otherwise delete the lowest, insert the replacement and then update the lowest accordingly in the support table.

    Note that the 5K table would probably only need to store the rowid as a pointer/reference/map to the core table.

    Demonstration

    The following is a demonstration albeit just using a basic Word table as the source.

    First the 2 tables (@Entity annotated classes)

    Word

    @Entity (
            indices = {@Index(value = {"word"},unique = true)}
    )
    class Word {
        @PrimaryKey Long wordId=null;
        @NonNull
        String word;
        @ColumnInfo(index = true)
        long frequency;
    
        Word(){}
    
        @Ignore
        Word(String word, long frequency) {
            this.word = word;
            this.frequency = frequency;
        }
    }
    

    WordSubset aka the table with the highest occurring 5000 frequencies, it simply has a reference/map/link to the underlying/actual word. :-

    @Entity(
            foreignKeys = {
                    @ForeignKey(
                            entity = Word.class,
                            parentColumns = {"wordId"},
                            childColumns = {"wordIdMap"},
                            onDelete = ForeignKey.CASCADE,
                            onUpdate = ForeignKey.CASCADE
                    )
            }
    )
    class WordSubset {
        public static final long SUBSET_MAX_SIZE = 5000;
        @PrimaryKey
        long wordIdMap;
    
        WordSubset(){};
    
        @Ignore
        WordSubset(long wordIdMap) {
            this.wordIdMap = wordIdMap;
        }
    }
    

    WordSubsetSupport this will be a single row table that contains the highest and lowest frequencies (highest is not really needed), the number of rows in the WordSubset table and a reference/map to the word with the lowest frequency.

    @Entity(
            foreignKeys = {
                    @ForeignKey(
                            entity = Word.class,
                            parentColumns = {"wordId"},
                            childColumns = {"lowestWordIdMap"}
                    )
            }
    )
    class WordSubsetSupport {
        @PrimaryKey
        Long wordSubsetSupportId=null;
        long highestFrequency;
        long lowestFrequency;
        long countOfRowsInSubsetTable;
        @ColumnInfo(index = true)
        long lowestWordIdMap;
    
        WordSubsetSupport(){}
        @Ignore
        WordSubsetSupport(long highestFrequency, long lowestFrequency, long countOfRowsInSubsetTable, long lowestWordIdMap) {
            this.highestFrequency = highestFrequency;
            this.lowestFrequency = lowestFrequency;
            this.countOfRowsInSubsetTable = countOfRowsInSubsetTable;
            this.lowestWordIdMap = lowestWordIdMap;
            this.wordSubsetSupportId = 1L;
        }
    }
    

    For access an abstract class (rather than interface, as this, in Java, allows methods/functions with a body, a Kotlin interface allows these) CombinedDao :-

    @Dao
    abstract class CombinedDao {
        @Insert(onConflict = OnConflictStrategy.IGNORE)
        abstract long insert(Word word);
        @Insert(onConflict = OnConflictStrategy.IGNORE)
        abstract long insert(WordSubset wordSubset);
        @Insert(onConflict = OnConflictStrategy.IGNORE)
        abstract long insert(WordSubsetSupport wordSubsetSupport);
    
        @Query("SELECT * FROM wordsubsetsupport LIMIT 1")
        abstract WordSubsetSupport getWordSubsetSupport();
        @Query("SELECT count() FROM wordsubsetsupport")
        abstract long getWordSubsetSupportCount();
        @Query("SELECT countOfRowsInSubsetTable FROM wordsubsetsupport")
        abstract long getCountOfRowsInSubsetTable();
        @Query("UPDATE wordsubsetsupport SET countOfRowsInSubsetTable=:updatedCount")
        abstract void updateCountOfRowsInSubsetTable(long updatedCount);
        @Query("UPDATE wordsubsetsupport " +
                "SET countOfRowsInSubsetTable = (SELECT count(*) FROM wordsubset), " +
                "lowestWordIdMap = (SELECT word.wordId FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId ORDER BY frequency ASC LIMIT 1)," +
                "lowestFrequency = (SELECT coalesce(min(frequency),0) FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId)," +
                "highestFrequency = (SELECT coalesce(max(frequency),0) FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId)")
        abstract void autoUpdateWordSupportTable();
        @Query("DELETE FROM wordsubset WHERE wordIdMap= (SELECT wordsubset.wordIdMap FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId ORDER BY frequency ASC LIMIT 1)")
        abstract void deleteLowestFrequency();
    
        @Transaction
        @Query("")
        int addWord(Word word) {
            /* try to add the add word, setting the wordId value according to the result.
                The result will be the wordId generated (1 or greater) or if the word already exists -1
            */
            word.wordId = insert(word);
            /* If the word was added and not rejected as a duplicate, then it may need to be added to the WordSubset table */
            if (word.wordId > 0) {
                /* Are there any rows in the support table? if not then add the very first entry/row */
                if (getWordSubsetSupportCount() < 1) {
                    /* Need to add the word to the subset */
                    insert(new WordSubset(word.wordId));
                    /* Can now add the first (and only) row to the support table */
                    insert(new WordSubsetSupport(word.frequency,word.frequency,1,word.wordId));
                    autoUpdateWordSupportTable();
                    return 1;
                }
                /* If there are less than the maximum number of rows in the subset table then
                    1) insert the new subset row, and
                    2) update the support table accordingly
                */
                if (getCountOfRowsInSubsetTable() < WordSubset.SUBSET_MAX_SIZE) {
                    insert(new WordSubset(word.wordId));
                    autoUpdateWordSupportTable();
                    return 2;
                }
                /*
                    Last case is that the subset table is at the maximum number of rows and
                    the frequency of the added word is greater than the lowest frequency in the
                    subset, so
                    1) the row with the lowest frequency is removed from the subset table and
                    2) the added word is added to the subset
                    3) the support table is updated accordingly
                */
                if (getCountOfRowsInSubsetTable() >= WordSubset.SUBSET_MAX_SIZE) {
                    WordSubsetSupport currentWordSubsetSupport = getWordSubsetSupport();
                    if (word.frequency > currentWordSubsetSupport.lowestFrequency) {
                        deleteLowestFrequency();
                        insert(new WordSubset(word.wordId));
                        autoUpdateWordSupportTable();
                        return 3;
                    }
                }
                return 4; /* indicates word added but does not qualify for addition to the subset */
            }
            return -1;
        }
    }
    

    TheDatabase is a pretty standard @Database annotated class, other than that it allows use the main thread for the sake of convenience and brevity of the demo:-

    @Database( entities = {Word.class,WordSubset.class,WordSubsetSupport.class}, version = TheDatabase.DATABASE_VERSION, exportSchema = false)
    abstract class TheDatabase extends RoomDatabase {
        abstract CombinedDao getCombinedDao();
    
        private static volatile TheDatabase instance = null;
        public static TheDatabase getInstance(Context context) {
            if (instance == null) {
                instance = Room.databaseBuilder(context,TheDatabase.class,DATABASE_NAME)
                        .addCallback(cb)
                        .allowMainThreadQueries()
                        .build();
            }
            return instance;
        }
    
        private static Callback cb = new Callback() {
            @Override
            public void onCreate(@NonNull SupportSQLiteDatabase db) {
                super.onCreate(db);
            }
    
            @Override
            public void onOpen(@NonNull SupportSQLiteDatabase db) {
                super.onOpen(db);
            }
        };
    
    
    
        public static final String DATABASE_NAME = "the_database.db";
        public static final int DATABASE_VERSION = 1;
    }
    

    Finally activity code that randomly generates and adds 10,000 words (or thereabouts as some could be duplicate words), each word having a frequency that is also randomly generated (between 1 and 10000) :-

    public class MainActivity extends AppCompatActivity {
        
        TheDatabase db;
        CombinedDao dao;
    
        @Override
        protected void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            setContentView(R.layout.activity_main);
            
            db = TheDatabase.getInstance(this);
            dao = db.getCombinedDao();
            
            for (int i=0; i < 10000; i++) {
                Word currentWord = generateRandomWord();
                Log.d("ADDINGWORD","Adding word " + currentWord.word + " frequency is " + currentWord.frequency);
                dao.addWord(generateRandomWord());
            }
        }
        public static final String ALPHABET = "abcdefghijklmnopqrstuvwxyz";
        private Word generateRandomWord() {
            Random r = new Random();
            int wordLength = (abs(r.nextInt()) % 24) + 1;
            int frequency = abs(r.nextInt()) % 10000;
            StringBuilder sb = new StringBuilder();
            for (int i=0; i < wordLength; i++) {
                int letter = abs(r.nextInt()) % (ALPHABET.length());
                sb.append(ALPHABET.substring(letter,letter+1));
            }
            return new Word(sb.toString(),frequency);
        }
    }
    

    Obviously the results would differ per run, also the demo is only really designed to be run once (although it could be run more).

    After running, using AppInspection, then

    The support table (in this instance) is :-

    enter image description here

    The subset table on it's own means little as it just contains a map to the actual word. So it's more informative to use a query to look at what it contains.

    e.g.

    enter image description here

    As can be seen the query shows that the word who's wordId is 7412 is the one with the lowest frequency of 4690 (as expected according to the support table)

    Going to the last page shows:-

    enter image description here