androidsqliteandroid-sqliteandroid-roomfts4

Android Room - Unknown tokenizer - FtsOptions.TOKENIZER_UNICODE61


I'm using FTS4 with a tokenizer to remove diacritics. The database is abstracted with Room.

Something like:

Content

@Entity(
        tableName = "test"
)
public class Test {

    @PrimaryKey(autoGenerate = true)
    @ColumnInfo(name = "id")
    private Long mId;

    @ColumnInfo(name = "name")
    String mName;

    public Test(String name) {
        mName = name;
    }

    public Long getId() {
        return mId;
    }

    public void setId(Long id) {
        mId = id;
    }

    public String getName() {
        return mName;
    }

    public void setName(String name) {
        mName = name;
    }
}

FTS

@Entity(tableName = "search_fts")
@Fts4(
        contentEntity = Test.class,
        tokenizer = FtsOptions.TOKENIZER_UNICODE61,
        tokenizerArgs = {
                "remove_diacritics=2"
        }
)
public class SearchFts {

    @PrimaryKey
    @ColumnInfo(name = "rowid")
    int mRowId;

    @ColumnInfo(name = "name")
    String mName;

    public SearchFts(int rowId, String name) {
        mRowId = rowId;
        mName = name;
    }

    public int getRowId() {
        return mRowId;
    }

    public void setRowId(int rowId) {
        mRowId = rowId;
    }

    public String getName() {
        return mName;
    }

    public void setName(String name) {
        mName = name;
    }
}

This results in a SQLite query:

CREATE VIRTUAL TABLE IF NOT EXISTS search_fts
USING FTS4(
     'name' TEXT,
     tokenize=unicode61 'remove_diacritics=2', content='test'
);

Everything works on newer Android versions (i.e. API 30), but on older versions (i.e.API 25 or 27) I get the following error:

android.database.sqlite.SQLiteException: unknown tokenizer (code 1)

The official docs:

The "unicode61" tokenizer is available beginning with SQLite version 3.7.13 (2012-06-11). Unicode61 works very much like "simple" except that it does simple unicode case folding according to rules in Unicode Version 6.1 and it recognizes unicode space and punctuation characters and uses those to separate tokens.

API and database version:

enter image description here

Logcat:

2022-07-26 11:36:56.532 16176-16229/hr.laserline.osis E/SQLiteLog: (1) statement aborts at 28: [CREATE VIRTUAL TABLE IF NOT EXISTS search_fts USING FTS4(name TEXT, tokenize=unicode61 remove_diacritics=2, content=test)] unknown tokenizer

.... 2022-07-26 11:36:56.539 16176-16229/hr.laserline.osis E/hr.laserline.osis.service.sync.SyncServicePresenter: Sync error! Db transaction rollback ... android.database.sqlite.SQLiteException: unknown tokenizer (code 1) at android.database.sqlite.SQLiteConnection.nativeExecuteForChangedRowCount(Native Method) at android.database.sqlite.SQLiteConnection.executeForChangedRowCount(SQLiteConnection.java:735) at android.database.sqlite.SQLiteSession.executeForChangedRowCount(SQLiteSession.java:754) at android.database.sqlite.SQLiteStatement.executeUpdateDelete(SQLiteStatement.java:64) at android.database.sqlite.SQLiteDatabase.executeSql(SQLiteDatabase.java:1754) at android.database.sqlite.SQLiteDatabase.execSQL(SQLiteDatabase.java:1682) at androidx.sqlite.db.framework.FrameworkSQLiteDatabase.execSQL(FrameworkSQLiteDatabase.java:265) at hr.laserline.osis.data.db.LlamaRoomDatabase_Impl$1.createAllTables(LlamaRoomDatabase_Impl.java:148) at androidx.room.RoomOpenHelper.onCreate(RoomOpenHelper.java:74) at androidx.sqlite.db.framework.FrameworkSQLiteOpenHelper$OpenHelper.onCreate(FrameworkSQLiteOpenHelper.java:177) at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:333) at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:238) at androidx.sqlite.db.framework.FrameworkSQLiteOpenHelper$OpenHelper.getWritableSupportDatabase(FrameworkSQLiteOpenHelper.java:151) at androidx.sqlite.db.framework.FrameworkSQLiteOpenHelper.getWritableDatabase(FrameworkSQLiteOpenHelper.java:112) at androidx.room.RoomDatabase.inTransaction(RoomDatabase.java:706) at androidx.room.RoomDatabase.assertNotSuspendingTransaction(RoomDatabase.java:483) at hr.laserline.osis.data.db.dao.ParameterDao_Impl.getSyncNumPerChunk(ParameterDao_Impl.java:610) at hr.laserline.osis.data.repositories.ParameterRepository.getSyncNumPerChunk(ParameterRepository.java:72) at hr.laserline.osis.service.sync.SyncServiceInteractor.setNumPerChunk(SyncServiceInteractor.java:62) at hr.laserline.osis.service.sync.SyncServicePresenter.lambda$startSyncWithErp$0$hr-laserline-osis-service-sync-SyncServicePresenter(SyncServicePresenter.java:98) at hr.laserline.osis.service.sync.SyncServicePresenter$$ExternalSyntheticLambda2.get(Unknown Source:10) at io.reactivex.rxjava3.internal.operators.observable.ObservableDefer.subscribeActual(ObservableDefer.java:33) at io.reactivex.rxjava3.core.Observable.subscribe(Observable.java:13176) at io.reactivex.rxjava3.internal.operators.observable.ObservableSubscribeOn$SubscribeTask.run(ObservableSubscribeOn.java:96) at io.reactivex.rxjava3.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:38) at io.reactivex.rxjava3.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:25) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636) at java.lang.Thread.run(Thread.java:764) at hr.laserline.osis.utils.SingleTaskExecutor$1.run(SingleTaskExecutor.java:118)

Any suggestions?


Solution

  • It's not just Room, this is a general issue of FTS3-4/unicode61 tokenizer.

    Simply put

    SQLite 3.27.0 (embedded from Android API 30 onwards) has this line:

    Added the remove_diacritics=2 option to FTS3 and FTS5.

    See my blog for full details.