disk rocksdb open-policy-agent lsm-tree badgerdb

Understanding the Open Policy Agent (OPA) Disk-Storage implementation's use of .sst and .vlog files (BadgerDB)

I'm working through some OPA examples like this one that leverage disk storage. I've removed the temporary directory in favor of a permanent one (like we'd have in a production system) and I'm noticing some strange behavior. If I first write the example record

    "authz": {
        "tenants": {
            "acmecorp.openpolicyagent.org": {
                "tier": "gold"
            },
            "globex.openpolicyagent.org" :{
                "tier": "silver"
            }
        }
    }

then the directory is populated with 000001.sst, 000001.vlog, DISCARD, KEYREGISTRY, and MANIFEST files. However, on every subsequent read a new .sst and .vlog file are added with an incremented number such as 000002.sst. It seems really inefficient to keep writing new files on writes and especially reads, why is this the case?

Also, is the expectation that I do my own garbage collection on another thread or is this something that comes built in with OPA or Badger?

Solution

It seems really inefficient to keep writing new files on writes and especially reads, why is this the case?

From the perspective of using OPA, this should be considered an implementation detail. I can't comment on the necessity of those files, other than this is how Badger does it. Badger itself is far from simple, it's a multi-layer system involving its own caches etc -- it's too complex (for me!) to judge its on-disk behaviour in any way.

Also, is the expectation that I do my own garbage collection on another thread or is this something that comes built in with OPA or Badger?

You are not expected to do any such thing. In fact, OPA has a goroutine running that will periodically call the advised GC routine, here's the code.

If you find the need to dig into this further, the Badger community might be another good venue, see this Dgraph discourse category. (And we can of course discuss this on the OPA slack, too.)