I'm working through some OPA examples like this one that leverage disk storage. I've removed the temporary directory in favor of a permanent one (like we'd have in a production system) and I'm noticing some strange behavior. If I first write the example record
"authz": {
"tenants": {
"acmecorp.openpolicyagent.org": {
"tier": "gold"
},
"globex.openpolicyagent.org" :{
"tier": "silver"
}
}
}
then the directory is populated with 000001.sst
, 000001.vlog
, DISCARD
, KEYREGISTRY
, and MANIFEST
files. However, on every subsequent read a new .sst
and .vlog
file are added with an incremented number such as 000002.sst
. It seems really inefficient to keep writing new files on writes and especially reads, why is this the case?
Also, is the expectation that I do my own garbage collection on another thread or is this something that comes built in with OPA or Badger?
It seems really inefficient to keep writing new files on writes and especially reads, why is this the case?
From the perspective of using OPA, this should be considered an implementation detail. I can't comment on the necessity of those files, other than this is how Badger does it. Badger itself is far from simple, it's a multi-layer system involving its own caches etc -- it's too complex (for me!) to judge its on-disk behaviour in any way.
Also, is the expectation that I do my own garbage collection on another thread or is this something that comes built in with OPA or Badger?
You are not expected to do any such thing. In fact, OPA has a goroutine running that will periodically call the advised GC routine, here's the code.
If you find the need to dig into this further, the Badger community might be another good venue, see this Dgraph discourse category. (And we can of course discuss this on the OPA slack, too.)