Consistently coupling brightway database to existing brightway datapackage

In Brightway2.5, I am having difficulties understanding the relation between datapackages and databases. Here I see examples where either a bw database is created or a bw datapackage. This is the only example I find where both are used, but the bw database is created first, and how they are coupled is unclear.

What about the other way around: creating a datapackage first and then coupling a database to it?

I can successfully create a datapackage with large A and B matrices, save it as filesystem, and then reload it:

import bw_processing as bwp
from fs.osfs import OSFS
myfs = OSFS(some/filesystem/path)

dp = bwp.create_datapackage(fs = myfs)
dp.add_persistent_vector(...) # loading indices, data, and flip for A and B matrices
dp.finalize_serialization() # saves to disc
... 
dp = bwp.load_datapackage(myfs) # reloads the data if needed

Doing dp.data gives the matrix coordinates, values, and flip. I can do calculations but I can not see which node corresponds to a specific index and the information associated with it (name, id, geography, unit, type, etc.). I only know there is an exchange (edge) between activity (node) number "201" and number "205", for example. At this point I want to add all the contextual (metadata) information on the exchanges to be able to do meaningful analysis and use normal bw functions (search, etc.)

So, what is the recommended and scalable approach (works for large product systems with thousands of exchanges) to create a bw database of nodes and edges linked to an already existing bw datapackage? Any example is appreciated.

Solution

So, what is the recommended and scalable approach (works for large product systems with thousands of exchanges) to create a bw database of nodes and edges linked to an already existing bw datapackage?

Short answer: Don't do this.

Long answer: The datapackage is generated from the database, and is a reduced representation of it's data (it only includes numeric data without any metadata).

There are specific cases where you wouldn't want to store edge data in the database - for example, in IO tables we skip storing edges in SQLite because we only have a single numeric value without uncertainty, properties, etc, so we don't lose anything (and gain some performance) by dropping straight to a datapackage. See the IOTable backend code to see what's required in this case to make the datapackage act like a normal Brightway Database. Note that we still store Node data in the database, and we therefore get the datapackage integer ids from the database.

Trying to keep track of the integer ids outside the database will inevitably lead you to reinvent the idea of a database, with lots of pain and frustration, and no net benefit.