sqlmetadatardbmsrelational-databaserdbms-agnostic

Key-value pairs for metadata/tagging in RDBMS: efficient storage


I know that key-value pairs aren't good database design, aren't normalized etc however in this case I believe they are the most appropriate solution.

My excuse for this, and some background: A large set of items is being pushed into a set of tables, and each item can be tagged with arbitrary metadata that the user can choose. The user can choose the metadata because they are specifying how they wish to categorize, report on and view the items later. For this specific business problem it is not our place (as systems designers) to say what these dimensions are. There's not a consistent set of keys used across items and in some cases the presence of a certain key will be used as a filter condition.

Another bit of background info, the entries will be INSERTed, but not UPDATEd. Eventually they will be DELETEd (sequentially, in the same order they were inserted).

The question, "Efficient storage": by this I am referring to query (read) performance. The following types of queries will be used:

Basically, which is the best choice given these options?:

OPTION 1

Items table:
item_id (integer, pk)
... item fields ...

ItemFacts table:
item_id (integer, fk)
key_name (nvarchar(64))
key_value (nvarchar(128))

OPTION 2

Items table:
item_id (integer, pk)
... item fields ...

Facts table:
fact_id (integer, pk)
key_name (nvarchar(64))
key_value (nvarchar(128))

ItemFacts table:
item_id (integer, fk)
fact_id (integer, fk)

(There could be a third option where the key names are pulled out to a separate table again to reduce redundancy since there may be a whole load of used/possible values for a given key name, might also be worth considering)

roughly speaking, there will be a great deal of duplicated key/value matches. As such there should be a storage efficiency increase. I realise this is a bit of an open-ended question, but what about read performance? How about if I introduce this query too:?

If I can provide any more clarification, please let me know.


Solution

  • You don't need an excuse to make a bad design. Your design is your choice. But to ask what's the best way to screw up my design is not a question with a lot of answers and no good ones. The real question is what other storage technology should I use INSTEAD of an RDBMS.

    There are systems designed to store key-value data like Cassandra. Search for NoSQL... find a technology that fits.