I have a table that has a text column Fruit. In that column only one of four values ever appears: Pear, Apple, Banana, and Strawberry. I have a million rows.
Instead of repeating that data (on average) a quarter million times each, if I extract it into a another table that has a Fruit column and just those four rows and make the original column a foreign key, does it save space?
I expect that the four fruit names are stored only once, and that the million rows now have pointers or indexes or some kind of reference into the second table.
If my row values are longer than short fruit names I expect the savings/optimization is even larger.
The data types of the fields on both sides of a foreign key relationship have to be identical.
If the parent table's key field is (say) varchar(20)
, then the foreign key fields in the dependent table will also have to be varchar(20)
. Which means, yes, you'd have to have X million rows of 'Apple' and 'Pear' and 'Banana' repeating in each table which has a foreign key pointing back at the fruit table.
Generally it's more efficient to use numeric fields as keys (int, bigint), as those can have comparisons done with very few CPU instructions (generally a direct one cpu instruction comparison is possible). Strings, on the other hand, require loops and comparatively expensive setups. So yes, you'd be better off to store the fruit names in a table somewhere, and use their associated numeric ID fields as the foreign key.
Of course, you should benchmark both setups. These are just general rules of thumbs, and your specific requirements/setup may actually work faster with the strings-as-key version.