mysqldatabase-designprimary-keydatabase-normalizationcomposite-key

Database Design: Composite key vs one column primary key


The database of our web application includes two tables:

States (idStates, State, Lat, Long)

idStates is an auto-incrementing primary key.

Cities (idAreaCode, idStates, City, Lat, Long)

idAreaCode is a primary key consisting of country code + area code (e.g. 91422 where 91 is the country code for india and 422 is the area code of a city in India). idStates is a foreign key referencing States to associate each city in Cities with its corresponding state.

We figured that country code + area code would be unique for each city, and thus could safely be used as a primary key. Everything was working. But a location in India found a flaw in the DB design--India, like the US, is a federal democracy and is geographically divided into many states or union territories. Both the states and union territories data is stored in States. There is, however, one location--Chandigarh--which belongs to TWO states (Haryana and Punjab) and is also a union territory by itself.

The current db design doesn't allow us to store more than one record for the city Chandigarh.

One solution suggested is to create a primary key combining columns idAreaCode and idStates.

What is a solution?

We are using MySQL with the InnoDB engine.

The database stores meteorological information for each city. Thus, the state and city are the starting point of each query.

Database normalization is important to us.

The database is updated daily / hourly using a CSV file (which is generated by another app). Each record in the CSV file is identified by the idStates and idAreaCode columns.

Hence it is preferred that the primary key used in City, rather than be auto-incremented, is the same for every city, even if the table is deleted and refreshed again. Zip codes (or pin codes) and area codes (or STD codes) meet the criteria of being unique and static (don't change often) and a ready list of these are easily available. (We decided on area codes for now because India is in the process of updating its pin codes to a new format.)

PS: We decided to handle this at the application level instead of making changes to the database design. In the database we will only be storing one record for Chandigarh. In the application we created a flag for any search for Chandigarh Punjab or Chandigarh Haryana to redirect to this record. It's an acceptable compromise since this is the ONLY exception we've come across.


Solution

  • It sounds like you are gathering data for a telephone directory. Are you? Why are states important to you? The answer to this question will probably determine which database design will work best for you.

    You may think that it's obvious what a city is. It's not. It depends on what you are going to do with the data. In the US, there is this unit called MSA (Metropolitan Statistical Area). The Kansas City MSA spans both Kansas City, Kansas and Kansas City, Missouri. Whether the MSA unit makes sense or not depends on the intended use of the data. If you used area codes in US to determine cities, you'd end up with a very different grouping than MSAs. Again, it depends on what you are going to do with the data.

    In general whenever hierarchical patterns of political subdivisions break down, the most general solution is to consider the relationship many-to-many. You solve this problem the same way you solve other many-to-many problems. By creating a new table, with two foreign keys. In this case the foreign keys are IdAreacode and IdStates.

    Now you can have one arecode in many states and one state spanning many area codes. It seems a shame to accpet this extra overhead to cover just one exception. Do you know whether the exception you have uncovered is just the tip of the iceberg, and there are many such exceptions?