I am designing a database to store survey responses that are collected through APIs from our survey platform - Wufoo. Each survey has different number of questions. Each survey has an Id and each response has an Id as well. I want to store the data as efficient, yet flexible enough to retrieve multiple questions at the same time via querying.
One option would be to store it as a table with columns - Survey_Id, Response_Id, Question, and Response. i.e. each row of response from the API will be many rows in a single table.
Another option would be to have two tables called questions and responses. Each table can have about 50 columns named - Q1, Q2, Q3, Q4, etc. I can then store all the questions (once) in the question table, and for each response, store the values in the response table in the same sequence of Q1, Q2, Q3, etc. To retrieve I can use the values in Question table as an alias of column name when reading the response table, most likely using stored procedures.
Which of the above option would you choose, and perhaps forsee and potential problems, if you were going to use the data for reporting? Or are there other ways to store survey responses?
Normalization is the best approach for database design.
Your second option is good but will improve it further. Maintain 2 tables questions
and responses
.
questions table schema
survey_id
question_id
question
responses table schema
response_id
survey_id (references to survey_id in questions table)
question_id (references to question_id in questions table)
response
Let's say Survey 1 has 20 questions. So, questions
table will have 20 entries. For each question there will be answers in response
table.
One more hint, if you are expecting huge replies or large data set, avoid auto increment primary keys as this will get exhausted. Due to this i have maintained 2 references survey_id
and question_id
in responses
table.
Hope this helps!