mysqldatabasedatabase-design

Improve database design for online exam


I am working on a multiple choice online test project here i have designed database to store result but looking for more optimized way.

Requirements:

  1. Every question have four options.
  2. Only one option can be selected and that needs to be stored in database.

My design:

tables:

students
stud_id, name, email

tests
test_id, testname, duration

questions
que_id, question, opt1, opt2, opt3, opt4, answer, test_id

answers
stud_id, que_id, answer

By this way answers can be stored but it increase the number of records as for every question solved by student new record will be added in answers table.

e.g. One test consists 100 questions and 1000 students take that test, for every student there will be 100 records for each question and for 1000 students 100k records.

Is there any better way to do this where number of records will be less.


Solution

  • Initial Response

    Understanding the Data

    You have done good work. As far as the data is concerned, the design is correct, but incomplete. There are two errors:

    1. opt1…opt4 is a repeating group, that breaks 2NF. It must be placed in a separate table.
    1. Your answers table has no integrity. As it stands, answers can be recorded against a question that the student was not asked, or for a test that the student did not sit. Prevention of that type of error is ordinary fare in a Relational Database, and it is not possible in a Record Filing System.
    1. The third item is not an error, because you did not give it as a requirement. However, it seems to me that a question can be used in more than one test. The way you have set it up, such questions will be duplicated (the whole point of a database is to Normalise it, such that there is no duplication).

    Questions

    By this way answers can be stored but it increase the number of records as for every question solved by student new record will be added in answers table.

    Yes. That is normal for a database.

    Is there any better way to do this where number of records will be less.

    For a Record Filing System, yes. For a database, no. Since you have tagged your question as database-design, I will assume that that is what you want.

    A database is a collection of facts, not of records with related fields. The facts are about the real world, limited to the scope of the database and app.

    It is important to determine the discrete facts that we need, because subordinate facts depend on higher-order facts. That is database design. And we Normalise the data, as we progress, as part of one and the same exercise. Normalisation has the purpose of eliminating duplication, otherwise you have Update Anomalies. And we determine Relational Keys, as we progress, again as part of one and the same exercise. Relational Keys provide the logical structure of a Relational database, ie. the logical integrity.

    e.g. One test consists 100 questions and 1000 students take that test, for every student there will be 100 records for each question and for 1000 students 100k records.

    Yes. But that is expressed in ISAM record-processing terms. In database terms, you cannot get around the fact that the database stores:

    You need to get your head around two things: the large number of discrete facts; and the use of compound Keys. Both are essential to Relational databases. If either of those are missing, or you implement them with reluctance, you will not have the integrity, power, or speed of a Relational database, you will have a pre-1970's ISAM Record Filing System.

    Further, the SQL platforms, and to some degree the non-SQL platforms such as MySQL, are heavily optimised for processing sets of data (not record-by-record); heavy I/O and caching; etc. If you implement the structures required for high concurrency, you will obtain even more performance.

    Implementation

    As far as the implementation is concerned, and particularly since you are concerned about performance, there are errors. A restatement would be, the implementation should not be attempted until the data is understood and modelled correctly.

    The problem across the board, is that you have added a surrogate (there is no such thing as "surrogate key", it is simply a surrogate, a physical record id). It is far to early in the modelling exercise; it hasn't progressed enough; the model is not stable, to add surrogates.

    Solution

    Rather than going back and forth, let me provide the proposal, and you can discuss it.

    Your answers table has no integrity. As it stands, answers can be recorded against a question that the student was not asked, or for a test that the student did not sit. Prevention of that type of error is ordinary fare in a Relational Database, and it is not possible in a Record Filing System.

    Please comment/discuss.

    Response to Comments

    I will add additional table subject (subject_id, subject_name) and add that subject_id in question table as FK is this okay?

    Yes, by all means. But that has consequences. Some advice to make sure we do that properly, across the board:

    1. As explained, do not use surrogates (Record IDs) unless you absolutely have to. Short Codes are much better for Identifiers, for both users and developers.
    1. Subject is important. It is the context in which (a) a question exists, and (b) a test exists. They did exist as independent items (page 1 of the DM), but now they are subordinate to subject. The addition substantially improves data integrity.

    2. The fact of a student registration and the fact of a student sitting for a test, are discrete and separate facts.

    3. Gratefully, that eliminated two surrogates question_id and test_id. Short codes such as CHAR(2) are easier and more meaningful.

    4. Note the improvement in the table names, improved clarity.

    5. I have updated the Student Test Data Model (Page 2 only, for those following the progression).

    6. However, that exposes something (that is why we model data, paper is cheap, many drafts are normal). If we evaluate the Predicates (readily visible in the Data Model, as detailed in the IDEF1X Notation document):

         each subject_test was taken by 0-to-n student_tests
         each student_test is [a taking of] 1 subject_test
         each student took 0-to-n student_tests
         each student_test is taken by 1 student
      

    those Predicates are not accurate. A student can sit for a test in any subject. Given the new subject table, I would think that we want students to be registered for subjects, and therefore student_test to be constrained to subjects that the student is registered for.

    1. I have updated the Student Test Data Model (Page 3). Now we have even more integrity, such that student_test is constrained to subjects that the student is registered for. The relevant Predicates are:

        each student registered for 0-to-n student_subjects
        each student_subject is a registration of 1 student
        each subject attracted 0-to-n student_subjects
        each student_subject is an attraction of 1 subject
      
        each subject_test was taken by 0-to-n student_tests
        each student_test is [a taking of] 1 subject_test
        each student_subject took 0-to-n student_tests
        each student_test is taken by 1 student_subjects
      
    2. Now the data model appears to be complete.