sqldatabase-designrelational-databaseentity-relationship

How to avoid duplicate data in one-to-many relationship


Table Action has fields id, start_time, end_time and group_id.

Table ActionGroup has fields id, start_time and end_time of a group of multiple sequential actions.

ActionGroup has a 1-to-many relationship with Action.

Example:

If there are four Action records,

enter image description here

the relevant ActionGroup record is

enter image description here

Action already includes the data that ActionGroup needs, and ActionGroup is repeating it.

How do I query start time and end time of the ActionGroup in a simple and performant way, without separately logging the start_time and end_time fields in ActionGroup?

What are problems with duplicating the data as above when designing a table schema?


Solution

  • To start with your last question: Data duplication is bad in that it introduces the risk of inconsistency, e.g. if your ActionGroup 1 is registered with an end_time of (say) 8 while having an Action with end_time 13. If your data is inconsistent, you can't trust any query results, since an inconsistent logical system can produce any result at all.

    In your example, ActionGroupcan be derived from Action using an aggregate query (relational algebra pseudocode):

    Action group by { group_id } add {
      Min(start_time) start_time,
      Max(end_time) end_time
    }
    rename { group_id id }
    

    – so you don't need those attributes in the Action relvar (table). This is simple; whether it's fast enough depends on your requirements.

    But note that if you do keep redundant start_timeand end_time attributes in ActionGroup, you need to control the redundancy (preferrably with a constraint, alternatively with triggered actions, or, in the worst case, application code) in order to avoid inconsistency. This will also have performance implications, but they will apply to writes instead of reads.