sql sql-server database database-design schema

structure data with common and arbitrary attributes for querying all events and then querying for specific event information

I have a table that holds information about events across a number of different devices and platforms.

Event:
  int Id
  datetime Occured_On
  uniqueidentifier UserId --> Users table
  int Category (the type of event) --> Categories table
  text Summary (textual summary of the event)
  text EventSource (specifies whether this is a DesktopEvent or MobileEvent)

Each device has its own set of data for each event:

DesktopEvent
  int Id
  int EventId --> Event
  text Hostname
  text LoggedInUsername

MobileEvent
  int Id
  int EventId --> Event
  int PlatformId 
  text ESN

...

I want to query Event for the last X most recent events. I have a column specifying the source of the event, so that I can make a second query to the appropriate table.

results = SELECT * FROM Event DESCENDING Occured_On LIMIT 5. 

foreach (result in results)
{
    if (result.EventSource == "DesktopEvent")
       data = SELECT * FROM DestkopEvent WHERE EventId == result.EventId

    ...
}

This seems inefficient.

There could be many sources of events, not just two (DesktopEvent and MobileEvent). So joining across 10+ tables checking for which one is not NULL seems worse than the method above.

How should I structure this data with common and arbitrary attributes for querying all events and then querying for specific event information?

Solution

In data modeling, you sometimes find things that are not utterly different, but also aren't exactly alike. They have some attributes in common, so they're not utterly different. But they each have some unique attributes, so they're not exactly alike.

Attributes that are common to all those things belong together in one table. That table is usually called a supertype.

Each unique set of attributes belongs in separate tables; those tables are called subtypes. In data modeling, "supertype" and "subtype" don't have anything to do with the object-oriented programming concepts that use the same words. Same words, vastly different meanings. Be careful with that.

It sounds like the table "events" is your supertype. "Desktop events" and "Mobile events" are two of several subtypes.

If that's the case, you're probably on the right track. Inefficiency is only apparent, not real. The real truth is that desktop events and mobile events are different things to you, and so they must be stored in different tables. And storing them in different tables makes integrity constraints infinitely simpler to implement. After all, that's what SQL databases are designed to do. Queries can take advantage of indexes.

Normally, a supertype and 'n' subtypes map to 'n' + 1 tables and 'n' views. You have one table for each of the 'n' subtypes, plus one table for the supertype. You'd also build one view for each subtype; each view joins one subtype to the supertype. You'd normally use the views, and not use the subtype tables directly. But you might use the supertype table directly--to get the last five events, for example.

Search SO for the terms supertype and subtype. SO user PerformaceDBA and I have both written about them more than once. (Apologies to others who have weighed in on relational supertypes and subtypes; I have a lot of trouble remembering names.)