mysqldatabase-designinnodbmyisamdatabase-engine

Keeping video viewing statistics breakdown by video time in a database


I need to keep a number of statistics about the videos being watched, and one of them is what parts of the video are being watched most. The design I came up with is to split the video into 256 intervals and keep the floating-point number of views for each of them. I receive the data as a number of intervals the user watched continuously. The problem is how to store them. There are two solutions I see.

Row per every video segment

Let's have a database table like this:

CREATE TABLE `video_heatmap` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `video_id` int(11) NOT NULL,
 `position` tinyint(3) unsigned NOT NULL,
 `views` float NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `idx_lookup` (`video_id`,`position`)
) ENGINE=MyISAM

Then, whenever we have to process a number of views, make sure there are the respective database rows and add appropriate values to the views column. I found out it's a lot faster if the existence of rows is taken care of first (SELECT COUNT(*) of rows for a given video and INSERT IGNORE if they are lacking), and then a number of update queries is used like this:

UPDATE video_heatmap
SET views = views + ?
WHERE video_id = ? AND position >= ? AND position < ?

This seems, however, a little bloated. The other solution I came up with is

Row per video, update in transactions

A table will look (sort of) like this:

CREATE TABLE video (
 id INT NOT NULL AUTO_INCREMENT,
 heatmap BINARY (4 * 256) NOT NULL,
 ...
) ENGINE=InnoDB

Then, upon every time a view needs to be stored, it will be done in a transaction with consistent snapshot, in a sequence like this:

  1. If the video doesn't exist in the database, it is created.
  2. A row is retrieved, heatmap, an array of floats stored in the binary form, is converted into a form more friendly for processing (in PHP).
  3. Values in the array are increased appropriately and the array is converted back.
  4. Row is changed via UPDATE query.

So far the advantages can be summed up like this:

First approach

Second approach

So, what should I do? If it wasn't for the rest of our system using MyISAM consistently, I'd go with the second approach, but currently I'm leaning to the first one. But maybe there are some reasons to favour one approach or another?


Solution

  • Second approach looks tempting at first sight, but it makes queries like "how many views for segment x of video y" unable to use an index on video.heatmap. Not sure if this is a real-life concern for you though. Also, you would have to parse back and forth the entire array every time you need data for one segment only.

    But first and foremost, your second solution is hackish (but interesting nonetheless). I wouldn't recommend denormalising your database until you face an acutal performance issue.

    Also, try populating the video_headmap table in advance with wiews = 0 as soon as a video is inserted (a trigger can help).

    If space is really a concern, remove your surrogate key video_headmap.id and instead make (video_id, position) the primary key (then get rid of the superfluous UNIQUE constraint). But this shouldn't come into the equation. 256 x 12 bytes per video (rough row length with 3 numeric columns, okay add some for the index) is only an extra 3kb per video!

    Finally, nothing prevents you from switching your current table to InnoDB and leverage its row-level locking capability.

    Please note I fail to undestand why views cannot be an UNSIGNED INT. I would recommend changing this type.