Keeping track of changes to a database must be a big concern for lots of people, but it seems that the big names have software for that.
My question is for a small SQL database with 10 tables, <10 columns each, using joins to create a "master" junction table: is there a downside to updating a few times per year by adding rows (with a lot of duplicate information) and then taking the MAX id (PK) to generate and post on a website the most recent data in tabular form (excerpted from the "master")? This versus updating the records, in which I'll lose information on the values at a particular moment.
A typical row for teacher contact information would have fName, lName, schoolName, [address & phone info]; for repertoire or audition information: year, instrument, piece, composer, publisher/edition.
Others have asked about tracking db changes, but only one recently, and not with a lot of votes/details: How to track data changes in a database table Keeping history of data revisions - best practice? How to track data changes in a database table
This lightweight solution seems promising, but I don't know if it didn't get votes because it's not helpful, or because folks just weren't interested. How to keep track of changes to data in a table?
more background if needed: I'm a music teacher (i.e. amateur programmer) maintaining a Joomla website for our organization. I'm using a Joomla plugin called Sourcerer to create dynamic content (PHP/SQL to the Joomla database) to make it easier to communicate changes (dates, personnel, rules, repertoire, etc.) For years, this was done with static pages (and paper handbooks) that took days to update.
I also, however, want to be able to look back and see the database state at a particular time: who taught where, what audition piece was listed, etc., as we could with paper versions. NOTE: I'm not tracking HTML changes, only that information fed from the database.
The code I'm using now to generate the "master junction table." I would modify this to "INSERT into" for my new rows and query from it via Sourcerer to post the information online.
CREATE TABLE 011people_to_schools_junction
AS (
SELECT *
FROM (
SELECT a.peopleID, a.districtID, a.firstName, a.lastName, a.statusID, c.schoolName
FROM 01People a
INNER JOIN (
SELECT districtID, MAX(peopleID) peopleID
FROM 01People
GROUP BY districtID
) b
ON a.districtID = b.districtID
AND a.peopleID = b.peopleID
INNER JOIN (
SELECT schoolID, MAX(peopleID) peopleID
FROM 01people_to_schools_junction ab
GROUP BY schoolID
) z
ON z.peopleID = a.peopleID
LEFT JOIN 01Schools c
ON c.schoolID = z.schoolID
WHERE z.schoolID IS NOT NULL
OR z.peopleID IS NOT NULL
ORDER BY c.schoolName
) t1
);
#Add a primary key as the first column
ALTER TABLE 011people_to_schools_junction
ADD COLUMN 011people_to_schoolsID INT NOT NULL AUTO_INCREMENT FIRST,
ADD PRIMARY KEY (011people_to_schoolsID);
To answer your questions in order:
Is there a downside?
Of course, and it's performance - related. If you add a million records each year, it will hurt performances; and occupy space on disk.
Where the suggestions in the linked question bad or just not popular?
The question and answers are good; but the right answer depends on your specific use case: are you doing it for legal reasons, how fast you want to be able to access the data, how much data and updates you have, how much you want your history functionality to last without changes... only if it met your use case you would vote.
As a rule of thumb, history should go to a different table, this would provide several advantages:
In order to choose whether to have a single history table or several (one per backed up table) depends on how you plan to retrieve the data and what you want to do with it:
if you mirror each of your tables adding a timestamp and the user id, your code would need little modifications; but you'd end up with twice as many tables, and any structure change would then need to be replicated on the history table as well;
if you build a single history table with the timestamp, the user id, the table name and a json representation of the record, you will have an easier life building it, while for retrieving it you should access the data using an Object per row i.e. using Joomla's dbo getObjectList(), then the objects will be the same format you store in the history table and the changes there will be fairly easy. But querying for changes across specific tables/fields will be much harder.
Keep in mind that having data is useless if you can't retrieve it properly.
Since you mention pushing to the website a few times a year, the overhead of the queries should not be an issue (if you update monthly, waiting 5 minutes may not be a problem).
You should seek the best solution based on the other uses of this data: for it to be useful to anyone, you will have to implement a system to retrieve historical data. If phpmyadmin is enough, well look no further.
I hope this scared you. Either way it's a lot of hard work.
If you just want to be able to look up old data, you may instead store a copy of the markup/output you generate from time to time, and save it to different folders on the webserver. This will take minutes to set up, and be extremely reliable.
Sure, it's more fun to code it. But are you really sure you need it? And you can keep the database dumps just in case one day you change your mind.