mysqlsql-serverdatabasedatabase-administrationdata-exchange

best approach to exchanging data dumps between organizations


I am working a project where I will receive student data dumps once a month. The data will be imported into my system. The initial import will be around 7k records. After that, I don't anticipate more than a few hundred a month. However, there will also be existing records that will be updated as the student changes grades, etc.

I am trying to determine the best way to keep track of what has been received, imported, and updated over time.

I was thinking of setting up a hosted MySQL database with a script that imports the SFTP dump into a table that includes a creation_date and a modification_date field. My thought was, the person performing the extraction, could connect to the MySQL db and run a query on the imported table each month to get the differences before the next extraction.

Another thought I had, was to create a new received table every month for each data dump. Then I would perform the query on the differences.

Note: The importing system is legacy and will accept imports using a utility and unique csv type files. So that probably rules out options like XML.

Thank you in advance for any advice.


Solution

  • I'm going to assume you're tracking students' grades in a course over time.

    I would recommend a two table approach:

    Table 1: transaction level data. Add-only. New information is simply appended on. Sammy got a 75 on this week's quiz, Beth did 5 points extra credit, etc. Each row is a single transaction. Presumably it has the student's name/id, the value being added, maybe the max possible value or some weighting factor, and of course the timestamp added. All of this just keeps adding to a never-ending (in theory) table.

    Table 2: summary table, rebuilt at some interval. This table does a simple aggregation on the first table, processing the transactional scores into a global one. Maybe it's a simple sum, maybe it's a weighted average, maybe you have something more complex in mind. This table has one row per student (per course?). You want this to be rebuilt nightly. If you're lazy, you just DROP/CREATE/INSERT. If you're worried about data-loss, you just INSERT and add a timestamp so you can have snapshots going back.