I am working on a timesheet application for a client. The User model in django is used for storing user data. But I am using a timesheet table to store User information related to the company (like role/position, client, project name, hours spent, feedback rating). I want to store the hours worked each day in a separate table with a foreign key relationship(to the timesheet data table), and create a separate table for each user to store their hours worked data separately.
Is this possible in Django? Or storing the all the worked hours data in a single table seems easy, but over time it will get accumulated and the retrieval would be slow? Since it will be accumulating 150K records every year (600 employees, 250 work days in a year, using sqlite db).
Is this possible in Django?
Possible yes, but often not a good idea. Usually the retrieval will not be slowed down by the number of records.
Most databases work with indexes [SQLite-doc]: these are data structures that make it easy to access records that have a FOREIGN KEY
to a specific record for example.
Several databases automatically set an index on a column that has a FOREIGN KEY
constraint, exactly because it is an almost trivial use-case to retrieve the records beloning to a certain related table record. SQLite does not, but that is not much of an issue, because Django automatically adds one for you.
Such index thus allows very fast retrieval of the location of the records related, and then the database can fetch the actual records without having to go through the individual records.
This means that search does not scale with the total size of the table, but with the size of the records to retrieve.
Since it will be accumulating 150K records every year
Even if that is true, and even if the database would not search in the most optimal way, that is still not very much. Databases can often work with millions of records.
What you describe is sharding [wiki] and sharding can sometimes help, but it is often quite complex, and painful to get it exactly right, especially if eventually a user needs access to a different shard. Unless the performance is terrible, and all else tried fails, I think it is not a good idea to do sharding, since you have more tables. This means that querying is more complex, it is harder to make aggregates: if a manager is supposed to easily determine for all the employees how much they worked last month, it is no longer a single JOIN, but results in a complex query where you JOIN on the table of each of the related users.