im designing a system where i will have multiple users uploading large amount of data. My initial example is 100 users uploading 100Mb each every day.
I need to get the data, insert it into a database, process the data in the database (ETL) and then use the "polished" data for analysis.
The uploaded files will be received in chunks of 65k (initial design).
To avoid getting bottlenecks im thinking on building this using a MSMQ where i put the data into the MQ and then pass it on to different "programs/tools" that will process the data and in turn signal to the ETL tool via MSMQ to start doing its thing.
Alternatively im thinking on a "linear" approach:
--> receive data
--> save data to sql
--> wait for upload finish (run the two above until no more chunks)
--> signal the ETL to do its thing
--> When ETL is done report "Done" to callee
Which approach seem to be the better one? Is there any alternatives to look into? The ambition is to have several thousands of users... As far as i see this approach it locks the client/downloader.
I prefer the first approach. The advantage over the second approach would be that you can send and process the MSMQ messages asynchronously and have them transactional secure with very little efort.
Not that the second efford would not work - but the first looks like much less effort to me.
I also suggest that you might want to look at some frameworks that sit on top of the MSMQ. As a C# programmer I can recommend NServiceBus - but I do not know what you might using.