I have a task to complete.
There are two types of csv files 4000+
both related to each other.
2 types are:
1. Country2.csv
2. Security_Name.csv
Contents of Country2.csv
:
Company Name;Security Name;;;;Final NOS;Final FFR
Contents of Security_Name.csv
:
Date;Close Price;Volume
There are multiple countries and for each country multiple security files
Now I need to READ them do some CALCULATION and then WRITE the output in another files
READ
Read both the file Country 2.csv and Security.csv and extract all the data from them.
For example :
Read France 2.csv, extract Security_Name, Final NOS, Final FFR
Then Read Security.csv(which matches the Security_Name
) and extract
Date, Close Price, Volume
Calculation
Calculations are basically finding Median of the values extracted which is quite simple.
For Example:
Monthly Median Traded Values
Daily Traded Value of a Security ... and so on
Write
Based on the month I need to sort the output in two different file with following formats:
If Month % 3 = 0
Save It as MONTH_NAME.csv in following format:
Security name; 12-month indicator; 3-month indicator; FOT
Else
Save It as MONTH_NAME.csv in following format:
Security Name; Monthly Median Traded Value Ratio; Number of days Volume > 0
My question is how do I design my application in such a way that it is maintainable and the flow of data throughout the execution is seamless?
So first thing. Based on the kind of data you are looking to generate, I would probably be looking at moving this data to a SQL db if at all possible. This is "one SQL query" kind of stuff. And far more maintainable than C++ that generates CSV files from CSV files.
Barring that, I would probably look at using datamash
and/or perl
. On a Windows platform, you could do this through Cygwin or WSL. Probably less maintainable, but so much easier it's not too much of an issue.
That said, if you're looking for something moderately maintainable, C++ could work. The first thing I would do is design my input classes. Data-centric, but it can work. It sounds like you could have a Country
class, a Security
class, and a SecurityClose
class...or something along those lines. You can think about whether a Security class should contain a collection of SecurityClose
es (data), or whether the data should just be "loose" and reference the Security
it belongs to. Same with the Country->Security relationship.
Once you've decided how all that's going to look, you want something (likely a function) that can tokenize
a CSV line. So "1,2,3"
gets turned into a vector<string>
with the contents "1"
"2"
"3"
. Then, each of your input classes should have a constructor or initializer that takes a vector<string>
and populates itself. You might need to pass higher level data along too. Like the filename if you want the security data to know which security it belongs to..
That's basically most of the battle there. Once you've pulled your data into sensibly organized classes, the rest should come more easily. And if you run into bumps, hopefully you can ask specific design or implementation questions from there.