I am analyzing data with phone numbers which I need to match to the country and the operator. I have received the country and destination (city/operator) mappings of the phone numbers prefixes in the following form:
Country, Destination, Country Code, Destination Code, Remarks
AAA, Some Mobile, 111, "12, 23, 34, 46",Some remarks
AAA, Some city A, 111, "55, 56, 57, 51", Some more remarks
BBB, Some city B, 222, "234, 345, 456", Other remarks
The data here is dummy data but the real data is of the same form. There are a lot of values in the "Destination Code" column. So I want to convert this file into a form suitable for using inside a database.
What I've thought of is converting it into the form shown below:
Country, Destination, Combined Code, Remarks
AAA, Some Mobile, 11112, Some remarks
AAA, Some Mobile, 11123, Some remarks
AAA, Some Mobile, 11134, Some remarks
AAA, Some Mobile, 11146, Some remarks
etc..
This would enable me to create a simpler mapping table. What is the best way to deal with data like this? How would I write a code in bash shell script or python for this conversion?
>>> data = [['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks'],
... ['AAA', 'Some Mobile', '111', '12, 23, 34, 46','Some remarks'],
... ['AAA', 'Some city A', '111', '55, 56, 57, 51', 'Some more remarks'],
... ['BBB', 'Some city B', '222', '234, 345, 456', 'Other remarks']]
>>>
>>> op=[data[0]]
>>> for i in data[1:]:
... for j in i.pop(3).split(','):
... op.append([k+j.strip() if i.index(k)==2 else k for k in i])
...
>>> for i in op:
... print i
...
['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks']
['AAA', 'Some Mobile', '11112', 'Some remarks']
['AAA', 'Some Mobile', '11123', 'Some remarks']
['AAA', 'Some Mobile', '11134', 'Some remarks']
['AAA', 'Some Mobile', '11146', 'Some remarks']
['AAA', 'Some city A', '11155', 'Some more remarks']
['AAA', 'Some city A', '11156', 'Some more remarks']
['AAA', 'Some city A', '11157', 'Some more remarks']
['AAA', 'Some city A', '11151', 'Some more remarks']
['BBB', 'Some city B', '222234', 'Other remarks']
['BBB', 'Some city B', '222345', 'Other remarks']
['BBB', 'Some city B', '222456', 'Other remarks']
Solution for your updated problem:
>>> data = [['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks'],
... ['AAA', 'Some Mobile', '111', '12, 23, 34, 46','Some remarks'],
... ['AAA', 'Some city A', '111', '55, 56, 57, 51', 'Some more remarks'],
... ['BBB', 'Some city B', '222', '234, 345, 456', 'Other remarks']]
>>>
>>> op=[data[0]]
>>> for i in data[1:]:
... for id,j in enumerate(i.pop(3).split(',')):
... k=i[:]
... k.insert(3,i[2]+j.strip())
... op.append(k)
...
>>> for i in op:
... print i
...
['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks']
['AAA', 'Some Mobile', '111', '11112', 'Some remarks']
['AAA', 'Some Mobile', '111', '11123', 'Some remarks']
['AAA', 'Some Mobile', '111', '11134', 'Some remarks']
['AAA', 'Some Mobile', '111', '11146', 'Some remarks']
['AAA', 'Some city A', '111', '11155', 'Some more remarks']
['AAA', 'Some city A', '111', '11156', 'Some more remarks']
['AAA', 'Some city A', '111', '11157', 'Some more remarks']
['AAA', 'Some city A', '111', '11151', 'Some more remarks']
['BBB', 'Some city B', '222', '222234', 'Other remarks']
['BBB', 'Some city B', '222', '222345', 'Other remarks']
['BBB', 'Some city B', '222', '222456', 'Other remarks']