pythonpython-2.7textgroupingfilesplitting

Splitting data into alternating groups in python 2.7


          day         city  temperature  windspeed   event

        2017-01-01  new york           32          6    Rain
        2017-01-02  new york           36          7   Sunny
        2017-01-03  new york           28         12    Snow
        2017-01-04  new york           33          7   Sunny
        2017-01-05  new york           31          7    Rain
        2017-01-06  new york           33          5   Sunny
        2017-01-07  new york           27         12    Rain
        2017-01-08  new york           23          7  Rain
        2017-01-01    mumbai           90          5   Sunny
        2017-01-02    mumbai           85         12     Fog
        2017-01-03    mumbai           87         15     Fog
        2017-01-04    mumbai           92          5    Rain
        2017-01-05    mumbai           89          7   Sunny
        2017-01-06    mumbai           80         10     Fog
        2017-01-07    mumbai           85         9     Sunny
        2017-01-08    mumbai           89          8    Rain
        2017-01-01     paris           45         20   Sunny
        2017-01-02     paris           50         13  Cloudy
        2017-01-03     paris           54          8  Cloudy
        2017-01-04     paris           42         10  Cloudy
        2017-01-05     paris           43         20   Sunny
        2017-01-06     paris           48         4  Cloudy
        2017-01-07     paris           40          14  Rain
        2017-01-08     paris           42         15  Cloudy
        2017-01-09     paris           53         8  Sunny

The above shows the .txt file.

My goal is to create 4 groups as evenly distributed as possible, containing all the cities, meaning that each group has 'new york','mumbai','paris'.

Since there are 25 data, 3 groups will have 6 lines while 1 group will have 7 lines.

What I have in mind right now is that, since the data are already sorted by their city, I can read the text file lines by lines and then for each line, i will append it to 4 groups (G1-G4) in an alternating pattern. Meaning to say, the first line, it will append it to G1, then 2nd line to G2, 3rd to G3, 4th to G4 , 5th will append back to G1, 6th append to G2 and so on. This can ensure that all the groups have all the 3 cities.

Is it possible to code in this way?

Expected result:

G1: Row/Line 1 , Row 5, Row 9,

G2: Row 2, Row 6, Row 10,

G3: Row 3, Row 7, Row 11,

G4: Row 4, Row 8, Row 12, and so on.


Solution

  • Since your input is already sorted, you can split the string into a list and then slice them using a step of 4:

    data = '''        2017-01-01  new york           32          6    Rain
            2017-01-02  new york           36          7   Sunny
            2017-01-03  new york           28         12    Snow
            2017-01-04  new york           33          7   Sunny
            2017-01-05  new york           31          7    Rain
            2017-01-06  new york           33          5   Sunny
            2017-01-07  new york           27         12    Rain
            2017-01-08  new york           23          7  Rain
            2017-01-01    mumbai           90          5   Sunny
            2017-01-02    mumbai           85         12     Fog
            2017-01-03    mumbai           87         15     Fog
            2017-01-04    mumbai           92          5    Rain
            2017-01-05    mumbai           89          7   Sunny
            2017-01-06    mumbai           80         10     Fog
            2017-01-07    mumbai           85         9     Sunny
            2017-01-08    mumbai           89          8    Rain
            2017-01-01     paris           45         20   Sunny
            2017-01-02     paris           50         13  Cloudy
            2017-01-03     paris           54          8  Cloudy
            2017-01-04     paris           42         10  Cloudy
            2017-01-05     paris           43         20   Sunny
            2017-01-06     paris           48         4  Cloudy
            2017-01-07     paris           40          14  Rain
            2017-01-08     paris           42         15  Cloudy
            2017-01-09     paris           53         8  Sunny'''
    lines = data.splitlines()
    groups = [lines[i::4] for i in range(4)]
    for g in groups:
        print(g)
    

    This outputs:

    ['        2017-01-01  new york           32          6    Rain', '        2017-01-05  new york           31          7    Rain', '        2017-01-01    mumbai           90          5   Sunny', '        2017-01-05    mumbai           89          7   Sunny', '        2017-01-01     paris           45         20   Sunny', '        2017-01-05     paris           43         20   Sunny', '        2017-01-09     paris           53         8  Sunny']
    ['        2017-01-02  new york           36          7   Sunny', '        2017-01-06  new york           33          5   Sunny', '        2017-01-02    mumbai           85         12     Fog', '        2017-01-06    mumbai           80         10     Fog', '        2017-01-02     paris           50         13  Cloudy', '        2017-01-06     paris           48         4  Cloudy']
    ['        2017-01-03  new york           28         12    Snow', '        2017-01-07  new york           27         12    Rain', '        2017-01-03    mumbai           87         15     Fog', '        2017-01-07    mumbai           85         9     Sunny', '        2017-01-03     paris           54          8  Cloudy', '        2017-01-07     paris           40          14  Rain']
    ['        2017-01-04  new york           33          7   Sunny', '        2017-01-08  new york           23          7  Rain', '        2017-01-04    mumbai           92          5    Rain', '        2017-01-08    mumbai           89          8    Rain', '        2017-01-04     paris           42         10  Cloudy', '        2017-01-08     paris           42         15  Cloudy']