pythonpandasdataframe

Python Dataframe Extract Colum Data Based cell value


From the dataframe, unique values are taken in "Region" and would like to extract the "Name" However, when one Name is in Region, it spells out the name (l-i-n-d-a instead of linda)

   import pandas as pd
   data = {
        'Region': ['South','West', 'North','West', 'North',  'South','West', 'North', 'East'],
        'Name': ['Tom', 'nick', 'krish', 'jack','peter','sam','jon','megan','linda']
    }
  
   df = pd.DataFrame(data)
   list_region = df['Region'].unique().tolist()
   for region in list_region:
     list_person = df.set_index('Region').loc[region, 'Name']
     for person in list_person:
       print(region + ' >> ' + person)

partial output as below, linda was spelled out

North >> megan
East >> l
East >> i
East >> n
East >> d
East >> a

Solution

  • You could use the value_counts() function, and then get only the index of the result:

    
    import pandas as pd
    data = {
        'Region': [
            'South', 'West', 'North', 'West', 'North',
            'South', 'West', 'North', 'East'
        ],
        'Name': [
            'Tom', 'nick', 'krish', 'jack', 'peter', 
            'sam', 'jon', 'megan', 'linda'
        ]
    }
      
    df = pd.DataFrame(data)
    
    combinations = df.value_counts().index.to_list()
    

    which yields:

    [('East', 'linda'), ('North', 'krish'), ('North', 'megan'),
     ('North', 'peter'), ('South', 'Tom'), ('South', 'sam'),
     ('West', 'jack'), ('West', 'jon'), ('West', 'nick')]
    

    and then for the formatting:

    for item in combinations:
        print(item[0] + ' >> ' + item[1])
    

    which outputs:

    East >> linda
    North >> krish
    North >> megan
    North >> peter
    South >> Tom
    South >> sam
    West >> jack
    West >> jon
    West >> nick