pythonpandassalesforcesoqlordereddictionary

parent-child relationship query in simple_salesforce python, extracting from ordered dicts


I'm trying to query information from salesforce using the simple_salesforce package in python.

The problem is that it's nesting fields that are a part of a parent-child relationship into an ordered dict within an ordered dict

I want.. from the Opportunity object, to find the id, and the accountid associated with that record.

The SOQL query may look like..

query = "select id, account.id from opportunity where closedate = last_n_days:5"

in SOQL (salesforce object query language), a dot denotes a parent child relationship in the database. So I'm trying to get the id from the opportunity object, and then the related id from the account object on that record.

for some reason the Id comes in fine, but the account.id is nested in an ordered dict within an ordered dict:

q = sf.query_all(query)

this pulls back an ordered dictionary..

OrderedDict([('totalSize', 455),
             ('done', True),
             ('records',
              [OrderedDict([('attributes',
                             OrderedDict([('type', 'Opportunity'),
                                          ('url',

I would pull the records piece of the ordereddict to create a df

df = pd.DataFrame(q['records'])

This gives me 3 columns, an ordered dict called 'attributes', Id and another ordered dict called 'Account'. I'm looking for a way to extract the ('BillingCountry', 'United States') piece out of the nested ordered dict 'Account'

[OrderedDict([('attributes',
               OrderedDict([('type', 'Opportunity'),
                            ('url',
                             '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])),
              ('Id', '0061B003451RhZgiHHF'),
              ('Account',
               OrderedDict([('attributes',
                             OrderedDict([('type', 'Account'),
                                          ('url',
                                           '/services/data/v34.0/sobjects/Account/001304300MviPPF3Z')])),
                            ('BillingCountry', 'United States')]))])

Edit: clarifying what I'm looking for.

I want to end with a dataframe with a column for each of the queried fields.

When I put the 'records' piece into a DataFrame using df = pd.DataFrame(sf.query_all(query)['records']) it gives me:

attributes  Id  Account
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])    0061B003451RhZgiHHF OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0013000000MvkRQQAZ')])), ('BillingCountry', 'United States')])
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001Pa52QQAR')]) 0061B00001Pa52QQAR  OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001vQPxqAAG')])), ('BillingCountry', 'United States')])
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001TRu5mQAD')]) 0061B00001TRu5mQAD  OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001rfRTrAAE')])), ('BillingCountry', 'United States')])

after I remove the 'attributes' column I want the output to be

Id BillingCountry
0061B003451RhZgiHHF 'United States'
0061B00001Pa52QQAR 'United States'
0061B00001TRu5mQAD 'United States'

Solution

  • Pandas is an amazing tool for tabular data. But while it can contain Python objects, that is not its sweet spot. I suggest you extract your data from the query prior to inserting them into a pandas.Dataframe:

    Extract records:

    To extract the desired fields as a list of dictionaries is as easy as:

    records = [dict(id=rec['Id'], country=rec['Account']['BillingCountry'])
               for rec in data['records']]
    

    Insert records into a dataframe:

    With a list of dicts, a dataframe is as easy as:

    df = pd.DataFrame(records)
    

    Test Code:

    import pandas as pd
    from collections import OrderedDict
    
    data = OrderedDict([
        ('totalSize', 455),
        ('done', True),
        ('records', [
            OrderedDict([
                ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])),
                ('Id', '0061B003451RhZgiHHF'),
                ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0013000000MvkRQQAZ')])),
                                         ('BillingCountry', 'United States')])),
            ]),
            OrderedDict([
                ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001Pa52QQAR')])),
                ('Id', '0061B00001Pa52QQAR'),
                ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001vQPxqAAG')])),
                                         ('BillingCountry', 'United States')])),
            ]),
            OrderedDict([
                ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001TRu5mQAD')])),
                ('Id', '0061B00001TRu5mQAD'),
                ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001rfRTrAAE')])),
                                         ('BillingCountry', 'United States')])),
            ]),
        ])
    ])
    
    records = [dict(id=rec['Id'], country=rec['Account']['BillingCountry'])
               for rec in data['records']]
    for r in records:
        print(r)
    
    print(pd.DataFrame(records))
    

    Test Results:

    {'country': 'United States', 'id': '0061B003451RhZgiHHF'}
    {'country': 'United States', 'id': '0061B00001Pa52QQAR'}
    {'country': 'United States', 'id': '0061B00001TRu5mQAD'}
    
             country                   id
    0  United States  0061B003451RhZgiHHF
    1  United States   0061B00001Pa52QQAR
    2  United States   0061B00001TRu5mQAD