I'm trying to query information from salesforce using the simple_salesforce
package in python.
The problem is that it's nesting fields that are a part of a parent-child relationship into an ordered dict within an ordered dict
I want.. from the Opportunity object, to find the id, and the accountid associated with that record.
The SOQL query may look like..
query = "select id, account.id from opportunity where closedate = last_n_days:5"
in SOQL (salesforce object query language), a dot denotes a parent child relationship in the database. So I'm trying to get the id from the opportunity object, and then the related id from the account object on that record.
for some reason the Id comes in fine, but the account.id is nested in an ordered dict within an ordered dict:
q = sf.query_all(query)
this pulls back an ordered dictionary..
OrderedDict([('totalSize', 455),
('done', True),
('records',
[OrderedDict([('attributes',
OrderedDict([('type', 'Opportunity'),
('url',
I would pull the records
piece of the ordereddict
to create a df
df = pd.DataFrame(q['records'])
This gives me 3 columns, an ordered dict called 'attributes'
, Id
and another ordered dict called 'Account'
. I'm looking for a way to extract the ('BillingCountry', 'United States')
piece out of the nested ordered dict 'Account'
[OrderedDict([('attributes',
OrderedDict([('type', 'Opportunity'),
('url',
'/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])),
('Id', '0061B003451RhZgiHHF'),
('Account',
OrderedDict([('attributes',
OrderedDict([('type', 'Account'),
('url',
'/services/data/v34.0/sobjects/Account/001304300MviPPF3Z')])),
('BillingCountry', 'United States')]))])
Edit: clarifying what I'm looking for.
I want to end with a dataframe with a column for each of the queried fields.
When I put the 'records'
piece into a DataFrame using df = pd.DataFrame(sf.query_all(query)['records'])
it gives me:
attributes Id Account
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')]) 0061B003451RhZgiHHF OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0013000000MvkRQQAZ')])), ('BillingCountry', 'United States')])
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001Pa52QQAR')]) 0061B00001Pa52QQAR OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001vQPxqAAG')])), ('BillingCountry', 'United States')])
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001TRu5mQAD')]) 0061B00001TRu5mQAD OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001rfRTrAAE')])), ('BillingCountry', 'United States')])
after I remove the 'attributes'
column I want the output to be
Id BillingCountry
0061B003451RhZgiHHF 'United States'
0061B00001Pa52QQAR 'United States'
0061B00001TRu5mQAD 'United States'
Pandas is an amazing tool for tabular data. But while it can contain Python objects, that is not its sweet spot. I suggest you extract your data from the query prior to inserting them into a pandas.Dataframe
:
To extract the desired fields as a list of dictionaries is as easy as:
records = [dict(id=rec['Id'], country=rec['Account']['BillingCountry'])
for rec in data['records']]
With a list of dicts, a dataframe is as easy as:
df = pd.DataFrame(records)
import pandas as pd
from collections import OrderedDict
data = OrderedDict([
('totalSize', 455),
('done', True),
('records', [
OrderedDict([
('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])),
('Id', '0061B003451RhZgiHHF'),
('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0013000000MvkRQQAZ')])),
('BillingCountry', 'United States')])),
]),
OrderedDict([
('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001Pa52QQAR')])),
('Id', '0061B00001Pa52QQAR'),
('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001vQPxqAAG')])),
('BillingCountry', 'United States')])),
]),
OrderedDict([
('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001TRu5mQAD')])),
('Id', '0061B00001TRu5mQAD'),
('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001rfRTrAAE')])),
('BillingCountry', 'United States')])),
]),
])
])
records = [dict(id=rec['Id'], country=rec['Account']['BillingCountry'])
for rec in data['records']]
for r in records:
print(r)
print(pd.DataFrame(records))
{'country': 'United States', 'id': '0061B003451RhZgiHHF'}
{'country': 'United States', 'id': '0061B00001Pa52QQAR'}
{'country': 'United States', 'id': '0061B00001TRu5mQAD'}
country id
0 United States 0061B003451RhZgiHHF
1 United States 0061B00001Pa52QQAR
2 United States 0061B00001TRu5mQAD