Below is the sample data -
({'age': 61,
'name': ['Emiko', 'Oliver'],
'occupation': 'Medical Student',
'telephone': '166.814.5565',
'address': {'address': '645 Drumm Line', 'city': 'Kennewick'},
'credit-card': {'number': '3792 459318 98518', 'expiration-date': '12/23'}},
{'age': 54,
'name': ['Wendolyn', 'Ortega'],
'occupation': 'Tractor Driver',
'telephone': '1-975-090-1672',
'address': {'address': '1274 Harbor Court', 'city': 'Mustang'},
'credit-card': {'number': '4600 5899 6829 6887',
'expiration-date': '11/25'}})
We can apply filter on the dask bag root elemnets as below. b.filter(lambda record: record['age'] > 30).take(2) # Select only people over 30
However I need to access the nested element i.e credit-card.expiration-date Any help will be appriciated.
You can simply do this:
import dask.bag as db
data = ({'age': 61,
'name': ['Emiko', 'Oliver'],
'occupation': 'Medical Student',
'telephone': '166.814.5565',
'address': {'address': '645 Drumm Line', 'city': 'Kennewick'},
'credit-card': {'number': '3792 459318 98518', 'expiration-date': '12/23'}},
{'age': 54,
'name': ['Wendolyn', 'Ortega'],
'occupation': 'Tractor Driver',
'telephone': '1-975-090-1672',
'address': {'address': '1274 Harbor Court', 'city': 'Mustang'},
'credit-card': {'number': '4600 5899 6829 6887',
'expiration-date': '11/25'}})
bag = db.from_sequence(data)
result = bag.map(lambda record: record['credit-card']['expiration-date']).compute()
print(result)
which returns
['12/23', '11/25']
In those cases where you have several cards per individual, do this:
import dask.bag as db
data = ({
'age': 61,
'name': ['Emiko', 'Oliver'],
'occupation': 'Medical Student',
'telephone': '166.814.5565',
'address': {'address': '645 Drumm Line', 'city': 'Kennewick'},
'credit-card': {'number': '3792 459318 98518', 'expiration-date': '12/23'}
},
{
'age': 54,
'name': ['Wendolyn', 'Ortega'],
'occupation': 'Tractor Driver',
'telephone': '1-975-090-1672',
'address': {'address': '1274 Harbor Court', 'city': 'Mustang'},
'credit-card': [
{'number': '4600 5899 6829 6887', 'expiration-date': '11/25'},
{'number': '4610 5899 6829 6887', 'expiration-date': '11/26'},
]
})
bag = db.from_sequence(data)
result = bag.map(lambda record: record['credit-card']['expiration-date']
if isinstance(record['credit-card'], dict)
else [card['expiration-date'] for card in record['credit-card']]).compute()
print(result)
which will return
['12/23', ['11/25', '11/26']]