I have 2 pandas dfs (df1 & df2) as seen here:
df1 | col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|---|
row1 | Dog | Cat | Bird | Tree | Lion |
row2 | Cat | Dragon | Bird | Dog | Tree |
row3 | Cat | Dog | Bird | Tree | Hippo |
row4 | Cat | Tree | Bird | Ant | Fish |
row5 | Cat | Tree | Monkey | Dragon | Ant |
df2 | col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|---|
row1 | 3.219843 | 1.3631996 | 1.0051135 | 0.89303696 | 0.4313375 |
row2 | 2.8661892 | 1.4396228 | 0.7863044 | 0.539315 | 0.48167187 |
row3 | 2.5679462 | 1.3657334 | 0.9470184 | 0.79186934 | 0.48637152 |
row4 | 3.631389 | 0.94815284 | 0.7561722 | 0.6743943 | 0.5441728 |
row5 | 2.4727197 | 1.5941181 | 1.4069512 | 1.064051 | 0.48297918 |
The string elements of the df1 correspond to the values of df2. For both dataframes the condition exists that an element (or a value) does not repeat on the same row. But can be repeated on different rows.
For example Dog of row1 = 3.219843, bird of row3 = 0.9470184, bird of row4 = 0.7561722 etc.
I would like to extract the values for all unique elements of the 1st df into different arrays. Like:
Dog = [3.219843, 0.539315, 1.3657334]
Cat = [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197]
etc...
Any ideas?
Many thanks!
Assuming that your first columns df1
and df2
are the index of their respective df
, we can extract the values for each unique animal in df1
by using the first df
as a mask to extract all wanted values from the second one (the result is a new df
with NaN
in irrelevant cells, which can be turned into a 1-dimensional array with .stack().values
).
First of, create some test data. Please provide it in a form like this in future posts. That's what @mozway was talking about in the comments. It is greatly appreciated.
(It's not always the case that somebody is willing to do all the copy-and-pasting necessary to get dataframes up and running for testing.)
import pandas as pd
import numpy as np
index = ['row1', 'row2', 'row3', 'row4', 'row5']
data1 = {'col1': ['Dog', 'Cat', 'Cat', 'Cat', 'Cat'],
'col2': ['Cat', 'Dragon', 'Dog', 'Tree', 'Tree'],
'col3': ['Bird', 'Bird', 'Bird', 'Bird', 'Monkey'],
'col4': ['Tree', 'Dog', 'Tree', 'Ant', 'Dragon'],
'col5': ['Lion', 'Tree', 'Hippo', 'Fish', 'Ant']}
data2 = {'col1': [3.219843, 2.8661892, 2.5679462, 3.631389, 2.4727197],
'col2': [1.3631996, 1.4396228, 1.3657334, 0.94815284, 1.5941181],
'col3': [1.0051135, 0.7863044, 0.9470184, 0.7561722, 1.4069512],
'col4': [0.89303696, 0.539315, 0.79186934, 0.6743943, 1.064051],
'col5': [0.4313375, 0.48167187, 0.48637152, 0.5441728, 0.48297918]}
df1 = pd.DataFrame(data1, index=index)
df2 = pd.DataFrame(data2, index=index)
Since you didn't specify what data structure you need, this is the strategy outlined above in a dict
comprehension:
{animal: df2[df1.eq(animal)].stack().values for animal in np.unique(df1)}
The result looks like this:
{'Ant': array([0.6743943 , 0.48297918]),
'Bird': array([1.0051135, 0.7863044, 0.9470184, 0.7561722]),
'Cat': array([1.3631996, 2.8661892, 2.5679462, 3.631389 , 2.4727197]),
'Dog': array([3.219843 , 0.539315 , 1.3657334]),
'Dragon': array([1.4396228, 1.064051 ]),
'Fish': array([0.5441728]),
'Hippo': array([0.48637152]),
'Lion': array([0.4313375]),
'Monkey': array([1.4069512]),
'Tree': array([0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181 ])}