pythonpandascount

Looping over Pandas columns while counting rows that contain two specified values


I am trying to count each row in Pandas dataframe that contain two specified values, but the problem is that the values can be in any column. How can I loop the part with;

for col in con.columns:
    counts += len(con[(con[col]==name1)])

...so that I have two conditions, that require name1 and name2 to exist on the same row? I would use the cross tabulation, but the dataset is way too large for that.

con = pd.read_csv("keyword connections.txt", sep="\t")
key = pd.read_csv("keyword short list.txt", sep="\t")    
text_file = open("Output.txt", "w")
    
i = 0
j = 0
k = 0
for i in range(0,key.size):
    name1 = key.iloc[i]["Keyword"]
    for j in range(0,key.size):
        name2 = key.iloc[j]["Keyword"]
        counts = 0
        for col in con.columns:
            counts += len(con[(con[col]==name1)])
        text_file.write(name1+"\t"+name2+"\t"+str(counts)+"\n")
        j += 1
        k += 1
    i += 1
text_file.close()
print(k)

Solution

  • Assuming this example:

        col1   col2   col3
    0  name1  other  name2
    1  other  name2  other
    2  name2  name1  other
    3  other  other  other
    

    You could use several masks built with eq and any and combine them with & to perform boolean indexing:

    m1 = df.eq('name1').any(axis=1)
    m2 = df.eq('name2').any(axis=1)
    
    out = df[m1 & m2]
    

    Or, from a list of values:

    names = ['name1', 'name2']
    
    out = df[np.logical_and.reduce([df.eq(n).any(axis=1)
                                    for n in names])]
    

    Alternatively, you could use a set of the wanted values and agg:

    S = {'name1', 'name2'}
    
    out = df[df.agg(S.issubset, axis=1)]
    

    Output:

        col1   col2   col3
    0  name1  other  name2
    2  name2  name1  other
    

    Intermediates:

        col1   col2   col3     m1     m2  m1&m2                 as_set  S.issubset
    0  name1  other  name2   True   True   True  {name1, other, name2}        True
    1  other  name2  other  False   True  False         {other, name2}       False
    2  name2  name1  other   True   True   True  {name1, other, name2}        True
    3  other  other  other  False  False  False                {other}       False