pythonpandasdataframeindexingattributeerror

AttributeError 'dataframe' object has no attribute 'str'


I am trying to filter out the dataframe that contains a list of product. However, I am getting the error 'dataframe' object has no attribute 'str' whenever I run the code.

Here is the line of code:

include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]

Product is an object datatype.

import pandas as pd
import numpy as np

data = pd.read_csv("FILE.csv", header = None)

headerName = ["DRID", "Product", "M24", "M23", "M22", "M21"] 
data.columns = [headerName]

log_df = np.log(1 + data[["M24", "M23", "M22", "M21"]])
copy = data[["DRID", "Product"]].copy()
log_df = copy.join(log_df)

include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]

Here is the head:

       ID  PRODUCT       M24       M23       M22  M21
0  123421        A  0.000000  0.000000  1.098612  0.0   
1  141840        A  0.693147  1.098612  0.000000  0.0   
2  212006        A  0.693147  0.000000  0.000000  0.0   
3  216097        A  1.098612  0.000000  0.000000  0.0   
4  219517        A  1.098612  0.693147  1.098612  0.0

Solution

  • Short answer: change data.columns=[headerName] into data.columns=headerName

    Explanation: when you set data.columns=[headerName], the columns are MultiIndex object. Therefore, your log_df['Product'] is a DataFrame and for DataFrame, there is no str attribute.

    When you set data.columns=headerName, your log_df['Product'] is a single column and you can use str attribute.

    For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product'] into Series. After that, str attribute is available.

    products = pd.Series(df.Product.values.flatten())
    include_clique = products[products.str.contains("Product A")]
    

    However, I guess the first solution is what you're looking for