pythonpandasdataframetruncated

4 last characters truncated columns


I would like to :

  1. Extract from df a movies DataFrame, containing only movies (movie content).

  2. Create a new variable duree at movies, which contains the values of the duration variable with the last 4 characters truncated.

  3. Change the type of the variable duration to int.

1. movies = df[df['type'] == 'Movie']
2. movies['duration'] = movies['duration'].astype(str).str[:-4]
3. movies['duration'].astype(int)

I can't create my new variable duree which contains the values of the duration variable with the last 4 characters truncated


Solution

  • This is a warning that occurs because the movies is a slice of your original df, so there is some ambiguity in pandas about which DataFrames should be modified because movies is directly derived from df - this is called chained assignment. Currently the way you have your code structured, pandas will modify the movies DataFrame without modifying df but this could lead to some unintended behaviors with more complex operations.

    For your purposes, you can avoid chained assignment by setting movies to be a copy so it is not connected to df: movies = df[df['type'] == 'Movie'].copy()

    If you are interested in a more in-depth discussion about chained assignment and why this warning occurs, there is already a great stackoverflow answer here.