I am working on the Titanic dataset as my first project. To impute missing values of the variable 'Age', I had run a linear regression model. Now, I have 2 dataframes as follows -
train_data.tail()
Survived Pclass Sex Age SibSp Parch Fare Embarked
886 0 2 male 27.0 0 0 13.00 S
887 1 1 female 19.0 0 0 30.00 S
888 0 3 female NaN 1 2 23.45 S
889 1 1 male 26.0 0 0 30.00 C
890 0 3 male 32.0 0 0 7.75 Q
imp_age.head()
Age
859 27.0
863 -8.0
868 27.0
878 27.0
888 23.0
The second dataframe given above has values for age that I want to impute in place of 'NaN' values of first dataframe. Both the dataframes have this data under the column name 'Age'.
I tried running the following code to get the merged df -
merged_df = train_data.merge(imp_age,how='outer',left_index=True,right_index=True)
But the output creates an additional 'Age_y' column instead of merging it with the old column -
Survived Pclass Sex Age_x SibSp Parch Fare Embarked Age_y
886 0 2 male 27.0 0 0 13.00 S NaN
887 1 1 female 19.0 0 0 30.00 S NaN
888 0 3 female NaN 1 2 23.45 S 23.0
889 1 1 male 26.0 0 0 30.00 C NaN
890 0 3 male 32.0 0 0 7.75 Q NaN
Can someone please help me to get the below desired output. I have done lot of tos and fros on this but since I am new to Python, I am struggling a little -
Survived Pclass Sex Age SibSp Parch Fare Embarked
886 0 2 male 27.0 0 0 13.00 S
887 1 1 female 19.0 0 0 30.00 S
888 0 3 female 23.0 1 2 23.45 S
889 1 1 male 26.0 0 0 30.00 C
890 0 3 male 32.0 0 0 7.75 Q
Try fillna,
train_data['Age'] = train_data['Age'].fillna(imp_age['Age'])