pythonpandasdataframeappendconcatenation

How can pandas concat function duplicate behavior of append function in pandas,


I've just inherited some code that uses pandas' append method. This code causes Pandas to issue the following warning:

The frame.append method is deprecated and will be removed from pandas 
in a future version. Use pandas.concat instead.

So, I want to use pandas.concat, without changing the behavior the append method gave. However, I can't.

Below I've recreated code that illustrates my problem. It creates an empty DataFrame with 31 columns and shape (0,31). When a new, empty row is appended to this DataFrame, the result has shape (1,31). In the code below, I've tried several ways to use concat and get the same behavior as append.

import pandas as pd

# Create Empty Dataframe With Column Headings
obs = pd.DataFrame(columns=['basedatetime_before', 'lat_before', 'lon_before', 
                            'sog_before', 
                            'cog_before', 
                            'heading_before', 
                            'vesselname_before', 'imo_before', 
                            'callsign_before', 
                            'vesseltype_before', 'status_before', 
                            'length_before', 'width_before', 
                            'draft_before',
                            'cargo_before', 
                            'basedatetime_after', 'lat_after', 
                            'lon_after', 
                            'sog_after', 
                            'cog_after', 'heading_after', 
                            'vesselname_after', 'imo_after', 
                            'callsign_after', 
                            'vesseltype_after', 'status_after', 
                            'length_after', 'width_after', 
                            'draft_after', 
                            'cargo_after'])

# Put initial values in DataFrame
desired = pd.Timestamp('2016-03-20 00:05:00+0000', tz='UTC')
obs['point'] = desired
obs['basedatetime_before'] = pd.to_datetime(obs['basedatetime_before'])
obs['basedatetime_after'] = pd.to_datetime(obs['basedatetime_after'])
obs.rename(lambda s: s.lower(), axis = 1, inplace = True)

# Create new 'dummy' row
new_obs = pd.Series([desired], index=['point'])

# Get initial Shape Information
print("Orig obs.shape", obs.shape)
print("New_obs.shape", new_obs.shape)
print("--------------------------------------")

# Append new dummy row to Data Frame
obs1 = obs.append(new_obs, ignore_index=True)

# Attempt to duplicate effect of append with concat
obs2 = pd.concat([obs, new_obs])
obs3 = pd.concat([obs, new_obs], ignore_index=True)
obs4 = pd.concat([obs, new_obs.T])
obs5 = pd.concat([obs, new_obs.T], ignore_index=True)
obs6 = pd.concat([new_obs, obs])
obs7 = pd.concat([new_obs, obs], ignore_index=True)
obs8 = pd.concat([new_obs.T, obs])
obs9 = pd.concat([new_obs.T, obs], ignore_index=True)

# Verify original DataFrame hasn't changed and append still works 
obs10 = obs.append(new_obs, ignore_index=True)

# Print results
print("----> obs1.shape",obs1.shape)
print("obs2.shape",obs2.shape)
print("obs3.shape",obs3.shape)
print("obs4.shape",obs4.shape)
print("obs5.shape",obs5.shape)
print("obs6.shape",obs6.shape)
print("obs7.shape",obs7.shape)
print("obs8.shape",obs8.shape)
print("obs9.shape",obs9.shape)
print("----> obs10.shape",obs10.shape)

However, every way I've tried to use concat to add a new row to the DataFrame results in a new DataFrame with shape (1,32). This can be seen in the results shown below:

    Orig obs.shape (0, 31)
    New_obs.shape (1,)
    --------------------------------------
    ----> obs1.shape (1, 31)
    obs2.shape (1, 32)
    obs3.shape (1, 32)
    obs4.shape (1, 32)
    obs5.shape (1, 32)
    obs6.shape (1, 32)
    obs7.shape (1, 32)
    obs8.shape (1, 32)
    obs9.shape (1, 32)
    ----> obs10.shape (1, 31) 

How can I use concat to add new_obs to the obs DataFrame and get a DataDrame with shape (1, 31) instead of (1,32)?


Solution

  • new_obs = pd.Series([desired], index=['point'])
    new_obs=pd.DataFrame(new_obs)
    new_obs.columns=['point']
    

    In Series data type, it does not contain "column name". Therefore in your original code, it will append into a table below as a undefined table column name. Please add a column name after converse it to dataframe type

    enter image description here