pythonpandaschained-assignment

Pandas: Subindexing dataframes: Copies vs views


Say I have a dataframe

import pandas as pd
import numpy as np
foo = pd.DataFrame(np.random.random((10,5)))

and I create another dataframe from a subset of my data:

bar = foo.iloc[3:5,1:4]

does bar hold a copy of those elements from foo? Is there any way to create a view of that data instead? If so, what would happen if I try to modify data in this view? Does Pandas provide any sort of copy-on-write mechanism?


Solution

  • Your answer lies in the pandas docs: returning-a-view-versus-a-copy.

    Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

    In your example, bar is a view of slices of foo. If you wanted a copy, you could have used the copy method. Modifying bar also modifies foo. pandas does not appear to have a copy-on-write mechanism.

    See my code example below to illustrate:

    In [1]: import pandas as pd
       ...: import numpy as np
       ...: foo = pd.DataFrame(np.random.random((10,5)))
       ...: 
    
    In [2]: pd.__version__
    Out[2]: '0.12.0.dev-35312e4'
    
    In [3]: np.__version__
    Out[3]: '1.7.1'
    
    In [4]: # DataFrame has copy method
       ...: foo_copy = foo.copy()
    
    In [5]: bar = foo.iloc[3:5,1:4]
    
    In [6]: bar == foo.iloc[3:5,1:4] == foo_copy.iloc[3:5,1:4]
    Out[6]: 
          1     2     3
    3  True  True  True
    4  True  True  True
    
    In [7]: # Changing the view
       ...: bar.ix[3,1] = 5
    
    In [8]: # View and DataFrame still equal
       ...: bar == foo.iloc[3:5,1:4]
    Out[8]: 
          1     2     3
    3  True  True  True
    4  True  True  True
    
    In [9]: # It is now different from a copy of original
       ...: bar == foo_copy.iloc[3:5,1:4]
    Out[9]: 
           1     2     3
    3  False  True  True
    4   True  True  True