pythonpandasdataframelinearmodels

Python Linearmodels: How to let Python know these are ID columns to identify Group?


I would like to run a panel regression (fixed-effect model) on a group of individuals, which are uniquely identified by province and city, across time t.

Code to create dataframe and run the regression

import numpy as np
import pandas as pd
from linearmodels import PanelOLS
data = {'y':[1,2,3,1,0,3],
        'x1': [0,1,2,3,0,2],
        'x2':[1,1,3,2,1,0],
        't':  ['2020-02-18', '2020-02-18', '2020-02-17', '2020-02-18', '2020-02-18', '2020-02-17'],
        'province': ['A', 'A','A','B','B','B'],
        'city': ['a','b','a','a','c','a']}
dataframe = pd.DataFrame (data, columns = ['y','x1', 'x2', 't', 'province', 'city'])

dataframe=dataframe.set_index(['t','province','city'], append=True)
mod = PanelOLS(dataframe.y, dataframe[['x1','x2']], entity_effects=True)

But I got an error which says "DataFrame input must have a MultiIndex with 2 levels."

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-eb5264bfefc9> in <module>
      1 dataframe=dataframe.set_index(['t','province','city'], append=True)
----> 2 mod = PanelOLS(dataframe.y, dataframe[['x1','x2']], entity_effects=True)

C:\ProgramData\Anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights, entity_effects, time_effects, other_effects, singletons, drop_absorbed)
   1038         drop_absorbed: bool = False,
   1039     ) -> None:
-> 1040         super(PanelOLS, self).__init__(dependent, exog, weights=weights)
   1041 
   1042         self._entity_effects = entity_effects

C:\ProgramData\Anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights)
    224         weights: Optional[PanelDataLike] = None,
    225     ) -> None:
--> 226         self.dependent = PanelData(dependent, "Dep")
    227         self.exog = PanelData(exog, "Exog")
    228         self._original_shape = self.dependent.shape

C:\ProgramData\Anaconda3\lib\site-packages\linearmodels\panel\data.py in __init__(self, x, var_name, convert_dummies, drop_first, copy)
    198                 if len(x.index.levels) != 2:
    199                     raise ValueError(
--> 200                         "DataFrame input must have a " "MultiIndex with 2 levels"
    201                     )
    202                 if isinstance(self._original, (DataFrame, PanelData, Series)):

ValueError: DataFrame input must have a MultiIndex with 2 levels

As a solution, instead of doing

dataframe=dataframe.set_index(['t','province','city'], append=True)

I do this

dataframe=dataframe.set_index(['t'], append=True)

This will allow the model to go through. But I do not know why. In this case, I am using two columns to identify the group. What if I need three columns to identify my groups? How does python differentiate between the ID and x variables?


Solution

  • According to the author of linearmodels, I need to have a single entity,

    import numpy as np
    import pandas as pd
    from linearmodels import PanelOLS
    data = {'y':[1,2,3,1,0,3],
            'x1': [0,1,2,3,0,2],
            'x2':[1,1,3,2,1,0],
            't': pd.to_datetime(['2020-02-18', '2020-02-18', '2020-02-17', '2020-02-18', '2020-02-18', '2020-02-17']),
            'province': ['A', 'A','A','B','B','B'],
            'city': ['a','b','a','a','c','a']}
    dataframe = pd.DataFrame (data, columns = ['y','x1', 'x2', 't', 'province', 'city'])
    dataframe["city-provence"] = [(c,p) for c,p in zip(dataframe.city, dataframe.province)]
    dataframe = dataframe.set_index(["city-provence","t"])
    
                              y  x1  x2 province city
    city-provence t                                  
    (a, A)        2020-02-18  1   0   1        A    a
    (b, A)        2020-02-18  2   1   1        A    b
    (a, A)        2020-02-17  3   2   3        A    a
    (a, B)        2020-02-18  1   3   2        B    a
    (c, B)        2020-02-18  0   0   1        B    c
    (a, B)        2020-02-17  3   2   0        B    a