Based on the work of Kuo et al (Kuo, H.-I., Chen, C.-C., Tseng, W.-C., Ju, L.-F., Huang, B.-W. (2007). Assessing impacts of SARS and Avian Flu on international tourism demand to Asia. Tourism Management. Retrieved from: https://www.sciencedirect.com/science/article/abs/pii/S0261517707002191?via%3Dihub), I am measuring the effect of COVID-19 on tourism demand.
My panel data can be found here: https://www.dropbox.com/s/t0pkwrj59zn22gg/tourism_covid_data-total.csv?dl=0
I would like to use a first-difference transformation model(GMMDIFF) and treat the lags of the dependent variable (tourism demand) as instruments for the lagged dependent variable. The dynamic and first difference version of the tourism demand model: Δyit = η2Δ yit-1 + η3 ΔSit + Δuit
where, y is tourism demand, i refers to COVID-19 infected countries, t is time, S is the number of SARS cases, and u is the fixed effects decomposition of the error term.
Up to now, using python I managed to get some results using the Panel OLS:
import pandas as pd
import numpy as np
from linearmodels import PanelOLS
import statsmodels.api as sm
tourism_covid_data=pd.read_csv('../Data/Data - Dec2021/tourism_covid_data-total.csv, header=0, parse_dates=['month_year']
tourism_covid_data['l.tourism_demand']=tourism_covid_data['tourism_demand'].shift(1)
tourism_covid_data=tourism_covid_data.dropna()
exog = sm.add_constant(tourism_covid_data[['l.tourism_demand','monthly cases']])
mod = PanelOLS(tourism_covid_data['tourism_demand'], exog, entity_effects=True)
fe_res = mod.fit()
fe_res
I am trying to find the solution and use GMM for my data, however, it seems that GMM is not widely used in python and not other similar questions are available on stack. Any ideas on how I can work here?
I just tried your data. I don't think your data fits diff GMM or system GMM because it is a T(=48) >>N(=4) long panel. Anyway, pydynpd still produces results. In both cases, I had to collapse instrument matrix to reduce the issue with too many instruments.
Model 1: diff GMM; treating "monthly cases" as predetermined variable
import pandas as pd
from pydynpd import regression
df = pd.read_csv("tourism_covid_data-total.csv") #, index_col=False)
df['monthly_cases']=df['monthly cases']
command_str='tourism_demand L1.tourism_demand monthly_cases | gmm(tourism_demand, 2 6) gmm(monthly_cases, 1 2)| nolevel collapse '
mydpd = regression.abond(command_str, df, ['Country', 'month_year'])
The output:
Python 3.9.7 (default, Sep 10 2021, 14:59:43)
[GCC 11.2.0] on linux
Warning: system and difference GMMs do not work well on long (T>=N) panel data
Dynamic panel-data estimation, two-step difference GMM
Group variable: Country Number of obs = 184
Time variable: month_year Number of groups = 4
Number of instruments = 7
+-------------------+-----------------+---------------------+------------+-----------+
| tourism_demand | coef. | Corrected Std. Err. | z | P>|z| |
+-------------------+-----------------+---------------------+------------+-----------+
| L1.tourism_demand | 0.7657082 | 0.0266379 | 28.7450196 | 0.0000000 |
| monthly_cases | -182173.5644815 | 171518.4068348 | -1.0621225 | 0.2881801 |
+-------------------+-----------------+---------------------+------------+-----------+
Hansen test of overid. restrictions: chi(5) = 3.940 Prob > Chi2 = 0.558
Arellano-Bond test for AR(1) in first differences: z = -1.04 Pr > z =0.299
Arellano-Bond test for AR(2) in first differences: z = 1.00 Pr > z =0.319
Model 2: diff GMM; treating the lag of "monthly cases" as exogenous variable
command_str='tourism_demand L1.tourism_demand L1.monthly_cases | gmm(tourism_demand, 2 6) iv(L1.monthly_cases)| nolevel collapse '
mydpd = regression.abond(command_str, df, ['Country', 'month_year'])
Output:
Warning: system and difference GMMs do not work well on long (T>=N) panel data
Dynamic panel-data estimation, two-step difference GMM
Group variable: Country Number of obs = 184
Time variable: month_year Number of groups = 4
Number of instruments = 6
+-------------------+-----------------+---------------------+------------+-----------+
| tourism_demand | coef. | Corrected Std. Err. | z | P>|z| |
+-------------------+-----------------+---------------------+------------+-----------+
| L1.tourism_demand | 0.7413765 | 0.0236962 | 31.2866594 | 0.0000000 |
| L1.monthly_cases | -190277.2987977 | 164169.7711072 | -1.1590276 | 0.2464449 |
+-------------------+-----------------+---------------------+------------+-----------+
Hansen test of overid. restrictions: chi(4) = 1.837 Prob > Chi2 = 0.766
Arellano-Bond test for AR(1) in first differences: z = -1.05 Pr > z =0.294
Arellano-Bond test for AR(2) in first differences: z = 1.00 Pr > z =0.318
Model 3: similar to Model 2, but a system GMM.
command_str='tourism_demand L1.tourism_demand L1.monthly_cases | gmm(tourism_demand, 2 6) iv(L1.monthly_cases)| collapse '
mydpd = regression.abond(command_str, df, ['Country', 'month_year'])
Output:
Warning: system and difference GMMs do not work well on long (T>=N) panel data
Dynamic panel-data estimation, two-step system GMM
Group variable: Country Number of obs = 188
Time variable: month_year Number of groups = 4
Number of instruments = 8
+-------------------+-----------------+---------------------+------------+-----------+
| tourism_demand | coef. | Corrected Std. Err. | z | P>|z| |
+-------------------+-----------------+---------------------+------------+-----------+
| L1.tourism_demand | 0.5364657 | 0.0267678 | 20.0414904 | 0.0000000 |
| L1.monthly_cases | -216615.8306112 | 177416.0961037 | -1.2209480 | 0.2221057 |
| _con | -10168.9640333 | 8328.7444649 | -1.2209480 | 0.2221057 |
+-------------------+-----------------+---------------------+------------+-----------+
Hansen test of overid. restrictions: chi(5) = 1.876 Prob > Chi2 = 0.866
Arellano-Bond test for AR(1) in first differences: z = -1.06 Pr > z =0.288
Arellano-Bond test for AR(2) in first differences: z = 0.99 Pr > z =0.322