I have a data.frame ordered by companies ("gvkey") and calendar quarters ("datafqtr") and and additional variable (for example, "day")
gvkey datafqtr dvy
1 1001 1983Q1 0.50
2 1001 1983Q2 1.50
3 1001 1983Q3 2.00
4 1001 1983Q4 4.50
5 1002 1984Q1 0.00
6 1002 1984Q2 0.00
7 1002 1984Q3 0.10
8 1002 1984Q4 0.45
What would be the best way to do the following in R? (not necessarily using zoo?)
For this data.frame, for each company and for each year, subtract the 3rd quarter from the fourth quarter, then the second quarter from the first quarter and finally, the first quarter from the second quarter.
The output should look like this:
gvkey datafqtr dvy
1 1001 1983Q1 0.50
2 1001 1983Q2 1.00
3 1001 1983Q3 0.50
4 1001 1983Q4 2.50
5 1002 1984Q1 0.00
6 1002 1984Q2 0.00
7 1002 1984Q3 0.10
8 1002 1984Q4 0.35
I have thought to do it the following way:
Use the zoo package and lag each variable once, while having the company code ("gvkey") and calendar quarter (datafqtr) as factors, and then subtract the observation.
data<-zoo(data)
data<-data[order(data[,'gvkey'],data[,'datafqtr']),]
data$lagfqtr<-lag(data$datafqtr,-1,na.pad=TRUE)
data$laggvkey<-lag(data$gvkey,-1,na.pad=TRUE)
data$lagdvy<-lag(data$dvy,-1,na.pad=TRUE)
I then convert the zoo to a regular data.frame and subtract the lagged variable from the non-lagged variable:
data<-as.data.frame(data)
data[,] <- lapply(data[,], function(x) as.numeric(as.character(x)))
indice <- indice*1
data$divq <- data$dvy - (data$lagdvy * indice)
I am not sure whether this is doing what I want,
thank you and appreciate all your help,
Using library dplyr
, you can do something like this (subtract lagged values from current values):
library(dplyr)
df %>%
group_by(gvkey) %>%
mutate(dvy = dvy - lag(dvy, default = 0))
Output is as follows and desired:
Source: local data frame [8 x 3]
Groups: gvkey [2]
gvkey datafqtr dvy
<int> <chr> <dbl>
1 1001 1983Q1 0.50
2 1001 1983Q2 1.00
3 1001 1983Q3 0.50
4 1001 1983Q4 2.50
5 1002 1984Q1 0.00
6 1002 1984Q2 0.00
7 1002 1984Q3 0.10
8 1002 1984Q4 0.35
Note: this will work if data is already ordered. Otherwise, you will simply need to introduce arrange
function into the above pipe and everything should fall in place.