i'm new to R and I'm sorry if this question is already answered. This is an example of my dataset:
idnumber SIC(1-digit) Year Ebit
198 A 2019 2344
196 A 2019 6383
374 A 2019 5628
281 A 2019 2672
274 A 2018 2792
196 A 2018 3802
374 A 2018 3892
468 B 2019 6372
389 B 2019 3829
493 C 2019 2718
928 C 2019 2628
278 C 2019 3672
I want to compute the standard deviation for "Ebit" by the industrial sector "SIC(1-digit)". In this way I would like to find a volatility measure of operating revenue "Ebit" by industry.
Thanks in advance, for your kind answer..
Let's load your data to reproduce your example:
dat <- data.frame(
idnumber = c(198, 196, 374, 281, 274, 196, 374, 468, 389, 493, 928, 278),
`SIC(1-digit)` = c('A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'),
Year = c(2019, 2019, 2019, 2019, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019),
Ebit = c(2344, 6383, 5628, 2672, 2792, 3802, 3892, 6372, 3829, 2718, 2628, 3672),
check.names = FALSE
)
You see SIC(1-digit)
is surrounded by back-ticks and the argument check.names = FALSE
. This is because your column name has the special characters (
and )
; you can read more about this here and here
Once your data is loaded, you can use dplyr
:
library(dplyr)
dat %>%
group_by(`SIC(1-digit)`) %>%
summarise(standard_deviation = sd(Ebit))
# A tibble: 3 x 2
`SIC(1-digit)` standard_deviation
* <chr> <dbl>
1 A 1544.
2 B 1798.
3 C 579.
Or data.table
:
library(data.table)
setDT(dat)
dat[, .(standard_deviation = sd(Ebit)), by = `SIC(1-digit)`]
SIC(1-digit) standard_deviation
1: A 1544.4116
2: B 1798.1725
3: C 578.5257