In the house price prediction dataset, there are about 80 variables and 1459 obs.
To understand the data better, I have segregated the variables which are 'char' type.
char_variables = sapply(property_train, is.character)
char_names = names(property_train[,char_variables])
char_names
There are 42 variables that are char datatype.
I want to find the number of observations in each variable.
The simple code for that would be:
table(property_train$Zoning_Class)
Commer FVR RHD RLD RMD
10 65 16 1150 218
But repeating the same for 42 variables would be a tedious task.
The for loops I've tried to print all the tables show error.
for (val in char_names){
print(table(property_train[[val]]))
}
Abnorml AdjLand Alloca Family Normal Partial
101 4 12 20 1197 125
Is there a way to iterate the char_names through the dataframe to print all 42 tables.
str(property_train)
'data.frame': 1459 obs. of 81 variables:
$ Id : int 1 2 3 4 5 6 7 8 9 10 ...
$ Building_Class : int 60 20 60 70 60 50 20 60 50 190 ...
$ Zoning_Class : chr "RLD" "RLD" "RLD" "RLD" ...
$ Lot_Extent : int 65 80 68 60 84 85 75 NA 51 50 ...
$ Lot_Size : int 8450 9600 11250 9550 14260 14115 10084 10382..
$ Road_Type : chr "Paved" "Paved" "Paved" "Paved" ...
$ Lane_Type : chr NA NA NA NA ...
$ Property_Shape : chr "Reg" "Reg" "IR1" "IR1" ...
$ Land_Outline : chr "Lvl" "Lvl" "Lvl" "Lvl" ...
Actually, for me your code does not give an error (make sure to evaluate all lines in the for-loop together):
property_train <- data.frame(a = 1:10,
b = rep(c("A","B"),5),
c = LETTERS[1:10])
char_variables = sapply(property_train, is.character)
char_names = names(property_train[,char_variables])
char_names
table(property_train$b)
for (val in char_names){
print(table(property_train[val]))
}
You can also get this result in a bit more user-friendy form using dplyr and tidyr by pivoting all the character columns into a long format and counting all the column-value combinations:
library(dplyr)
library(tidyr)
property_train %>%
select(where(is.character)) %>%
pivot_longer(cols = everything(), names_to = "column") %>%
group_by(column, value) %>%
summarise(freq = n())