I'm trying to understand regression in R. I'm trying to solve an exercise which has a 100 random male-female dataset like this:
sex sbp bmi
male 130 40.0
female 126 29.0
female 115 25.0
male 120 33.0
female 128 34.0
...
I want to get a numerical summary (0) plot the relation between sbp and bmi (1) and estimate beta1, beta2 and sigma parameters with R^2 (2). Then, check the goodness of the model (3) and get the confidence intervals (4)..
I think that sex is a categorical variable, so here it's my code:
as.numeric(framingham$sex) - 1
apply(framingham, 2, class)
#0
framingham$sex <- factor (framingham$sex)
levels (framingham$sex) <- c("female", "male")
resultadoNumerico <- compareGroups(~., data = framingham)
resumenNumerico <- createTable(resultadoNumerico)
resumenNumerico
# 1
framinghamMatrix <- data.matrix(framingham)
pairs(framinghamMatrix)
cor(framinghamMatrix)
#2
regre <- lm(sbp ~ bmi+sex, data = framingham)
regreSum <- summary(regre)
regreSum
# Sigma
regreSum$sigma
# Betas
regreSum$coefficients
#3
plot(framingham$bmi, framingham$sbp, xlab = "SBP", ylab = "BMI")
abline (regre)
But I think that I'm not doing things right... Could you help me? Thanks in advance...
To check the relation between variables try a plot called pairs.panels from psych library. It gives the distributions , scatter plot and correlation coefficients.
library(psych)
pairs.panels(framingham)
The sex variable here is categorical hence convert it into factor and then provide as input to your linear regression model. By alphabetical order the first level in the factor becomes your reference level and hence in the summary of model you can see only levels other than the reference level (in this case female is base -reference level)
framingham$sex<-as.factor(framingham$sex)
Now create your linear model.
model <- lm(sbp ~ bmi+sex, data = framingham)
model
summary(model)
The summary gives the coefficients, intercept, standard error (95% confidence) , t-value and p-value( that indicates the significance of variables), Multiple R-squared (Goodness of fit) , Adjusted R-squared (Goodness of fit adjusted to model complexity) etc.