Context:
R
on RStudio
to make, out of a .csv
file (meu_primeiro_csv
), a basic ggplot()
;.csv
file was imported using read_csv()
, and I manually entered the type of each column using col_types = list()
. The types, are, thus, correct;ggplot()
I'm making: TP_DEPENDENCIA
, since it is used not only on geom_point()
as color
and shape
, but also as facet_wrap()
;TP_DEPENDENCIA
is of col_factor()
type, and have four possible values: 1
, 2
, 3
, or 4
. This is a code in which each value stands for a different type of school: 1
= Federal
, 2
= Estadual
, 3
= Municipal
, or 4
= Privada
. There are no other types of school, and there are no NA
s, this col_factor()
is tidy;col_factor()
"numbers" to their true "string" meanings on facet_wrap()
by using the labeller
parameter, as suggested by tamtam's answer for "Changing facet labels in face_wrap() ggplot2" on 'Oct 29, 2020';labeller
parameter on geom_point()
, so the legend of the graph to the right shows "4, 2, 3, 1" instead of the names that these codes represent.Questions:
<fct>
type TP_DEPENDENCIA
column's number-coded values to their true textual meanings on the graph's legends?facet_wrap()
, without changing the values stored on the .csv
file?The problematic graph:
The code that generated the above graph:
ggplot(
data = meu_primeiro_csv,
mapping = aes(y = QT_SALAS_UTILIZADAS, x = QT_MAT_BAS)) +
geom_point(mapping = aes(color = TP_DEPENDENCIA, shape = TP_DEPENDENCIA)) +
facet_wrap(~TP_DEPENDENCIA, labeller = labeller(TP_DEPENDENCIA = c(`1` = "Federal", `2` = "Estadual", `3` = "Municipal", `4` = "Privada"))) +
labs(
title = "Educação básica: total de alunos × total de salas",
subtitle = "Totais por tipo de escola: municipal, estadual, federal, ou privada",
y = "Salas utilizadas pela escola",
x = "Matrículas na educação básica",
color = "Tipo de escola",
shape = "Tipo de escola"
) +
geom_smooth(method = "lm") +
scale_color_colorblind()
CSV for reproducible example:
Use case_when
to create a new variable in your dataset with the labels for TP_DEPENDENCIA
. Use the new variable instead of TP_DEPENDENCIA
, and you'll get the labels in the legend.
meu_primeiro_csv %>%
mutate(tipo_de_escola = case_when(TP_DEPENDENCIA == 1 ~ "Federal",
TP_DEPENDENCIA == 2 ~ "Estadual",
TP_DEPENDENCIA == 3 ~ "Municipal"
TP_DEPENDENCIA == 4 ~ "Privada"
)
) %>%
ggplot(
mapping = aes(y = QT_SALAS_UTILIZADAS, x = QT_MAT_BAS)) +
geom_point(mapping = aes(color = tipo_de_escola, shape = tipo_de_escola)) +
facet_wrap(~tipo_de_escola) +
labs(
title = "Educação básica: total de alunos × total de salas",
subtitle = "Totais por tipo de escola: municipal, estadual, federal, ou privada",
y = "Salas utilizadas pela escola",
x = "Matrículas na educação básica",
color = "Tipo de escola",
shape = "Tipo de escola"
) +
geom_smooth(method = "lm") +
scale_color_colorblind()
However, you should consider showing the variable TP_DEPENDENCIA
with only one aesthetic instead of three. Try making the plot where you are only using TP_DEPENDENCIA
to either facet_wrap, or colour, or shape. You'll have the same amount of information, and your graph will be simpler. Choose the one you think works best.