I'm struggling to modifing the colour/shape/... of the points based of if it's a missing value or not.
library(ggplot2)
library(naniar)
ggplot(data = airquality,
aes(x = Ozone,
y = Solar.R)) +
geom_miss_point()
airquality_no_na <-airquality[!(is.na(airquality$Ozone) | is.na(airquality$Solar.R)) ,]
airquality_na <-airquality[(is.na(airquality$Ozone) | is.na(airquality$Solar.R)),]
ggplot() +
geom_point(data = airquality_no_na,
aes(x = Ozone,
y = Solar.R, colour = "NoMissing")) +
geom_miss_point(data = airquality_na,
aes(x = Ozone,
y = Solar.R, colour = "Missing")) +
scale_colour_manual(name = 'Legende',
values =c('NoMissing'='green',
'Missing'='blue'))
I don't know how to make the missing value in green and the non-missing value in blue without spliting in two dataframe.
EDIT :
My issue was a bit more complexe. I want to have the possibility to choose the color for the first data set (missing in blue, not missing in green) ans the second data set (missing in red, not missing in yellow)
#Create dataframes
df1=as.data.frame(matrix(data=runif(n=200, 0,1),ncol=2))
df2=as.data.frame(matrix(data=runif(n=100, 0,1),ncol=2))
#Add missing values
df1[rbinom(n=100,size=1,prob = 0.1) ==1,1] <- NA
df1[rbinom(n=100,size=1,prob = 0.1) ==1,2] <- NA
df2[rbinom(n=50,size=1,prob = 0.1) ==1,1] <- NA
df2[rbinom(n=50,size=1,prob = 0.1) ==1,2] <- NA
#This doesnt work. It only print in blue (missing) and green (not missing)
ggplot() +
geom_miss_point(data = df1,
aes(x = V1,
y = V2)) +
geom_miss_point(data = df2,
aes(x = V1,
y = V2)) +
scale_colour_manual(values = c("blue", "green", "yellow","red"))
I am not sure if this a good idea. But for the sake of "showing how to do this in theory". From what I understand from a quick look into the naniar
package, is that the color aesthetic is mapped to ..missing..
by default. You would need to dig quite a lot into the actual geom to change that behaviour. But there is a simple workaround for it.
Create a second color scale with ggnewscale
.
You will not get around subsetting your data first, but this is not a bad thing. Don't fear to subset your data, that's a very normal thing to do.
library(tidyverse)
library(naniar)
library(ggnewscale)
ggplot() +
geom_miss_point(data = df1, aes(V1, V2)) +
scale_colour_manual(name = "df1", values = c("blue", "green")) +
new_scale_color() +
geom_miss_point(data = df2, aes(V1, V2)) +
scale_colour_manual(name = "df2", values = c("yellow","red"))