I'm very new to R and trying to complete an assignment.
One of the questions I need to solve is this:
#Q3 How many observations have no books purchased (including Florence)? Delete those observations before proceeding to the next step
I believe I'm supposed to solve this with a function provided from the dplyr package but I'm very confused on which one to use
Sample Data:
Gender | M | R | F | FirstPunch | ChildBks | YouthBks | Florence |
---|---|---|---|---|---|---|---|
0 | 138 | 28 | 3 | 40 | 0 | 1 | 0 |
1 | 240 | 14 | 1 | 14 | 1 | 0 | 1 |
```{r echo=TRUE}
library(dplyr)
library(tidyr)
library(ggplot2)
library(readxl)
doc <- as.data.frame(read_excel("path/doc.xls"))
doc <- cbc[,3:ncol(doc)]
str(doc)
summary(doc)
```
```
Count the unique values of each variable.
Q1 How many observations are in the dataset?
Q2 List the variables and show their type
```
```{r}
doc_counts <- doc %>% summarise_all(~(n_distinct(.)))
doc_counts
```
```
#Q3 How many observations have no books purchased (including Florence)? Delete those observations before proceeding to the next step
```
```{r}
//answer to go here
```
I believe I vaguely know how to delete the observations by using the filter function. I'm just confused on how I'm supposed to figure out how many observations have no books purchased?
Any help is very much appreciated.
I've tried using summarise_all()
, summarise_if()
, and additional other things I searched up but don't remember anymore because I couldn't figure out how to make it work.
I'm not entirely sure what I should be expecting to see. My instructions in class are very vague.
Using the dplyr
package:
ChildBks
, YouthBks
, Florence
) are zero.filter()
function to select rows where all book-related columns are zero.n()
to count how many rows meet this condition.# Count the number of observations with no books purchased
no_books_count <- doc %>%
filter(ChildBks == 0 & YouthBks == 0 & Florence == 0) %>%
n()
# Print the number of observations with no books purchased
print(no_books_count)
# Remove those observations from the dataset
doc_filtered <- doc %>%
filter(!(ChildBks == 0 & YouthBks == 0 & Florence == 0))
I hope this helps!