I am working with the R programming language.
I came across the following math puzzle recently:
I wrote some R code to represent the size of the pond over 1000 days:
set.seed(123)
n_days <- 1000
pond_population <- rep(0, n_days)
pond_population[1] <- 100
for (i in 2:n_days) {
prob <- runif(1)
if (prob <= 0.05) {
pond_population[i] <- pond_population[i-1] + round(pond_population[i-1] * 0.05)
} else if (prob > 0.05 && prob <= 0.10) {
pond_population[i] <- pond_population[i-1] - round(pond_population[i-1] * 0.05)
} else {
pond_population[i] <- pond_population[i-1]
}
}
plot(pond_population, type = "l", main = "Pond Population Over Time", xlab = "Day", ylab = "Population")
My Question: I am curious about the following modification to this problem - suppose each day you catch 10 distinct fish from this pond, tag these fish, and then put them back into the pond. Naturally, it is possible that some days you will catch fish that you previously caught in the past - and it is also possible that some of the fish you caught in the past will die. After you have finished fishing on the 1000th day - what percent of the current fish pond population will be known to you?
I am interested in learning how to write a simulation procedure to answer this question - something that can be added to code I already wrote that keeps track of the size of fish pond population each day as well as the individual fish within the pond that you have already seen (e.g. imagine if each fish is assigned a unique ID).
I am not sure how to begin this problem - I think I might have to use "stacks" or "queues" for this problem, but I am not familiar with these concepts and how they would be used here.
Can someone please show me how to do this?
Thanks!
Here's one approach
library(tidyverse)
set.seed(123)
fish_pond_sim <- function(pop=100, days=1000, fished_per_day=10) {
fish <- tibble(
id = 1:pop,
seen = FALSE,
dead = FALSE
)
results <- tibble(
population = pop,
day = 1,
seen = 0,
dead = 0,
seen_and_alive = 0
)
living <- pop
for (i in 2:days) {
prob <- runif(1)
five_percent <- round(length(living) * 0.05)
if (prob <= 0.05) {
five_pct_sample <- sample(living, five_percent, replace = FALSE)
fish[fish$id %in% five_pct_sample,]$dead <- TRUE
} else if (prob > 0.05 && prob <= 0.10) {
fish <- fish %>% add_row(
id = max(fish$id):(max(fish$id) + five_percent),
seen = FALSE,
dead = FALSE
)
}
fished_sample <- sample(living, fished_per_day, replace = FALSE)
fish[fish$id %in% fished_sample,]$seen <- TRUE
living <- fish[!fish$dead,]$id
results <- results %>% add_row(
population = length(living),
day = i,
seen = sum(fish$seen),
dead = sum(fish$dead),
seen_and_alive = sum(fish$seen & !fish$dead)
)
}
return(results)
}
result <- fish_pond_sim(1000, 1000)
result %>%
ggplot(aes(x = day)) +
geom_line(aes(y = population, color = "Population")) +
geom_line(aes(y = seen_and_alive, color = "Seen and Alive")) +
theme_bw()
To get the percentage which will be known to you, you can do something like this:
result %>%
slice_tail() %>%
pull(seen_and_alive_percentage) # 84.00424