rregressioncurve-fittingmodelingbest-fit-curve

How can I plot and curve fit multiple data sets within one data frame? in R


I have this df

   tree sdepth shallow_avg ddepth deep_avg swdepth sw_avg
  <dbl>  <dbl>       <dbl>  <dbl>    <dbl>   <dbl>  <dbl>
1     3      2      0.0857    3.5  0.0454      3.7      0
2     4      2      0.142     3.5  0.0991      4.1      0
3     5      2      0.0119    3.5  0.00498     5.7      0
4     7      2      0.0217    3.5  0.0169      5.1      0 

I am trying to plot and curve fit each sarate tree. the (x,y) points are (0,0), (sdepth,shallow_avg), (ddepth,deep_average), and (swdepth,sw_avg)

my base code is

sample_data <- data.frame(x = c(0, 2, 3.5, 4.7), y = c(0, 0.0679, 0.0367, 0))

# fit polynomial regression models up to degree 5
linear_model5 <- lm(y~poly(x,5,raw=TRUE), data=sample_data)

# create a basic scatterplot 
plot(sample_data$x, sample_data$y)

# define x-axis values
x_axis <- seq(1, 10, length=10)

# add curve of each model to plot

lines(x_axis, predict(linear_model5, data.frame(x=x_axis)), col='orange')

but that is only doing one tree and is by hand


Solution

  • This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

    Reformat the data with the x columns and the y columns side by side. bind_rows will include the origin on each group of tree. Then ggplot will automatically separate the points and the lines by the color aesthetic.

    It doesn't make sense to fit polynomials of the 5th degree to 4 data points so I have fitted polynomials of the 3rd degree, this can be changed by changing the value of polydeg.

    sample_data <- read.table(text = "
    tree sdepth shallow_avg ddepth deep_avg swdepth sw_avg
    1     3      2      0.0857    3.5  0.0454      3.7      0
    2     4      2      0.142     3.5  0.0991      4.1      0
    3     5      2      0.0119    3.5  0.00498     5.7      0
    4     7      2      0.0217    3.5  0.0169      5.1      0", header = TRUE)
    
    suppressPackageStartupMessages({
      library(dplyr)
      library(tidyr)
      library(ggplot2)
    })
    
    polydeg <- 3L
    
    df1 <- bind_cols(
      sample_data %>%
        select(-3, -5, -7) %>%
        pivot_longer(
          cols = -tree,
          names_to = "xcol",
          values_to = "x"
        ),
      sample_data %>%
        select(-2, -4, -6) %>%
        pivot_longer(
          cols = -tree,
          names_to = "ycol",
          values_to = "y"
        ) %>%
        select(-tree)
    ) %>%
      select(-xcol, -ycol) %>%
      mutate(tree = factor(tree))
    
    bind_rows(
      df1 %>%
        group_by(tree) %>%
        summarise(tree = first(tree), x = 0, y = 0),
      df1
    ) %>%
      ggplot(aes(x, y, color = tree)) +
      geom_point(size = 2) +
      geom_smooth(
        formula = y ~ poly(x, polydeg, raw = TRUE),
        method = lm,
        se = FALSE
      ) +
      theme_bw()
    

    Created on 2023-03-29 with reprex v2.0.2