rjointibble

Join 2 tibbles without the same length and add "NA" for empty cells


I tried several things posted in the forum but had no success...

If hope I was clear. If not, please tell meI've 2 tibbles named traitement_RPE and traitement_VOL which contain 3 columns named "Athletes", "Type", and "Valeur".

enter image description here enter image description here

The 1st picture represents traitement_VOL and the 2nd traitement_RPE. I would like to join them by "Athletes" key. The issue is that some athletes in traitement_RPE are not present in traitement_VOL. In that case, I want "NA" cells:

enter image description here

In this example, Balfet had values in traitement_VOL but not in traitement_RPE. So when the name doesn't match between the tibbles I want NA cells or "0" cells it doesn't matter.

I tried left_join(), full_join() etc. without success see this picture:

enter image description here

There is: "Detected an unexpected many-to-many relationship between x and y." It duplicates the Type.x column for each Type.y argument. I just want to have 1 argument by 1 as shown in my 3rd picture.

The best result I had was with merge() but I obtained rows join instead of columns join and it's complicated to add rows to athletes that don't have values in "Valeur" of traitement_RPE because it's aleatory (survey for athletes, sometimes they don't answer...).

If someone could help me with this issue, thanks!

Below is the full code :) sorry, some parts are in French but the code is a common language :)

setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
library(tidyverse)
# install.packages("xlsx")
# library(xlsx)
library(data.table)


library(readxl)
RPE_CDN_2324 <- read_excel("Suivi RPE 2223.xlsx",
                             na = "Non réalisé")

VOLUME_RCN_2324 <- read_excel("CDE_RCN_2324.xlsx", 
                           sheet = "Vol_test", skip = 1)

traitement_RPE <- tibble(RPE_CDN_2324)

traitement_VOL <- tibble(VOLUME_RCN_2324)

## CODE SUR VOLUME_RCN_2324 ----

#PIVOT LONGER DES COLONNES DE RPE_CDN_2324

traitement_VOL <- traitement_VOL |> 
  pivot_longer(cols = starts_with("J2"), names_to = "VOL", values_to = "temps")

# SUPPRESSION DES NA DE RPE_CDN_2324

traitement_VOL <- traitement_VOL |> 
  filter(!is.na(temps))

# TRI DU TIBBLE PAR ORDRE ALPHABETIQUE DE RPE_CDN_2324

traitement_VOL <- traitement_VOL |>
  rename(Athletes = "...1") |> 
  rename(Type = "VOL") |> 
  rename(Valeur = "temps") |>
  arrange(Athletes)

#RECOMPOSITION DU TIBBLE DE RPE_CDN_2324

traitement_VOL <- traitement_VOL |>
  select(Athletes, Type, Valeur) |> 
  filter(Valeur > 0.1)

### CODE SUR RPE_CDN_2324 ----

#PIVOT LONGER DES COLONNES DE RPE_CDN_2324

traitement_RPE <- traitement_RPE |> 
  pivot_longer(cols = starts_with("RPE"), names_to = "RPE", values_to = "valeur")

# SUPPRESSION DES NA DE RPE_CDN_2324

traitement_RPE <- traitement_RPE |> 
  filter(!is.na(valeur)) 

# TRI DU TIBBLE PAR ORDRE ALPHABETIQUE DE RPE_CDN_2324

traitement_RPE <- traitement_RPE |> 
  rename(Athletes = "Votre nom de famille") |> 
  rename(Type = "RPE") |> 
  rename(Valeur = "valeur") |> 
  arrange(Athletes)

#RECOMPOSITION DU TIBBLE DE RPE_CDN_2324

traitement_RPE <- traitement_RPE |>
  select(Athletes ,Type , Valeur)

#### TRYING TO JOIN TIBBLES  ----


# Bilann <- left_join(traitement_VOL, traitement_RPE, by = "Athletes")

# rename and set NAs to 0
# names(Bilann) <- c("Athletes", "Valeur", "Valeur")
# Bilann[is.na(Bilann)] <- 0

# BILAN <- merge(traitement_VOL, traitement_RPE, by = c("Athletes"))

# write.xlsx(x = traitement_RPE, file = "Traitemennt_RPE-testRCN.xls")    

Solution

  • You won't return your example result from your example dfs as the data are different. For instance, "COLLECTIF" and "TERRAIN" are not common to both tables. But here is a solution that assumes you want to join data with common attributes e.g. "RPE [ACTIVATION]...21" is the same as "J2 Vol [ACTIVATION]".

    library(dplyr)
    library(stringr)
    
    # Sample dat based on your example images
    set.seed(1)
    traitement_VOL <- data.frame(Athletes = c(rep(LETTERS[1:4], each = 4), LETTERS[4]),
                                 type = c(rep(c("J2 Vol [ACTIVATION]",
                                                "J2 Vol [SEPARE]",
                                                "J2 Vol [TERRAIN]",
                                                "J2 Vol [MUSCU]...17"), 4),
                                          "J2 Vol [DEV INDIV]"),
                                 valeur = c(rep(c(15, 54, 67, 35), 4), 25))
    
    traitement_RPE <- data.frame(Athletes = c(rep(LETTERS[1:4], each = 4), LETTERS[4]),
                                 type = c(rep(c("RPE [MUSCU]...14",
                                                "RPE [COLLECTIF]...20",
                                                "RPE [ACTIVATION]...21",
                                                "RPE [SEPARE]...22"), 4), 
                                          "RPE[DEV INDIV]...24"),
                                 valeur = sample(4:8, 17, replace =  TRUE))
    
    # Add new 'common' field to each df 
    traitement_VOL <- traitement_VOL %>%
      mutate(x = str_extract(type, pattern = "(?<=\\[).*(?=\\])"))
    
    traitement_RPE <- traitement_RPE %>%
      mutate(x = str_extract(type, pattern = "(?<=\\[).*(?=\\])"))
    
    # Join dfs using two columns
    traitement_VOL %>%
      left_join(traitement_RPE, by = join_by(Athletes, x))
    
       Athletes              type.x valeur.x          x                type.y valeur.y
    1         A J2 Vol [ACTIVATION]       15 ACTIVATION RPE [ACTIVATION]...21        4
    2         A     J2 Vol [SEPARE]       54     SEPARE     RPE [SEPARE]...22        5
    3         A    J2 Vol [TERRAIN]       67    TERRAIN                  <NA>       NA
    4         A J2 Vol [MUSCU]...17       35      MUSCU      RPE [MUSCU]...14        4
    5         B J2 Vol [ACTIVATION]       15 ACTIVATION RPE [ACTIVATION]...21        5
    6         B     J2 Vol [SEPARE]       54     SEPARE     RPE [SEPARE]...22        6
    7         B    J2 Vol [TERRAIN]       67    TERRAIN                  <NA>       NA
    8         B J2 Vol [MUSCU]...17       35      MUSCU      RPE [MUSCU]...14        8
    9         C J2 Vol [ACTIVATION]       15 ACTIVATION RPE [ACTIVATION]...21        8
    10        C     J2 Vol [SEPARE]       54     SEPARE     RPE [SEPARE]...22        8
    11        C    J2 Vol [TERRAIN]       67    TERRAIN                  <NA>       NA
    12        C J2 Vol [MUSCU]...17       35      MUSCU      RPE [MUSCU]...14        6
    13        D J2 Vol [ACTIVATION]       15 ACTIVATION RPE [ACTIVATION]...21        4
    14        D     J2 Vol [SEPARE]       54     SEPARE     RPE [SEPARE]...22        8
    15        D    J2 Vol [TERRAIN]       67    TERRAIN                  <NA>       NA
    16        D J2 Vol [MUSCU]...17       35      MUSCU      RPE [MUSCU]...14        5
    17        D  J2 Vol [DEV INDIV]       25  DEV INDIV   RPE[DEV INDIV]...24        8
    
    traitement_VOL %>%
      full_join(traitement_RPE, by = join_by(Athletes, x))
    
       Athletes              type.x valeur.x          x                type.y valeur.y
    1         A J2 Vol [ACTIVATION]       15 ACTIVATION RPE [ACTIVATION]...21        4
    2         A     J2 Vol [SEPARE]       54     SEPARE     RPE [SEPARE]...22        5
    3         A    J2 Vol [TERRAIN]       67    TERRAIN                  <NA>       NA
    4         A J2 Vol [MUSCU]...17       35      MUSCU      RPE [MUSCU]...14        4
    5         B J2 Vol [ACTIVATION]       15 ACTIVATION RPE [ACTIVATION]...21        5
    6         B     J2 Vol [SEPARE]       54     SEPARE     RPE [SEPARE]...22        6
    7         B    J2 Vol [TERRAIN]       67    TERRAIN                  <NA>       NA
    8         B J2 Vol [MUSCU]...17       35      MUSCU      RPE [MUSCU]...14        8
    9         C J2 Vol [ACTIVATION]       15 ACTIVATION RPE [ACTIVATION]...21        8
    10        C     J2 Vol [SEPARE]       54     SEPARE     RPE [SEPARE]...22        8
    11        C    J2 Vol [TERRAIN]       67    TERRAIN                  <NA>       NA
    12        C J2 Vol [MUSCU]...17       35      MUSCU      RPE [MUSCU]...14        6
    13        D J2 Vol [ACTIVATION]       15 ACTIVATION RPE [ACTIVATION]...21        4
    14        D     J2 Vol [SEPARE]       54     SEPARE     RPE [SEPARE]...22        8
    15        D    J2 Vol [TERRAIN]       67    TERRAIN                  <NA>       NA
    16        D J2 Vol [MUSCU]...17       35      MUSCU      RPE [MUSCU]...14        5
    17        D  J2 Vol [DEV INDIV]       25  DEV INDIV   RPE[DEV INDIV]...24        8
    18        A                <NA>       NA  COLLECTIF  RPE [COLLECTIF]...20        7
    19        B                <NA>       NA  COLLECTIF  RPE [COLLECTIF]...20        6
    20        C                <NA>       NA  COLLECTIF  RPE [COLLECTIF]...20        4
    21        D                <NA>       NA  COLLECTIF  RPE [COLLECTIF]...20        5