I'm using R to connect to a database I have stored in PGAdmin so that I can add data into the database through R. Before adding the data into the database, it has to be normalized.
The dataset consists of employee data. If an employee is a manager, the managing
column lists the employee ids of the employees that they manage.
I want to create a new dataframe in R that would represent my Managers table that has two columns: employee_id and manager_id (where manager_id is just the employee_id of the manager) that are both pulled from the original dataframe
How would I do this in R? Here's reproducible data:
> dput(test)
structure(list(first_name = c("Carrol", "Scott", "Michael", "Mary",
"Jane", "Alex"), last_name = c("Dhin", "Peters", "Scott", "Smith",
"Johnson", "Barter"), employee_id = c(412153L, 534253L, 643645L,
765453L, 627234L, 174543L), email = c("carrol.dhin@company.com",
"scott.peters@company.com", "michael.scott@company.com", "mary.smith@company.com",
"jane.johnson@company.com", "alex.barter@company.com"), managing = c("174543",
"", "", "", "534253, 643645", ""), department = c("Accounting",
"Sales", "Sales", "Marketing", "Sales", "Accounting"), department_budget = c(500000L,
1100000L, 1100000L, 750000L, 1100000L, 500000L), serial_number = c("KX6234",
"FS5235", "LP5242", "GK6246", "KX6902", "BN7451"), manufacturer = c("Lenovo",
"Lenovo", "Lenovo", "Lenovo", "Lenovo", "Lenovo"), model = c("X1 Gen 10",
"T14s", "P1", "X1 Gen 10", "T15", "T14s"), date_assigned = c(44576L,
44471L, 44341L, 44681L, 44606L, 44378L), installed_software = c("MS Office, Adobe Acrobat, Slack",
"MS Office", "Mathcad, Adobe Acrobat", "", "MS Office, Slack",
"Google Chrome")), class = "data.frame", row.names = c(NA, -6L
))
Given your sample data as df
, you could use select()
and filter()
from dplyr
to create a subset of managers.
library(dplyr)
managers <- df |> select("manager_id" =employee_id,
"employee_id" =managing) %>%
filter(.$employee_id!= "")
Result:
> managers
manager_id employee_id
1 412153 174543
2 627234 534253, 643645