rdplyrtidyrjanitor

Identifying specific words across two columns (grepl)


I have dataset of online job postings where row represents a single job posting by a given company. I am trying to explore whether there are gender preferences for certain jobs like nursing, teaching, physical education, etc.

Therefore, I created the following function with specific words that may signify a preference towards hiring females or males. I am conducting this across two columns "job_title" and the job "description." My current code is working well but I don't think it's efficient as I am repeating the same tedious steps for the two columns when I think I should be able to do it for the two variables at once. Specifically, every time I identify a word that signifies a gender-preference, I need to code it manually across the two variables.

#install packages:

library(stringr)
library(dplyr)
library(tidyr)
#install.packages("")
require(lubridate)
#install.packages("janitor")
library(janitor)

First, here is a data example:

# Print data example with specific columns
dput(job_posts[1:4,c(1,3,4)])

output:

structure(list(id = 1:4, description = c("Job Description *\r\n\r\nJob Title:\r\n\r\nPE Teacher\r\n\r\nSchool / Location:\r\n\r\nISG Jubail\r\n\r\nSchool / Department:\r\n\r\nMiddle/High\r\n\r\nSubjects:\r\n\r\nPhysical Education\r\n\r\nJob Status:\r\n\r\nFull-time\r\n\r\nSalary Code:\r\n\r\nTeacher\r\n\r\nWork Days / Hours:\r\n\r\n191 days @ 8 hours per day\r\n\r\nEligible Applicants:\r\n\r\nSaudi National, Dependent\r\n\r\nPosition Start Date:\r\n\r\n15 August 2023\r\n\r\nApplication Deadline:\r\n\r\nUntil filled\r\n\r\nReports to:\r\n\r\nBrent Wingers, Principal\r\n\r\nRequisition Number:\r\n\r\n05-2324-050\r\n\r\nSummary\r\n\r\nISG Jubail seeks a dynamic Physical Education (PE) teacher with a passion for\r\ninvolving students in fitness, sport, and health. The ideal candidate will\r\nhave experience inspiring students to live active and healthy lives. This\r\nperson would demonstrate a commitment to a well-rounded and flexible PE\r\nprogram that encompasses multiple age, grade, and experience levels. ISG\r\nJubail has a strong sense of community. The successful candidate would be\r\ncommitted to contributing to the whole school community beyond the confines of\r\nthe PE program. The ideal candidate would be eager to engage in collaboration\r\naround our mission: We inspire innovation and compassionate action.\r\n\r\nJob Duties\r\n\r\nThe successful candidate will be able to effectively:\r\n\r\n* Demonstrate commitment to the safety and security of children and young people (child protection)\r\n* Collaborate around meeting our mission: We inspire innovation and compassionate action\r\n* Communicate and collaborate in a professional manner\r\n* Commit to continual professional learning\r\n* Plan meaningful assessment opportunities\r\n* Drive learning through assessment and feedback\r\n* Reflect on assessment goals and targets\r\n* Plan and deliver effective and meaningful learning opportunities\r\n* Know and respond to the needs of all students\r\n* Promote agency and innovation in the learning process\r\n* Foster compassion and build relationships amongst all the ISG community\r\n* Exhibit effective and positive classroom management\r\n* Contribute to student growth and collaboration\r\n* Contribute to the after-school activities program and to the wider ISG Jubail community\r\n* Effectively assess and report on student academic achievement and learning dispositions\r\n* Espouse creativity and passion in and out of the classroom\r\n\r\nSpecifically with regards to PE, the successful candidate will be able to\r\neffectively:\r\n\r\n* Inspire students to live healthy, active lifestyles\r\n* Build and lead a balanced PE program that allows for agency in individual and team settings\r\n* Contribute to the creation and revision of the PE curriculum\r\n* Coordinate / coach formal and informal athletic and after-school opportunities\r\n\r\nQualifications and Knowledge\r\n\r\nThe candidate must have:\r\n\r\n* Minimum of Bachelor's degree in a related field of study\r\n\r\nThe ideal candidate will have:\r\n\r\n* Current certification/licensure to teach PE\r\n* An advanced degree in a related field of study\r\n* Knowledge of the SHAPE Standards\r\n* Knowledge of Understanding by Design (UbD) principles\r\n\r\nExperience and Skills\r\n\r\nThe ideal candidate will have:\r\n\r\nExperience\r\n\r\n* Teaching PE at the elementary school level in an international context\r\n* Leading and inspiring elementary students to be active and healthy\r\n* Organizing events and competitions\r\n* Documenting curriculum according to Understanding by Design (UbD) principles\r\n* Designing programs based on the SHAPE Standards\r\n* Collaborating in a professional setting with large and small groups of colleagues\r\n* Leading groups of students and faculty\r\n\r\nSkills & Dispositions\r\n\r\n* Positive and compassionate outlook\r\n* Open-mindedness and flexibility\r\n* A proactive approach and ability to take initiative\r\n* Desire to engage in meaningful collaboration", 
"مطلوب للتعيين فوراً محاسب فرع \r\nومشرف فروع مالي واداري لمجموعة طبية بالرياض\r\n\r\nخبرة جيدة بالحسابات وان يتمتع بشخصية قيادية تؤهله لقيادة فريق\r\n\r\nيفضل ان يكون خبرة بالمجمعات او المراكز الطبية\r\n\r\nخبرة لا تقل عن 5 سنوات\r\n\r\nالرجاء ارسال السيرة الذاتية على البريد الالكتروني بعنوان الوظيفة المتقدم لها \r\nفضلاً يكون الجاهزية لحضور المقابلة خلال الأيام القليلة القادمة \r\njob4all90@gmail.com\r\n\r\nJob Type: Full-time\r\n\r\nSalary: ﷼5,000.00 - ﷼8,500.00 per month\r\n\r\nAbility to commute/relocate:\r\n\r\n* الرياض: Reliably commute or planning to relocate before starting work (Required)", 
"RISAL\r\n\r\nWestern Province, Saudi Arabia\r\n\r\nPosted a day ago\r\n\r\nExpires in a month\r\n\r\nJob Description\r\n\r\nResponsible for supervising the Dewatering project; Supervising the\r\ndewatering system to ensure water levels remain as required in a project\r\nenvironment. perform all aspects of the position in a safe, cost-effective,\r\nand productive manner while continuously aligning daily functions to reflect\r\nDewatering Logistics values.\r\n\r\n* Supervising all maintenance activities of the dewatering infrastructure including well points, header pipes, dewatering and surface pumps, piping,\r\n* General understanding of construction operations, as well as a good understanding of Dewatering and maintenance operations.\r\n* Support in investigating, analyzing, the documentation of accidents, injuries, and equipment damage and recommend prevention strategies as necessary.\r\n* Work with the project manager to schedule upcoming projects. Assist the Technical Services Department with determining the specifications of the equipment.\r\n* Works closely with the manager to schedule equipment repairs and replacements to maximize equipment utilization and minimize downtime.\r\n* Must be able to design a dewatering network based on soil permeability report.\r\n\r\nSkills\r\n\r\n* Good mechanical background.\r\n* Must be a leader\r\n* Ability to work with others in a team environment.\r\n* Good verbal, written, analytical, and persuasive skills\r\n* Good administrative, organizational, and technical writing skills\r\n* Good knowledge in utilizing word processing, spreadsheet, database, and presentation software.\r\n\r\nEducation\r\n\r\nHydrology or Mechanical\r\n\r\nJob Details\r\n\r\nJob Location\r\n\r\nWestern Province, Saudi Arabia\r\n\r\nCompany Industry\r\n\r\nOther Business Support Services\r\n\r\nCompany Type\r\n\r\nEmployer (Private Sector)\r\n\r\nJob Role\r\n\r\nOther\r\n\r\nJoining Date\r\n\r\n2023-04-02\r\n\r\nEmployment Status\r\n\r\nFull time\r\n\r\nNumber of Vacancies\r\n\r\n1\r\n\r\nPreferred Candidate\r\n\r\nCareer Level\r\n\r\nMid Career\r\n\r\nYears of Experience\r\n\r\nMin: 2 Max: 20\r\n\r\nDegree\r\n\r\nDiploma\r\n\r\nJob Details\r\n\r\nJob Location\r\n\r\nWestern Province, Saudi Arabia\r\n\r\nCompany Industry\r\n\r\nOther Business Support Services\r\n\r\nCompany Type\r\n\r\nEmployer (Private Sector)\r\n\r\nJob Role\r\n\r\nOther\r\n\r\nJoining Date\r\n\r\n2023-04-02\r\n\r\nEmployment Status\r\n\r\nFull time\r\n\r\nNumber of Vacancies\r\n\r\n1", 
"The ideal candidate will lead the account development and penetration strategy\r\nfor assigned customers or regions. They should be skilled at building and\r\nmaintaining relationships with clients and work to provide exceptional\r\ncustomer service to clients.\r\n\r\nResponsibilities\r\n\r\nManage a portfolio of accounts\r\n\r\nDevelop positive relationship with clients\r\n\r\nManage a number of accounts for long term success.\r\n\r\nCommunicate and deal with customer needs quickly.\r\n\r\nGenerate new business using existing and potential customer networks.\r\n\r\nSolve problems and provide solutions to customers in a timely manner.\r\n\r\nSubmit periodic reports on the status of accounts and transactions.\r\n\r\nSelect, set and track sales account targets in line with organization goals.\r\n\r\nSuggest actions to improve sales performance and identify growth\r\nopportunities.\r\n\r\nBuilding strong relationships with government agencies, ministries and private\r\ncompanies\r\n\r\nQualifications\r\n\r\n* Bachelor's degree or equivalent experience\r\n* Experience as a Sales Manager\r\n* 2 years experience in IT companies\r\n\r\nJob Type: Full-time\r\n\r\nSalary: ﷼15,000.00 - ﷼20,000.00 per month\r\n\r\nAbility to commute/relocate:\r\n\r\n* Riyadh: Reliably commute or planning to relocate before starting work (Required)"
), jobtitle = c("PE Teacher", "مطلوب محاسب & مشرف فروع مالي واداري لمجموعة طبية بالرياض", 
"Dewatering Supervisor", "sales account manager")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

Here is my code to write a function looking for specific words that indicate a gender preference:

Gender_preferences under the "description" column

job_posts$gender_preferences = NA #creating empty column

#place possible levels  I am interested in into the vector
gender = c("Hostess", "female nurse", "female teacher", "Waitress", "Female", "Female Coordinator") 

#for each value in gender, if  the description has that value, then assign it a new column and one of the 4 numbers
for(i in gender){
  value = grepl(i, job_posts$description, ignore.case=TRUE)
  job_posts$gender_preferences[which(value)] = (1:6)[gender==i]}

I then repeat the same code for the job title column:

job_posts$gender_preferences = NA #creating empty column

#place possible levels  I am interested in into the vector
gender = c("Hostess", "female nurse", "female teacher", "Waitress", "Female", "Female Coordinator") 

#for each value in gender, if  the jobtitle has that value, then assign it a new column and one of the 4 numbers
for(i in gender){
  value = grepl(i, job_posts$jobtitle, ignore.case=TRUE)
  job_posts$gender_preferences[which(value)] = (1:6)[gender==i]}

I then do this to distinguish between job postings that show a preference towards hiring females, versus otherwise.

#Relabel all cases where a female preference is mentioned

job_posts <-
job_posts %>% 
mutate(job_posts, gender_preferences = case_when(gender_preferences == "1" ~ "female",
                                          gender_preferences == "2" ~ "female",
                                          gender_preferences == "3" ~ "female",
                                          gender_preferences == "4" ~ "female",
                                          gender_preferences == "5" ~ "female",
                                           gender_preferences == "6" ~ "female",
                                          gender_preferences == "7" ~ "female"))     

And lastly this step:

Rename NA with "indifferent" for firms that don't mention any gender preferences.

job_posts <- 
job_posts %>%
    mutate(gender_preferences=replace_na(gender_preferences, "indifferent"))

Solution

  • Based on your description of the issue, it appears that you do not want to distinguish between whether the words in gender are mentioned in jobtitle or description (or both).

    If this is the case, you can simply create a new column by combining jobtitle and description like so:

    library(dplyr)
    library(magrittr)
    job_posts %<>%
        mutate( job_posting_text = paste( jobtitle, description ) )
    

    and then perform the grepl step on just the new column job_posting_text.

    If your data set is large, I would avoid using for loops as they are a bit slow when processing large data frames in my experience. In addition, it is not necessary to run grepl once for every string in gender. You can combine the strings you're interested in into one string with a "|" separating the strings. I would do something like this instead:

    job_posts %<>% 
      # If the job title and/or description contains one or more of the strings
      # "hostess", "waitress" or "female", set gender_preferences equal to "female".
      # If not, set gender_preferences = "indifferent":
      mutate( gender_preferences = ifelse( 
        grepl("hostess|waitress|female", job_posting_text, ignore.case = TRUE)
        , yes = "female", no = "indifferent"
      ) )