I have a training task in AdventureWorks2017 DB. The task is the following: I need a list of the Jobtitle in which the fewest and most women work in proportions. Only consider those in which at least 4 people work.
So far my code looks like this:
select a.JobTitle, AVG(ertek) as avg_women, COUNT(*) as sum_women
from
(select BusinessEntityID, JobTitle, Gender, CASE WHEN Gender = 'F' THEN 1.0 ELSE 0.0
END as ertek
from HumanResources.Employee) a
group by a.JobTitle
This gives me back all the JobTitles, the avarage of women in those titles and the sum of the women in the jobtitles. The problem is that I cant use MAX and MIN in the query (Maybe I need to create another subquery?). And I need to get the last part of the task "Only consider those in which at least 4 people work." Thanks for the help!
ER diagram: https://homel.vsb.cz/~dan11/ddj/AdventureWorks2008_db_diagram.pdf
Take these kind of requirements one at a time and build your solution gradually.
Run your query after each additional step to confirm the expected result!
Steps
group by e.JobTitle having count(1) >= 4
count(1) as JobTitleCount
andcount(case when e.Gender = 'F' then 1 end) as FemaleCount
count(case when e.Gender = 'F' then 1 end)*100.0/count(1) as FemalePercentage
order by
clause:dense_rank() over(order by
<percentage from step 2 comes here> ) as RankLeast
anddense_rank() over(order by
<percentage from step 2 comes here> desc) as RankMost
where jfpr.RankLeast = 1 or jfpr.RankMost = 1
Intermediate result
Steps 1. to 3.
select e.JobTitle,
count(1) as JobTitleCount,
count(case when e.Gender = 'F' then 1 end) as FemaleCount,
count(case when e.Gender = 'F' then 1 end)*100.0/count(1) as FemalePercentage,
dense_rank() over(order by count(case when e.Gender = 'F' then 1 end)*100.0/count(1) ) as FemalePercentageRankLeast,
dense_rank() over(order by count(case when e.Gender = 'F' then 1 end)*100.0/count(1) desc) as FemalePercentageRankMost
from HumanResources.Employee e
group by e.JobTitle
having count(1) >= 4
order by FemalePercentage;
Full solution
Moving the entire previous query in a subquery jfpr
and leaving out the columns we no longer need for the final result (JobTitleCount
and FemaleCount
). The order by
moves to outside the subquery.
select jfpr.JobTitle,
jfpr.FemalePercentage
from ( select e.JobTitle,
count(case when e.Gender = 'F' then 1 end)*100.0/count(1) as FemalePercentage,
dense_rank() over(order by count(case when e.Gender = 'F' then 1 end)*100.0/count(1) ) as FemalePercentageRankLeast,
dense_rank() over(order by count(case when e.Gender = 'F' then 1 end)*100.0/count(1) desc) as FemalePercentageRankMost
from HumanResources.Employee e
group by e.JobTitle
having count(1) >= 4 ) jfpr -- job female percentage rank
where jfpr.FemalePercentageRankLeast = 1
or jfpr.FemalePercentageRankMost = 1
order by jfpr.FemalePercentage;
Result
On my copy of AdventureWorks (no idea what version I have right now) this produces:
JobTitle FemalePercentage
---------------------------- ----------------
Quality Assurance Technician 0.000000000000
Scheduling Assistant 0.000000000000
Janitor 50.000000000000
Application Specialist 50.000000000000