I am querying the very popular AdventureWorks DB in SSMS. My objective to find the number of males and females under each job title from HumanResources.Employee.
For this my original query was,
SELECT JobTitle,
COUNT(CASE WHEN Gender='M' THEN 1
ELSE 0
END) AS MALE_COUNT,
COUNT(CASE WHEN Gender='F' THEN 1
ELSE 0
END) AS FEMALE_COUNT,Gender
FROM HumanResources.Employee
GROUP BY JobTitle,Gender
ORDER BY JobTitle
GO
However, I am getting incorrect answer with the above query.So by modifying it as below ,I am getting the desired result:
SELECT JobTitle,
COUNT(CASE WHEN Gender='M' THEN 1
END) AS MALE_COUNT,
COUNT(CASE WHEN Gender='F' THEN 1
END) AS FEMALE_COUNT,Gender
FROM HumanResources.Employee
GROUP BY JobTitle,Gender
ORDER BY JobTitle
GO
As can be easily seen, I am just removing the 'ELSE 0' condition for both the CASE statements, but am I unable to figure out as to how '0' is affecting the values returned in the result.
Can someone explain to me the difference between these two? Also I would like to know how the COUNT function is taking multiple values, when normally(say SELECT COUNT(3,3)) it doesn't work.
Documentations say:
COUNT(ALL expression) evaluates expression for each row in a group, and returns the number of nonnull values.
Read more here: https://learn.microsoft.com/en-us/sql/t-sql/functions/count-transact-sql?view=sql-server-ver15
So we can rewrite your query as:
SELECT
JobTitle,
COUNT(CASE WHEN Gender='M' THEN 1 ELSE NULL END) AS MALE_COUNT,
COUNT(CASE WHEN Gender='F' THEN 1 ELSE NULL END) AS FEMALE_COUNT
FROM HumanResources.Employee
GROUP BY JobTitle
ORDER BY JobTitle
About your queries:
First query:
inside Count function you have a CASE
expression which will change the 'M' to 1 and 'F' to 0. Then COUNT
function will do the count operations over them. because neither 1 nor 0 are NULL so Count
will return total number of records, regardless of 'M' or 'F'
This procedure is same for second CASE
too.
Second query:
Inside Count
function you have a CASE
expression which will change 'M' to 1, but you did not mentioned what to do with other values, so NULL will be returned for non-M values. After that Count
function will do the count operations on these records and will return the number of M's.(Second query is equal to the query I have posted, and they both will have the same output. However because of readability I prefer my query over yours :-) )
This procedure is same for second CASE
too.
Documentations for CASE
expression says:
ELSE else_result_expression Is the expression returned if no comparison operation evaluates to TRUE. If this argument is omitted and no comparison operation evaluates to TRUE, CASE returns NULL. else_result_expression is any valid expression. The data types of else_result_expression and any result_expression must be the same or must be an implicit conversion.
Read more here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/case-transact-sql?view=sql-server-ver15
SELECT Count(3,3)
This is syntactically wrong and will give you an error like this, which is pretty self explanatory:
Msg 174, Level 15, State 1, Line 1 The Count function requires 1 argument(s)
The syntax for Count
function based on the aforementioned documentations is like this:
-- Aggregation Function Syntax
COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )
-- Analytic Function Syntax
COUNT ( [ ALL ] { expression | * } ) OVER ( [ <partition_by_clause> ] )