I am having an issue in Google data studio. I am creating a case statements that looks for keywords in a few different dimensions and categorises them accordingly. This all works fine until I add a custom dimensions. The see statement is valid as far as Google is concerned but it returns nothing.
CASE
WHEN CONTAINS_TEXT(PAGE, "Heart") THEN "Cardiology"
WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
End
The above works fine
CASE
WHEN CONTAINS_TEXT(PAGE, "Heart") THEN "Cardiology"
WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
WHEN WEB_SUBJECT = "Hypertension" THEN "Cardiology"
END
The above is valid but is blank when used. Web_subject is a custom dimensions defined in Google analytics.
Any help greatly appreciate. This one has me stumped.
UPDATE: So changing the format of the custom dimension expression to be REGEX worked in returning results for that particular condition but it now does not show results for any others. I have a feeling it may because of the data sitting behind it all.
A little more context, the data source in this case is a roll up google analytics account, So it contains analytics from 10+ domains. What I am attempting to do is group together page views and users from different domains into subjects using various different dimensions. For example if a USER selects the filter 'Diabetes' I want to return all pageviews for: Domain A where page title contains x Domain B where customDimesnion 1 = Y etc
This is my current mess of a CASE statement:
case
when contains_text(Page, 'biosimilars-confidence') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Oncologie).*') OR CONTAINS_TEXT(Page, 'oncology') OR contains_text(Page, 'rrmm-challenges') OR contains_text(Page, 'oncologyhighlights2020') OR CONTAINS_TEXT(Page, 'ASCO') OR CONTAINS_TEXT(Page Title, 'cancer') OR contains_text(Page Title, 'Oncology') OR contains_text(Page Title, 'ASCO') THEN "Oncology"
when CONTAINS_TEXT(Page, 'DOMAINB') OR contains_text(Page, 'eprint') THEN "Eprint"
when CONTAINS_TEXT(Page, 'diabetes') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Diabetes).*') OR contains_text(Page, 'hypoglycemia') OR contains_text(Page, 'glp1ras') OR contains_text(Page Title, 'Diabetes') OR contains_text(Page Title, 'hyperglycemia') OR contains_text(Page Title, 'diabetes') OR contains_text(Page Title, 'ADA') OR contains_text(Page Title, 'GLP-1') then "Diabetes/Endocrinology"
when CONTAINS_TEXT(Page, 'cardiology') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Cardiologie).*') OR contains_text(Page Title, 'Cardiology') OR contains_text(Page Title, 'Heart failure') then'Cardiology'
when contains_text(Page, 'gastro') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Gastro-enterologie).*') OR contains_text(Page Title, 'Gastroenterology') then "Gastroenterology"
when contains_text(Page, 'rheumatology') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Reumatologie).*') OR contains_text(Page Title, 'Rheumatology') OR contains_text(Page Title, 'adalimumab') OR CONTAINS_TEXT(Page Title, 'arthritis') OR CONTAINS_TEXT(Page Title, 'EULAR') OR contains_text(Page Title, "joint") then "Rheumatology"
when CONTAINS_TEXT(Page, 'migraine') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Neurologie).*') OR contains_text(Page Title, 'Neurology') OR contains_text(Page Title, "Ataxia") OR contains_text(Page Title, "EAN 2020") then "Neurology"
when Page = 'DOMAINA.com' OR contains_text(Page, 'training') OR Page = 'ime.DOMAINA.com' then "Corporate site traffic"
when contains_text(Page Title, 'Gynaecology ') then "Gynaecology "
when CONTAINS_TEXT(Page Title, 'Dermatology') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Dermatologie).*') OR contains_text(Page Title, 'eczema') OR contains_text(Page Title, 'Laser hair removal') then "Dermatology"
when CONTAINS_TEXT(Page Title, 'COVID-19') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Coronavirus (COVID-19)).*') then "COVID-19"
WHEN CONTAINS_TEXT(Page Title, 'General practice') then "General practice"
when contains_text(Page Title, 'Haematology') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Hematologie).*') OR contains_text(Page Title, 'EHA25') then "Haematology"
when contains_text(Page Title, 'Mental health') OR CONTAINS_TEXT(Page Title, 'Psychology') then 'Mental health'
when contains_text(Page Title, 'penile length') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Urologie).*') then "Urology"
when contains_text(Page Title, 'ERS 2019') OR REGEXP_MATCH(WEBSITE subject, '.*(?i)(Longziekten).*') THEN "Lung diseases"
when contains_text(Page, 'noonan')Then "Genetics"
when REGEXP_MATCH(WEBSITE subject, '.*(?i)(Farmacie).*') then 'Pharmacy'
when REGEXP_MATCH(WEBSITE subject, '.*(?i)(Heelkunde).*') then 'Surgery'
when REGEXP_MATCH(WEBSITE subject, '.*(?i)(Huisartsgeneeskunde).*') then 'Family medicine'
when REGEXP_MATCH(WEBSITE subject, '.*(?i)(Infectieziekten).*') then 'Infectious Diseases'
when REGEXP_MATCH(WEBSITE subject, '.*(?i)(Overig).*') then 'Other/Uncategorized'
when REGEXP_MATCH(WEBSITE subject, '.*(?i)(Voeding).*') then 'Nutrition'
else 'Other/Uncategorized'
end
I have anonymised some bits so if it looks strange that is why. It may well be I am going about this in completely the wrong way. One other thing to note, the domain names are stored in the PAGE dimension.
It may be due to:
contains_text(Page Title, 'ADA')
Adding a Word Boundary around ADA
may do the trick, ensuring that words that simply contain ADA
are not captured into the respective WHEN
clause
REGEXP_MATCH(Page Title, ".*(?i)(\\bADA\\b).*")
An alternative is to follow the two step process below:
Create the following Data Source-level Calculated Field to CONCAT
the fields into one single consolidated field:
CONCAT(Page, ", ", WEBSITE subject, ", ", Page Title)
Create the CASE
statement below:
Field_CONCAT
represents the field created above;|
Pipe operators |
serve as the RegEx version of OR
\\
is used to escape special RegEx characters such as (
, |
, .
, etc;ADA
(as well as ASCO
) to ensure that text that simply contain ada
are not part of the Diabetes/Endocrinology
WHEN clause:CASE
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(biosimilars-confidence|Oncologie|oncology|rrmm-challenges|oncologyhighlights2020|\\bASCO\\b|cancer).*") THEN "Oncology"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(DOMAINB|eprint).*") THEN "Eprint"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(diabetes|hypoglycemia|glp1ras|hyperglycemia|\\bADA\\b|GLP-1).*") THEN "Diabetes/Endocrinology"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(cardiology|Cardiologie|Heart failure).*") THEN 'Cardiology'
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(gastro|Gastro-enterologie|Gastroenterology).*") THEN "Gastroenterology"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(rheumatology|Reumatologie|adalimumab|arthritis|EULAR|joint).*") THEN "Rheumatology"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(migraine|Neurologie|Neurology|Ataxia|EAN 2020).*") THEN "Neurology"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(DOMAINA\\.com|training|ime\\.DOMAINA\\.com).*") THEN "Corporate site traffic"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Gynaecology).*") THEN "Gynaecology "
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Dermatology|Dermatologie|eczema|Laser hair removal).*") THEN "Dermatology"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(COVID-19|Coronavirus \\(COVID-19\\)).*") THEN "COVID-19"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(General practice).*") THEN "General practice"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Haematology|Hematologie|EHA25).*") THEN "Haematology"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Mental health|Psychology).*") THEN 'Mental health'
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(penile length|Urologie).*") THEN "Urology"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(ERS 2019|Longziekten).*") THEN "Lung diseases"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(noonan).*") THEN "Genetics"
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Farmacie).*") THEN 'Pharmacy'
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Heelkunde).*") THEN 'Surgery'
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Huisartsgeneeskunde).*") THEN 'Family medicine'
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Infectieziekten).*") THEN 'Infectious Diseases'
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Overig).*") THEN 'Other/Uncategorized'
WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Voeding).*") THEN 'Nutrition'
ELSE 'Other/Uncategorized'
END
Added a New Page to the Google Data Studio Report and a GIF to demonstrate:
ELSE NULL
Note that the CASE
statements in the question and this suggestion have not explicitly stated the ELSE
clause, thus by default, ELSE
values are treated as NULL.
Tried out the CASE
statement and it works as expected! One tweak to the initial CASE statement is the inclusion of a Logical Operator, OR
:
CASE
WHEN CONTAINS_TEXT(PAGE, "Heart") OR WEB_SUBJECT = "Hypertension" THEN "Cardiology"
WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
END
In addition, you could also have a look to see whether the CASE
statement below resolves the issue; it uses the REGEXP_MATCH
function and ensures that fields Contain .*
the respective values as well as adding a Case Insensitive Flag (?i)
thus matching Hypertension
, hypertension
, HYPerTENsion
, etc:
CASE
WHEN REGEXP_MATCH(PAGE, ".*(?i)(Heart).*") OR REGEXP_MATCH(WEB_SUBJECT, ".*(?i)(Hypertension).*") THEN "Cardiology"
WHEN REGEXP_MATCH(PAGE, ".*(?i)(Diabetes\\.website\\.com).*") THEN "Diabetes"
END
If the above doesn't resolve the issue, could you elaborate with a screenshot(s) and provide some sample data (removing or replacing any sensitive information); feel free to add data to This Editable Google Sheet (also used in the Report and GIF below).
Google Data Studio Report and a GIF to elaborate on the above: