I am currently trying to recreate an advanced segment from UA into bigquery and running into issues getting the numbers to match.
I need to exclude all sessions from users who have ever completed an event with a category of 'booking' and all sessions from outside the US. My current query gets me close to the number in our looker dashboard but its less than it should be.
This is my current code:
WITH Excluded_Users AS ( SELECT visitId AS exclusion_id FROM my_data.ga_sessions_*, UNNEST(hits) AS hits WHERE hits.eventInfo.eventCategory = 'Booking' )
SELECT CONCAT( FORMAT("%04d", EXTRACT(YEAR FROM PARSE_DATE("%Y%m%d", date))), '-', FORMAT("%02d", EXTRACT(MONTH FROM PARSE_DATE("%Y%m%d", date))) ) AS year_month,COUNT(DISTINCT IF(totals.visits=1, CONCAT(fullVisitorId, CAST(visitId AS STRING)), NULL) ) as sessions,
FROM my_data.ga_sessions_* s
LEFT JOIN Excluded_Users e ON s.visitId = e.exclusion_id
WHERE e.exclusion_id IS NULL AND s.totals.visits = 1 AND geoNetwork.Country = 'United States'
GROUP BY 1
A couple of things:
visitId
is just a timestamp of the session (ignoring the midnight split unlike visitStartTime
)fullvisitorid
+visitstarttime
to calculate sessionshits
if necessary, save some resources by using subqueriesSELECT
CONCAT( FORMAT("%04d", EXTRACT(YEAR FROM PARSE_DATE("%Y%m%d", date))), '-', FORMAT("%02d", EXTRACT(MONTH FROM PARSE_DATE("%Y%m%d", date))) ) AS year_month,
COUNT(1) as sessions,
-- no need to count(distinct fullvisitorid||visitstarttime) since every row is a session
FROM `my_data.ga_sessions_*` s
WHERE
NOT exists(select 1 from unnest(hits) where eventInfo.eventCategory = 'Booking')
AND s.totals.visits = 1
AND geoNetwork.Country = 'United States'
GROUP BY 1