We have configured a linking between the GA 4 property and GoogleBigQuery
via the GA interface (without any additional code). It works fine, we see a migrated data in GBQ tables, but however, we face an issue with how this data is written in those tables.
If we look at any table we could see that events from different users can be recorded in one session (and there can be different clientIDs
(and even usedIDs, which we pass when authorizing a user)) See an example
This is a result of executing following query:
SELECT
event_name,
user_pseudo_id,
user_id,
device.category,
device.mobile_brand_name,
device.mobile_model_name,
device.operating_system_version,
geo.region,
geo.city,
params.key,
params.value.int_value
FROM `%project_name%.analytics_256374149.events_20210331`, unnest(event_params) AS params
WHERE event_name="page_view"
AND params.value.int_value=1617218965
ORDER BY event_timestamp
As a result, you can see that within one session different users from different regions, with different devices and identifiers are combined. It is, of course, impossible to use such data for reporting purposes. Once again, it is a default GA4 → BigQuery setup in the GA4 interface (no add-ons).
We do not understand what the error is (in import, in requests, or somewhere else) and would like to get advice on this issue.
Thanks.
You should look at the combination of user_pseudo_id
and the event_param ga_session_id
. This combination is unique and used for measuring unique sessions across a property.
For example, this query counts the number of unique event names in each session:
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
COUNT(DISTINCT event_name) AS unique_event_name_count
FROM `<project>.<dataset>.events_*`
GROUP BY user_pseudo_id, ga_session_id