I have the following sample table in Looker using a GA4 data source (page title omitted for privacy):
The total sessions aggregation shows a number around 0.5 Million. However when exported to excel and summed the actual sum is slightly over 1 Million. Is Google Analytics 4 doing some special calculation to unduplicate page path's where the same session visited them? I would like to use the GA4 api to export sessions by page path however because of this inconsistency when I aggregate in the dashboard using the exported data I'm seeing the incorrect 1 Million value.
Any pointers towards documentation or otherwise would be greatly appreciated. When attempting the same process with a Universal Analytics data source there is no discrepancy. This seems to be a GA4 specific issue and not an issue with any one GA property.
Thank you!
A short summary: the qualification and self-validation of data between GA4 or Looker, and the API are all too different to validate against themselves.
When your data arrives at your GA4 container, it is processed before it is accessible. After it is processed, it can be queried. Looker has it's own processing rules (I believe it additionally qualifies the data it gets from GA4). Generally, Looker will display similar numbers to GA.
If you use the GA4 Data Analytics API, you're not viewing data under the same processing scrutiny. You may see as much as a 3x increase in sessions when using the API because the API is not going to adhere to any qualifications of how long the sessions were. This applies to other default or Google-maintained dimensions and metrics, and they are symptomatically affected in different ways respectively. The API is great for when you want to make an intricate insight that you know all of the bounds for, and the Looker or GA interface is too cumbersome or uncooperative to get there.