google-analyticsreal-timeadobe-analyticsevent-trackingclickstream

How do I fork clickstream data from adobe or google analytics tracking to another source?


Context: I wish to collect real-time, hit-level clickstream data from a website that will ideally pushed into an AWS kinesis data stream (or elsewhere). This is to build a machine learning software for a client that already has Adobe analytics and Google analytics implemented on their website.

Question: Instead of building a tracking code that tracks the clickstream data and pushes it to our own AWS Kinesis data stream or some form of storage under our control, the goal is to piggy-back the tracking code (already implemented by Adobe and Google analytics) that sends a duplicate of the tracked data directly into an AWS kinesis stream. I understand that there are ways to export certain a certain granularity of data from google and adobe analytics (once it's already arrived in these platforms) but these export options don't really satisfy the requirement of raw, unprocessed, hit-level, real-time clickstream data.

Is it possible to modify the tracking code in a way that a duplicate of the tracked data can be redirected to a custom route, ideally AWS Kinesis? As I understand it, the analytics tracking code is essentially Javascript code (tag) embedded in the website that imports a method from a URL that does the event tracking and the uploading. If I could redirect the data at this point in the already implemented analytics stage then I could get the real-time raw data that I need.

I haven't figured out a way to create and redirect a duplicate of the tracked data by modification of this tag. I doubt the imported method is customisable if it is imported from an adobe or google automatically created URL?

Any detailed answers or even links to information would be helpful, Thanks.


Solution

  • It is technically possible. GA has hitCallback/event_callback/eventCallback type parameters you can define a function for (depends on what version of GA you are using.. ga.js, gtag, gtm dataLayer, etc.).

    Meanwhile, AA has similar registerPostTrackCallback you can register a callback function with.

    But a few things:

    1. GA in particular will be a bit tricky to work with. It doesn't really pass any info about the request to the callback, except for the account id itself. So getting the payload that was sent in the request will involve using that account ID and looking through the GA object for whatever version you are using. Meanwhile, AA does pass the full request URL to the callback, so that's a lot easier.

    2. Not sure what your overall goal/context is, but to clarify, piggybacking off of the tools will get you the raw data sent to the collection servers. So if you were looking to get aggregated data solely from piggybacking (e.g. how many page views a certain page has, etc.) you won't really get that sort of thing, unless you are doing the aggregation on your end. If you are looking to get that sort of thing, you should instead look into exporting data from the tools themselves. Google and Adobe both have API endpoints for requesting/receiving datasets within timeframes, etc.

    3. Point #2 aside, I would still recommend against piggybacking off of the tools directly. Point #1 in particular demonstrates why.. as mentioned, it's technically possible, but it gets real messy real quick, and will be an ongoing mess to work out as the respective tools come out with new versions of their tools that may or may not break things.

    The better practice is to implement a generic data layer to broadcast events/data to, and then subscribe to that for GA, AA, your AWS Kinesis thing, etc. This way you don't need to worry about the other points above.