For context: I am setting up a PubSub Emitter for snowplow. (For other readers PubSub is a simple queue on Google Cloud Platforms that takes in messages which are an array as input).
['data' => 'Name', 'attributes' => 'key pair values of whatever data you are sending']
The above is irrelevant except that I must create a custom Emitter class in order to achieve this goal since Google Cloud PubSub has some different connectors than the stereotypical http request/sockets/others that snowplow provides.
Actual problem:
I want to set a specific schema for each event I am sending. How do you associate the schema to each payload?
The PHP Tracker SyncEmitter (the most standard snowplow provided Emitter) doesn't allow any custom setting for the schema (as shown below)
private function getPostRequest($buffer) {
$data = array("schema" => self::POST_REQ_SCEHMA, "data" => $buffer);
return $data;
}
It is hardcoded in to every event tracked.
So I investigated. And read up on snowplow trackers a bit more. I am still baffled, and I know I can extend the Payload class and force my own schemas as a variable, but why is it not this way already? I am asking because I am assuming the opensource programmer did it right, and I am not understanding it correctly.
I figured it out.
The Tracker class contains trackUnstructuredEvent
:
/**
* Tracks an unstructured event with the aforementioned metrics
*
* @param array $event_json - The properties of the event. Has two fields:
* - A "data" field containing the event properties and
* - A "schema" field identifying the schema against which the data is validated
* @param array|null $context - Event Context
* @param int|null $tstamp - Event Timestamp
*/
public function trackUnstructEvent($event_json, $context = NULL, $tstamp = NULL) {
$envelope = array("schema" => self::UNSTRUCT_EVENT_SCHEMA, "data" => $event_json);
$ep = new Payload($tstamp);
$ep->add("e", "ue");
$ep->addJson($envelope, $this->encode_base64, "ue_px", "ue_pr");
$this->track($ep, $context);
}
Which accepts the schema as input. Snowplow wants you to use the Tracker's default function and provided the above as a solution to my issue.
But it still has a schema wrapped around the data(that contains the input schema).... More questions from my own answer...