google-cloud-platformgoogle-cloud-dataflowgoogle-cloud-iotgcp-ai-platform-training

How do I update IoT device config (in Cloud IoT Core), using Dataflow?


I am using Google Cloud Platform to collect IoT data. Then analysis will be done, probably in AI Platform, and I want to send some of the retrieved data as a config setting to IoT devices. I have seen several flow charts (see below) showing how data can flow from AI Platform via Dataflow to IoT Core as device config, but how do I do this? (I have previously sent device config updates only via Cloud Functions.)

IoT dataflow

I am new to Dataflow and AI Platform, but have started to look at adding some Python code to an Apache Beam pipeline in Dataflow to update the device config. Does this seem like the way forward?


Solution

  • You can do it that way for sure, although I find Beam a bit difficult to work with, and having everything in one place means potentially a lot of disruption to your pipeline for changes. E.g. if you wanted to change how the IoT device receives/reacts to the data coming in, you'd have to update the whole pipeline in Dataflow in order to make that change. It's not great isolation.

    This also depends on how often you're changing the configuration on your device from that data coming in. Is it once a day? A thousand times a day? If you're on the higher end of things, then yes, using the IoT Admin SDK directly from Dataflow is probably the best bet, since other solutions start to add a lot of cost. If it's just like, a handful of times a day or less, I'd recommend having Dataflow write back out to a separate Pub/Sub topic that a GCF (Google Cloud Function) is listening to, and update the device configs from the GCF. That gives better isolation to the process so if you need to change, for example, how the data is processed, but the output ends up being the same, your device config and GCF components don't need to be touched. And vice versa, if you only want to change how the devices handle the data, but the Dataflow output doesn't change, you don't have to change the Dataflow process.

    GCF does add some cost, so you don't necessarily want to do that if you're changing config constantly, but if it's only relatively infrequently, you'd likely be able to stay below the free tier for GCF. The free tier for GCF is (currently):

    2 million invocations per month (includes both background and HTTP invocations) 400,000 GB-seconds, 200,000 GHz-seconds of compute time 5 GB network egress per month