tldr; How do you get a program on an AWS server to continuously listen for data packets?
I want to build an iPhone app that collects sensor data and sends that data to a server. When the server has enough sensor data, it constructs a classifier from the data and sends that classifier to all of the contributing iPhone apps. I'm trying to host the server on AWS.
I've spent hours and hours reading about data streams, tcp protocols, amazon ec2, amazon emr, apache spark, spark streaming, amazon s3, restful interfaces, cron jobs, amazon vpc, etc, but I just can't put the pieces together. I just don't understand how the iPhone and the AWS server communicate. Let me walk you through how I think the app should work. Please correct any errors in my thought process and let me know how I should actually go about doing these things.
1) The iPhone app collects some sensor data. 2) The iPhone app sends the data to the AWS server using HTTP or TCP. How do I do this? Do I need to supply the IP address of my server? 3) The server picks up the sensor data from the iPhone. This is where I'm really confused. How does this happen? Can I have a Python program hosted on AWS running in an infinite loop checking for data packets? Do I need to run a CRON job on AWS? Do I need to download a web server on an EC2 node? Can I use a third party streaming tool like Spark Streaming or Amazon Kinesis? Basically, how do I get a server-side program to continuously listen for data packets? 4) The server constructs the classifier when it has enough data. 5) The server sends the classifier to the iPhone app using HTTP or TCP.
I feel like I'm missing something incredibly basic. My main problem is I don't understand how a program on a server (specifically an AWS ec-2 node or an AWS EMR cluster) is supposed to listen for data packets.
There are any number of ways you could accomplish this. You could run web servers and have the iPhone app post to your Elastic Load Balancer. Or you could write some other type of service to run on EC2 servers that listens on a TCP port and still use Elastic Load Balancer.
Personally, I would setup an API Gateway endpoint that takes all the data posted to it and adds it to a Kinesis Stream. You can read about doing that here. Then you could have a service running on EC2 instances, or a Lambda function, that processes the streaming data.
Your general question about a server listening for data packets is just basic server-side programming. You have a service running on the server that is bound to a certain TCP port. Then the service runs the code you configure it to run when it receives data on that port.
If you want to ingest a Kinesis Stream then you would write code using the Kinesis Client Library. Or you could write a REST API that runs on one or more web servers. Or you could write code that binds to a specific port on a server and listens to TCP packets, but I wouldn't recommend doing it at that low level. You could also have API Gateway send the data directly to a Lambda function if you want.