amazon-web-servicesapache-sparkspark-structured-streamingkinesis-stream

Spark structured streaming - Kinesis stream


Does Spark supports structured streaming with Kinesis stream as data source? It appears Databricks version supports - https://docs.databricks.com/structured-streaming/kinesis-best-practices.html. However does Spark outside of Databricks support this feature?


Solution

  • Yes, you can use the following open source connector: https://github.com/roncemer/spark-sql-kinesis

    Example:

    // Stream data from the "test" stream
    // Note: if running on AWS EC2, you can omit the secret and access keys in lieu of the attached IAM role on the EC2 instance
    
    val kinesis = spark
        .readStream
        .format("kinesis")
        .option("streamName", "spark-streaming-example")
        .option("endpointUrl", "https://kinesis.us-east-1.amazonaws.com")
        .option("awsAccessKeyId", [ACCESS_KEY])
        .option("awsSecretKey", [SECRET_KEY])
        .option("startingposition", "TRIM_HORIZON")
        .load