pysparkapache-kafkagoogle-colaboratoryspark-structured-streaming

Not able to read streaming data from Kafka using pyspark in google colab


I am running pyspark on google colab. I have set up Kafka and added a csv file in a topic. If I don't use structured streaming to read from kafka, I am able to read the data and print it.

enter image description here

However, when I try to read the same data using spark structured streaming, the loop just keeps on running without anything getting printed on the terminal.

enter image description here

How, do I print the data in this case ? Any help will be much appreciated. Thanks !


Solution

  • Printing to a console doesn't work nicely in environments like Colab or Databricks. What you can do instead is to use memory sink:

    query = streaming_df.writeStream.format("memory").queryName("streaming_df").start()
    

    Then, you can query your in-memory output using:

    spark.sql("SELECT * FROM streaming_df").show()