I have a homework like this:
I have a problems with read different file from 1 topic, 2 folder containt json file but 1 is song data and 1 is log. How can I get their own data from just 1 topics ?
Unclear why you cannot just use two topics, one for each file. Especially if they don't have matching schemas, which will be important for SparkSQL.
How can I get their own data from just 1 topics ?
It begins at step 2.
Write the data to your single topic in a format like so (content
used for example purposes only)
{"type": "song", "content": "..."}
or
{"type": "log", "content": "..."}
Then, in SparkSQL, you can do something like this
df = spark.readStream.format("kafka")... # TODO: apply a schema to the data to get a "type" column
song_data = df.where(df("type") == "song").select("content")
log_data = df.where(df("type") == "log").select("content")
You could also do the same filtering in Python-Kafka and not need dataframes or a Spark environment.