apache-sparkscalapb

Using Scalapb with spark structured streaming and java generated protobuf classes


The project I'm working on is in the planning/prototyping phase and we would like to stream our data into spark 3 using protobuf encoded messages in Kafka and structured streaming. We've prototyped using spark streaming (vs structured streaming) where we can specify the serde classes that Kafka would use but with structured streaming it's obviously different.

From what I've read so far what seems to be the cleanest approach is to use scalapb. The documentation for it seems quite straightforward (thanks!) however I can't determine if it would work with existing generated Java protobuf classes or whether I'd have to generate Scala versions of those same .proto files as well.

Using the example from the docs @ https://scalapb.github.io/docs/sparksql could the Person class be Java vs Scala?


Solution

  • SparkSQL-ScalaPB works only with the ScalaPB generated code.