jsoncloudgoogle-cloud-dataprocgoogle-data-api

Json library for Cloud Dataproc


I need to find a json library for Google Cloud Dataproc. I'm a bit not sure where can find a list of supported json libraries. Or if I write my own, which dependencies can be taken into Dataproc?

Any data on this topic will be highly appreciated.

Best Regards, Oleg


Solution

  • If you are talking about reading/parsing JSON objects, than you can use Gson library witch is a part of Hadoop distribution on Dataproc.

    Also, you can use JSON library of your choice and any other dependencies, but you should create uber jar for your job and include all these libraries/dependencies into it.

    If you are talking about Google JSON API Client libraries, than Dataproc by default deploys 1.20.0 version as part of GCS and BQ connectors. You still can use newer JSON API Client library version if you will relocate it inside your job's uber jar to avoid conflicts with version deployed to Dataproc.

    See more detailed answer on conflicting dependencies management in Dataproc here.