Java 11 and Spark SQL 2.13:3.3.2 here. Please note: I'm using and interested in the Java API and would appreciate Java answers, but I can probably decipher Scala/Python-based answers and do the necessary Scala/Python-to-Java conversions if necessary. But Java would be appreciated!
I understand how to create a new Dataset<Row>
with a specified schema:
Dataset<Row> dataFrame = sparkSession.emptyDataFrame();
List<StructField> structFields = getSomehow();
StructType schema = DataTypes.createStructType(structFields.toArray(StructField[]::new));
Dataset<Row> ds = sparkSession.createDataFrame(dataFrame.rdd(), schema);
What I'm trying to understand is: how do I do the reverse of this? How do I turn a Dataset<Row>
back into a List<StructField>
(its schema; columns)? I see the ds.schema()
method, which returns a StructType
, but not sure how to deconstruct that back into a list of individual columns/StructFields
. Any ideas?
You were close, you need to transform it to Java list:
ds.schema().toList()
Full code:
scala.collection.immutable.List<StructField> schemaList = ds.schema().toList();