We have to get the schema of dataframe in SparkR as StructType and list as list of fields, e.g:
str(schema)
#List of 2
# $ jobj :Class 'jobj' <environment: 0x563114ff5900>
# $ fields:function ()
# - attr(*, "class")= chr "structType"
schema <- schema(output_count)
fields <- schema$fields()
fields
#[[1]]
#StructField(name = "word", type = "StringType", nullable = TRUE)
#[[2]]
#StructField(name = "count", type = "StringType", nullable = TRUE)
I found that SparkR API exposes a method: https://spark.apache.org/docs/2.0.0/api/R/
but not sure how to use it as a beginner in SparkR
My attempt:
schema <- schema(output_count)
str(schema)
#List of 2
# $ jobj :Class 'jobj' <environment: 0x563114ff5900>
# $ fields:function ()
# - attr(*, "class")= chr "structType"
I try to get it as a structtype
If I understood correctly, then the below codes at least produces the type of output you explained in the question.
df <- SparkR::createDataFrame(iris)
lapply(SparkR::dtypes(df), function(x) SparkR::structField(x[1], x[2]))
The output is:
[[1]]
StructField(name = "Sepal_Length", type = "DoubleType", nullable = TRUE)
[[2]]
StructField(name = "Sepal_Width", type = "DoubleType", nullable = TRUE)
[[3]]
StructField(name = "Petal_Length", type = "DoubleType", nullable = TRUE)
[[4]]
StructField(name = "Petal_Width", type = "DoubleType", nullable = TRUE)
[[5]]
StructField(name = "Species", type = "StringType", nullable = TRUE)
If you further use do.apply
with SparkR::structType
,
do.call(SparkR::structType, lapply(SparkR::dtypes(dd), function(x) SparkR::structField(x[1], x[2])))
then the output is like below:
StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE