apache-sparkapache-spark-sql

What is the difference between sort and orderBy functions in Spark


What is the difference between sort and orderBy spark DataFrame?

scala> zips.printSchema
root
 |-- _id: string (nullable = true)
 |-- city: string (nullable = true)
 |-- loc: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- pop: long (nullable = true)
 |-- state: string (nullable = true)

Below commands produce same result:

zips.sort(desc("pop")).show
zips.orderBy(desc("pop")).show

Solution

  • OrderBy is just an alias for the sort function.

    From the Spark documentation:

      /**
       * Returns a new Dataset sorted by the given expressions.
       * This is an alias of the `sort` function.
       *
       * @group typedrel
       * @since 2.0.0
       */
      @scala.annotation.varargs
      def orderBy(sortCol: String, sortCols: String*): Dataset[T] = sort(sortCol, sortCols : _*)