pythonapache-sparkpysparkapache-spark-sqlwindow-functions

Spark SQL Row_number() PartitionBy Sort Desc


I've successfully create a row_number() and partitionBy() by in Spark using Window, but would like to sort this by descending, instead of the default ascending. Here is my working code:

from pyspark import HiveContext
from pyspark.sql.types import *
from pyspark.sql import Row, functions as F
from pyspark.sql.window import Window

(
    data_cooccur
    .select(
        "driver",
        "also_item",
        "unit_count",
        F.rowNumber().over(
            Window
            .partitionBy("driver")
            .orderBy("unit_count")
        ).alias("rowNum")
    )
    .show()
)

That gives me this result:

+------+---------+----------+------+
|driver|also_item|unit_count|rowNum|
+------+---------+----------+------+
|   s10|      s11|         1|     1|
|   s10|      s13|         1|     2|
|   s10|      s17|         1|     3|
+------+---------+----------+------+

And here I add the desc() to order descending:

(
    data_cooccur
    .select(
        "driver",
        "also_item",
        "unit_count",
        F.rowNumber().over(
            Window
            .partitionBy("driver")
            .orderBy("unit_count")
            .desc()
        ).alias("rowNum")
    )
    .show()
)

And get this error:

> AttributeError: 'WindowSpec' object has no attribute 'desc'

What am I doing wrong here?


Solution

  • desc should be applied on a column not a window definition. You can use either a method on a column:

    from pyspark.sql.functions import col, row_number
    from pyspark.sql.window import Window
    
    F.row_number().over(
        Window.partitionBy("driver").orderBy(col("unit_count").desc())
    )
    

    or a standalone function:

    from pyspark.sql.functions import desc
    from pyspark.sql.window import Window
    
    F.row_number().over(
        Window.partitionBy("driver").orderBy(desc("unit_count"))
    )