[SOLVED] In PySpark, how do I get word frequency in a column when a row can contain multiple words?

Assume a two column PySpark DataFrame with 3 rows:

["Number"]     [ "Keywords"}

1              Mary had a little lamb

2              A little lamb is white

3              Mary is little

Desired output:

little 3

Mary   2

lamb   2

is     2

a      2

had    1

white  1

Tried "explode" and "split", but could not get the syntax right.

You can try below code -

from pyspark.sql import functions as F
from pyspark.sql.functions import explode, split


df = df.withColumn("Keyword", explode(split(F.col("Keywords"), " ")))

keyword_counts = df.withColumn("Keyword", F.lower(F.col("Keyword"))).groupBy("Keyword").count()

keyword_counts = keyword_counts.orderBy(F.col("count").desc())