I am getting different output while converting dataframe api to spark SQL. Point me where i missed the sql logic.
I have table called demo, have two columns value and marker. the value column is seq number start from 1 to 100. marker value is A [1 to 10] , B [11-20] and so on.
sample tables
+-----+------+
|value|marker|
+-----+------+
| 1| A|
| 2| A|
| 3| A|
| 4| A|
| 5| A|
| 6| A|
| 7| A|
| 8| A|
| 9| A|
| 10| A|
| 11| B|
| 12| B|
DataFrame API and output.
df_2.groupBy('marker').pivot('marker').agg(F.count('*')).fillna(0).show()
output:
+------+---+---+---+---+---+---+---+---+---+---+
|marker| A| B| C| D| E| F| G| H| I| J|
+------+---+---+---+---+---+---+---+---+---+---+
| B| 0| 10| 0| 0| 0| 0| 0| 0| 0| 0|
| A| 10| 0| 0| 0| 0| 0| 0| 0| 0| 0|
| C| 0| 0| 10| 0| 0| 0| 0| 0| 0| 0|
| D| 0| 0| 0| 10| 0| 0| 0| 0| 0| 0|
| E| 0| 0| 0| 0| 10| 0| 0| 0| 0| 0|
| F| 0| 0| 0| 0| 0| 10| 0| 0| 0| 0|
| G| 0| 0| 0| 0| 0| 0| 10| 0| 0| 0|
| H| 0| 0| 0| 0| 0| 0| 0| 10| 0| 0|
| I| 0| 0| 0| 0| 0| 0| 0| 0| 10| 0|
| J| 0| 0| 0| 0| 0| 0| 0| 0| 0| 9|
+------+---+---+---+---+---+---+---+---+---+---+
I tried to convert API to SQL based using PIVOT like below.
spark.sql("""
select
*
from
(
select marker from demo
)
PIVOT
(
count(marker)
for marker in ('A','B','C','D','E','F','G','H','I','J')
)
""").show()
and output is
+---+---+---+---+---+---+---+---+---+---+
| A| B| C| D| E| F| G| H| I| J|
+---+---+---+---+---+---+---+---+---+---+
| 10| 10| 10| 10| 10| 10| 10| 10| 10| 9|
+---+---+---+---+---+---+---+---+---+---+
How can i get Dataframe API output using Spark SQL.
Note - We can get expected output using sum with case when clause for individual marker, but i am expecting to use pivot.
you are almost there. you'd need to create a column with the same values but a new name.
here's an example
select * from (
select col, col as col1 from data_tbl
)
pivot (
count(col) for col in ('A', 'B', 'C', 'D')
)
order by col1
this'd result in
+----+----+----+----+----+
|col1| A| B| C| D|
+----+----+----+----+----+
| A| 10|null|null|null|
| B|null| 10|null|null|
| C|null|null| 10|null|
| D|null|null|null| 10|
+----+----+----+----+----+