I am using Spark Java (not scala, python).
I have to change my code so that my spark query will select all columns rather than a specific set of columns. (Like using select *
). Before when I had a specific set of columns, it is easy for me to know the exact position/index of each column because it is in the order of my select. However, since I am now selecting all, I do not know the order exactly.
I need the position/index of particular columns so that I can use the function .isNullAt()
because it requires position/index and not the string column name.
I am wondering does using dataframe.columns()
give me an array which the exact same index/position I can use for the dataframe methods that require an index/position? And then I can search the array using my string column name to get back the correct index?
From your question I'm guessing you're trying to get the index of a field in a row so you can check nullity.
Indeed you could use ds.columns()
as it will give you the ordered columns and then use the index from here.
Nevertheless, I would advice to use another method though as you keep the logic inside row processing and it will be more robust. You can use .fieldIndex(String fieldName)
row.isNullAt(row.fieldIndex("my_column_name"))