apache-sparkpysparkalias

Get name / alias of column in PySpark


I am defining a column object like this:

column = F.col('foo').alias('bar')

I know I can get the full expression using str(column), but how can I get the column's alias only?

In the example, I'm looking for a function get_column_name where get_column_name(column) returns the string bar.


Solution

  • One way is through regular expressions:

    from pyspark.sql.functions import col
    column = col('foo').alias('bar')
    print(column)
    #Column<foo AS `bar`>
    
    import re
    print(re.findall("(?<=AS `)\w+(?=`>$)", str(column)))[0]
    #'bar'