apache-sparkpysparkuser-defined-functionscurrency-formattingpython-babel

Format float to currency using PySpark and Babel


I'd like to convert a float to a currency using Babel and PySpark

sample data:

amount       currency
2129.9       RON
1700         EUR
1268         GBP
741.2        USD
142.08091153 EUR
4.7E7        USD
0            GBP

I tried:

df = df.withColumn(F.col('amount'), format_currency(F.col('amount'), F.col('currency'),locale='be_BE'))

or

df = df.withColumn(F.col('amount'), format_currency(F.col('amount'), 'EUR',locale='be_BE'))

They both give me an error:
enter image description here


Solution

  • To use Python libraries with Spark dataframes, you need to use an UDF:

    from babel.numbers import format_currency
    import pyspark.sql.functions as F
    
    format_currency_udf = F.udf(lambda a, c: format_currency(a, c))
    
    df2 = df.withColumn(
        'amount',
        format_currency_udf('amount', 'currency')
    )
    
    df2.show()
    +----------------+--------+
    |          amount|currency|
    +----------------+--------+
    |     RON2,129.90|     RON|
    |       €1,700.00|     EUR|
    |       £1,268.00|     GBP|
    |       US$741.20|     USD|
    |         €142.08|     EUR|
    |US$47,000,000.00|     USD|
    +----------------+--------+