dataframepysparksnowflake-cloud-data-platform

Snowpark DataFrame: Why so many synonyms for the same class methods?


I suspect it must be for some backward compatibility. And I simply try to find out what was the reason behind. The Snowpark DataFrame API was inspired from the Apache Spark DataFrame API.

But why so many similar DataFrame class methods, with the same signatures and functionality, appear with two different names?

Just a few examples (but there are so many):

Also, is there a preferred notation today? A best practice to follow, related to these calls?


Solution

  • Snowpark/dataframe.py

     # Add aliases for user code migration
    createOrReplaceTempView = create_or_replace_temp_view
    createOrReplaceView = create_or_replace_view
    crossJoin = cross_join
    dropDuplicates = drop_duplicates
    groupBy = group_by
    minus = subtract = except_
    toDF = to_df
    toPandas = to_pandas
    unionAll = union_all
    unionAllByName = union_all_by_name
    unionByName = union_by_name
    withColumn = with_column
    withColumnRenamed = with_column_renamed
    toLocalIterator = to_local_iterator
    randomSplit = random_split
    order_by = sort
    orderBy = order_by
    printSchema = print_schema
    

    And associated issue: Add pythonic snake_case APIs for camelCase APIs #196

    1. Every function and variables will be in snake_case.

    ...

    1. While supporting a smooth code migration from pyspark to snowpark, using snake_case is recommended to user's new code. So all sample code will use snake_case APIs.

    Also PEP 8: Function and Variable Names:

    Function and Variable Names

    Function names should be lowercase, with words separated by underscores as necessary to improve readability.

    Variable names follow the same convention as function names.

    mixedCase is allowed only in contexts where that’s already the prevailing style (e.g. threading.py), to retain backwards compatibility.