I suspect it must be for some backward compatibility. And I simply try to find out what was the reason behind. The Snowpark DataFrame API was inspired from the Apache Spark DataFrame API.
But why so many similar DataFrame class methods, with the same signatures and functionality, appear with two different names?
Just a few examples (but there are so many):
Also, is there a preferred notation today? A best practice to follow, related to these calls?
# Add aliases for user code migration createOrReplaceTempView = create_or_replace_temp_view createOrReplaceView = create_or_replace_view crossJoin = cross_join dropDuplicates = drop_duplicates groupBy = group_by minus = subtract = except_ toDF = to_df toPandas = to_pandas unionAll = union_all unionAllByName = union_all_by_name unionByName = union_by_name withColumn = with_column withColumnRenamed = with_column_renamed toLocalIterator = to_local_iterator randomSplit = random_split order_by = sort orderBy = order_by printSchema = print_schema
And associated issue: Add pythonic snake_case APIs for camelCase APIs #196
- Every function and variables will be in snake_case.
...
- While supporting a smooth code migration from pyspark to snowpark, using
snake_case
is recommended to user's new code. So all sample code will use snake_case APIs.
Also PEP 8: Function and Variable Names:
Function and Variable Names
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
Variable names follow the same convention as function names.
mixedCase is allowed only in contexts where that’s already the prevailing style (e.g. threading.py), to retain backwards compatibility.