pandassnowflake-cloud-data-platform

Why can't I import Pandas in Python Worksheet?


I get this error when doing import pandas as pd.

Traceback (most recent call last): File "_udf_code.py", line 10, in main ModuleNotFoundError: No module named 'pyarrow' in function PYTHON_WORKSHEET with handler main

Is it not allowed to import pandas and do pandas manipulation in Snowflake?


Solution

  • 3rd Party packages are available via Anaconda channel and pandas is listed.

    https://repo.anaconda.com/pkgs/snowflake/


    Using Third-Party Packages

    Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge the Snowflake Third Party Terms.

    You must be the organization administrator (use the ORGADMIN role) to accept the terms. You only need to accept the terms once for your Snowflake account. Refer to Enabling the ORGADMIN Role for an Account.

    1. Sign in to Snowsight.
    2. Select Admin » Billing & Terms.
    3. In the Anaconda section, select Enable.
    4. In the Anaconda Packages dialog, click the link to review the Snowflake Third Party Terms page.
    5. If you agree to the terms, select Acknowledge & Continue.

    Python Worksheet:

    enter image description here

    enter image description here


    pandas on Snowflake:

    pandas on Snowflake lets you run your pandas code in a distributed manner directly on your data in Snowflake. Just by changing the import statement and a few lines of code, you can get the familiar pandas experience you know and love with the scalability and security benefits of Snowflake.

    With pandas on Snowflake, you can work with much larger datasets and avoid the time and expense of porting your pandas pipelines to other big data frameworks or provisioning large and expensive machines. It runs workloads natively in Snowflake through transpilation to SQL, enabling it to take advantage of parallelization and the data governance and security benefits of Snowflake.

    ...

    Once pandas on Snowflake is installed, instead of importing pandas as import pandas as pd, use the following two lines:

    import modin.pandas as pd
    import snowflake.snowpark.modin.plugin