amazon-athenaprestotrinounnestpypika

alias a table returning function (UNNEST) using pypika


How can I write a query with pypika to generate the following SQL?

SELECT "x" FROM unnest(ARRAY[1,2,3]) as t("x")

The closest I can get to is this:

from pypika import PostgreSQLQuery, Array, CustomFunction
   
unnest = CustomFunction('unnest', ['array'])
PostgreSQLQuery.from_(unnest(Array(1,2,3))).select('x')

Which outputs:

SELECT "x" FROM unnest(ARRAY[1,2,3])

But is obviously missing the required alias to allow me to use "x" in a following sub-query.

I am using Athena/Presto as the backend, but Postgres supports the same SQL


Solution

  • You can try using Unnest class from this comment @github which hacks AliasedQuery:

    from typing import Any
    from pypika.queries import AliasedQuery, Selectable
    from pypika.utils import format_quotes
    
    class Unnest(AliasedQuery):
    
        def __init__(self, *selects: Selectable) -> None:
            super().__init__('t')
            self._selects = selects
    
        def as_(self, alias: str, *terms: str) -> "Selectable":
            self.alias = alias
            self.name = alias
            self._alias_terms = terms
            return self
    
        def get_sql(self, **kwargs: Any) -> str:
            unnest = "UNNEST({select})".format(
                select=", ".join(term.get_sql(**kwargs) for term in self._selects)
            )
            sql = unnest + self._as_select_sql(**kwargs)
            return sql
    
        def _as_select_sql(self, **kwargs: Any) -> str:
            quote_char = kwargs.get("quote_char")
            terms = ", ".join(format_quotes(term, quote_char)
                              for term in self._alias_terms)
            return " AS {alias}({terms})".format(
                alias=self.alias,
                terms=terms,
            )
    
        def __eq__(self, other: "Unnest") -> bool:
            return isinstance(other, Unnest) and self.alias == other.alias  # TODO
    
        def __hash__(self) -> int:
            return hash(str(self.alias))  # TODO
    

    And your case can look like:

    unnest = Unnest(Array(1,2,3)).as_('t', 'column')
    PostgreSQLQuery.from_(unnest).select('column')
    

    Which generates:

    SELECT "t"."column" FROM UNNEST(ARRAY[1,2,3]) AS t("column")
    

    Which is executed fine by Presto/Trino. But I would argue that this a workaround at best.