Given an arbitrary pyspark.sql.column.Column object (or, similarly, a pyspark.sql.connect.column.Column object), is there a way to get a datatype back -- either as a DDL string or pyspark.sql.types.DataType subclass?
Note, no dataframe is available in this context. ONLY a column object. This check should be eager, not lazy, because downstream frame evaluation may rely on the results of this check.
I do have a specific use case that prompted this question (for Databricks Spatial SQL, preprocess a column with dbf.st_geomfromwkb for a BinaryType() column and dbf.st_geomfromwkt for a StringType() column), but this class of issue has come up before and I think it'll come up again, so I'm NOT looking for a "get it done this time" solution, I'm looking for the general-purpose answer I can reuse.
samkart's comment should be the answer, if you raise it as such I'll happily delete this one.
Column is an unbound computation of a projection from a dataset, it doesn't have a material form until it's evaluated. Internally not even the functions that may be referenced in it are resolved.
When you take an action on an actual dataset the columns are resolved, functions and named data attributes elements are looked up against the dataset. Only after that is type checking available.