
Can not read my glue catalog table from glue notebook with sparkdataframes

Hello I have built an apache iceberg database in s3 and added it to glue catalog so that I can query it from athena.

Now I am trying to perform some ETL from glue notebooks but it keeps on returning the following error

AnalysisExeption: org.apache.hadopp.hive.ql.metada.HiveException: Unable to fetch table my_table. StorageDescriptor#InputFormat cannot be null for table: my_table (Service: null; Status Code: 0; ErrorCode: null; Request ID: null; Proxy: null). I have tried two way of doing but they both throw the same error Scrip1:

%connections my-glue-connector
%glue_version 3.0

from pyspark.context import SpartContext
from awsglue.context import GlueContext
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf

conf = SparkConf()

sc = SparkContext.getOrCreate(conf=conf)
glueContext = GlueContext(sc)
spark = glueContext.spark_session

Script 2

I can run magic commands to create tables like

CREATE TABLE AwsDataCatalog.mydatabase.mytable\
USING iceberg \
AS SELECT col1, col2(\
AS t (col1,col2)

But I can not even retrieve that table that I can query in athena so it was indeed created.

SELECT * FROM mytable

wont work neither

SELECT * FROM my_catalog.mydatabase.mytable

I have used this link as a guide.


  • The problem is with the keyword my_catalog in spark initialization config. In AWS, the default catalog where all table exists is glue_catalog. Replace the config with my_catalog keyword with actual glue catalog for it to work.


    To query the table, you will simply,

    SELECT * FROM glue_catalog.mydatabase.mytable