I have created a new Databricks Premium Instance in Azure and whether I attempt to create a Delta Table with the following PySpark code:
if chkDir == 'False' or chkTbl == False:
ent.setupDeltaTable(stageName,regName)
#register data frame
deltadf = DeltaTable.forName(spark,f"{stageName}{regName}")
else:
#register data frame
deltadf = DeltaTable.forName(spark,f"{stageName}{regName}")
I get the error:
AnalysisException: `BASEsqlArea2`.`Country` is not a Delta table.
The full error is as follows:
To be honest, I think the issue is to do with the f"{stageName}{regName}"
. For some reason Databricks is separating the Database name from the table name with a period
I should point out that I have no such issue with Databricks Community Edition. Therefore, it leads me to think that the situation is due to the DBR which is 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)
Also, I don't know if this is related but having recently created a new Azure Databricks instance, Databricks no longer creates the Database in the Hive_metastore by default? See image. The last time I created a new Databricks instance (about 4 months ago) and created a Delta table, Databricks automatically created the Database and tables in the Hive_metastore. Now it appears to automatically create the Database and tables in under the name of the Databricks Instance itself. Has Databricks made some changes to where Databases and tables are created?
I have just executed the code again, and in addition to above error, I'm now getting the additional error:
[RequestId=64a0cd67-33fe-4c5f-9e99-xxxxxx ErrorClass=INVALID_PARAMETER_VALUE] GenerateTemporaryPathCredential uri /mnt/lake/BASE/CombinedClass/sqlArea2/data/Country/1 is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.
I have not seen such an error before. Again, I don't get this error at all when executing the code in Databricks Community Edition
I tried the following suggestion to fix this issue with the following code: stageName = "BASEsqlArea2" regName = "Country" chkDir = 'False' chkTbl = False if chkDir == 'False' or chkTbl == False: ent.setupDeltaTable(stageName,regName) deltadf = DeltaTable.forPath(spark, f"/FileStore/tables/delta/{stageName}/{regName}") deltadf.toDF().show()
However, I get the following error:
BASEsqlArea2 - Database created
Warning: Unmapped Spark type 'ByteType()' encountered, defaulting to 'STRING'
Table creation failed
AnalysisException: `/FileStore/tables/delta/BASEsqlArea2/Country` is not a Delta table.
I should mention that my Catalog looks like the following:
When I tried to register the Delta table using the
deltadf = DeltaTable.forName(spark, f"{stageName}.{regName}")
Error's:
AnalysisException: `BASEsqlArea2`.`Country_users` is not a Delta table.
When I tried create the Delta table using setupDeltaTable(stageName, regName)
AnalysisException: [RequestId=7df20209-dfa7-4e40-ba92-ed3162ba9dca ErrorClass=INVALID_PARAMETER_VALUE] GenerateTemporaryPathCredential uri /FileStore/tables/delta/BASEsqlArea2/Country_users is not a valid URI.
Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.
I have tried the below approach: To hive_metastore:
from pyspark.sql import SparkSession
from delta.tables import *
databaseName = "hive_metastore.BASEsqlArea2"
spark.sql(f"CREATE DATABASE IF NOT EXISTS {databaseName}")
spark.sql(f"USE {databaseName}")
data = [("James", "Smith"), ("Anna", "Rose")]
columns = ["firstname", "lastname"]
df = spark.createDataFrame(data, columns)
def setupDeltaTable(stageName, regName):
fullTableName = f"{stageName}.{regName}"
if not DeltaTable.isDeltaTable(spark, f"delta.`{fullTableName}`"):
df.write.format("delta").mode("overwrite").saveAsTable(fullTableName)
print(f"Delta table created as {fullTableName}.")
else:
print(f"Delta table {fullTableName} already exists.")
stageName = "hive_metastore.BASEsqlArea2"
regName = "Country"
setupDeltaTable(stageName, regName)
deltadf = spark.read.table(f"{stageName}.{regName}")
display(deltadf)
Results: