I am trying to use the COPY INTO
statement in Databricks to ingest CSV files from Cloud Storage.
The problem is that the folder name has a space in it /AP Posted/
and when I try to refer to the path the code execution raises the below error:
Error in SQL statement: URISyntaxException: Illegal character in path at index 70: abfss://gpdi-files@hgbsprodgbsflastorage01.dfs.core.windows.net/RAW/AP Posted/
I googled the error and found articles advising to replace the space with "%20". This solution is not effective.
So, does someone knows how to solve it? Or the only solution is indeed to prevent spaces in naming folders.
This is my current Databricks SQL Code:
COPY INTO prod_gbs_gpdi.bronze_data.my_table
FROM 'abfss://gpdi-files@hgbsprodgbsflastorage01.dfs.core.windows.net/RAW/AP Posted/'
FILEFORMAT = CSV
VALIDATE 500 ROWS
PATTERN = 'AP_SAPEX_KPI_001 - Posted Invoices in 2021_3.CSV'
FORMAT_OPTIONS(
'header'='true',
'delimiter'=';',
'skipRows'='8',
'mergeSchema'='true', --Whether to infer the schema across multiple files and to merge the schema of each file
'encoding'='UTF-8',
'enforceSchema'='true', --Whether to forcibly apply the specified or inferred schema to the CSV files
'ignoreLeadingWhiteSpace'='true',
'ignoreTrailingWhiteSpace'='true',
'mode'='PERMISSIVE' --Parser mode around handling malformed records
)
COPY_OPTIONS (
'force' = 'true', --If set to true, idempotency is disabled and files are loaded regardless of whether they’ve been loaded before.
'mergeSchema'= 'true' --If set to true, the schema can be evolved according to the incoming data.
)
Trying to use the path where one of the folders has space, gave the same error.
PATTERN
parameter as follows:%sql
COPY INTO table1
FROM '/mnt/repro/op/'
FILEFORMAT = csv
PATTERN='has space/sample1.csv'
FORMAT_OPTIONS ('mergeSchema' = 'true','header'='true')
COPY_OPTIONS ('mergeSchema' = 'true');
path/has?space/
also works. But if there are multiple folders like has space, hasAspace, hasBspace
etc., then this would not work as expected.%sql
COPY INTO table2
FROM '/mnt/repro/op/has?space/'
FILEFORMAT = csv
PATTERN='sample1.csv'
FORMAT_OPTIONS ('mergeSchema' = 'true','header'='true')
COPY_OPTIONS ('mergeSchema' = 'true');
dbutils.fs.cp()
and then use dbfs path to use COPY INTO
.dbutils.fs.cp('/mnt/repro/op/has space/sample1.csv','/FileStore/tables/mycsv.csv')
%sql
COPY INTO table3
FROM '/FileStore/tables/'
FILEFORMAT = csv
PATTERN='mycsv.csv'
FORMAT_OPTIONS ('mergeSchema' = 'true','header'='true')
COPY_OPTIONS ('mergeSchema' = 'true');