I have a .csv
file data.csv stored at location: dbfs:/raw/data/externalTables/emp_data_folder/emp_data.csv
Here is a sample of the data in the file:
Alice,25,50000,North
Bob,30,60000,South
Charlie,35,70000,East
David,40,80000,West
Eve,29,58000,North
Frank,50,90000,South
Grace,28,54000,East
Hannah,32,62000,West
Ian,45,72000,North
Jack,27,56000,South
Using this .csv file, I created an external table in Spark using the following SQL command:
%sql
CREATE TABLE IF NOT EXISTS tablesDbDef.emp_data_f (
Name STRING,
Age INTEGER,
Salary INT,
Region STRING
)
USING CSV
LOCATION '/raw/data/externalTables/emp_data_folder/'
The table is created successfully, and I can query it without any issues.
Next, I inserted a new record into the table using the following command:
%sql
INSERT INTO tablesDbDef.emp_data_f VALUES ('Mark', 20, 50000, 'South')
The record is inserted successfully and I can see this in sql
query. My understanding is that if we insert new data, spark
will create new files (.csv files in this case) for the newly inserted data. However, when I check the emp_data_folder
directory, I don't see any new files created for this newly inserted record. The only files present are the original emp_data.csv
and a newly generated _SUCCESS file.
My question is where is this newly inserted data stored if not in files? Because I can see the newly inserted data in the sql
queries but there is no file created for this?
When you create an external table using USING CSV LOCATION '/path', Spark reads data from the file but doesn’t manage the files or modify them when new data is inserted.
When you use INSERT INTO on an external table, Spark stores the new data in its internal metadata (e.g., Hive Metastore), not in the original CSV file.
Spark treats CSV as read-only and doesn’t append records to it. Instead, the new data is stored in Spark's managed storage, allowing it to be queried but not reflected in the CSV.
To write new data back to files, you’ll need to either convert the table to a managed table or write the updated data to a new location.