I have a AWS Crawler which I am switching the s3 target path in order to switch the underlying table source. The problem is that the tables are being created from both targets:
configuration:
aws glue get-crawler --name sand-main
{
"Crawler": {
"Name": "sand-main",
"Role": "Crawler-sand",
"Targets": {
"S3Targets": [
{
"Path": "s3://sand-main-green/main",
"Exclusions": [
"checkpoints/**",
"IsActive.txt",
"isactive.txt"
]
}
],
"JdbcTargets": [],
"MongoDBTargets": [],
"DynamoDBTargets": [],
"CatalogTargets": []
},
"DatabaseName": "sand_main",
"Description": "",
"Classifiers": [],
"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},
"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "DELETE_FROM_DATABASE"
},
"LineageConfiguration": {
"CrawlerLineageSettings": "DISABLE"
},
"State": "READY",
"CrawlElapsedTime": 0,
"CreationTime": "2020-09-30T14:07:25-06:00",
"LastUpdated": "2021-01-28T11:32:15-07:00",
"LastCrawl": {
"Status": "SUCCEEDED",
"LogGroup": "/aws-glue/crawlers",
"LogStream": "sand-main",
"MessagePrefix": "5bb1907d-2847-46ef-8712-3a50deb2b7a0",
"StartTime": "2021-01-28T11:32:35-07:00"
},
"Version": 24,
"Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"}},\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"
}
}
The path I have a lambda that will switch from:
"Path": "s3://sand-main-green/main"
To:
"Path": "s3://sand-main-blue/main"
But I end up with tables:
Name -> Location
test -> s3://sand-main-blue/main/testtest_2398l50df -> s3://sand-main-green/main/test
I have DELETE_IN_DATABASE
so I would expect the old s3 paths to be deleted. It feels like the crawler retains the history of its s3 targets. I do not want this behavior
Usually crawler creates table with last part of file path as table name (in your example "test"). If table is already present in database, it creates new table with random characters as postfix (in your example test_2398l50df).
If you want table "test" to be set to new path, you should follow steps in below order: