I have an AWS CloudSearch instance that I am still developing.
At times, such as when I make some modification to the format of a field, I find myself wanting to wipe out all of the data and regenerating it.
Is there any way to clear out all of the data using the console, or do I have to go about it by programatic means?
If I do have to use programatic means (i.e. generate and POST a bunch of "delete" SDF files) is there any good way to query for all documents in a CloudSearch instance?
I guess I could just delete and re-create the instance, but thattakes a while, and loses all of the indexes/rank expressions/text options/etc
Using aws and jq from the command line (tested with bash on mac):
export CS_DOMAIN=https://yoursearchdomain.yourregion.cloudsearch.amazonaws.com
# Get ids of all existing documents, reformat as
# [{ type: "delete", id: "ID" }, ...] using jq
aws cloudsearchdomain search \
--endpoint-url=$CS_DOMAIN \
--size=10000 \
--query-parser=structured \
--search-query="matchall" \
| jq '[.hits.hit[] | {type: "delete", id: .id}]' \
> delete-all.json
# Delete the documents
aws cloudsearchdomain upload-documents \
--endpoint-url=$CS_DOMAIN \
--content-type='application/json' \
--documents=delete-all.json
For more info on jq see Reshaping JSON with jq
Update Feb 22, 2017
Added --size
to get the maximum number of documents (10,000) at a time. You may need to repeat this script multiple times. Also, --search-query
can take something more specific, if you want to be selective about the documents getting deleted.