amazon-web-servicesamazon-cloudsearch

How to clear all data from AWS CloudSearch?


I have an AWS CloudSearch instance that I am still developing.

At times, such as when I make some modification to the format of a field, I find myself wanting to wipe out all of the data and regenerating it.

Is there any way to clear out all of the data using the console, or do I have to go about it by programatic means?

If I do have to use programatic means (i.e. generate and POST a bunch of "delete" SDF files) is there any good way to query for all documents in a CloudSearch instance?

I guess I could just delete and re-create the instance, but thattakes a while, and loses all of the indexes/rank expressions/text options/etc


Solution

  • Using aws and jq from the command line (tested with bash on mac):

    export CS_DOMAIN=https://yoursearchdomain.yourregion.cloudsearch.amazonaws.com
    
    # Get ids of all existing documents, reformat as
    # [{ type: "delete", id: "ID" }, ...] using jq
    aws cloudsearchdomain search \
      --endpoint-url=$CS_DOMAIN \
      --size=10000 \
      --query-parser=structured \
      --search-query="matchall" \
      | jq '[.hits.hit[] | {type: "delete", id: .id}]' \
      > delete-all.json
    
    # Delete the documents
    aws cloudsearchdomain upload-documents \
      --endpoint-url=$CS_DOMAIN \
      --content-type='application/json' \
      --documents=delete-all.json
    

    For more info on jq see Reshaping JSON with jq

    Update Feb 22, 2017

    Added --size to get the maximum number of documents (10,000) at a time. You may need to repeat this script multiple times. Also, --search-query can take something more specific, if you want to be selective about the documents getting deleted.