[SOLVED] S3-Select Pricing on JSON

S3-Select Pricing on JSON

I am confused about the S3 select pricing regarding data returned and data scanned. If I want to access something at an index in a json file, does it still scan the entire file and the data scanned counts for the entire file size? Suppose I use the following query on this example file:

select * from S3Object[*].place1[*].Houses[*]

{
    "place1": [
        "Houses": [
            {
                "date": "1777-06-30",
                "price": "445000.0"
            },
            {
                "date": "2014-10-31",
                "price": "495000.0"
            }
        ],
        "Apartments": [
            {
                "date": "1777-06-30",
                "price": "445000.0"
            },
            {
                "date": "2014-10-31",
                "price": "495000.0"
            }
        ]
    ]
}

Would it charge data scanned for the entire file or would it be reduced because I am accessing the Houses array directly?

Solution

JSON data would need to be scanned in its entirety to provide the output. This is because there is no concept of an index or a block range on a JSON file. (An index points to where data is stored, and a block range tracks the min/max value of data in a storage block.)

JSON is fine for data interchange, but is not designed for efficient storage.

You could, however, compress the file to reduce the storage cost. It is possible that this would also reduce the scan cost (as is the case for Amazon Athena), but I could not find any information to confirm this for S3 Select.