socratasoda

SODA between operator does not work with some datasets


The between operator works for some Socrata SODA datasets and not for others. When used on certain datasets the query yields no results, but on other datasets it works as expected. What am I doing wrong?

Instead of using between I tried using >= which did yield results. Also, I tried using between on a different dataset which worked as expected.

For example, using between on the :updated_at field in this query yields no results.

curl "https://data.cityofnewyork.us/resource/3h2n-5cm9.json?%24select=%3A*%2C*&%24where=%3Aupdated_at+between+%272019-01-10T10%3A00%3A00.000%27+and+%272019-09-11T10%3A00%3A00.000%27&%24limit=10"
[]

However, there is data because :updated_at >= yields results.

curl "https://data.cityofnewyork.us/resource/3h2n-5cm9.json?%24select=%3A*%2C*&%24where=%3Aupdated_at+%3E%3D+%272019-01-10T10%3A00%3A00.000%27&%24limit=10"
[{":created_at":"2019-09-11T22:23:43.719Z",":id":"row-qu6f_f9ar_rewb",":updated_at":"2019-09-11T22:23:43.719Z"
...]

Here's an example of between working for a different dataset.

curl "https://data.cityofnewyork.us/resource/rvhx-8trz.json?%24select=%3A*%2C*&%24where=%3Aupdated_at+between+%272019-01-10T10%3A00%3A00.000%27+and+%272019-09-11T10%3A00%3A00.000%27&%24limit=10"
[{":created_at":"2018-05-25T21:44:12.712Z",":id":"row-a34u.9rem~mnnx",":updated_at":"2019-01-10T22:23:58.425Z",":version":"rv-xahg~4nmg.r4c2"
...]

The between operator used to work on the https://data.cityofnewyork.us/resource/3h2n-5cm9.json dataset but does not as of 11 Sep 2019 10 pm.


Solution

  • Inspecting the following query returns results, but none of the results appear to be prior to 2019-09-11T10:00:00

    https://data.cityofnewyork.us/resource/3h2n-5cm9.json?$select=:*,*&$where=:updated_at >= '2019-01-10T10:00:00'
    

    Since a couple days have passed, the following query does appear to return all results (I've adjusted the latter date to the current date, September 13th, 2019:

    https://data.cityofnewyork.us/resource/3h2n-5cm9.json?$select=:*,*&$where=:updated_at between '2019-01-10T10:00:00' AND '2019-09-13T10:00:00'
    

    When looking at this file, I noticed all of the :updated_at timestamps are the same (2019-09-11T22:23:43.719Z at the moment). The info box in this section helps explain the field a bit. My guess is NYC is doing a complete replace so the :updated_at is completely changed each day because all records are rewritten.