hashckanfilehash

How can I find out what sort of Hash is being returned by a CKAN resource record?


Example record:

"resources": [
      {
        "cache_last_updated": null,
        "cache_url": null,
        "mimetype_inner": "",
        "hash": "9d599bcf3b8db2b5c6aea528bc37d728c856b09c",
        "description": "CSV file extracted and cleaned from source excel.",
        "format": "CSV",
        "url": "https://raw.github.com/datasets/gold-prices/master/data/data.csv",
        "created": "2017-07-18T13:16:40.728715",
        "state": "active",
        "package_id": "9cbdb9a8-b78d-449e-8342-46fb581a1e17",
        "last_modified": "2012-05-04T12:40:59.181686",
        "mimetype": "text/plain",
        "url_type": null,
        "position": 0,
        "revision_id": "007398e3-a1fc-4a31-821e-a77b9057f796",
        "size": "14502",
        "datastore_active": true,
        "id": "b9aae52b-b082-4159-b46f-7bb9c158d013",
        "resource_type": "file",
        "name": "CSV "
      }
    ],

The API docs say:

key   example Notes
hash  null    Hash of the data e.g. SHA1

e.g. SHA1 Doesn't get me very far. I can't check a hash if I don't know what algorithm was used to compute it.

Looking at the source also doesn't enlighten me. It seems it is a free text field, so I guess uploaders can set it to what ever they want But presumably it is designed to be consumed by someone, so it must be being communicated.

Here is an example where it is an empty string. s


Solution

  • It is provided with the intention of it being used by datapusher, ckanext-xloader, ckanext-archiver or whatever is installed that inspects the data at the resource URL. They choose their own hash function. They generally use it to work out if the data has been updated.

    In that sense the hash field is for internal use only. But I guess the user might want to do the same sort of thing and it think it is reasonable to include the name of the hash function in the value of this field. If you'd like to describe the use case and write a PR for one of these extensions, you'd be most welcome.

    The example you give is a ZIP file on data.gov.au. I believe that site is running datapusher, which aims to download data that is in XLS and CSV format and puts them in the Datastore database to provide data previewing and an API for the data. A ZIP file might is not handled by datapusher, so it ignores them, hence why you'd not expect a hash for this resource.