jsonfreebasemqlgoogle-refine

Google Refine and fetching data from freebase for a large data set to create a column from URL not working


I have a google refine project with 36k rows of data. I would like to add another column with fetching json data from freebase url. I was able to get it working on a small dataset but when i ran it on this project it took few hours to process and then most of the results were blank. I did get some results with data though. Is there a way to limit on amount of rows the data will be fetched or a better way of getting the data from the url.

Thank You!


Solution

  • If you're adding data from Freebase, you'd probably be better off using the "Add column from Freebase" rather than "Add column by fetching URL."

    Facets are one of the most powerful Google Refine features and they can be used to control all kinds of things. In this case, you could use a facet to select a subset of your data and only do the fetch on that subset (and then repeat with a different subset).

    The next version of Refine will include better error reporting on the results of URL fetches to help debug problems like this, but make sure that you're respecting all the limits of the remote site as far as total number of requests, requests per second, etc.