I have a CSV of ~2000 URLs that, when queried, do a 301 or 302 redirect, and I'm trying to figure out if OpenRefine is able to export to a new column the destination url that it retrieves HTML from when I fetch the html from it (or some other way).
e.g.
https://www-istp.gsfc.nasa.gov/stargaze/Ssolsys.htm
redirects to
https://pwg.gsfc.nasa.gov/stargaze/Ssolsys.htm
And I know that from clicking the link in my browser of choice. I've found a few answers suggesting that this can be done in various coding languages, but nothing so far suggesting how to do so in OpenRefine, even though I'm like 80% sure that it can be.
Does anyone out there know what I might be able to do to make this happen?
In OpenRefine you can write expressions in GREL, Jython (Java Implementation of Python 2) and Clojure. As far as I know GREL does not support analyzing the target of a redirection URL, so I would use Python for that.
In your OpenRefine Project go to your column containing the urls and use "Edit column" > "Add column based on this column..."
In the corresponding dialog window (see Screenshot below) you change the expression language to "Python / Jython" and use the following code snippet to retrieve the "real" URL of the request.
import urllib2
response = urllib2.urlopen(value)
return response.geturl()