javacurlgroovymetadatadoi

Why am I obtaining different results when running curl command from Terminal and in Java?


I have learned lots of suggestions to run curl in Java or its derivatives. For example, curl command in Java, using curl command in Java, etc.

Also, I have figured out how to fetch the metadata of a given resource using DOI. From this instruction, I am very interested in running this curl command using a small snippet in Java to handle the result.

Let's give an example. The URL is http://dx.doi.org/10.1016/j.immuni.2015.09.001.

Running curl command from a terminal

curl -LH "Accept: application/x-bibtex" http://dx.doi.org/10.1016/j.immuni.2015.09.001

The output looks like

@article{Biswas_2015,
    doi = {10.1016/j.immuni.2015.09.001},
    url = {https://doi.org/10.1016%2Fj.immuni.2015.09.001},
    year = 2015,
    month = {sep},
    publisher = {Elsevier {BV}},
    volume = {43},
    number = {3},
    pages = {435--449},
    author = {Subhra~K. Biswas},
    title = {Metabolic Reprogramming of Immune Cells in Cancer Progression},
    journal = {Immunity}

Running this curl command in Groovy

Recycling some codes sharing on this site, I have written the process as below.

Map result = [:]
String command = "curl -LH 'Accept: application/x-bibtex' http://dx.doi.org/10.1016/j.immuni.2015.09.001"
Process process = Runtime.getRuntime().exec(command)
InputStream stream = process.getInputStream()
result.put("data", stream.text)
process.destroy()

What I obtain is the whole page in HTML rather than a BibTeX formatted form as what is my expectation.

The question is: what am I doing wrong here? Are there any of you that have experienced with that issue?


Solution

  • Using exec is not a shell - you can't and don't have to quote for a shell, that is not there. Further exec(String) uses by default a string tokenizer (which basically splits at whitespace) to make it particularly useless for any slightly advanced usecase.

    You are most likely always better off to use the version that accepts a string array for the command (+ args).

    What you where effectively calling looked like this (note, that the command gets split at whitespace -- so I used \' to make my shell ignore that):

    # curl -LH \'Accept: application/x-bibtex\' http://dx.doi.org/10.1016/j.immuni.2015.09.001
    curl: (6) Could not resolve host: application
    ... HTML ...
    

    The shortest route using groovy looks like this (note that exec also has a version for passing in an array of strings):

    groovy:000> ["curl", "-LH", "Accept: application/x-bibtex", "http://dx.doi.org/10.1016/j.immuni.2015.09.001"].execute().text
    ===> @article{Biswas_2015,
    9doi = {10.1016/j.immuni.2015.09.001},
    9url = {https://doi.org/10.1016%2Fj.immuni.2015.09.001},
    9year = 2015,
    9month = {sep},
    9publisher = {Elsevier {BV}},
    9volume = {43},
    9number = {3},
    9pages = {435--449},
    9author = {Subhra~K. Biswas},
    9title = {Metabolic Reprogramming of Immune Cells in Cancer Progression},
    9journal = {Immunity}
    }
    

    If you need "shell-isms", then use ["sh", "-c", command] instead.