curlapache-sparkdatabricks

Using curl within a Databricks+Spark notebook


I'm running a Spark cluster using Databricks. I'd like to transfer data from a server using curl. For example,

curl -H "Content-Type: application/json" -H "auth:xxxx" -X GET "https://websites.net/Automation/Offline?startTimeInclusive=201609240100&endTimeExclusive=201609240200&dataFormat=json" -k > automation.json

How does one do this within a Databricks notebook (preferably in python, but Scala is also okay)?


Solution

  • In Scala, you can do something like:

    import sys.process._
    val command = """curl -H "Content-Type: application/json" -H "auth:xxxx" -X GET "http://google.com" -k > /home/user/automation.json"""
    Seq("/bin/bash", "-c", command).!!