[SOLVED] How to integrate Palantir Foundry with Amazon S3 or HDFS

How to integrate Palantir Foundry with Amazon S3 or HDFS

Within Palantir Foundry platform, I am working in Data integration. I need some help as I am new to Palantir software. Is there any documents, white-papers, links or tutorials on this topic?

How do I integrate data from another source, for example Amazon S3 or HDFS?

Solution

To integrate data from another Platform you'll need a source and a sync in data connection. You'll need to have platform permissions to create these, not all users can since it can involve the organisation data governance policies.

Assuming you don't have a source with a valid configuration for S3. You'll need to create one. On Data Connection, click "Sources" and then click "New Source". You can then do this in two ways:

Use prebuilt S3 source: Click File System in the new New Source drop down and follow the wizard steps
Use a custom connector such as magritte-rest: Click Custom in the same dropdown.

For the magritte-rest:

Select either one of the available agents, or Cloud ingest depending on your preferences
Give it a name and save it into a folder.
Add configuration like:

type: magritte-rest
url: 'https://foobar.organization.s3.amazonaws.com'

Now to create the Sync, use a configuration similar to this:

type: rest-source-adapter
method: GET
path: the/path/in/s3/yourdata
outputFileType: csv

Other output file types are also supported (json, zip, ...)