hadoophdfsparquetgreenplum

Greenplum pxf - select from external table - invalid configuration


I have a greenplum database up and running and parquet files stored in hdfs at /user/hadoopuser/raw/ I installed and launched pxf and created external table with:

create external table requests(id bigint, full_name text, req_date timestamp)
location('pxf://user/hadoopuser/raw?PROFILE=hdfs:parquet') format 'CUSTOM' (formatter='pxfwritable_import')

But when I try to access data with select * from requests I get the following error:

[08000] ERROR: PXF server error : invalid configuration for server 'default' (seg0 slice1 10.0.2.20:6000 pid=18636) Hint: Configure a valid value for 'pxf.fs.basePath' property for server 'default' to access the filesystem.

pxf-service.log only contains

java.io.IOException: org.greenplum.pxf.api.error.PxfRuntimeException: invalid configuration for server 'default'

What is the valid value for pxf.fs.basePath, where do I set it and why is this error happening?


Solution

  • PXF stores configuration for external data sources (e.g., "servers") in either $PXF_HOME/servers/ (the default) or $PXF_BASE/servers. Unless you have relocated $PXF_BASE (see Relocating $PXF_BASE in the docs), it will be stored in $PXF_HOME which is /usr/local/pxf-gp<GPDB-major-version>.

    In the $PXF_HOME/servers directory, there should be one directory per external data source and there typically is called default/. For access HDFS, this directory should contain:

    1. a copy of hdfs-site.xml
    2. a copy of core-site.xml
    3. a copy of pxf-site.xml (see $PXF_HOME/templates)