Configuration File Format
The gpfdist
configuration file uses the YAML 1.1 document format and implements a schema for defining the transformation parameters. The configuration file must be a valid YAML document.
The gpfdist
program processes the document in order and uses indentation (spaces) to determine the document hierarchy and relationships of the sections to one another. The use of white space is significant. Do not use white space for formatting and do not use tabs.
The following is the basic structure of a configuration file.
---
VERSION: 1.0.0.1
TRANSFORMATIONS:
transformation_name1:
TYPE: input | output
COMMAND: command
CONTENT: data | paths
SAFE: posix-regex
STDERR: server | console
transformation_name2:
TYPE: input | output
COMMAND: command
...
VERSION
Required. The version of the gpfdist
configuration file schema. The current version is 1.0.0.1.
TRANSFORMATIONS
Required. Begins the transformation specification section. A configuration file must have at least one transformation. When gpfdist
receives a transformation request, it looks in this section for an entry with the matching transformation name.
TYPE
Required. Specifies the direction of transformation. Values are input
or output
.
-
input
:gpfdist
treats the standard output of the transformation process as a stream of records to load into HAWQ. -
output
:gpfdist
treats the standard input of the transformation process as a stream of records from HAWQ to transform and write to the appropriate output.
COMMAND
Required. Specifies the command gpfdist
will execute to perform the transformation.
For input transformations, gpfdist
invokes the command specified in the CONTENT
setting. The command is expected to open the underlying file(s) as appropriate and produce one line of TEXT
for each row to load into HAWQ />. The input transform determines whether the entire content should be converted to one row or to multiple rows.
For output transformations, gpfdist
invokes this command as specified in the CONTENT
setting. The output command is expected to open and write to the underlying file(s) as appropriate. The output transformation determines the final placement of the converted output.
CONTENT
Optional. The values are data
and paths
. The default value is data
.
- When
CONTENT
specifiesdata
, the text%filename%
in theCOMMAND
section is replaced by the path to the file to read or write. - When
CONTENT
specifiespaths
, the text%filename%
in theCOMMAND
section is replaced by the path to the temporary file that contains the list of files to read or write.
The following is an example of a COMMAND
section showing the text %filename%
that is replaced.
COMMAND: /bin/bash input_transform.sh %filename%
SAFE
Optional. A POSIX
regular expression that the paths must match to be passed to the transformation. Specify SAFE
when there is a concern about injection or improper interpretation of paths passed to the command. The default is no restriction on paths.
STDERR
Optional.The values are server
and console
.
This setting specifies how to handle standard error output from the transformation. The default, server
, specifies that gpfdist
will capture the standard error output from the transformation in a temporary file and send the first 8k of that file to HAWQ as an error message. The error message will appear as a SQL error. Console
specifies that gpfdist
does not redirect or transmit the standard error output from the transformation.