Write the gpfdist Configuration
The gpfdist configuration is specified as a YAML 1.1 document. It specifies rules that gpfdist uses to select a Transform to apply when loading or extracting data.
This example gpfdist configuration contains the following items:
- the
config.yamlfile definingTRANSFORMATIONS - the
input_transform.shwrapper script, referenced in theconfig.yamlfile - the
input_transform.stxjoost transformation, called frominput_transform.sh
Aside from the ordinary YAML rules, such as starting the document with three dashes (---), a gpfdist configuration must conform to the following restrictions:
- a
VERSIONsetting must be present with the value1.0.0.1. - a
TRANSFORMATIONSsetting must be present and contain one or more mappings. Each mapping in the
TRANSFORMATIONmust contain:- a
TYPEwith the value ‘input’ or 'output’ - a
COMMANDindicating how the transform is run.
- a
Each mapping in the
TRANSFORMATIONcan contain optionalCONTENT,SAFE, andSTDERRsettings.
The following gpfdist configuration called config.YAML applies to the prices example. The initial indentation on each line is significant and reflects the hierarchical nature of the specification. The name prices_input in the following example will be referenced later when creating the table in SQL.
---
VERSION: 1.0.0.1
TRANSFORMATIONS:
prices_input:
TYPE: input
COMMAND: /bin/bash input_transform.sh %filename%
The COMMAND setting uses a wrapper script called input_transform.sh with a %filename% placeholder. When gpfdist runs the prices_input transform, it invokes input_transform.sh with /bin/bash and replaces the %filename% placeholder with the path to the input file to transform. The wrapper script called input_transform.sh contains the logic to invoke the STX transformation and return the output.
If Joost is used, the Joost STX engine must be installed.
#!/bin/bash
# input_transform.sh - sample input transformation,
# demonstrating use of Java and Joost STX to convert XML into
# text to load into HAWQ.
# java arguments:
# -jar joost.jar joost STX engine
# -nodecl don't generate a <?xml?> declaration
# $1 filename to process
# input_transform.stx the STX transformation
#
# the AWK step eliminates a blank line joost emits at the end
java \
-jar joost.jar \
-nodecl \
$1 \
input_transform.stx \
| awk 'NF>0
The input_transform.sh file uses the Joost STX engine with the AWK interpreter. The following diagram shows the process flow as gpfdist runs the transformation.
