About gpfdist Setup and Performance

Consider the following scenarios for optimizing your ETL network performance.

  • Allow network traffic to use all ETL host Network Interface Cards (NICs) simultaneously. Run one instance of gpfdist on the ETL host, then declare the host name of each NIC in the LOCATION clause of your external table definition (see Creating External Tables - Examples).

Figure: External Table Using Single gpfdist Instance with Multiple NICs

  • Divide external table data equally among multiple gpfdist instances on the ETL host. For example, on an ETL system with two NICs, run two gpfdist instances (one on each NIC) to optimize data load performance and divide the external table data files evenly between the two gpfdists.

Figure: External Tables Using Multiple gpfdist Instances with Multiple NICs

Note: Use pipes (|) to separate formatted text when you submit files to gpfdist. HAWQ encloses comma-separated text strings in single or double quotes. gpfdist has to remove the quotes to parse the strings. Using pipes to separate formatted text avoids the extra step and improves performance.