gpfdist
Serves data files to or writes data files out from HAWQ segments.
Synopsis
gpfdist [-d <directory>] [-p <http_port>] [-l <log_file>] [-t <timeout>]
[-S] [-w <time>] [-v | -V] [-s] [-m <max_length>] [--ssl <certificate_path>]
gpfdist -? | --help
gpfdist --version
Description
gpfdist
is HAWQ parallel file distribution program. It is used by readable external tables and hawq load
to serve external table files to all HAWQ segments in parallel. It is used by writable external tables to accept output streams from HAWQ segments in parallel and write them out to a file.
In order for gpfdist
to be used by an external table, the LOCATION
clause of the external table definition must specify the external table data using the gpfdist://
protocol (see the HAWQ command CREATE EXTERNAL TABLE
).
Note: If the --ssl
option is specified to enable SSL security, create the external table with the gpfdists://
protocol.
The benefit of using gpfdist
is that you are guaranteed maximum parallelism while reading from or writing to external tables, thereby offering the best performance as well as easier administration of external tables.
For readable external tables, gpfdist
parses and serves data files evenly to all the segment instances in the HAWQ system when users SELECT
from the external table. For writable external tables, gpfdist
accepts parallel output streams from the segments when users INSERT
into the external table, and writes to an output file.
For readable external tables, if load files are compressed using gzip
or bzip2
(have a .gz
or .bz2
file extension), gpfdist
uncompresses the files automatically before loading provided that gunzip
or bunzip2
is in your path.
Note: Currently, readable external tables do not support compression on Windows platforms, and writable external tables do not support compression on any platforms.
To run gpfdist
on your ETL machines, refer to Client-Based HAWQ Load Tools for more information.
Note: When using IPv6, always enclose the numeric IP address in brackets.
You can also run gpfdist
as a Windows Service. See Running gpfdist as a Windows Service for more details.
Options
gpfdist
will serve files for readable external tables or create output files for writable external tables. If not specified, defaults to the current directory.gpfdist
will serve files. Defaults to 8080.gpfdist
process. Default is 5 seconds. Allowed values are 2 to 600 seconds. May need to be increased on systems with a lot of network traffic.line too long
error message occurs). Should not be used otherwise as it increases resource allocation. Valid range is 32K to 256MB. (The upper limit is 1MB on Windows systems.)WARN
level and higher are written to the gpfdist
log file. INFO
level messages are not written to the log file. If this option is not specified, all gpfdist
messages are written to the log file.
You can specify this option to reduce the information written to the log file.
O_SYNC
flag. Any writes to the resulting file descriptor block gpfdist
until the data is physically written to the underlying hardware.For a HAWQ with multiple segments, there might be a delay between segments when writing data from different segments to the file. You can specify a time to wait before HAWQ closes the file to ensure all the data is written to the file.
gpfdist
. After executing gpfdist
with the --ssl <certificate_path>
option, the only way to load data from this file server is with the gpfdist://
protocol.
The location specified in <certificate_path> must contain the following files:
- The server certificate file,
server.crt
- The server private key file,
server.key
- The trusted certificate authorities,
root.crt
The root directory (/
) cannot be specified as <certificate_path>.
Running gpfdist as a Windows Service
HAWQ Loaders allow gpfdist
to run as a Windows Service.
Follow the instructions below to download, register and activate gpfdist
as a service:
Update your HAWQ Loaders for Windows package to the latest version. See HAWQ Loader Tools for Windows for install and configuration information.
Register
gpfdist
as a Windows service:- Open a Windows command window
Run the following command:
sc create gpfdist binpath= "<loader_install_dir>\bin\gpfdist.exe -p 8081 -d \"<external_load_files_path>\" -l \"<log_file_path>\""
You can create multiple instances of
gpfdist
by running the same command again, with a unique name and port number for each instance:sc create gpfdistN binpath= "<loader_install_dir>\bin\gpfdist.exe -p 8082 -d \"<external_load_files_path>\" -l \"<log_file_path>\""
Activate the
gpfdist
service:- Open the Windows Control Panel and select Administrative Tools > Services.
- Highlight then right-click on the
gpfdist
service in the list of services. Select Properties from the right-click menu, the Service Properties window opens.
Note that you can also stop this service from the Service Properties window.
Optional: Change the Startup Type to Automatic (after a system restart, this service will be running), then under Service status, click Start.
Click OK.
Repeat the above steps for each instance of gpfdist
that you created.
Examples
To serve files from a specified directory using port 8081 (and start gpfdist
in the background):
$ gpfdist -d /var/load_files -p 8081 &
To start gpfdist
in the background and redirect output and errors to a log file:
$ gpfdist -d /var/load_files -p 8081 -l /home/gpadmin/log &
To stop gpfdist
when it is running in the background:
–First find its process id:
$ ps ax | grep gpfdist
–Then kill the process, for example:
$ kill 3456