Loading Data with hawq load
The HAWQ hawq load
utility loads data using readable external tables and the HAWQ parallel file server ( gpfdist
or gpfdists
). It handles parallel file-based external table setup and allows users to configure their data format, external table definition, and gpfdist
or gpfdists
setup in a single configuration file.
To use hawq load
- Ensure that your environment is set up to run
hawq load
. Some dependent files from your HAWQ /> installation are required, such asgpfdist
and Python, as well as network access to the HAWQ segment hosts. Create your load control file. This is a YAML-formatted file that specifies the HAWQ connection information,
gpfdist
configuration information, external table options, and data format.For example:
--- VERSION: 1.0.0.1 DATABASE: ops USER: gpadmin HOST: mdw-1 PORT: 5432 GPLOAD: INPUT: - SOURCE: LOCAL_HOSTNAME: - etl1-1 - etl1-2 - etl1-3 - etl1-4 PORT: 8081 FILE: - /var/load/data/* - COLUMNS: - name: text - amount: float4 - category: text - description: text - date: date - FORMAT: text - DELIMITER: '|' - ERROR_LIMIT: 25 - ERROR_TABLE: payables.err_expenses OUTPUT: - TABLE: payables.expenses - MODE: INSERT SQL: - BEFORE: "INSERT INTO audit VALUES('start', current_timestamp)" - AFTER: "INSERT INTO audit VALUES('end', current_timestamp)"
Run
hawq load
, passing in the load control file. For example:$ hawq load -f my_load.yml