Handling Load Errors
Readable external tables are most commonly used to select data to load into regular database tables. You use the CREATE TABLE AS SELECT
or INSERT INTO
commands to query the external table data. By default, if the data contains an error, the entire command fails and the data is not loaded into the target database table.
The SEGMENT REJECT LIMIT
clause allows you to isolate format errors in external table data and to continue loading correctly formatted rows. Use SEGMENT REJECT LIMIT
to set an error threshold, specifying the reject limit count
as number of ROWS
(the default) or as a PERCENT
of total rows (1-100).
The entire external table operation is aborted, and no rows are processed, if the number of error rows reaches the SEGMENT REJECT LIMIT
. The limit of error rows is per-segment, not per entire operation. The operation processes all good rows, and it discards and optionally logs formatting errors for erroneous rows, if the number of error rows does not reach the SEGMENT REJECT LIMIT
.
The LOG ERRORS
clause allows you to keep error rows for further examination. For information about the LOG ERRORS
clause, see the CREATE EXTERNAL TABLE
command.
When you set SEGMENT REJECT LIMIT
, HAWQ scans the external data in single row error isolation mode. Single row error isolation mode applies to external data rows with format errors such as extra or missing attributes, attributes of a wrong data type, or invalid client encoding sequences. HAWQ does not check constraint errors, but you can filter constraint errors by limiting the SELECT
from an external table at runtime. For example, to eliminate duplicate key errors:
=# INSERT INTO table_with_pkeys
SELECT DISTINCT * FROM external_table;