Creating External Tables - Examples
- Example 1 - Single gpfdist instance on single-NIC machine
- Example 2 - Multiple gpfdist instances
- Example 3 - Multiple gpfdists instances
- Example 4 - Single gpfdist instance with error logging
- Example 5 - Readable Web External Table with Script
- Example 6 - Writable External Table with gpfdist
- Example 7 - Writable External Web Table with Script
- Example 8 - Readable and Writable External Tables with XML Transformations
The following examples show how to define external data with different protocols. Each CREATE EXTERNAL TABLE
command can contain only one protocol.
Note: When using IPv6, always enclose the numeric IP addresses in square brackets.
Start gpfdist
before you create external tables with the gpfdist
protocol. The following code starts the gpfdist
file server program in the background on port 8081 serving files from directory /var/data/staging
. The logs are saved in /home/gpadmin/log
.
$ gpfdist -p 8081 -d /var/data/staging -l /home/gpadmin/log &
Example 1 - Single gpfdist instance on single-NIC machine
Creates a readable external table, ext_expenses
, using the gpfdist
protocol. The files are formatted with a pipe (|) as the column delimiter.
=# CREATE EXTERNAL TABLE ext_expenses
( name text, date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*')
FORMAT 'TEXT' (DELIMITER '|');
Example 2 - Multiple gpfdist instances
Creates a readable external table, ext_expenses, using the gpfdist
protocol from all files with the txt extension. The column delimiter is a pipe ( | ) and NULL is a space (’ ’).
=# CREATE EXTERNAL TABLE ext_expenses
( name text, date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*.txt', 'gpfdist://etlhost-2:8081/*.txt')
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
Example 3 - Multiple gpfdists instances
Creates a readable external table, ext_expenses, from all files with the txt extension using the gpfdists
protocol. The column delimiter is a pipe ( | ) and NULL is a space (’ ’). For information about the location of security certificates, see gpfdists Protocol.
- Run
gpfdist
with the--ssl
option. Run the following command.
=# CREATE EXTERNAL TABLE ext_expenses ( name text, date date, amount float4, category text, desc1 text ) LOCATION ('gpfdists://etlhost-1:8081/*.txt', 'gpfdists://etlhost-2:8082/*.txt') FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
Example 4 - Single gpfdist instance with error logging
Uses the gpfdist protocol to create a readable external table, ext_expenses,
from all files with the txt extension. The column delimiter is a pipe ( | ) and NULL (’ ’) is a space.
Access to the external table is single row error isolation mode. Input data formatting errors can be captured so that you can view the errors, fix the issues, and then reload the rejected data. If the error count on a segment is greater than five (the SEGMENT REJECT LIMIT
value), the entire external table operation fails and no rows are processed.
=# CREATE EXTERNAL TABLE ext_expenses
( name text, date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*.txt', 'gpfdist://etlhost-2:8082/*.txt')
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
LOG ERRORS INTO expenses_errs SEGMENT REJECT LIMIT 5;
To create the readable ext_expenses
table from CSV-formatted text files:
=# CREATE EXTERNAL TABLE ext_expenses
( name text, date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*.txt', 'gpfdist://etlhost-2:8082/*.txt')
FORMAT 'CSV' ( DELIMITER ',' )
LOG ERRORS INTO expenses_errs SEGMENT REJECT LIMIT 5;
Example 5 - Readable Web External Table with Script
Creates a readable web external table that executes a script once on five virtual segments:
=# CREATE EXTERNAL WEB TABLE log_output (linenum int, message text)
EXECUTE '/var/load_scripts/get_log_data.sh' ON 5
FORMAT 'TEXT' (DELIMITER '|');
Example 6 - Writable External Table with gpfdist
Creates a writable external table, sales_out, that uses gpfdist
to write output data to the file sales.out. The column delimiter is a pipe ( | ) and NULL is a space (’ ’). The file will be created in the directory specified when you started the gpfdist file server.
=# CREATE WRITABLE EXTERNAL TABLE sales_out (LIKE sales)
LOCATION ('gpfdist://etl1:8081/sales.out')
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
DISTRIBUTED BY (txn_id);
Example 7 - Writable External Web Table with Script
Creates a writable external web table, campaign_out
, that pipes output data recieved by the segments to an executable script, to_adreport_etl.sh
:
=# CREATE WRITABLE EXTERNAL WEB TABLE campaign_out
(LIKE campaign)
EXECUTE '/var/unload_scripts/to_adreport_etl.sh' ON 6
FORMAT 'TEXT' (DELIMITER '|');
Example 8 - Readable and Writable External Tables with XML Transformations
HAWQ can read and write XML data to and from external tables with gpfdist. For information about setting up an XML transform, see Transforming XML Data.