This project has retired. For details please refer to its Attic page.
Apache HAWQ (Incubating) Docs
Back to Apache HAWQ Page
Need Help?

Doc Index
  • Apache HAWQ (incubating)
  • System Requirements
  • HAWQ System Overview
    • What is HAWQ?
    • HAWQ Architecture
    • Table Distribution and Storage
    • Elastic Query Execution Runtime
    • Resource Management
    • HDFS Catalog Cache
    • Management Tools
    • High Availability, Redundancy and Fault Tolerance
  • Getting Started with HAWQ Tutorial
    • Lesson 1 - Runtime Environment
    • Lesson 2 - Cluster Administration
    • Lesson 3 - Database Administration
    • Lesson 4 - Sample Data Set and HAWQ Schemas
    • Lesson 5 - HAWQ Tables
    • Lesson 6 - HAWQ Extension Framework (PXF)
  • Running a HAWQ Cluster
    • Overview
    • Introducing the HAWQ Operating Environment
    • Managing HAWQ Using Ambari
      • Using the Ambari REST API
    • Starting and Stopping HAWQ
    • Expanding a Cluster
    • Removing a Node
    • Backing Up and Restoring HAWQ
    • High Availability in HAWQ
    • Master Mirroring
    • HAWQ Filespaces and High Availability Enabled HDFS
    • Understanding the Fault Tolerance Service
    • Recommended Monitoring and Maintenance Tasks
    • Routine System Maintenance Tasks
    • Monitoring a HAWQ System
    • HAWQ Administrative Log Files
  • Managing Resources
    • How HAWQ Manages Resources
    • Best Practices for Configuring Resource Management
    • Configuring Resource Management
    • Integrating YARN with HAWQ
    • Working with Hierarchical Resource Queues
    • Analyzing Resource Manager Status
  • Managing Client Access
    • Configuring Client Authentication
    • Using LDAP Authentication with TLS/SSL
    • Using Kerberos Authentication
    • Disabling Kerberos Security
    • Overview of HAWQ Authorization
    • Using HAWQ Native Authorization
    • Using Ranger for Authorization
      • Overview of Ranger Policy Management
      • Configuring HAWQ to use Ranger Policy Management
      • Creating HAWQ Authorization Policies in Ranger
        • HAWQ Resources and Permissions
        • SQL Command Permissions Summary
        • Using MADLib with Ranger Authorization
      • Auditing Authorization Events
    • Establishing a Database Session
    • Supported Client Applications
    • HAWQ Client Applications
    • Connecting with psql
    • HAWQ Database Drivers and APIs
    • Troubleshooting Connection Problems
  • Defining Database Objects
    • Overview
    • Creating and Managing Databases
    • Creating and Managing Tablespaces
    • Creating and Managing Schemas
    • Creating and Managing Tables
    • Identifying HAWQ Table HDFS Files
    • Choosing the Table Storage Model
    • Partitioning Large Tables
    • Creating and Managing Views
  • Using Procedural Languages
    • Using Languages in HAWQ
    • Using HAWQ Built-In Languages
    • Using PL/Java
    • Using PL/pgSQL
    • Using PL/Python
    • Using PL/R
  • Managing Data with HAWQ
    • Basic Data Operations
    • About Database Statistics
    • Concurrency Control
    • Working with Transactions
    • Loading and Unloading Data
      • Working with File-Based External Tables
        • Accessing File-Based External Tables
        • gpfdist Protocol
        • gpfdists Protocol
        • Handling Errors in External Table Data
      • Using the HAWQ File Server (gpfdist)
        • About gpfdist Setup and Performance
        • Controlling Segment Parallelism
        • Installing gpfdist
        • Starting and Stopping gpfdist
        • Troubleshooting gpfdist
      • Creating and Using Web External Tables
        • Command-based Web External Tables
        • URL-based Web External Tables
      • Loading Data Using an External Table
      • Loading and Writing Non-HDFS Custom Data
        • Using a Custom Format
          • Importing and Exporting Fixed Width Data
          • Examples - Read Fixed-Width Data
      • Creating External Tables - Examples
      • Handling Load Errors
        • Define an External Table with Single Row Error Isolation
        • Capture Row Formatting Errors and Declare a Reject Limit
        • Identifying Invalid CSV Files in Error Table Data
        • Moving Data between Tables
      • Registering Files into HAWQ Internal Tables
      • Loading Data with hawq load
      • Loading Data with COPY
      • Running COPY in Single Row Error Isolation Mode
      • Optimizing Data Load and Query Performance
      • Unloading Data from HAWQ
        • Defining a File-Based Writable External Table
          • Example - HAWQ file server (gpfdist)
        • Defining a Command-Based Writable External Web Table
          • Disabling EXECUTE for Web or Writable External Tables
        • Unloading Data Using a Writable External Table
        • Unloading Data Using COPY
      • Transforming XML Data
        • Determine the Transformation Schema
        • Write a Transform
        • Write the gpfdist Configuration
        • Load the Data
        • Transfer and Store the Data
          • Transforming with GPLOAD
          • Transforming with INSERT INTO SELECT FROM
          • Configuration File Format
        • XML Transformation Examples
          • Command-based Web External Tables
          • Example using IRS MeF XML Files (In demo Directory)
          • Example using WITSML™ Files (In demo Directory)
      • Formatting Data Files
        • Formatting Rows
        • Formatting Columns
        • Representing NULL Values
        • Escaping
          • Escaping in Text Formatted Files
          • Escaping in CSV Formatted Files
        • Character Encoding
    • HAWQ InputFormat for MapReduce
  • Using PXF with Unmanaged Data
    • Installing PXF Plugins
    • Configuring PXF
    • Accessing HDFS File Data
    • Accessing Hive Data
    • Accessing HBase Data
    • Accessing JSON Data
    • Writing Data to HDFS
    • Using Profiles to Read and Write Data
    • PXF External Tables and API
    • Troubleshooting PXF
  • Querying Data
    • About HAWQ Query Processing
    • About GPORCA
      • Overview of GPORCA
      • GPORCA Features and Enhancements
      • Enabling GPORCA
      • Considerations when Using GPORCA
      • Determining The Query Optimizer In Use
      • Changed Behavior with GPORCA
      • GPORCA Limitations
    • Defining Queries
    • Using Functions and Operators
    • Query Performance
    • Query Profiling
  • Best Practices
    • Configuring HAWQ
    • Operating HAWQ
    • Securing HAWQ
    • Managing Resources
    • Managing Data
    • Querying Data
  • Troubleshooting
    • Query Performance Issues
    • Rejection of Query Resource Requests
    • Queries Cancelled Due to High VMEM Usage
    • Segments Do Not Appear in gp_segment_configuration
    • Handling Segment Resource Fragmentation
  • HAWQ Reference
    • SQL Commands
      • ABORT
      • ALTER AGGREGATE
      • ALTER CONVERSION
      • ALTER DATABASE
      • ALTER FUNCTION
      • ALTER OPERATOR
      • ALTER OPERATOR CLASS
      • ALTER RESOURCE QUEUE
      • ALTER ROLE
      • ALTER SEQUENCE
      • ALTER TABLE
      • ALTER TABLESPACE
      • ALTER TYPE
      • ALTER USER
      • ANALYZE
      • BEGIN
      • CHECKPOINT
      • CLOSE
      • COMMIT
      • COPY
      • CREATE AGGREGATE
      • CREATE CAST
      • CREATE CONVERSION
      • CREATE DATABASE
      • CREATE EXTERNAL TABLE
      • CREATE FUNCTION
      • CREATE GROUP
      • CREATE LANGUAGE
      • CREATE OPERATOR
      • CREATE OPERATOR CLASS
      • CREATE RESOURCE QUEUE
      • CREATE ROLE
      • CREATE SCHEMA
      • CREATE SEQUENCE
      • CREATE TABLE
      • CREATE TABLE AS
      • CREATE TABLESPACE
      • CREATE TYPE
      • CREATE USER
      • CREATE VIEW
      • DEALLOCATE
      • DECLARE
      • DROP AGGREGATE
      • DROP CAST
      • DROP CONVERSION
      • DROP DATABASE
      • DROP EXTERNAL TABLE
      • DROP FILESPACE
      • DROP FUNCTION
      • DROP GROUP
      • DROP LANGUAGE
      • DROP OPERATOR
      • DROP OPERATOR CLASS
      • DROP OWNED
      • DROP RESOURCE QUEUE
      • DROP ROLE
      • DROP SCHEMA
      • DROP SEQUENCE
      • DROP TABLE
      • DROP TABLESPACE
      • DROP TYPE
      • DROP USER
      • DROP VIEW
      • END
      • EXECUTE
      • EXPLAIN
      • FETCH
      • GRANT
      • INSERT
      • PREPARE
      • REASSIGN OWNED
      • RELEASE SAVEPOINT
      • RESET
      • REVOKE
      • ROLLBACK
      • ROLLBACK TO SAVEPOINT
      • SAVEPOINT
      • SELECT
      • SELECT INTO
      • SET
      • SET ROLE
      • SET SESSION AUTHORIZATION
      • SHOW
      • TRUNCATE
      • VACUUM
    • Server Configuration Parameter Reference
      • About Server Configuration Parameters
      • Configuration Parameter Categories
        • Append-Only Table Parameters
        • Client Connection Default Parameters
        • Connection and Authentication Parameters
        • Database and Tablespace/Filespace Parameters
        • Error Reporting and Logging Parameters
        • External Table Parameters
        • GPORCA Parameters
        • HAWQ Array Configuration Parameters
        • HAWQ Extension Framework (PXF) Parameters
        • HAWQ PL/Java Extension Parameters
        • HAWQ Resource Management Parameters
        • Lock Management Parameters
        • Past PostgreSQL Version Compatibility Parameters
        • Query Tuning Parameters
        • Ranger Configuration Parameters
        • Statistics Collection Parameters
        • System Resource Consumption Parameters
      • Configuration Parameters
        • add_missing_from
        • application_name
        • array_nulls
        • authentication_timeout
        • backslash_quote
        • block_size
        • bonjour_name
        • check_function_bodies
        • client_encoding
        • client_min_messages
        • cpu_index_tuple_cost
        • cpu_operator_cost
        • cpu_tuple_cost
        • cursor_tuple_fraction
        • custom_variable_classes
        • DateStyle
        • db_user_namespace
        • deadlock_timeout
        • debug_assertions
        • debug_pretty_print
        • debug_print_parse
        • debug_print_plan
        • debug_print_prelim_plan
        • debug_print_rewritten
        • debug_print_slice_table
        • default_hash_table_bucket_number
        • default_statistics_target
        • default_tablespace
        • default_transaction_isolation
        • default_transaction_read_only
        • dynamic_library_path
        • effective_cache_size
        • enable_bitmapscan
        • enable_groupagg
        • enable_hashagg
        • enable_hashjoin
        • enable_indexscan
        • enable_mergejoin
        • enable_nestloop
        • enable_seqscan
        • enable_sort
        • enable_tidscan
        • escape_string_warning
        • explain_pretty_print
        • extra_float_digits
        • from_collapse_limit
        • gp_adjust_selectivity_for_outerjoins
        • gp_analyze_relative_error
        • gp_autostats_mode
        • gp_autostats_on_change_threshhold
        • gp_backup_directIO
        • gp_backup_directIO_read_chunk_mb
        • gp_cached_segworkers_threshold
        • gp_command_count
        • gp_connections_per_thread
        • gp_debug_linger
        • gp_dynamic_partition_pruning
        • gp_enable_agg_distinct
        • gp_enable_agg_distinct_pruning
        • gp_enable_direct_dispatch
        • gp_enable_fallback_plan
        • gp_enable_fast_sri
        • gp_enable_groupext_distinct_gather
        • gp_enable_groupext_distinct_pruning
        • gp_enable_multiphase_agg
        • gp_enable_predicate_propagation
        • gp_enable_preunique
        • gp_enable_sequential_window_plans
        • gp_enable_sort_distinct
        • gp_enable_sort_limit
        • gp_external_enable_exec
        • gp_external_grant_privileges
        • gp_external_max_segs
        • gp_filerep_tcp_keepalives_count
        • gp_filerep_tcp_keepalives_idle
        • gp_filerep_tcp_keepalives_interval
        • gp_hashjoin_tuples_per_bucket
        • gp_idf_deduplicate
        • gp_interconnect_cache_future_packets
        • gp_interconnect_default_rtt
        • gp_interconnect_fc_method
        • gp_interconnect_hash_multiplier
        • gp_interconnect_min_retries_before_timeout
        • gp_interconnect_min_rto
        • gp_interconnect_queue_depth
        • gp_interconnect_setup_timeout
        • gp_interconnect_snd_queue_depth
        • gp_interconnect_timer_checking_period
        • gp_interconnect_timer_period
        • gp_interconnect_type
        • gp_log_format
        • gp_max_csv_line_length
        • gp_max_databases
        • gp_max_filespaces
        • gp_max_packet_size
        • gp_max_plan_size
        • gp_max_tablespaces
        • gp_motion_cost_per_row
        • gp_reject_percent_threshold
        • gp_reraise_signal
        • gp_role
        • gp_safefswritesize
        • gp_segment_connect_timeout
        • gp_segments_for_planner
        • gp_session_id
        • gp_set_proc_affinity
        • gp_set_read_only
        • gp_statistics_pullup_from_child_partition
        • gp_statistics_use_fkeys
        • gp_vmem_idle_resource_timeout
        • gp_vmem_protect_segworker_cache_limit
        • gp_workfile_checksumming
        • gp_workfile_compress_algorithm
        • gp_workfile_limit_files_per_query
        • gp_workfile_limit_per_query
        • gp_workfile_limit_per_segment
        • hawq_acl_type
        • hawq_dfs_url
        • hawq_global_rm_type
        • hawq_master_address_host
        • hawq_master_address_port
        • hawq_master_directory
        • hawq_master_temp_directory
        • hawq_re_memory_overcommit_max
        • hawq_rm_cluster_report_period
        • hawq_rm_force_alterqueue_cancel_queued_request
        • hawq_rm_master_port
        • hawq_rm_memory_limit_perseg
        • hawq_rm_min_resource_perseg
        • hawq_rm_nresqueue_limit
        • hawq_rm_nslice_perseg_limit
        • hawq_rm_nvcore_limit_perseg
        • hawq_rm_nvseg_perquery_limit
        • hawq_rm_nvseg_perquery_perseg_limit
        • hawq_rm_nvseg_variance_amon_seg_limit
        • hawq_rm_rejectrequest_nseg_limit
        • hawq_rm_resource_idle_timeout
        • hawq_rm_return_percent_on_overcommit
        • hawq_rm_segment_heartbeat_interval
        • hawq_rm_segment_port
        • hawq_rm_stmt_nvseg
        • hawq_rm_stmt_vseg_memory
        • hawq_rm_tolerate_nseg_limit
        • hawq_rm_yarn_address
        • hawq_rm_yarn_app_name
        • hawq_rm_yarn_queue_name
        • hawq_rm_yarn_scheduler_address
        • hawq_rps_address_port
        • hawq_segment_address_port
        • hawq_segment_directory
        • hawq_segment_temp_directory
        • integer_datetimes
        • IntervalStyle
        • join_collapse_limit
        • krb_caseins_users
        • krb_server_keyfile
        • krb_srvname
        • lc_collate
        • lc_ctype
        • lc_messages
        • lc_monetary
        • lc_numeric
        • lc_time
        • listen_addresses
        • local_preload_libraries
        • log_autostats
        • log_connections
        • log_disconnections
        • log_dispatch_stats
        • log_duration
        • log_error_verbosity
        • log_executor_stats
        • log_hostname
        • log_min_duration_statement
        • log_min_error_statement
        • log_min_messages
        • log_parser_stats
        • log_planner_stats
        • log_rotation_age
        • log_rotation_size
        • log_statement
        • log_statement_stats
        • log_timezone
        • log_truncate_on_rotation
        • max_appendonly_tables
        • max_connections
        • max_files_per_process
        • max_fsm_pages
        • max_fsm_relations
        • max_function_args
        • max_identifier_length
        • max_index_keys
        • max_locks_per_transaction
        • max_prepared_transactions
        • max_stack_depth
        • optimizer
        • optimizer_analyze_root_partition
        • optimizer_minidump
        • optimizer_parts_to_force_sort_on_insert
        • optimizer_prefer_scalar_dqa_multistage_agg
        • password_encryption
        • pgstat_track_activity_query_size
        • pljava_classpath
        • pljava_statement_cache_size
        • pljava_release_lingering_savepoints
        • pljava_vmoptions
        • port
        • pxf_enable_filter_pushdown
        • pxf_enable_stat_collection
        • pxf_remote_service_login
        • pxf_remote_service_secret
        • pxf_service_address
        • pxf_service_port
        • pxf_stat_max_fragments
        • random_page_cost
        • regex_flavor
        • runaway_detector_activation_percent
        • search_path
        • seg_max_connections
        • seq_page_cost
        • server_encoding
        • server_version
        • server_version_num
        • shared_buffers
        • shared_preload_libraries
        • ssl
        • ssl_ciphers
        • standard_conforming_strings
        • statement_timeout
        • superuser_reserved_connections
        • tcp_keepalives_count
        • tcp_keepalives_idle
        • tcp_keepalives_interval
        • temp_buffers
        • TimeZone
        • timezone_abbreviations
        • track_activities
        • track_counts
        • transaction_isolation
        • transaction_read_only
        • transform_null_equals
        • unix_socket_directory
        • unix_socket_group
        • unix_socket_permissions
        • update_process_title
        • vacuum_cost_delay
        • vacuum_cost_limit
        • vacuum_cost_page_dirty
        • vacuum_cost_page_miss
        • vacuum_freeze_min_age
        • xid_stop_limit
      • Sample hawq-site.xml Configuration File
    • HDFS Configuration Reference
    • Environment Variables
    • Character Set Support Reference
    • Data Types
    • System Catalog Reference
      • System Tables
      • System Views
      • System Catalogs Definitions
        • gp_configuration_history
        • gp_distribution_policy
        • gp_global_sequence
        • gp_master_mirroring
        • gp_persistent_database_node
        • gp_persistent_filespace_node
        • gp_persistent_relation_node
        • gp_persistent_relfile_node
        • gp_persistent_tablespace_node
        • gp_relfile_node
        • gp_segment_configuration
        • gp_version_at_initdb
        • pg_aggregate
        • pg_am
        • pg_amop
        • pg_amproc
        • pg_appendonly
        • pg_attrdef
        • pg_attribute
        • pg_attribute_encoding
        • pg_auth_members
        • pg_authid
        • pg_cast
        • pg_class
        • pg_compression
        • pg_constraint
        • pg_conversion
        • pg_database
        • pg_depend
        • pg_description
        • pg_exttable
        • pg_filespace
        • pg_filespace_entry
        • pg_index
        • pg_inherits
        • pg_language
        • pg_largeobject
        • pg_listener
        • pg_locks
        • pg_namespace
        • pg_opclass
        • pg_operator
        • pg_partition
        • pg_partition_columns
        • pg_partition_encoding
        • pg_partition_rule
        • pg_partition_templates
        • pg_partitions
        • pg_pltemplate
        • pg_proc
        • pg_resqueue
        • pg_resqueue_status
        • pg_rewrite
        • pg_roles
        • pg_shdepend
        • pg_shdescription
        • pg_stat_activity
        • pg_stat_last_operation
        • pg_stat_last_shoperation
        • pg_stat_operations
        • pg_stat_partition_operations
        • pg_statistic
        • pg_stats
        • pg_tablespace
        • pg_trigger
        • pg_type
        • pg_type_encoding
        • pg_window
    • The hawq_toolkit Administrative Schema
      • Checking for Tables that Need Routine Maintenance
      • Viewing HAWQ Server Log Files
      • Checking Database Object Sizes and Disk Space
    • HAWQ Management Tools Reference
      • analyzedb
      • createdb
      • createuser
      • dropdb
      • dropuser
      • gpfdist
      • gplogfilter
      • hawq activate
      • hawq check
      • hawq checkperf
      • hawq config
      • hawq extract
      • hawq filespace
      • hawq init
      • hawq load
      • hawq register
      • hawq restart
      • hawq scp
      • hawq ssh
      • hawq ssh-exkeys
      • hawq start
      • hawq state
      • hawq stop
      • pg_dump
      • pg_dumpall
      • pg_restore
      • psql
      • vacuumdb