Enabling GPORCA

Precompiled versions of HAWQ that include the GPORCA query optimizer enable it by default, no additional configuration is required. To use the GPORCA query optimizer in a HAWQ built from source, your build must include GPORCA. You must also enable specific HAWQ server configuration parameters at or after install time:

Important: If you intend to execute queries on partitioned tables with GPORCA enabled, you must collect statistics on the partitioned table root partition with the ANALYZE ROOTPARTITION command. The command ANALYZE ROOTPARTITION collects statistics on the root partition of a partitioned table without collecting statistics on the leaf partitions. If you specify a list of column names for a partitioned table, the statistics for the columns and the root partition are collected. For information on the ANALYZE command, see ANALYZE.

You can also use the HAWQ utility analyzedb to update table statistics. The HAWQ utility analyzedb can update statistics for multiple tables in parallel. The utility can also check table statistics and update statistics only if the statistics are not current or do not exist. For information about the analyzedb utility, see analyzedb.

As part of routine database maintenance, you should refresh statistics on the root partition when there are significant changes to child leaf partition data.

Setting the optimizer_analyze_root_partition Parameter

When the configuration parameter optimizer_analyze_root_partition is set to on, root partition statistics will be collected when ANALYZE is run on a partitioned table. Root partition statistics are required by GPORCA.

You will perform different procedures to set optimizer configuration parameters for your whole HAWQ cluster depending upon whether you manage your cluster from the command line or use Ambari. If you use Ambari to manage your HAWQ cluster, you must ensure that you update server configuration parameters only via the Ambari Web UI. If you manage your HAWQ cluster from the command line, you will use the hawq config command line utility to set optimizer server configuration parameters.

If you use Ambari to manage your HAWQ cluster:

  1. Set the optimizer_analyze_root_partition configuration property to on via the HAWQ service Configs > Advanced > Custom hawq-site drop down.
  2. Select Service Actions > Restart All to load the updated configuration.

If you manage your HAWQ cluster from the command line:

  1. Log in to the HAWQ master host as a HAWQ administrator and source the file /usr/local/hawq/greenplum_path.sh.

    $ source /usr/local/hawq/greenplum_path.sh
    
  2. Use the hawq config utility to set optimizer_analyze_root_partition:

    $ hawq config -c optimizer_analyze_root_partition -v on
    
  3. Reload the HAWQ configuration:

    $ hawq stop cluster -u
    

Enabling GPORCA for a System

Set the server configuration parameter optimizer for the HAWQ system.

If you use Ambari to manage your HAWQ cluster:

  1. Set the optimizer configuration property to on via the HAWQ service Configs > Advanced > Custom hawq-site drop down.
  2. Select Service Actions > Restart All to load the updated configuration.

If you manage your HAWQ cluster from the command line:

  1. Log in to the HAWQ master host as a HAWQ administrator and source the file /usr/local/hawq/greenplum_path.sh.

    $ source /usr/local/hawq/greenplum_path.sh
    
  2. Use the hawq config utility to set optimizer:

    $ hawq config -c optimizer -v on
    
  3. Reload the HAWQ configuration:

    $ hawq stop cluster -u
    

Enabling GPORCA for a Database

Set the server configuration parameter optimizer for individual HAWQ databases with the ALTER DATABASE command. For example, this command enables GPORCA for the database test_db.

=> ALTER DATABASE test_db SET optimizer = ON ;

Enabling GPORCA for a Session or a Query

You can use the SET command to set optimizer server configuration parameter for a session. For example, after you use the psql utility to connect to HAWQ, this SET command enables GPORCA:

=> SET optimizer = on ;

To set the parameter for a specific query, include the SET command prior to running the query.