Query Profiling

Reading EXPLAIN Output
- EXPLAIN Example
Reading EXPLAIN ANALYZE Output
- EXPLAIN ANALYZE Example
Examining Query Plans to Solve Problems
Generating EXPLAIN Plan from a PL/pgSQL Function

Examine the query plans of poorly performing queries to identify possible performance tuning opportunities.

HAWQ devises a query plan for each query. Choosing the right query plan to match the query and data structure is necessary for good performance. A query plan defines how HAWQ will run the query in the parallel execution environment.

The query optimizer uses data statistics maintained by the database to choose a query plan with the lowest possible cost. Cost is measured in disk I/O, shown as units of disk page fetches. The goal is to minimize the total execution cost for the plan.

View the plan for a given query with the EXPLAIN command. EXPLAIN shows the query optimizer’s estimated cost for the query plan. For example:

EXPLAIN SELECT * FROM names WHERE id=22;

EXPLAIN ANALYZE runs the statement in addition to displaying its plan. This is useful for determining how close the optimizer’s estimates are to reality. For example:

EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;

Note: The legacy and GPORCA query optimizers coexist in HAWQ. GPORCA is the default HAWQ optimizer. HAWQ uses GPORCA to generate an execution plan for a query when possible. The EXPLAIN output generated by GPORCA is different than the output generated by the legacy query optimizer.

When the EXPLAIN ANALYZE command uses GPORCA, the EXPLAIN plan shows only the number of partitions that are being eliminated. The scanned partitions are not shown. To show name of the scanned partitions in the segment logs set the server configuration parameter gp_log_dynamic_partition_pruning to on. This example SET command enables the parameter.

SET gp_log_dynamic_partition_pruning = on;

For information about GPORCA, see Querying Data.

Reading EXPLAIN Output

A query plan is a tree of nodes. Each node in the plan represents a single operation, such as a table scan, join, aggregation, or sort.

Read plans from the bottom to the top: each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations. If the query requires joins, aggregations, sorts, or other operations on the rows, there are additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually HAWQ motion nodes: redistribute, broadcast, or gather motions. These operations move rows between segment instances during query processing.

The output of EXPLAIN has one line for each node in the plan tree and shows the basic node type and the following execution cost estimates for that plan node:

cost —Measured in units of disk page fetches. 1.0 equals one sequential disk page read. The first estimate is the start-up cost of getting the first row and the second is the total cost of cost of getting all rows. The total cost assumes all rows will be retrieved, which is not always true; for example, if the query uses LIMIT, not all rows are retrieved.
rows —The total number of rows output by this plan node. This number is usually less than the number of rows processed or scanned by the plan node, reflecting the estimated selectivity of any WHERE clause conditions. Ideally, the estimate for the topmost node approximates the number of rows that the query actually returns.
width —The total bytes of all the rows that this plan node outputs.

Note the following:

The cost of a node includes the cost of its child nodes. The topmost plan node has the estimated total execution cost for the plan. This is the number the optimizer intends to minimize.
The cost reflects only the aspects of plan execution that the query optimizer takes into consideration. For example, the cost does not reflect time spent transmitting result rows to the client.

EXPLAIN Example

The following example describes how to read an EXPLAIN query plan for a query:

EXPLAIN SELECT * FROM names WHERE name = 'Joelle';

                                 QUERY PLAN
-----------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..1.01 rows=1 width=11)
   ->  Append-only Scan on names  (cost=0.00..1.01 rows=1 width=11)
         Filter: name::text = 'Joelle'::text
(3 rows)

Read the plan from the bottom to the top. To start, the query optimizer sequentially scans the names table. Notice the WHERE clause is applied as a filter condition. This means the scan operation checks the condition for each row it scans and outputs only the rows that satisfy the condition.

The results of the scan operation are passed to a gather motion operation. In HAWQ, a gather motion is when segments send rows to the master. In this example, we have two segment instances that send to one master instance. This operation is working on slice1 of the parallel query execution plan. A query plan is divided into slices so the segments can work on portions of the query plan in parallel.

The estimated startup cost for this plan is 00.00 (no cost) and a total cost of 1.01 disk page fetches. The optimizer estimates this query will return one row.

Reading EXPLAIN ANALYZE Output

EXPLAIN ANALYZE plans and runs the statement. The EXPLAIN ANALYZE plan shows the actual execution cost along with the optimizer’s estimates. This allows you to see if the optimizer’s estimates are close to reality. EXPLAIN ANALYZE also shows the following:

The total runtime (in milliseconds) in which the query executed.
The memory used by each slice of the query plan, as well as the memory reserved for the whole query statement.
Statistics for the query dispatcher, including the number of executors used for the current query (total number/number of executors cached by previous queries/number of executors newly connected), dispatcher time (total dispatch time/connection establish time/dispatch data to executor time); and some time(max/min/avg) details for dispatching data, consuming executor data, and freeing executor.
Statistics about data locality. See Data Locality Statistics for details about these statistics.
The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted.
The Max/Last statistics are for the segment that output the maximum number of rows and the segment with the longest <time> to end.
The segment id of the segment that produced the most rows for an operation.
For relevant operations, the amount of memory (work_mem) used by the operation. If the work_mem was insufficient to perform the operation in memory, the plan shows the amount of data spilled to disk for the lowest-performing segment. For example:
```
Work_mem used: 64K bytes avg, 64K bytes max (seg0).
Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen
workfile I/O affecting 2 workers.
```
Warning: The work_mem server configuration parameter is deprecated and will be removed in a future HAWQ release. Use resource queues to manage memory use. For more information on resource queues, see Configuring Resource Management and Working with Hierarchical Resource Queues.
The time (in milliseconds) in which the segment that produced the most rows retrieved the first row, and the time taken for that segment to retrieve all rows. The result may omit <time> to first row if it is the same as the <time> to end.

EXPLAIN ANALYZE Example

This example describes how to read an EXPLAIN ANALYZE query plan using the same query. The bold parts of the plan show actual timing and rows returned for each plan node, as well as memory and time statistics for the whole query.

EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';

                                 QUERY PLAN
------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..1.01 rows=1 width=7)
   Rows out:  Avg 1.0 rows x 1 workers at destination.  Max/Last(seg0:ip-10-0-1-16/seg0:ip-10-0-1-16) 1/1 rows with 8.713/8.713 ms to first row, 8.714/8.714 ms to end, start offset by 0.708/0.708 ms.
   ->  Append-only Scan on names  (cost=0.00..1.01 rows=1 width=7)
         Filter: name = 'Joelle'::text
         Rows out:  Avg 1.0 rows x 1 workers.  Max/Last(seg0:ip-10-0-1-16/seg0:ip-10-0-1-16) 1/1 rows with 7.053/7.053 ms to first row, 7.089/7.089 ms to end, start offset by 2.162/2.162 ms.
 Slice statistics:
   (slice0)    Executor memory: 159K bytes.
   (slice1)    Executor memory: 247K bytes (seg0:ip-10-0-1-16).
 Statement statistics:
   Memory used: 262144K bytes
 Dispatcher statistics:
   executors used(total/cached/new connection): (1/1/0); dispatcher time(total/connection/dispatch data): (0.217 ms/0.000 ms/0.037 ms).
   dispatch data time(max/min/avg): (0.037 ms/0.037 ms/0.037 ms); consume executor data time(max/min/avg): (0.015 ms/0.015 ms/0.015 ms); free executor time(max/min/avg): (0.000 ms/0.000 ms/0.000 ms).
 Data locality statistics:
   data locality ratio: 1.000; virtual segment number: 1; different host number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment size(avg/min/max): (48.000 B/48 B/48 B); segment size with penalty(avg/min/max): (48.000 B/48 B/48 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 9.343 ms; resource allocation: 0.638 ms; datalocality calculation: 0.144 ms.
 Total runtime: 19.690 ms
(16 rows)

Read the plan from the bottom to the top. The total elapsed time to run this query was 19.690 milliseconds.

The Append-only scan operation had only one segment (seg0) that returned rows, and it returned just 1 row. The Max/Last statistics are identical in this example because only one segment returned rows. It took 7.053 milliseconds to find the first row and 7.089 milliseconds to scan all rows. This result is close to the optimizer’s estimate: the query optimizer estimated it would return one row for this query. The gather motion (segments sending data to the master) received 1 row. The total elapsed time for this operation was 19.690 milliseconds.

Examining Query Plans to Solve Problems

If a query performs poorly, examine its query plan and ask the following questions:

Do operations in the plan take an exceptionally long time? Look for an operation that consumes the majority of query processing time. For example, if a scan on a hash table takes longer than expected, the data locality may be low; reloading the data can increase the data locality and speed up the query. Or, adjust enable_<operator> parameters to see if you can force the legacy query optimizer (planner) to choose a different plan by disabling a particular query plan operator for that query.
Are the optimizer’s estimates close to reality? Run EXPLAIN ANALYZE and see if the number of rows the optimizer estimates is close to the number of rows the query operation actually returns. If there is a large discrepancy, collect more statistics on the relevant columns.
Are selective predicates applied early in the plan? Apply the most selective filters early in the plan so fewer rows move up the plan tree. If the query plan does not correctly estimate query predicate selectivity, collect more statistics on the relevant columns. You can also try reordering the WHERE clause of your SQL statement.
Does the optimizer choose the best join order? When you have a query that joins multiple tables, make sure that the optimizer chooses the most selective join order. Joins that eliminate the largest number of rows should be done earlier in the plan so fewer rows move up the plan tree.

If the plan is not choosing the optimal join order, set join_collapse_limit=1 and use explicit JOIN syntax in your SQL statement to force the legacy query optimizer (planner) to the specified join order. You can also collect more statistics on the relevant join columns.
Does the optimizer selectively scan partitioned tables? If you use table partitioning, is the optimizer selectively scanning only the child tables required to satisfy the query predicates? Scans of the parent tables should return 0 rows since the parent tables do not contain any data. See Verifying Your Partition Strategy for an example of a query plan that shows a selective partition scan.
Does the optimizer choose hash aggregate and hash join operations where applicable? Hash operations are typically much faster than other types of joins or aggregations. Row comparison and sorting is done in memory rather than reading/writing from disk. To enable the query optimizer to choose hash operations, there must be sufficient memory available to hold the estimated number of rows. Try increasing work memory to improve performance for a query. If possible, run an EXPLAIN ANALYZE for the query to show which plan operations spilled to disk, how much work memory they used, and how much memory was required to avoid spilling to disk. For example:

Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2 workers.

The “bytes wanted” message from EXPLAIN ANALYZE is based on the amount of data written to work files and is not exact. The minimum work_mem needed can differ from the suggested value.

Generating EXPLAIN Plan from a PL/pgSQL Function

User-defined PL/pgSQL functions often include dynamically created queries. You may find it useful to generate the EXPLAIN plan for such queries for query performance optimization and tuning.

Perform the following steps to create and run a user-defined PL/pgSQL function. This function displays the EXPLAIN plan for a simple query on a test database.

Log in to the HAWQ master node as user gpadmin and set up the HAWQ environment:
```
$ ssh gpadmin@hawq_master
$ . /usr/local/hawq/greenplum_path.sh
```
Create a test database named testdb:
```
$ createdb testdb
```
Start the PostgreSQL interactive utility, connecting to testdb:
```
$ psql -d testdb
```
Create the table test_tbl with a single column named id of type integer:
```
testdb=# CREATE TABLE test_tbl (id int);
```
Add some data to the test_tbl table:
```
testdb=# INSERT INTO test_tbl SELECT generate_series(1,100);
```
This INSERT command adds 100 rows to test_tbl, incrementing the id for each row.

Create a PL/pgSQL function named explain_plan_func() by copying and pasting the following text at the psql prompt:

CREATE OR REPLACE FUNCTION explain_plan_func() RETURNS varchar as $$
declare

 a varchar;
 b varchar;

 begin
   a = '';
   for b in execute 'explain select count(*) from test_tbl group by id' loop
     a = a || E'\n' || b;
   end loop;
   return a;
 end;
$$
LANGUAGE plpgsql
VOLATILE;

Verify the explain_plan_func() user-defined function was created successfully:
```
testdb=# \df+
```
The \df+ command lists all user-defined functions.

Perform a query using the user-defined function you just created:

testdb=# SELECT explain_plan_func();

The EXPLAIN plan results for the query are displayed:

                                         explain_plan_func                               
---------------------------------------------------------------------------------------------------------                                                                                             
Gather Motion 1:1  (slice2; segments: 1)  (cost=0.00..431.04 rows=100 width=8)                          
->  Result  (cost=0.00..431.03 rows=100 width=8)                         
     ->  HashAggregate  (cost=0.00..431.03 rows=100 width=8)                
           Group By: id                                                 
           ->  Redistribute Motion 1:1  (slice1; segments: 1)  (cost=0.00..431.02 rows=100 width=12) 
                 Hash Key: id                                              
                 ->  Result  (cost=0.00..431.01 rows=100 width=12)      
                       ->  HashAggregate  (cost=0.00..431.01 rows=100 width=12)                      
                             Group By: id                               
                             ->  Table Scan on test_tbl  (cost=0.00..431.00 rows=100 width=4) 
Settings:  default_hash_table_bucket_number=6                              
Optimizer status: PQO version 1.627
(1 row)