analyzedb
A utility that performs ANALYZE
operations on tables incrementally and concurrently.
Synopsis
analyzedb -d <dbname> -s <schema>
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> -t <schema>.<table>
[ -i col1[, col2, ...] | -x col1[, col2, ...] ]
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> -f <config-file> | --file <config-file>
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> --clean_last | --clean_all
analyzedb --version
analyzedb -? | -h | --help
Description
The analyzedb
utility updates statistics on table data for the specified tables in a HAWQ database incrementally and concurrently.
While performing ANALYZE
operations, analyzedb
creates a snapshot of the table metadata and stores it on disk on the master host. An ANALYZE
operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, analyzedb
automatically skips the table or partition because it already contains up-to-date statistics.
For a partitioned table analyzedb
analyzes only those partitions that have no statistics, or that have stale statistics. analyzedb
also refreshes the statistics on the root partition.
By default, analyzedb
creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, analyzedb
issues an ANALYZE
command to the database and specifies different table names. The -p
option controls the maximum number of concurrent sessions.
Notes
The utility determines if a table has been modified by comparing catalog metadata of tables with the snapshot of metadata taken during a previous analyzedb
operation. The snapshots of table metadata are stored as state files in the directory db_analyze
in the HAWQ master data directory. You can specify the --clean_last
or --clean_all
option to remove state files generated by analyzedb
.
If you do not specify a table, set of tables, or schema, the analyzedb
utility collects the statistics as needed on all system catalog tables and user-defined tables in the database.
External tables are not affected by analyzedb
.
Table names that contain spaces are not supported.
Arguments
PGDATABASE
. If PGDATABASE
is not set, the user name specified for the connection is used.Only one of the options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
-f
option to specify multiple tables in a file or the -s
option to specify all the tables in a schema.
Only one of these options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using the -i
or -x
. No other options are allowed in the file. Other options such as --full
must be specified on the command line.
Only one of the options can be used to specify the files to be analyzed: -f
or --file
, -t
, or -s
.
When performing ANALYZE
operations on multiple tables, analyzedb
creates concurrent sessions to analyze tables in parallel. The -p
option controls the maximum number of concurrent sessions.
In the following example, the first line performs an ANALYZE
operation on the table public.nation
, the second line performs an ANALYZE
operation only on the columns l_shipdate
and l_receiptdate
in the table public.lineitem
.
public.nation
public.lineitem -i l_shipdate, l_receiptdate
Options
-t
option. For the table specified with the -t
option, exclude statistics collection for the specified columns. Statistics are collected only on the columns that are not listed.
Only -i
, or -x
can be specified. Both options cannot be specified.
-t
option. For the table specified with the -t
option, collect statistics only for the specified columns.
Only -i
, or -x
can be specified. Both options cannot be specified.
ANALYZE
operation on all the specified tables. The operation is performed even if the statistics are up to date.ANALYZE
operations are not performed.analyzedb
operation. All other options except -d
are ignored.analyzedb
. All other options except-d
are ignored.Examples
An example that collects statistics only on a set of table columns. In the database mytest
, collect statistics on the columns shipdate
and receiptdate
in the table public.orders
:
$ analyzedb -d mytest -t public.orders -i shipdate, receiptdate
An example that collects statistics on a table and exclude a set of columns. In the database mytest
, collect statistics on the table public.foo
, and do not collect statistics on the columns bar
and test2
.
$ analyzedb -d mytest -t public.foo -x bar, test2
An example that specifies a file that contains a list of tables. This command collect statistics on the tables listed in the file analyze-tables
in the database named mytest
.
$ analyzedb -d mytest -f analyze-tables
If you do not specify a table, set of tables, or schema, the analyzedb
utility collects the statistics as needed on all catalog tables and user-defined tables in the specified database. This command refreshes table statistics on the system catalog tables and user-defined tables in the database mytest
.
$ analyzedb -d mytest