Table Storage Model and Distribution Policy

HAWQ supports several storage models and a mix of storage models. When you create a table, you choose how to store its data. This topic explains the options for table storage and how to choose the best storage model for your workload.

Note: To simplify the creation of database tables, you can specify the default values for some table storage options with the HAWQ server configuration parameter gp_default_storage_options.

Row-Oriented Storage

HAWQ provides storage orientation models of either row-oriented or Parquet tables. Evaluate performance using your own data and query workloads to determine the best alternatives.

  • Row-oriented storage: good for OLTP types of workloads with many iterative transactions and many columns of a single row needed all at once, so retrieving is efficient.

    Note: Column-oriented storage is no longer available. Parquet storage should be used, instead.

Row-oriented storage provides the best options for the following situations:

  • Frequent INSERTs. Where rows are frequently inserted into the table
  • Number of columns requested in queries. Where you typically request all or the majority of columns in the SELECT list or WHERE clause of your queries, choose a row-oriented model.
  • Number of columns in the table. Row-oriented storage is most efficient when many columns are required at the same time, or when the row-size of a table is relatively small.

Altering a Table

The ALTER TABLEcommand changes the definition of a table. Use ALTER TABLE to change table attributes such as column definitions, distribution policy, storage model, and partition structure (see also Maintaining Partitioned Tables). For example, to add a not-null constraint to a table column:

=> ALTER TABLE address ALTER COLUMN street SET NOT NULL;

Altering Table Distribution

ALTER TABLE provides options to change a table’s distribution policy . When the table distribution options change, the table data is redistributed on disk, which can be resource intensive. You can also redistribute table data using the existing distribution policy.

Changing the Distribution Policy

For partitioned tables, changes to the distribution policy apply recursively to the child partitions. This operation preserves the ownership and all other attributes of the table. For example, the following command redistributes the table sales across all segments using the customer_id column as the distribution key:

ALTER TABLE sales SET DISTRIBUTED BY (customer_id);

When you change the hash distribution of a table, table data is automatically redistributed. Changing the distribution policy to a random distribution does not cause the data to be redistributed. For example:

ALTER TABLE sales SET DISTRIBUTED RANDOMLY;

Redistributing Table Data

To redistribute table data for tables with a random distribution policy (or when the hash distribution policy has not changed) use REORGANIZE=TRUE. Reorganizing data may be necessary to correct a data skew problem, or when segment resources are added to the system. For example, the following command redistributes table data across all segments using the current distribution policy, including random distribution.

ALTER TABLE sales SET WITH (REORGANIZE=TRUE);

Dropping a Table

TheDROP TABLEcommand removes tables from the database. For example:

DROP TABLE mytable;

DROP TABLE always removes any indexes, rules, triggers, and constraints that exist for the target table. Specify CASCADEto drop a table that is referenced by a view. CASCADE removes dependent views.

To empty a table of rows without removing the table definition, use TRUNCATE. For example:

TRUNCATE mytable;