COMMAND NAME: hawq register

Usage1: Register parquet files generated by other system into corrsponding table in HAWQ.
Usage2: Register parquet/ao table from yaml configuration file.

*****************************************************
SYNOPSIS
*****************************************************
Usage1: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f filepath] [-e eof] [-l log_directory] <tablename>
Usage2: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c config] [-F, --force] [-l log_directory] <tablename>

hawq register help
hawq register -?

hawq register --version

*****************************************************
DESCRIPTION
*****************************************************
Use Case1:
"hawq register" is an utility to register file(s) on HDFS into
the table in HAWQ. It moves file(s) in the path(if path
refers to a file) or files under the path(if path refers to a
directory) into the table directory corresponding to the table,
and then update the table meta data to include the files.

To use "hawq register", HAWQ must have been started.

Currently "hawq register" supports parquet tables only.
User have to make sure that the meta data of the parquet file(s)
and the table are consistent.
The table to be registered into should not be hash distributed, which
is created by using "distributed by" statement when creating that table.
The file(s) to be registered and the table in HAWQ must be in the
same HDFS cluster.

Use Case2:
Hawq register can register both AO and parquet format table, and the files to be registered are listed in the .yml configuration file.
This configuration file can be generated by hawq extract. Register through .yml configuration doesn’t require the table already exist,
since .yml file contains table schema already.
HAWQ register behaviors differently with different options: 
 * If the table does not exist, hawq register will create table and do register. 
 * If table already exist, hawq register will append the files to the existing table.
 * If --force option specified, hawq register will erase existing catalog 
   table pg_aoseg.pg_aoseg_$relid/pg_aoseg.pg_paqseg_$relid data for the table and 
   re-register according to .yml configuration file definition. Note. If there are
   files under table directory which are not specified in .yml configuration file, it will throw error out.
Note. Without --force specified, if some file specified in .yml configuration file lie under the table directory,
      hawq register will throw error out.
Note. With --force option specified, if there are files under table directory which are not specified in .yml configuration file,
      hawq register will throw error out.
Note. In usage2, if the table is hash distributed, hawq register just check the file number to be registered
      has to be multiple times of this table’s bucket number, and check whether the distribution key specified
      in .yml configuration file is same as that of table. It does not check whether files are actually distributed by the key.
Note. To register a hash distributed table through yaml file , please make sure the order of the files in yaml keeps the hash distribution.

To use "hawq register", HAWQ must have been started.
Currently "hawq register" supports both AO and Parquet formats in this case.
The partition table is not supported in this version, and we will support it soon.

*****************************************************
Arguments
*****************************************************
<tablename>

Name of the table to be registered into.

*****************************************************
OPTIONS
*****************************************************
-? (help)

Displays the online help.

--version

Displays the version of this utility.

-l log_directory

Specifies the name of the directory where hawq register log files will be stored

*****************************************************
CONNECTION OPTIONS
*****************************************************
-h hostname

  Specifies the host name of the machine on which the HAWQ master
  database server is running. If not specified, reads from the
  environment variable $PGHOST which defaults to localhost.

-p port

  Specifies the TCP port on which the HAWQ master database server
  is listening for connections. If not specified, reads from the
  environment variable $PGPORT which defaults to 5432.

-U username

  The database role name to connect as. If not specified, reads
  from the environment variable $PGUSER which defaults to the current
  system user name.

*****************************************************
EXAMPLE FOR USAGE1
*****************************************************
Run "hawq register" to register a parquet file in HDFS with path
'hdfs://localhost:8020/temp/hive.paq' generated by hive into table
'parquet_table' in HAWQ, which is in the database named 'postgres'.

Assume the location of the database is 'hdfs://localhost:8020/hawq_default',
tablespace id is '16385', database id is '16387', table filenode id is '77160',
last file under the filenode numbered '7'.

$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq parquet_table

This will move the file 'hdfs://localhost:8020/temp/hive.paq' into the corresponding
new place 'hdfs://localhost:8020/hawq_default/16385/16387/77160/8' in HDFS, then
update the meta data of the table 'parquet_table' in HAWQ which is in the
table 'pg_aoseg.pg_paqseg_77160'.

*****************************************************
EXAMPLE FOR USAGE2
*****************************************************
This example shows hawq register functionality of hawq register according to yml configuration file.
Usually the yml configuration file is generated by hawq extract.
This example shows the life cycle of hawq extract and hawq register.

Firstly, create a table and insert some data into it:
$ psql -c "create table paq1(a int, b varchar(10))with(appendonly=true, orientation=parquet);"
$ psql -c "insert into paq1 values(generate_series(1,1000), 'abcde');"

Secondly, extract the table metadata information out:
$ hawq extract -o paq1.yml paq1

Thirdly, register to new table paq2 identifying yml file:
$ hawq register --config paq1.yml paq2

Finally, select the new table to look at whether the content has already been registered.
$ select count(*) from paq2;

In the above example, the final result should be return 1000.

*****************************************************
DATA TYPES
*****************************************************
The data types used in HAWQ and parquet format are not the same, so there is a
mapping between them, concluded as follow:

Data types in HAWQ              Data types in parquet
bool                            boolean
int2                            int32
int4                            int32
date                            int32
int8                            int64
time                            int64
timestamptz                     int64
timestamp                       int64
money                           int64
float4                          float
float8                          double
bit                             byte_array
varbit                          byte_array
byte                            byte_array
numeric                         byte_array
name                            byte_array
char                            byte_array
bpchar                          byte_array
varchar                         byte_array
text                            byte_array
xml                             byte_array
timetz                          byte_array
interval                        byte_array
macaddr                         byte_array
inet                            byte_array
cidr                            byte_array
