Back to Apache HAWQ Page
Need Help?
Doc Index
Apache HAWQ (incubating)
System Requirements
HAWQ System Overview
What is HAWQ?
HAWQ Architecture
Table Distribution and Storage
Elastic Query Execution Runtime
Resource Management
HDFS Catalog Cache
Management Tools
High Availability, Redundancy and Fault Tolerance
Getting Started with HAWQ Tutorial
Lesson 1 - Runtime Environment
Lesson 2 - Cluster Administration
Lesson 3 - Database Administration
Lesson 4 - Sample Data Set and HAWQ Schemas
Lesson 5 - HAWQ Tables
Lesson 6 - HAWQ Extension Framework (PXF)
Running a HAWQ Cluster
Overview
Introducing the HAWQ Operating Environment
Managing HAWQ Using Ambari
Using the Ambari REST API
Starting and Stopping HAWQ
Expanding a Cluster
Removing a Node
Backing Up and Restoring HAWQ
High Availability in HAWQ
Master Mirroring
HAWQ Filespaces and High Availability Enabled HDFS
Understanding the Fault Tolerance Service
Recommended Monitoring and Maintenance Tasks
Routine System Maintenance Tasks
Monitoring a HAWQ System
HAWQ Administrative Log Files
Managing Resources
How HAWQ Manages Resources
Best Practices for Configuring Resource Management
Configuring Resource Management
Integrating YARN with HAWQ
Working with Hierarchical Resource Queues
Analyzing Resource Manager Status
Managing Client Access
Configuring Client Authentication
Using LDAP Authentication with TLS/SSL
Using Kerberos Authentication
Disabling Kerberos Security
Overview of HAWQ Authorization
Using HAWQ Native Authorization
Using Ranger for Authorization
Overview of Ranger Policy Management
Configuring HAWQ to use Ranger Policy Management
Creating HAWQ Authorization Policies in Ranger
HAWQ Resources and Permissions
SQL Command Permissions Summary
Using MADLib with Ranger Authorization
Auditing Authorization Events
Establishing a Database Session
Supported Client Applications
HAWQ Client Applications
Connecting with psql
HAWQ Database Drivers and APIs
Troubleshooting Connection Problems
Defining Database Objects
Overview
Creating and Managing Databases
Creating and Managing Tablespaces
Creating and Managing Schemas
Creating and Managing Tables
Identifying HAWQ Table HDFS Files
Choosing the Table Storage Model
Partitioning Large Tables
Creating and Managing Views
Using Procedural Languages
Using Languages in HAWQ
Using HAWQ Built-In Languages
Using PL/Java
Using PL/pgSQL
Using PL/Python
Using PL/R
Managing Data with HAWQ
Basic Data Operations
About Database Statistics
Concurrency Control
Working with Transactions
Loading and Unloading Data
Working with File-Based External Tables
Accessing File-Based External Tables
gpfdist Protocol
gpfdists Protocol
Handling Errors in External Table Data
Using the HAWQ File Server (gpfdist)
About gpfdist Setup and Performance
Controlling Segment Parallelism
Installing gpfdist
Starting and Stopping gpfdist
Troubleshooting gpfdist
Creating and Using Web External Tables
Command-based Web External Tables
URL-based Web External Tables
Loading Data Using an External Table
Loading and Writing Non-HDFS Custom Data
Using a Custom Format
Importing and Exporting Fixed Width Data
Examples - Read Fixed-Width Data
Creating External Tables - Examples
Handling Load Errors
Define an External Table with Single Row Error Isolation
Capture Row Formatting Errors and Declare a Reject Limit
Identifying Invalid CSV Files in Error Table Data
Moving Data between Tables
Registering Files into HAWQ Internal Tables
Loading Data with hawq load
Loading Data with COPY
Running COPY in Single Row Error Isolation Mode
Optimizing Data Load and Query Performance
Unloading Data from HAWQ
Defining a File-Based Writable External Table
Example - HAWQ file server (gpfdist)
Defining a Command-Based Writable External Web Table
Disabling EXECUTE for Web or Writable External Tables
Unloading Data Using a Writable External Table
Unloading Data Using COPY
Transforming XML Data
Determine the Transformation Schema
Write a Transform
Write the gpfdist Configuration
Load the Data
Transfer and Store the Data
Transforming with GPLOAD
Transforming with INSERT INTO SELECT FROM
Configuration File Format
XML Transformation Examples
Command-based Web External Tables
Example using IRS MeF XML Files (In demo Directory)
Example using WITSML™ Files (In demo Directory)
Formatting Data Files
Formatting Rows
Formatting Columns
Representing NULL Values
Escaping
Escaping in Text Formatted Files
Escaping in CSV Formatted Files
Character Encoding
HAWQ InputFormat for MapReduce
Using PXF with Unmanaged Data
Installing PXF Plugins
Configuring PXF
Accessing HDFS File Data
Accessing Hive Data
Accessing HBase Data
Accessing JSON Data
Writing Data to HDFS
Using Profiles to Read and Write Data
PXF External Tables and API
Troubleshooting PXF
Querying Data
About HAWQ Query Processing
About GPORCA
Overview of GPORCA
GPORCA Features and Enhancements
Enabling GPORCA
Considerations when Using GPORCA
Determining The Query Optimizer In Use
Changed Behavior with GPORCA
GPORCA Limitations
Defining Queries
Using Functions and Operators
Query Performance
Query Profiling
Best Practices
Configuring HAWQ
Operating HAWQ
Securing HAWQ
Managing Resources
Managing Data
Querying Data
Troubleshooting
Query Performance Issues
Rejection of Query Resource Requests
Queries Cancelled Due to High VMEM Usage
Segments Do Not Appear in gp_segment_configuration
Handling Segment Resource Fragmentation
HAWQ Reference
SQL Commands
ABORT
ALTER AGGREGATE
ALTER CONVERSION
ALTER DATABASE
ALTER FUNCTION
ALTER OPERATOR
ALTER OPERATOR CLASS
ALTER RESOURCE QUEUE
ALTER ROLE
ALTER SEQUENCE
ALTER TABLE
ALTER TABLESPACE
ALTER TYPE
ALTER USER
ANALYZE
BEGIN
CHECKPOINT
CLOSE
COMMIT
COPY
CREATE AGGREGATE
CREATE CAST
CREATE CONVERSION
CREATE DATABASE
CREATE EXTERNAL TABLE
CREATE FUNCTION
CREATE GROUP
CREATE LANGUAGE
CREATE OPERATOR
CREATE OPERATOR CLASS
CREATE RESOURCE QUEUE
CREATE ROLE
CREATE SCHEMA
CREATE SEQUENCE
CREATE TABLE
CREATE TABLE AS
CREATE TABLESPACE
CREATE TYPE
CREATE USER
CREATE VIEW
DEALLOCATE
DECLARE
DROP AGGREGATE
DROP CAST
DROP CONVERSION
DROP DATABASE
DROP EXTERNAL TABLE
DROP FILESPACE
DROP FUNCTION
DROP GROUP
DROP LANGUAGE
DROP OPERATOR
DROP OPERATOR CLASS
DROP OWNED
DROP RESOURCE QUEUE
DROP ROLE
DROP SCHEMA
DROP SEQUENCE
DROP TABLE
DROP TABLESPACE
DROP TYPE
DROP USER
DROP VIEW
END
EXECUTE
EXPLAIN
FETCH
GRANT
INSERT
PREPARE
REASSIGN OWNED
RELEASE SAVEPOINT
RESET
REVOKE
ROLLBACK
ROLLBACK TO SAVEPOINT
SAVEPOINT
SELECT
SELECT INTO
SET
SET ROLE
SET SESSION AUTHORIZATION
SHOW
TRUNCATE
VACUUM
Server Configuration Parameter Reference
About Server Configuration Parameters
Configuration Parameter Categories
Append-Only Table Parameters
Client Connection Default Parameters
Connection and Authentication Parameters
Database and Tablespace/Filespace Parameters
Error Reporting and Logging Parameters
External Table Parameters
GPORCA Parameters
HAWQ Array Configuration Parameters
HAWQ Extension Framework (PXF) Parameters
HAWQ PL/Java Extension Parameters
HAWQ Resource Management Parameters
Lock Management Parameters
Past PostgreSQL Version Compatibility Parameters
Query Tuning Parameters
Ranger Configuration Parameters
Statistics Collection Parameters
System Resource Consumption Parameters
Configuration Parameters
add_missing_from
application_name
array_nulls
authentication_timeout
backslash_quote
block_size
bonjour_name
check_function_bodies
client_encoding
client_min_messages
cpu_index_tuple_cost
cpu_operator_cost
cpu_tuple_cost
cursor_tuple_fraction
custom_variable_classes
DateStyle
db_user_namespace
deadlock_timeout
debug_assertions
debug_pretty_print
debug_print_parse
debug_print_plan
debug_print_prelim_plan
debug_print_rewritten
debug_print_slice_table
default_hash_table_bucket_number
default_statistics_target
default_tablespace
default_transaction_isolation
default_transaction_read_only
dynamic_library_path
effective_cache_size
enable_bitmapscan
enable_groupagg
enable_hashagg
enable_hashjoin
enable_indexscan
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan
escape_string_warning
explain_pretty_print
extra_float_digits
from_collapse_limit
gp_adjust_selectivity_for_outerjoins
gp_analyze_relative_error
gp_autostats_mode
gp_autostats_on_change_threshhold
gp_backup_directIO
gp_backup_directIO_read_chunk_mb
gp_cached_segworkers_threshold
gp_command_count
gp_connections_per_thread
gp_debug_linger
gp_dynamic_partition_pruning
gp_enable_agg_distinct
gp_enable_agg_distinct_pruning
gp_enable_direct_dispatch
gp_enable_fallback_plan
gp_enable_fast_sri
gp_enable_groupext_distinct_gather
gp_enable_groupext_distinct_pruning
gp_enable_multiphase_agg
gp_enable_predicate_propagation
gp_enable_preunique
gp_enable_sequential_window_plans
gp_enable_sort_distinct
gp_enable_sort_limit
gp_external_enable_exec
gp_external_grant_privileges
gp_external_max_segs
gp_filerep_tcp_keepalives_count
gp_filerep_tcp_keepalives_idle
gp_filerep_tcp_keepalives_interval
gp_hashjoin_tuples_per_bucket
gp_idf_deduplicate
gp_interconnect_cache_future_packets
gp_interconnect_default_rtt
gp_interconnect_fc_method
gp_interconnect_hash_multiplier
gp_interconnect_min_retries_before_timeout
gp_interconnect_min_rto
gp_interconnect_queue_depth
gp_interconnect_setup_timeout
gp_interconnect_snd_queue_depth
gp_interconnect_timer_checking_period
gp_interconnect_timer_period
gp_interconnect_type
gp_log_format
gp_max_csv_line_length
gp_max_databases
gp_max_filespaces
gp_max_packet_size
gp_max_plan_size
gp_max_tablespaces
gp_motion_cost_per_row
gp_reject_percent_threshold
gp_reraise_signal
gp_role
gp_safefswritesize
gp_segment_connect_timeout
gp_segments_for_planner
gp_session_id
gp_set_proc_affinity
gp_set_read_only
gp_statistics_pullup_from_child_partition
gp_statistics_use_fkeys
gp_vmem_idle_resource_timeout
gp_vmem_protect_segworker_cache_limit
gp_workfile_checksumming
gp_workfile_compress_algorithm
gp_workfile_limit_files_per_query
gp_workfile_limit_per_query
gp_workfile_limit_per_segment
hawq_acl_type
hawq_dfs_url
hawq_global_rm_type
hawq_master_address_host
hawq_master_address_port
hawq_master_directory
hawq_master_temp_directory
hawq_re_memory_overcommit_max
hawq_rm_cluster_report_period
hawq_rm_force_alterqueue_cancel_queued_request
hawq_rm_master_port
hawq_rm_memory_limit_perseg
hawq_rm_min_resource_perseg
hawq_rm_nresqueue_limit
hawq_rm_nslice_perseg_limit
hawq_rm_nvcore_limit_perseg
hawq_rm_nvseg_perquery_limit
hawq_rm_nvseg_perquery_perseg_limit
hawq_rm_nvseg_variance_amon_seg_limit
hawq_rm_rejectrequest_nseg_limit
hawq_rm_resource_idle_timeout
hawq_rm_return_percent_on_overcommit
hawq_rm_segment_heartbeat_interval
hawq_rm_segment_port
hawq_rm_stmt_nvseg
hawq_rm_stmt_vseg_memory
hawq_rm_tolerate_nseg_limit
hawq_rm_yarn_address
hawq_rm_yarn_app_name
hawq_rm_yarn_queue_name
hawq_rm_yarn_scheduler_address
hawq_rps_address_port
hawq_segment_address_port
hawq_segment_directory
hawq_segment_temp_directory
integer_datetimes
IntervalStyle
join_collapse_limit
krb_caseins_users
krb_server_keyfile
krb_srvname
lc_collate
lc_ctype
lc_messages
lc_monetary
lc_numeric
lc_time
listen_addresses
local_preload_libraries
log_autostats
log_connections
log_disconnections
log_dispatch_stats
log_duration
log_error_verbosity
log_executor_stats
log_hostname
log_min_duration_statement
log_min_error_statement
log_min_messages
log_parser_stats
log_planner_stats
log_rotation_age
log_rotation_size
log_statement
log_statement_stats
log_timezone
log_truncate_on_rotation
max_appendonly_tables
max_connections
max_files_per_process
max_fsm_pages
max_fsm_relations
max_function_args
max_identifier_length
max_index_keys
max_locks_per_transaction
max_prepared_transactions
max_stack_depth
optimizer
optimizer_analyze_root_partition
optimizer_minidump
optimizer_parts_to_force_sort_on_insert
optimizer_prefer_scalar_dqa_multistage_agg
password_encryption
pgstat_track_activity_query_size
pljava_classpath
pljava_statement_cache_size
pljava_release_lingering_savepoints
pljava_vmoptions
port
pxf_enable_filter_pushdown
pxf_enable_stat_collection
pxf_remote_service_login
pxf_remote_service_secret
pxf_service_address
pxf_service_port
pxf_stat_max_fragments
random_page_cost
regex_flavor
runaway_detector_activation_percent
search_path
seg_max_connections
seq_page_cost
server_encoding
server_version
server_version_num
shared_buffers
shared_preload_libraries
ssl
ssl_ciphers
standard_conforming_strings
statement_timeout
superuser_reserved_connections
tcp_keepalives_count
tcp_keepalives_idle
tcp_keepalives_interval
temp_buffers
TimeZone
timezone_abbreviations
track_activities
track_counts
transaction_isolation
transaction_read_only
transform_null_equals
unix_socket_directory
unix_socket_group
unix_socket_permissions
update_process_title
vacuum_cost_delay
vacuum_cost_limit
vacuum_cost_page_dirty
vacuum_cost_page_miss
vacuum_freeze_min_age
xid_stop_limit
Sample hawq-site.xml Configuration File
HDFS Configuration Reference
Environment Variables
Character Set Support Reference
Data Types
System Catalog Reference
System Tables
System Views
System Catalogs Definitions
gp_configuration_history
gp_distribution_policy
gp_global_sequence
gp_master_mirroring
gp_persistent_database_node
gp_persistent_filespace_node
gp_persistent_relation_node
gp_persistent_relfile_node
gp_persistent_tablespace_node
gp_relfile_node
gp_segment_configuration
gp_version_at_initdb
pg_aggregate
pg_am
pg_amop
pg_amproc
pg_appendonly
pg_attrdef
pg_attribute
pg_attribute_encoding
pg_auth_members
pg_authid
pg_cast
pg_class
pg_compression
pg_constraint
pg_conversion
pg_database
pg_depend
pg_description
pg_exttable
pg_filespace
pg_filespace_entry
pg_index
pg_inherits
pg_language
pg_largeobject
pg_listener
pg_locks
pg_namespace
pg_opclass
pg_operator
pg_partition
pg_partition_columns
pg_partition_encoding
pg_partition_rule
pg_partition_templates
pg_partitions
pg_pltemplate
pg_proc
pg_resqueue
pg_resqueue_status
pg_rewrite
pg_roles
pg_shdepend
pg_shdescription
pg_stat_activity
pg_stat_last_operation
pg_stat_last_shoperation
pg_stat_operations
pg_stat_partition_operations
pg_statistic
pg_stats
pg_tablespace
pg_trigger
pg_type
pg_type_encoding
pg_window
The hawq_toolkit Administrative Schema
Checking for Tables that Need Routine Maintenance
Viewing HAWQ Server Log Files
Checking Database Object Sizes and Disk Space
HAWQ Management Tools Reference
analyzedb
createdb
createuser
dropdb
dropuser
gpfdist
gplogfilter
hawq activate
hawq check
hawq checkperf
hawq config
hawq extract
hawq filespace
hawq init
hawq load
hawq register
hawq restart
hawq scp
hawq ssh
hawq ssh-exkeys
hawq start
hawq state
hawq stop
pg_dump
pg_dumpall
pg_restore
psql
vacuumdb