To create a partitioned external table, the database must be at 12.2.0 compatibility level or higher.

分区的外部表和存储在数据库中的分区表有点像,但不同的是外部表分区可以存储在文件系统上,如  Apache Hive storage ,或 Hadoop Distributed File System (HDFS) 所有外部表上的限制在分区的外部表上都有。 分区表上的所有限制,在分区的外部表上也都有。 Oracle 数据库不能保证分区的外部文件包含满足分区定义的数据。



For large amounts of data, partitioning for external tables provides fast query performance and enhanced data maintenance.

  • About Partitioning External Tables
    Partitioning data in external tables is similar to partitioning tables stored in the database, but there are some differences. The files for the partitioned external table can be stored on a file system, in Apache Hive storage, or in a Hadoop Distributed File System (HDFS).

  • Restrictions for Partitioned External Tables
    Some restrictions apply to partitioned external tables.

  • Creating a Partitioned External Table
    You create a non-composite partitioned external table by issuing a  CREATE TABLE  statement with the  ORGANIZATION EXTERNAL  clause and the  PARTITION BY  clause. To create a composite partitioned external table, the  SUBPARTITION BY  clause must also be included.

  • Altering a Partitioned External Table
    You can use the  ALTER TABLE  statement to modify table-level external parameters, but not the partition-level and subpartition-level parameters, of a partitioned external table.

20.14.6.1  About Partitioning External Tables

Partitioning data in external tables is similar to partitioning tables stored in the database, but there are some differences. The files for the partitioned external table can be stored on a file system, in Apache Hive storage, or in a Hadoop Distributed File System (HDFS).

Before attempting to partition external tables, you should understand the concepts related to partitioning in   Oracle Database VLDB and Partitioning Guide .

The main reason to partition external tables is to take advantage of the same performance improvements provided by partitioning tables stored in the database. Specifically, partition pruning and partition-wise joins can improve query performance. Partition pruning means that queries can focus on a subset of the data in an external table instead of all of the data because the query can apply to only one partition. Partition-wise joins can be applied when two tables are being joined and both tables are partitioned on the join key, or when a reference partitioned table is joined with its parent table. Partition-wise joins break a large join into smaller joins that occur between each of the partitions, completing the overall join in less time.

Most of the partitioning strategies that are supported for tables in the database are supported for external tables. External tables can be partitioned by range or list, and composite partitioning is supported. However, hash partitioning is not supported for external tables.

For a partitioned table that is stored in the database, storage for each partition is specified with a tablespace. For a partitioned external table, storage for each partition is specified by indicating the directory and files for each partition.

Clauses for Creating Partitioned External Tables

The clauses for creating a non-partitioned external table are the following:

  • TYPE  - Specifies the access driver for the type of external table ( ORACLE_LOADER ORACLE_DATAPUMP ORACLE_HIVE , and  ORACLE_HDFS ).

  • DEFAULT DIRECTORY  - Specifies with a directory object the default directory to use for all input and output files that do not explicitly name a directory object. 

  • ACCESS PARAMETERS  - Describe the external data source.

  • LOCATION  - Specifies the files for the external table.

  • REJECT LIMIT  - Specifies the number of errors that can occur during a query of the external data.

When you create a partitioned external table, you must include a  PARTITION  clause that defines each partition. The following table describes the clauses allowed at each level during external table creation.

Table 20-6 External Table Clauses and Partitioning

ClauseTable LevelPartition LevelSubpartition Level

TYPE

Allowed

Not Allowed

Not Allowed

DEFAULT DIRECTORY

Allowed

Allowed

Allowed

ACCESS PARAMETERS

Allowed

Not Allowed

Not Allowed

LOCATION

Not allowed

Allowed

Allowed

REJECT LIMIT

Allowed

Not allowed

Not allowed

For a non-composite partitioned table, files for a partition must be specified in the  LOCATION  clause for the partition. For a composite partitioned table, files for a subpartition must be specified in the  LOCATION  clause for the subpartition. When a partition has subpartitions, the  LOCATION  clause can be specified for subpartitions but not for the partition. If the  LOCATION  clause is omitted for a partition or subpartition, then an empty partition or subpartition is created.

In the  LOCATION  clause, the files are named in the form  directory:file , and one clause can specify multiple files. The  directory  portion is optional. The following rules apply for the directory used by a partition or subpartition:

  • When a directory is specified in the  LOCATION  clause for a partition or subpartition, then it applies to that location only.

  • In the  LOCATION  clause for a specific partition, for each file that does not have a directory specification, use the directory specified in the  DEFAULT DIRECTORY  clause for the partition or table level, in order.

    For example, when the  ORGANIZATION EXTERNAL  clause of a  CREATE TABLE  statement includes a  DEFAULT DIRECTORY  clause, and a  PARTITION  clause in the statement does not specify a directory for a file in its  LOCATION  clause, the file uses the directory specified in the  DEFAULT DIRECTORY  clause for the table.

  • In the  LOCATION  clause for a specific subpartition, for each file that does not have a directory specification, use the directory specified in the  DEFAULT DIRECTORY  clause for the subpartition, partition, or table level, in order.

    For example, when a  PARTITION  clause includes a  DEFAULT DIRECTORY  clause, and a  SUBPARITION  clause in the partition does not specify a directory for a file in its  LOCATION  clause, the file uses the directory specified in the  DEFAULT DIRECTORY  clause for the partition.

  • The default directory for a partition or subpartition cannot be specified in a  LOCATION  clause. It can only be specified in a  DEFAULT DIRECTORY  clause.

See Also:

Example 20-23  illustrates the directory rules

Using the ORACLE_HIVE Access Driver

Apache Hive has its own partitioning. To create partitioned external tables, use the  CREATE_EXTDDL_FOR_HIVE  procedure in the  DBMS_HADOOP  package. This procedure generates data definition language (DDL) statements that you can use to create a partitioned external table that corresponds with the partitioning in the Apache Hive storage.

The  DBMS_HADOOP  package also includes the  SYNC_PARTITIONS_FOR_HIVE  procedure. This procedure automatically synchronizes the partitioning of the partitioned external table in the Apache Hive storage with the partitioning metadata of the same table stored in the Oracle Database.

Related Topics

20.14.6.2  Restrictions for Partitioned External Tables

Some restrictions apply to partitioned external tables.

The following are restrictions for partitioned external tables:

  • All restrictions that apply to non-partitioned external tables also apply to partitioned external tables.

  • Partitioning restrictions that apply to tables stored in the database also apply to partitioned external tables, such as the maximum number of partitions.

  • Oracle Database cannot guarantee that the external files for partitions contain data that satisfies partitioning definitions.

  • Only the  DEFAULT DIRECTORY  and  LOCATION  clauses can be specified in a  PARTITION  or  SUBPARTITION  clause.

  • When altering a partitioned external table with the  ALTER TABLE  statement, the following clauses are not supported:  MODIFY PARTITION EXCHANGE PARTITION MOVE PARTITION MERGE PARTITIONS SPLIT PARTITION COALESCE PARTITION , and  TRUNCATE PARTITION .

  • Reference partitioning, automatic list partitioning, and interval partitioning are not supported.

  • Subpartition templates are not supported.

  • The  ORACLE_DATAPUMP  access driver cannot populate external files for partitions using a  CREATE TABLE AS SELECT statement.

  • Incremental statistics are not gathered for partitioned external tables.

  • In addition to restrictions on partitioning methods that can be used for the other drivers, range and composite partitioning are not supported for the  ORACLE_HIVE  access driver.

  • SELECT  statement with the  EXTERNAL MODIFY  clause cannot override partition-level or subpartition-level clauses. Only external clauses supported at the table level can be overridden with the  EXTERNAL MODIFY  clause. Because the  LOCATION clause is not allowed at the table level for a partitioned external table, it cannot be overridden with the  EXTERNAL MODIFY clause.

See Also:

20.14.6.3  Creating a Partitioned External Table

You create a non-composite partitioned external table by issuing a  CREATE TABLE  statement with the  ORGANIZATION EXTERNAL  clause and the  PARTITION BY  clause. To create a composite partitioned external table, the  SUBPARTITION BY clause must also be included.

The  PARTITION BY  clause and the  SUBPARTITION BY  clause specify the locations of the external files for each partition and subpartition.

To create a partitioned external table, the database must be at 12.2.0 compatibility level or higher.

  1. Connect to the database as a user with the privileges required to create the external table.

    See   Oracle Database SQL Language Reference  for information about the required privileges.

  2. Issue a  CREATE TABLE  statement with the  ORGANIZATION EXTERNAL  clause and the  PARTITION BY  clause. For a composite partitioned table, include the  SUBPARTITION BY  clause also.

Example 20-22 Creating a Partitioned External Table with Access Parameters Common to All Partitions

This example creates an external table named  orders_external_range  that is partitioned by the date data in the  order_date  column. The  ACCESS PARAMETERS  clause is specified at the table level for the  ORACLE_LOADER  access driver. The  data_dir1  directory object is the default directory object used for the partitions  month1 month2 , and  month3 . The  pmax partition specifies the  data_dir2  directory object in the  DEFAULT DIRECTORY  clause, so the  data_dir2  directory object is used for the  pmax  partition.

-- Set up directories and grant access to oe
CREATE OR REPLACE DIRECTORY data_dir1
    AS '/flatfiles/data1'; 
CREATE OR REPLACE DIRECTORY data_dir2 
    AS '/flatfiles/data2'; 
CREATE OR REPLACE DIRECTORY bad_dir 
    AS '/flatfiles/bad'; 
CREATE OR REPLACE DIRECTORY log_dir 
    AS '/flatfiles/log'; 
GRANT READ ON DIRECTORY data_dir1 TO oe; 
GRANT READ ON DIRECTORY data_dir2 TO oe; 
GRANT WRITE ON DIRECTORY bad_dir TO oe; 
GRANT WRITE ON DIRECTORY log_dir TO oe; 
-- oe connects. Provide the user password (oe) when prompted.
CONNECT oe
-- create the partitioned external table
CREATE TABLE orders_external_range(
    order_id           NUMBER(12),
    order_date         DATE NOT NULL,
    customer_id        NUMBER(6) NOT NULL,
    order_status       NUMBER(2),
    order_total        NUMBER(8,2),
    sales_rep_id       NUMBER(6))
ORGANIZATION EXTERNAL(
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY data_dir1
   ACCESS PARAMETERS(
      RECORDS DELIMITED BY NEWLINE
      BADFILE bad_dir: 'sh%a_%p.bad'
      LOGFILE log_dir: 'sh%a_%p.log'
      FIELDS TERMINATED BY '|'
      MISSING FIELD VALUES ARE NULL))
PARALLEL
REJECT LIMIT UNLIMITED
PARTITION BY RANGE (order_date)
  (PARTITION month1 VALUES LESS THAN (TO_DATE('31-12-2014', 'DD-MM-YYYY'))
        LOCATION ('sales_1.csv'),
    PARTITION month2 VALUES LESS THAN (TO_DATE('31-01-2015', 'DD-MM-YYYY'))
        LOCATION ('sales_2.csv'),
    PARTITION month3 VALUES LESS THAN (TO_DATE('28-02-2015', 'DD-MM-YYYY'))
        LOCATION ('sales_3.csv'),
    PARTITION pmax VALUES LESS THAN (MAXVALUE)
        DEFAULT DIRECTORY data_dir2 LOCATION('sales_4.csv'));

 

In the previous example, the default directory  data_dir2  is specified for the  pmax  partition. You can also specify the directory for a specific location in this partition in the  LOCATION  clause in the following way:

PARTITION pmax VALUES LESS THAN (MAXVALUE)
   LOCATION ('data_dir2:sales_4.csv')

 

Note that, in this case, the directory  data_dir2  is specified for the location  sales_4.csv , but the  data_dir2  directory is not the default directory for the partition. Therefore, the default directory for the  pmax  partition is the same as the default directory for the table, which is  data_dir1 .

Example 20-23 Creating a Composite List-Range Partitioned External Table

This example creates an external table named  accounts  that is partitioned by the data in the  region  column. This partition is subpartitioned using range on the data in the  balance  column. The  ACCESS PARAMETERS  clause is specified at the table level for the  ORACLE_LOADER  access driver. A  LOCATION  clause is specified for each subpartition.

There is a table-level  DEFAULT DIRECTORY  clause set to the  data_dir1  directory object, and this directory object is used for all of the subpartitions, except for the following:

  • There is a partition-level  DEFAULT DIRECTORY  clause set to the  data_dir2  directory object for partition  p_southcentral . In that partition, the following subpartitions use this default directory:  p_sc_low p_sc_high , and  p_sc_extraordinary .

  • In partition  p_southcentral , the subpartition  p_sc_average  has a subpartition-level  DEFAULT DIRECTORY  clause set to the  data_dir3  directory object, and this subpartition uses the  data_dir3  directory object.

  • As previously stated, the default directory for the  p_sc_high  subpartition is  data_dir2 . The  p_sc_high  subpartition does not have a  DEFAULT DIRECTORY  clause, and the default directory  data_dir2  is inherited from the  DEFAULT DIRECTORY  specified in the  PARTITION BY  clause for the partition  p_southcentral . The files in the  p_sc_high subpartition use the following directories:

    • The  psch1.csv  file uses  data_dir2 , the default directory for the subpartition.

    • The  psch2.csv  file uses the  data_dir4  directory because the  data_dir4  directory is specified for that location.

-- Set up the directories and grant access to oe
CREATE OR REPLACE DIRECTORY data_dir1
    AS '/stage/data1_dir';
CREATE OR REPLACE DIRECTORY data_dir2
    AS '/stage/data2_dir';
CREATE OR REPLACE DIRECTORY data_dir3
    AS '/stage/data3_dir';
CREATE OR REPLACE DIRECTORY data_dir4
    AS '/stage/data4_dir';
CREATE OR REPLACE DIRECTORY bad_dir 
    AS '/stage/bad_dir'; 
CREATE OR REPLACE DIRECTORY log_dir 
    AS '/stage/log_dir';
GRANT READ ON DIRECTORY data_dir1 TO oe;
GRANT READ ON DIRECTORY data_dir2 TO oe;
GRANT READ ON DIRECTORY data_dir3 TO oe;
GRANT READ ON DIRECTORY data_dir4 TO oe;
GRANT WRITE ON DIRECTORY bad_dir TO oe; 
GRANT WRITE ON DIRECTORY log_dir TO oe; 
-- oe connects. Provide the user password (oe) when prompted.
CONNECT oe
-- create the partitioned external table
CREATE TABLE accounts
( id             NUMBER,
  account_number NUMBER,
  customer_id    NUMBER,
  balance        NUMBER,
  branch_id      NUMBER,
  region         VARCHAR(2),
  status         VARCHAR2(1)
)
ORGANIZATION EXTERNAL(
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY data_dir1
   ACCESS PARAMETERS(
      RECORDS DELIMITED BY NEWLINE
      BADFILE bad_dir: 'sh%a_%p.bad'
      LOGFILE log_dir: 'sh%a_%p.log'
      FIELDS TERMINATED BY '|'
      MISSING FIELD VALUES ARE NULL))
PARALLEL
REJECT LIMIT UNLIMITED
PARTITION BY LIST (region)
SUBPARTITION BY RANGE (balance)
( PARTITION p_northwest VALUES ('OR', 'WA') 
  ( SUBPARTITION p_nw_low VALUES LESS THAN (1000) LOCATION ('pnwl.csv'),
    SUBPARTITION p_nw_average VALUES LESS THAN (10000) LOCATION ('pnwa.csv'),
    SUBPARTITION p_nw_high VALUES LESS THAN (100000) LOCATION ('pnwh.csv'),
    SUBPARTITION p_nw_extraordinary VALUES LESS THAN (MAXVALUE) LOCATION ('pnwe.csv')
  ),
  PARTITION p_southwest VALUES ('AZ', 'UT', 'NM')
  ( SUBPARTITION p_sw_low VALUES LESS THAN (1000) LOCATION ('pswl.csv'),
    SUBPARTITION p_sw_average VALUES LESS THAN (10000) LOCATION ('pswa.csv'),
    SUBPARTITION p_sw_high VALUES LESS THAN (100000) LOCATION ('pswh.csv'),
    SUBPARTITION p_sw_extraordinary VALUES LESS THAN (MAXVALUE) LOCATION ('pswe.csv')
  ),
  PARTITION p_northeast VALUES ('NY', 'VM', 'NJ')
  ( SUBPARTITION p_ne_low VALUES LESS THAN (1000) LOCATION ('pnel.csv'),
    SUBPARTITION p_ne_average VALUES LESS THAN (10000) LOCATION ('pnea.csv'),
    SUBPARTITION p_ne_high VALUES LESS THAN (100000) LOCATION ('pneh.csv'),
    SUBPARTITION p_ne_extraordinary VALUES LESS THAN (MAXVALUE) LOCATION ('pnee.csv')
  ),
  PARTITION p_southeast VALUES ('FL', 'GA')
  ( SUBPARTITION p_se_low VALUES LESS THAN (1000) LOCATION ('psel.csv'),
    SUBPARTITION p_se_average VALUES LESS THAN (10000) LOCATION ('psea.csv'),
    SUBPARTITION p_se_high VALUES LESS THAN (100000) LOCATION ('pseh.csv'),
    SUBPARTITION p_se_extraordinary VALUES LESS THAN (MAXVALUE) LOCATION ('psee.csv')
  ),
  PARTITION p_northcentral VALUES ('SD', 'WI')
  ( SUBPARTITION p_nc_low VALUES LESS THAN (1000) LOCATION ('pncl.csv'),
    SUBPARTITION p_nc_average VALUES LESS THAN (10000) LOCATION ('pnca.csv'),
    SUBPARTITION p_nc_high VALUES LESS THAN (100000) LOCATION ('pnch.csv'),
    SUBPARTITION p_nc_extraordinary VALUES LESS THAN (MAXVALUE) LOCATION ('pnce.csv')
  ),
  PARTITION p_southcentral VALUES ('OK', 'TX') DEFAULT DIRECTORY data_dir2
  ( SUBPARTITION p_sc_low VALUES LESS THAN (1000) LOCATION ('pscl.csv'),
    SUBPARTITION p_sc_average VALUES LESS THAN (10000) 
       DEFAULT DIRECTORY data_dir3 LOCATION ('psca.csv'),
    SUBPARTITION p_sc_high VALUES LESS THAN (100000) 
       LOCATION ('psch1.csv','data_dir4:psch2.csv'),
    SUBPARTITION p_sc_extraordinary VALUES LESS THAN (MAXVALUE) 
       LOCATION ('psce.csv')
  )
);

 

See Also:

 Oracle Database VLDB and Partitioning Guide

20.14.6.4  Altering a Partitioned External Table

You can use the  ALTER TABLE  statement to modify table-level external parameters, but not the partition-level and subpartition-level parameters, of a partitioned external table.

The locations of external files are specified in the  PARTITION BY  and  SUBPARTITION BY  clauses. External files for a partition are specified in the partition’s  PARTITION BY  clause. External files for a subpartition are specified in the subpartition's  SUBPARTITION BY  clause.

The only exception is that the  LOCATION  clause cannot be specified at the table level during the creation of a partitioned external table. Therefore, the  LOCATION  clause cannot be added at the table level in an  ALTER TABLE  statement that modifies a partitioned external table.

At the partition level, only  ADD DROP , and  RENAME  operations are supported. An  ALTER TABLE  statement cannot modify the attributes of existing partitions and subpartitions. However, you can include the  DEFAULT DIRECTORY  and  LOCATION  clauses in a  PARTITION  clause or  SUBPARTITION  clause when you add a new partition or subpartition.

  1. Connect to the database as a user with the privileges required to alter the external table.

  2. Issue an  ALTER TABLE  statement.

Example 20-24 Renaming a Partition of a Partitioned External Table

This example renames a partition of the partitioned external table named  orders_external_range .

ALTER TABLE orders_external_range RENAME PARTITION pmax TO other_months;