DBMS_CLOUD Package Format Options for Avro, ORC, or Parquet

The format argument in DBMS_CLOUD specifies the format of source files.

The two ways to specify the format argument are:

format => '{"format_option" : “format_value” }'  

And:

format => json_object('format_option' value 'format_value'))

Examples:

format => json_object('type' VALUE 'CSV')

To specify multiple format options, separate the values with a ",".

For example:

format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true', 'dateformat' value 'YYYY-MM-DD-HH24-MI-SS', 'blankasnull' value 'true')
Format Option Description Syntax

maxvarchar

Controls the default maximum varchar2() length for character columns when creating external tables on Avro, ORC, or Parquet format files. The options are:

  • standard: Specifies that the length limit for varchar2() columns is 4000 bytes. This is equivalent to auto on systems with MAX_STRING_SIZE = standard.

    On systems with MAX_STRING_SIZE = extended, when you create an external table on Avro, ORC, or Parquet and you specify maxvarchar as standard, this shortens character columns to varchar2(4000).

  • extended: Specifies that the length limit for varchar2() columns is 32767 bytes. This is equivalent to auto on systems with MAX_STRING_SIZE = extended. This value raises an error on systems with MAX_STRING_SIZE = standard.

  • auto: The system automatically picks the varchar2() maximum byte size, based on the MAX_STRING_SIZE value. The value auto is the default value for maxvarchar.

maxvarchar: standard | extended | auto

Default value:auto

regexuri

When the value of regexuri is set to TRUE, you can use wildcards as well as regular expressions in the file names in Cloud source file URIs.

The characters "*" and "?" are considered wildcard characters when the regexuri parameter is set to FALSE. When the regexuri parameter is set to TRUE the characters "*" and "?" are part of the specified regular expression pattern.

Regular expression patterns are only supported for the file name or subfolder path in your URIs and the pattern matching is identical to that performed by the REGEXP_LIKE function. Regular expression patterns are not supported for directory names.

For external tables, this option is only supported with the tables that are created on a file in the Object Storage.

For example:

format => JSON_OBJECT('regexuri' value TRUE)

See REGEXP_LIKE Condition for more information on REGEXP_LIKE condition.

regexuri: True

Default value:False

type

Specifies the file type.

type : avro | orc | parquet

schema

When schema is set to first or all, the external table columns and data types are automatically derived from the Avro, ORC, or Parquet file metadata.

The column names will match those found in Avro, ORC, or Parquet. The data types are converted from Avro, ORC, or Parquet data types to Oracle data types. All columns are added to the table.

The value first specifies to use the metadata from the first Avro, ORC, or Parquet file in the file_uri_list to auto generate the columns and their data types. Use first if all of the files have the same schema.

The value all specifies to use the metadata from all Avro, ORC, or Parquet files in the file_uri_list to auto generate the columns and their data types. Use all (slower) if the files may have different schemas.

Default: If column_list is specified, then the schema value, if specified is ignored. If column_list is not specified then the schema default value is first.

Note: For Avro, ORC, or Parquet format files the schema format option is not available and the column_list parameter must be specified for partitioned external tables using the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE procedure.

schema : first | all