CREATE TABLE con formato Hive

Articolo
07/05/2024

Si applica a: segno di spunta sì Databricks Runtime

Definisce una tabella usando il formato Hive.

Sintassi

CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
    [ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ]
    [ COMMENT table_comment ]
    [ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
        | ( col_name1, col_name2, ... ) ]
    [ ROW FORMAT row_format ]
    [ STORED AS file_format ]
    [ LOCATION path ]
    [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
    [ AS select_statement ]

row_format:
    : SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
    | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
        [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
        [ MAP KEYS TERMINATED BY map_key_terminated_char ]
        [ LINES TERMINATED BY row_terminated_char ]
        [ NULL DEFINED AS null_char ]

Le clausole tra la clausola di definizione della colonna e la AS SELECT clausola possono essere visualizzate in qualsiasi ordine. Ad esempio, è possibile scrivere COMMENT table_comment dopo TBLPROPERTIES.

Nota

È necessario specificare la STORED AS clausola o ROW FORMAT . In caso contrario, il parser SQL usa la sintassi [USING] per analizzarla e creare una tabella Delta per impostazione predefinita.

Parametri

table_identifier

Nome di tabella, facoltativamente qualificato con un nome di schema.

Sintassi: [schema_name.] table_name
EXTERNAL

Definisce la tabella utilizzando il percorso specificato in LOCATION.
PARTIZIONATO PER

Partiziona la tabella in base alle colonne specificate.
FORMATO RIGA

Usare la SERDE clausola per specificare un serDe personalizzato per una tabella. In caso contrario, usare la DELIMITED clausola per usare il SerDe nativo e specificare il delimitatore, il carattere di escape, il carattere Null e così via.
SERDE

Specifica un serDe personalizzato per una tabella.
serde_class

Specifica un nome di classe completo di un serDe personalizzato.
SERDEPROPERTIES

Elenco di coppie chiave-valore usate per contrassegnare la definizione SerDe.
DELIMITATO

La DELIMITED clausola può essere usata per specificare il serDe nativo e dichiarare il delimitatore, il carattere di escape, il carattere Null e così via.
CAMPI TERMINATI DA

Utilizzato per definire un separatore di colonna.
ELEMENTI DI RACCOLTA TERMINATI DA

Utilizzato per definire un separatore di elementi della raccolta.
CHIAVI DI MAPPING TERMINATE DA

Usato per definire un separatore di chiavi della mappa.
RIGHE TERMINATE DA

Utilizzato per definire un separatore di riga.
NULL DEFINITO COME

Utilizzato per definire il valore specifico per NULL.
ESCAPE BY

Definire il meccanismo di escape.
ELEMENTI DI RACCOLTA TERMINATI DA

Definire un separatore di elementi della raccolta.
CHIAVI DI MAPPING TERMINATE DA

Definire un separatore di chiave della mappa.
RIGHE TERMINATE DA

Definire un separatore di riga.
NULL DEFINITO COME

Definire il valore specifico per NULL.
ARCHIVIATO COME

Formato di file per la tabella. I formati disponibili includono TEXTFILE, SEQUENCEFILE, RCFILEORC, PARQUET, e AVRO. In alternativa, è possibile specificare formati di input e output personalizzati tramite INPUTFORMAT e OUTPUTFORMAT. Solo i formati TEXTFILE, SEQUENCEFILEe RCFILE possono essere usati con ROW FORMAT SERDE e possono essere usati solo TEXTFILE con ROW FORMAT DELIMITED.
LOCATION

Percorso della directory in cui sono archiviati i dati della tabella, che potrebbe essere un percorso nell'archiviazione distribuita.
COMMENTO

Valore letterale stringa per descrivere la tabella.
TBLPROPERTIES

Elenco di coppie chiave-valore usate per contrassegnare la definizione della tabella.
AS select_statement

Popola la tabella usando i dati dell'istruzione select.

Esempi

--Use hive format
CREATE TABLE student (id INT, name STRING, age INT) STORED AS ORC;

--Use data from another table
CREATE TABLE student_copy STORED AS ORC
    AS SELECT * FROM student;

--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT)
    COMMENT 'this is a comment'
    STORED AS ORC
    TBLPROPERTIES ('foo'='bar');

--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT)
    STORED AS ORC
    TBLPROPERTIES ('foo'='bar')
    COMMENT 'this is a comment';

--Create partitioned table
CREATE TABLE student (id INT, name STRING)
    PARTITIONED BY (age INT)
    STORED AS ORC;

--Create partitioned table with different clauses order
CREATE TABLE student (id INT, name STRING)
    STORED AS ORC
    PARTITIONED BY (age INT);

--Use Row Format and file format
CREATE TABLE student (id INT, name STRING)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    STORED AS TEXTFILE;

--Use complex datatype
CREATE EXTERNAL TABLE family(
        name STRING,
        friends ARRAY<STRING>,
        children MAP<STRING, INT>,
        address STRUCT<street: STRING, city: STRING>
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\'
    COLLECTION ITEMS TERMINATED BY '_'
    MAP KEYS TERMINATED BY ':'
    LINES TERMINATED BY '\n'
    NULL DEFINED AS 'foonull'
    STORED AS TEXTFILE
    LOCATION '/tmp/family/';

--Use predefined custom SerDe
CREATE TABLE avroExample
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    TBLPROPERTIES ('avro.schema.literal'='{ "namespace": "org.apache.hive",
        "name": "first_schema",
        "type": "record",
        "fields": [
                { "name":"string1", "type":"string" },
                { "name":"string2", "type":"string" }
            ] }');

--Use personalized custom SerDe(we may need to `ADD JAR xxx.jar` first to ensure we can find the serde_class,
--or you may run into `CLASSNOTFOUND` exception)
ADD JAR /tmp/hive_serde_example.jar;

CREATE EXTERNAL TABLE family (id INT, name STRING)
    ROW FORMAT SERDE 'com.ly.spark.serde.SerDeExample'
    STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
        OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
    LOCATION '/tmp/family/';

Condividi tramite

CREATE TABLE con formato Hive

Sintassi

Parametri

Esempi

Commenti e suggerimenti

Risorse aggiuntive

Condividi tramite

CREATE TABLE con formato Hive

Sintassi

Parametri

Esempi

Istruzioni correlate

Commenti e suggerimenti

Risorse aggiuntive