INSERT
适用于: Databricks SQL Databricks Runtime
将新行插入表中,并可选择截断表或分区。 通过值表达式或查询的结果指定插入的行。
如果timestamp-millis
表架构中存在该类型,Databricks 不支持 INSERT
Hive Avro 表。
语法
INSERT { OVERWRITE | INTO } [ TABLE ] table_name
[ PARTITION clause ]
[ ( column_name [, ...] ) | BY NAME ]
query
INSERT INTO [ TABLE ] table_name
REPLACE WHERE predicate
query
注意
当你 INSERT INTO
Delta 表时,支持架构强制和演变。
如果列的数据类型不能安全地强制转换为 Delta 表的数据类型,则会引发运行时异常。
如果启用了架构演变,则新列可以作为架构的最后一列(或嵌套列)存在,以便架构得以演变。
参数
INTO
或OVERWRITE
如果指定
OVERWRITE
,则将应用以下操作:- 如果没有
partition_spec
,将在插入第一行之前截断表。 - 否则,在插入第一行之前,与
partition_spec
匹配的所有分区都将被截断。
如果指定
INTO
,则插入的所有行都将添加到现有行。- 如果没有
-
标识要插入到的表。 名称不得包含时态规范。 如果找不到表,Azure Databricks 会引发 TABLE_OR_VIEW_NOT_FOUND 错误。
table_name
不得为外表。适用于: Databricks SQL Databricks Runtime 16.0 及更高版本
table_name
可能包括选项规范。 -
一个可选参数,可指定插入的目标分区。 也可仅部分指定分区。
指定 静态 分区时(
column = value
)不能在插入列列表中重复此列。指定 动态 分区
INSERT OVERWRITE
时,table_name
该分区必须是 Delta Lake 表。 ( column_name [, …] )
表中列的可选列表。 insert 命令最多可以指定表中的任何特定列一次。
适用于: Databricks SQL Databricks Runtime 12.2 LTS 及更高版本
- 如果此命令省略某列,Databricks SQL 将改为分配相应的默认值。
- 如果目标表架构没有为插入的列定义任何默认值,则 Databricks SQL 会在该列可为空时分配
NULL
。 否则,Databricks SQL 会引发错误。
不提供列列表等同于按照表中定义的顺序指定所有列(在
PARTITION
子句中分配了值的列除外)。BY NAME
适用于: Databricks SQL Databricks Runtime 13.3 LTS 及更高版本
当使用此子句而不是显式列列表时,该命令使用
query
的公开列名按query
顺序生成列列表。 与显式列列表一样,每个列必须存在于目标表中,并且不得重复。 如果table_name
中存在的列不是隐含列列表的一部分,则会改用DEFAULT
值。BY NAME
还按名称匹配结构的属性。query
中的任何列都不能与PARTITION
子句中指定的列或生成的任何列匹配。REPLACE WHERE boolean_expression
适用于: Databricks SQL Databricks Runtime 12.2 LTS 及更高版本
如果
table_name
是 Delta Lake 表,请先删除与boolean_expression
匹配的行,然后再插入与query
中指定的boolean-expression
匹配的任何行。query
中不匹配boolean_expression
的行将被忽略。boolean_expression
可以是计算得出BOOLEAN
结果类型的任何表达式。-
生成要插入的行的查询。
必须将查询返回的列数与指定或隐含的插入列列表相匹配。
如果数据类型不能安全地转换为匹配的列数据类型,则会引发运行时异常。
适用于: Databricks SQL Databricks Runtime 11.3 LTS 及更高版本
- 如果
query
由 VALUES 子句组成,则expression
可以是DEFAULT
。 - 如果
query
由 SELECT 子句组成,则named_expression
可以是DEFAULT
。 DEFAULT
将在table_name
中插入相应列的明确定义的DEFAULT
表达式,如果没有定义任何表达式,则插入NULL
。
如果启用了架构演变,则新列可以作为架构的最后一列(或嵌套列)存在,以便架构得以演变。
- 如果
示例
本节内容:
INSERT INTO
INSERT 使用 VALUES
> CREATE TABLE students (name VARCHAR(64), address VARCHAR(64) DEFAULT 'unknown', student_id INT)
PARTITIONED BY (student_id);
-- Single row insert using a `VALUES` clause specifying all columns.
> INSERT INTO students VALUES
('Amy Smith', '123 Park Ave, San Jose', 111111);
-- Single row insert using an implicit default for address
> INSERT INTO students(name, student_id) VALUES('Grayson Miller', 222222);
-- Single row insert using an explicit DEFAULT keyword for address
> INSERT INTO students VALUES('Youna Kim', DEFAULT, 333333);
-- Multi-row insert using a `VALUES` clause
> INSERT INTO students VALUES
('Bob Brown', '456 Taylor St, Cupertino', 444444),
('Cathy Johnson', '789 Race Ave, Palo Alto', 555555);
-- Multi-row insert using a mix of DEFAULT and literals
> INSERT INTO students VALUES
('Gwyneth Zhao', '120 Main St, Rockport', 666666),
('Jackson Peterson', DEFAULT, 777777);
> SELECT * FROM students;
name address student_id
---------------- ------------------------ ----------
Amy Smith 123 Park Ave, San Jose 111111
Grayson Miller unknown 222222
Youna Kim unknown 333333
Bob Brown 456 Taylor St, Cupertino 444444
Cathy Johnson 789 Race Ave, Palo Alto 555555
Gwyneth Zhao 120 Main St, Rockport 666666
Jackson Peterson unknown 777777
使用子查询进行插入
-- Assuming the persons table has already been created and populated.
> SELECT * FROM persons;
name address ssn
------------- ------------------------- ---------
Dora Williams 134 Forest Ave, Melo Park 123456789
Eddie Davis 245 Market St, Milpitas 345678901
> INSERT INTO students PARTITION (student_id = 444444)
SELECT name, address FROM persons WHERE name = "Dora Williams";
> SELECT * FROM students;
name address student_id
------------- ------------------------- ----------
Amy Smith 123 Park Ave, San Jose 111111
Bob Brown 456 Taylor St, Cupertino 222222
Cathy Johnson 789 Race Ave, Palo Alto 333333
Dora Williams 134 Forest Ave, Melo Park 444444
-- Use an option specification to influence the write behavior
INSERT INTO t WITH ('write.split-size' = 10) SELECT * FROM s;
使用 TABLE
子句进行插入
-- Assuming the visiting_students table has already been created and populated.
> SELECT * FROM visiting_students;
name address student_id
------------- --------------------- ----------
Fleur Laurent 345 Copper St, London 777777
Gordon Martin 779 Lake Ave, Oxford 888888
> INSERT INTO students TABLE visiting_students;
> SELECT * FROM students;
name address student_id
------------- ------------------------- ----------
Amy Smith 123 Park Ave,San Jose 111111
Bob Brown 456 Taylor St, Cupertino 222222
Cathy Johnson 789 Race Ave, Palo Alto 333333
Dora Williams 134 Forest Ave, Melo Park 444444
Fleur Laurent 345 Copper St, London 777777
Gordon Martin 779 Lake Ave, Oxford 888888
插入到目录中
> CREATE TABLE students (name VARCHAR(64), address VARCHAR(64), student_id INT)
PARTITIONED BY (student_id)
LOCATION "/path/to/students_table";
> INSERT INTO delta.`/path/to/students_table` VALUES
('Amy Smith', '123 Park Ave, San Jose', 111111);
> SELECT * FROM students;
name address student_id
------------- ------------------------- ----------
Amy Smith 123 Park Ave, San Jose 111111
插入列列表
> INSERT INTO students (address, name, student_id) VALUES
('Hangzhou, China', 'Kent Yao', 11215016);
> SELECT * FROM students WHERE name = 'Kent Yao';
name address student_id
--------- ---------------------- ----------
Kent Yao Hangzhou, China 11215016
插入分区规范和列列表
> INSERT INTO students PARTITION (student_id = 11215017) (address, name) VALUES
('Hangzhou, China', 'Kent Yao Jr.');
> SELECT * FROM students WHERE student_id = 11215017;
name address student_id
------------ ---------------------- ----------
Kent Yao Jr. Hangzhou, China 11215017
使用 BY NAME 子句的 INSERT
> CREATE TABLE target(n INT, text STRING, s STRUCT<a INT, b INT>);
> INSERT INTO target BY NAME SELECT named_struct('b', 2, 'a', 1) AS s, 0 AS n, 'data' AS text;
> SELECT * FROM target;
0 data {"a":1,"b":2}
> CREATE OR REPLACE TABLE target(n INT, arr ARRAY<STRUCT<a INT, b INT>>);
> INSERT INTO target BY NAME SELECT array(named_struct('b', 2, 'a', 1)) AS arr, 0 AS n;
> INSERT INTO target BY NAME SELECT array(named_struct('b', 2, 'a', 1)) AS arr;
> SELECT * FROM target;
0 [{"a":1,"b":2}]
NULL [{"a":1,"b":2}]
> INSERT INTO target BY NAME SELECT array(named_struct('b', 2, 'a', 1)) AS arr, 0 AS badname;
Error
> INSERT INTO target BY NAME SELECT array(named_struct('b', 2, 'a', 1)) AS arr, 0 AS n, 1 AS n;
Error: INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS
REPLACE WHERE
> CREATE TABLE sales(tx_date DATE, amount INTEGER);
> INSERT INTO sales VALUES
(DATE'2022-10-01', 1234),
(DATE'2022-10-02', 2345),
(DATE'2022-10-03', 3456),
(DATE'2022-11-01', 3214);
-- Replace any rows with a transaction date in October 2022.
> INSERT INTO sales REPLACE WHERE tx_date BETWEEN '2022-10-01' AND '2022-10-31'
VALUES (DATE'2022-10-01', 1237),
(DATE'2022-10-02', 2378),
(DATE'2022-10-04', 2456),
(DATE'2022-10-05', 6328);
> SELECT * FROM sales ORDER BY tx_date;
tx_date amount
---------- ------
2022-10-01 1237
2022-10-02 2378
2022-10-04 2456
2022-10-05 6328
2022-11-01 3214
INSERT OVERWRITE
使用 VALUES
子句进行插入
-- Assuming the students table has already been created and populated.
> SELECT * FROM students;
name address student_id
------------- ------------------------- ----------
Amy Smith 123 Park Ave, San Jose 111111
Bob Brown 456 Taylor St, Cupertino 222222
Cathy Johnson 789 Race Ave, Palo Alto 333333
Dora Williams 134 Forest Ave, Melo Park 444444
Fleur Laurent 345 Copper St, London 777777
Gordon Martin 779 Lake Ave, Oxford 888888
Helen Davis 469 Mission St, San Diego 999999
Jason Wang 908 Bird St, Saratoga 121212
> INSERT OVERWRITE students VALUES
('Ashua Hill', '456 Erica Ct, Cupertino', 111111),
('Brian Reed', '723 Kern Ave, Palo Alto', 222222);
> SELECT * FROM students;
name address student_id
---------- ----------------------- ----------
Ashua Hill 456 Erica Ct, Cupertino 111111
Brian Reed 723 Kern Ave, Palo Alto 222222
使用子查询进行插入
-- Assuming the persons table has already been created and populated.
> SELECT * FROM persons;
name address ssn
------------- ------------------------- ---------
Dora Williams 134 Forest Ave, Melo Park 123456789
Eddie Davis 245 Market St,Milpitas 345678901
> INSERT OVERWRITE students PARTITION (student_id = 222222)
SELECT name, address FROM persons WHERE name = "Dora Williams";
> SELECT * FROM students;
name address student_id
------------- ------------------------- ----------
Ashua Hill 456 Erica Ct, Cupertino 111111
Dora Williams 134 Forest Ave, Melo Park 222222
使用 TABLE
子句进行插入
-- Assuming the visiting_students table has already been created and populated.
> SELECT * FROM visiting_students;
name address student_id
------------- --------------------- ----------
Fleur Laurent 345 Copper St, London 777777
Gordon Martin 779 Lake Ave, Oxford 888888
> INSERT OVERWRITE students TABLE visiting_students;
> SELECT * FROM students;
name address student_id
------------- --------------------- ----------
Fleur Laurent 345 Copper St, London 777777
Gordon Martin 779 Lake Ave, Oxford 888888
插入覆盖目录
> CREATE TABLE students (name VARCHAR(64), address VARCHAR(64), student_id INT)
PARTITIONED BY (student_id)
LOCATION "/path/to/students_table";
> INSERT OVERWRITE delta.`/path/to/students_table` VALUES
('Amy Smith', '123 Park Ave, San Jose', 111111);
> SELECT * FROM students;
name address student_id
------------- ------------------------- ----------
Amy Smith 123 Park Ave, San Jose 111111