Node.js용 Databricks SQL 드라이버

아티클
11/16/2024

Node.js용 Databricks SQL 드라이버는 JavaScript 코드를 사용하여 Azure Databricks 컴퓨팅 리소스에서 SQL 명령을 실행할 수 있는 Node.js 라이브러리입니다.

요구 사항

Node.js 버전 14 이상을 실행하는 개발 머신입니다. 설치된 버전의 Node.js를 인쇄하려면 명령 node -v를 실행합니다. 다른 버전의 Node.js를 설치하고 사용하려면 nvm(노드 버전 관리자)과 같은 도구를 사용할 수 있습니다.
노드 패키지 관리자(npm). 이후 버전의 Node.js에는 npm이 이미 포함되어 있습니다. npm이 설치되어 있는지 확인하려면 명령 npm -v를 실행합니다. 필요한 경우 npm을 설치하려면 npm 다운로드 및 설치의 지침과 같은 지침을 따를 수 있습니다.
npm의 @databricks/sql 패키지 Node.js 프로젝트에 @databricks/sql 패키지를 종속성으로 설치하려면 npm을 사용하여 프로젝트와 동일한 디렉터리 내에서 다음 명령을 실행합니다.
```
npm i @databricks/sql
```
Node.js 프로젝트에 TypeScript를 devDependencies로 설치하고 사용하려면 npm을 사용하여 프로젝트와 동일한 디렉터리 내에서 다음 명령을 실행합니다.
```
npm i -D typescript
npm i -D @types/node
```
기존 클러스터 또는 SQL 웨어하우스.
기존 클러스터 또는 SQL 웨어하우스의 서버 호스트 이름 및 HTTP 경로 값.
- 클러스터에 대해 이러한 값을 가져옵니다.
- SQL 웨어하우스에 대해 이러한 값을 가져옵니다.

인증

Node.js용 Databricks SQL 드라이버는 다음과 같은 Azure Databricks 인증 유형을 지원합니다.

Databricks 개인용 액세스 토큰 인증
Microsoft Entra ID 토큰 인증
OAuth M2M(machine-to-machine) 인증
OAuth 사용자 대 컴퓨터(U2M) 인증

Node.js용 Databricks SQL 드라이버는 다음과 같은 Azure Databricks 인증 유형은 아직 지원하지 않습니다.

참고 항목

보안 모범 사례로, 연결 변수 값을 코드로 하드 코딩하면 안 됩니다. 대신 보안 위치에서 이러한 연결 변수 값을 검색해야 합니다. 예를 들어 이 문서의 코드 조각 및 예제에서는 환경 변수를 사용합니다.

Databricks 개인용 액세스 토큰 인증

인증에 Node.js Databricks SQL Driver를 사용하려면 먼저 Azure Databricks 개인용 액세스 토큰을 만들어야 합니다. 이 단계에 대한 자세한 내용은 작업 영역 사용자에 대한 Azure Databricks 개인용 액세스 토큰을 참조 하세요.

Node.js용 Databricks SQL 드라이버를 인증하려면 다음 코드 조각을 사용합니다. 이 코드 조각에서는 다음 환경 변수를 설정한다고 가정합니다.

DATABRICKS_SERVER_HOSTNAME은(는) 클러스터 또는 SQL 웨어하우스에 대해 서버 호스트 이름의 값으로 설정됩니다.
, 클러스터 또는 SQL 웨어하우스에 대해 HTTP 경로 값을 에 설정합니다.
DATABRICKS_TOKEN은 Azure Databricks 개인용 액세스 토큰으로 설정됩니다.

환경 변수를 설정하려면 운영 체제 설명서를 참조하세요.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const token          = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "personal access token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
  }

  const client = new DBSQLClient();
  const connectOptions = {
    token: token,
    host:  serverHostname,
    path:  httpPath
  };

  client.connect(connectOptions)
  // ...

TypeScript

import { DBSQLClient } from "@databricks/sql";

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';
const token: string          = process.env.DATABRICKS_TOKEN || '';

if (token == '' || serverHostname == '' || httpPath == '') {
    throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
  }

  const client: DBSQLClient = new DBSQLClient();
  const connectOptions = {
    token: token,
    host:  serverHostname,
    path:  httpPath
  };

  client.connect(connectOptions)
  // ...

OAuth 사용자 대 컴퓨터(U2M) 인증

Node.js 1.8.0 이상 버전용 Databricks SQL 드라이버는 OAuth U2M(사용자 대 컴퓨터) 인증을 지원합니다.

OAuth U2M 인증과 함께 Node.js용 Databricks SQL 드라이버를 인증하려면 다음 코드 조각을 사용합니다. 이 코드 조각에서는 다음 환경 변수를 설정한다고 가정합니다.

DATABRICKS_SERVER_HOSTNAME은(는) 클러스터 또는 SQL 웨어하우스에 대해 서버 호스트 이름의 값으로 설정됩니다.
, 클러스터 또는 SQL 웨어하우스에 대해 HTTP 경로 값을 에 설정합니다.

환경 변수를 설정하려면 운영 체제 설명서를 참조하세요.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;

if (!serverHostname || !httpPath) {
    throw new Error("Cannot find Server Hostname or HTTP Path. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME " +
                    "and DATABRICKS_HTTP_PATH.");
  }

  const client = new DBSQLClient();
  const connectOptions = {
    authType:                  "databricks-oauth",
    useDatabricksOAuthInAzure: true,
    host:                      serverHostname,
    path:                      httpPath
  };

  client.connect(connectOptions)
  // ...

TypeScript

import { DBSQLClient } from "@databricks/sql";

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';

if (serverHostname == '' || httpPath == '') {
    throw new Error("Cannot find Server Hostname or HTTP Path. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME " +
                    "and DATABRICKS_HTTP_PATH.");
  }

  const client: DBSQLClient = new DBSQLClient();
  const connectOptions = {
    authType:                  "databricks-oauth",
    useDatabricksOAuthInAzure: true,
    host:                      serverHostname,
    path:                      httpPath
  };

  client.connect(connectOptions)
  // ...

OAuth M2M(machine-to-machine) 인증

Node.js 1.8.0 이상 버전용 Databricks SQL 드라이버는 OAuth U2M(컴퓨터 대 컴퓨터) 인증을 지원합니다.

OAuth M2M 인증과 함께 Node.js용 Databricks SQL 드라이버를 사용하려면 다음을 수행해야 합니다.

Azure Databricks 작업 영역에서 Azure Databricks 서비스 주체를 만들고 해당 서비스 주체에 대한 OAuth 비밀을 만듭니다.

서비스 주체 및 해당 OAuth 비밀을 만들려면 OAuth사용하여 서비스 주체를 사용하여 Azure Databricks 리소스에 대한 무인 액세스 권한 부여를 참조하세요. 서비스 주체의 UUID 또는 응용 프로그램 ID 값과 서비스 주체의 OAuth 비밀에 대한 비밀 값을 기록해 둡니다.
서비스 주체에게 클러스터 또는 웨어하우스에 대한 액세스 권한을 부여합니다. 컴퓨팅 사용 권한 또는 SQL 웨어하우스 관리를 참조하세요.

Node.js용 Databricks SQL 드라이버를 인증하려면 다음 코드 조각을 사용합니다. 이 코드 조각에서는 다음 환경 변수를 설정한다고 가정합니다.

DATABRICKS_SERVER_HOSTNAME은(는) 클러스터 또는 SQL 웨어하우스에 대해 서버 호스트 이름의 값으로 설정됩니다.
, 클러스터 또는 SQL 웨어하우스에 대해 HTTP 경로 값을 에 설정합니다.
DATABRICKS_CLIENT_ID, 서비스 주체의 UUID 또는 애플리케이션 ID 값으로 설정.
DATABRICKS_CLIENT_SECRET, Azure Databricks 서비스 주체의 OAuth 비밀 값으로 설정합니다.

환경 변수를 설정하려면 운영 체제 설명서를 참조하세요.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const clientId       = process.env.DATABRICKS_CLIENT_ID;
const clientSecret   = process.env.DATABRICKS_CLIENT_SECRET;

if (!serverHostname || !httpPath || !clientId || !clientSecret) {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "service principal ID or secret. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, and " +
                    "DATABRICKS_CLIENT_SECRET.");
  }

  const client = new DBSQLClient();
  const connectOptions = {
    authType:                  "databricks-oauth",
    useDatabricksOAuthInAzure: true,
    host:                      serverHostname,
    path:                      httpPath,
    oauthClientId:             clientId,
    oauthClientSecret:         clientSecret
  };

  client.connect(connectOptions)
  // ...

TypeScript

import { DBSQLClient } from "@databricks/sql";

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';
const clientId: string       = process.env.DATABRICKS_CLIENT_ID || '';
const clientSecret: string   = process.env.DATABRICKS_CLIENT_SECRET || '';

if (serverHostname == '' || httpPath == '' || clientId == '' || clientSecret == '') {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "service principal ID or secret. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, and " +
                    "DATABRICKS_CLIENT_SECRET.");
  }

  const client: DBSQLClient = new DBSQLClient();
  const connectOptions = {
    authType:                  "databricks-oauth",
    useDatabricksOAuthInAzure: true,
    host:                      serverHostname,
    path:                      httpPath,
    oauthClientId:             clientId,
    oauthClientSecret:         clientSecret
  };

  client.connect(connectOptions)
  // ...

Microsoft Entra ID 토큰 인증

Microsoft Entra ID 토큰 인증과 함께 Node.js용 Databricks SQL 드라이버를 사용하려면 Node.js용 Databricks SQL 드라이버에 Microsoft Entra ID 토큰을 제공해야 합니다. Microsoft Entra ID 액세스 토큰을 만들려면 다음을 수행합니다:

Azure Databricks 사용자의 경우 Azure CLI를 사용할 수 있습니다. Azure CLI사용하여 사용자에 대한 Microsoft Entra ID 토큰 가져오기 참조하세요.
Microsoft Entra ID 서비스 주체의 경우, Azure CLI를 사용하여 Microsoft Entra ID 액세스 토큰을 가져오는 방법을 참조하세요. Microsoft Entra ID 관리 서비스 주체를 만들려면 서비스 주체 관리를 참조하세요.

Microsoft Entra ID 토큰의 기본 수명은 약 1시간입니다. 새 Microsoft Entra ID 토큰을 만들려면 이 프로세스를 반복합니다.

Node.js용 Databricks SQL 드라이버를 인증하려면 다음 코드 조각을 사용합니다. 이 코드 조각에서는 다음 환경 변수를 설정한다고 가정합니다.

DATABRICKS_SERVER_HOSTNAME은(는) 클러스터 또는 SQL 웨어하우스에 대해 서버 호스트 이름의 값으로 설정됩니다.
, 클러스터 또는 SQL 웨어하우스에 대해 HTTP 경로 값을 에 설정합니다.
DATABRICKS_TOKEN, Microsoft Entra ID 토큰으로 설정.

환경 변수를 설정하려면 운영 체제 설명서를 참조하세요.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const token          = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "<ms-entra-id> token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
  }

  const client = new DBSQLClient();
  const connectOptions = {
    token: token,
    host:  serverHostname,
    path:  httpPath
  };

  client.connect(connectOptions)
  // ...

TypeScript

import { DBSQLClient } from "@databricks/sql";

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';
const token: string          = process.env.DATABRICKS_TOKEN || '';

if (token == '' || serverHostname == '' || httpPath == '') {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "<ms-entra-id> token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
  }

  const client: DBSQLClient = new DBSQLClient();
  const connectOptions = {
    token: token,
    host:  serverHostname,
    path:  httpPath
  };

  client.connect(connectOptions)
  // ...

쿼리 데이터

다음 코드 예제에서는 Node.js용 Databricks SQL 드라이버를 호출하여 Azure Databricks 컴퓨팅 리소스에서 기본 SQL 쿼리를 실행하는 방법을 보여 줍니다. 이 명령은trips 카탈로그의 samples 스키마에 있는 nyctaxi 테이블에서 처음 두 행을 반환합니다.

참고 항목

다음 코드 예제에서는 인증에 Azure Databricks 개인용 액세스 토큰을 사용하는 방법을 보여 줍니다. 사용 가능한 다른 Azure Databricks 인증 유형을 대신 사용하려면 인증을 참조하세요.

이 코드 예제에서는 Azure Databricks 환경 변수 집합에서 token, server_hostname 및 http_path 연결 변수 값을 검색합니다. 이러한 환경 변수에는 다음과 같은 환경 변수 이름이 있습니다.

DATABRICKS_TOKEN - 요구 사항의 Azure Databricks 개인용 액세스 토큰을 나타냅니다.
DATABRICKS_SERVER_HOSTNAME - 요구 사항의 서버 호스트 이름 값을 나타냅니다.
DATABRICKS_HTTP_PATH - 요구 사항의 HTTP 경로를 나타냅니다.

다른 방법을 사용하여 이러한 연결 변수 값을 검색할 수 있습니다. 환경 변수를 사용하는 것은 여러 방법 중 하나에 불과합니다.

다음 코드 예제에서는 Node.js용 Databricks SQL 커넥터를 호출하여 클러스터 또는 SQL 웨어하우스에서 기본 SQL 명령을 실행하는 방법을 보여 줍니다. 이 명령은 trips 테이블에서 처음 두 행을 반환합니다.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const token          = process.env.DATABRICKS_TOKEN;
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;

if (!token || !serverHostname || !httpPath) {
  throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
                  "Check the environment variables DATABRICKS_TOKEN, " +
                  "DATABRICKS_SERVER_HOSTNAME, and DATABRICKS_HTTP_PATH.");
}

const client = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath
};

client.connect(connectOptions)
  .then(async client => {
    const session = await client.openSession();
    const queryOperation = await session.executeStatement(
      'SELECT * FROM samples.nyctaxi.trips LIMIT 2',
      {
        runAsync: true,
        maxRows:  10000 // This option enables the direct results feature.
      }
    );

    const result = await queryOperation.fetchAll();

    await queryOperation.close();

    console.table(result);

    await session.close();
    await client.close();
})
.catch((error) => {
  console.error(error);
});

TypeScript

import { DBSQLClient } from '@databricks/sql';
import IDBSQLSession from '@databricks/sql/dist/contracts/IDBSQLSession';
import IOperation from '@databricks/sql/dist/contracts/IOperation';

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';
const token: string          = process.env.DATABRICKS_TOKEN || '';

if (serverHostname == '' || httpPath == '' || token == '') {
  throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
                  "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                  "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  host: serverHostname,
  path: httpPath,
  token: token
};

client.connect(connectOptions)
  .then(async client => {
    const session: IDBSQLSession = await client.openSession();

    const queryOperation: IOperation = await session.executeStatement(
      'SELECT * FROM samples.nyctaxi.trips LIMIT 2',
      {
        runAsync: true,
        maxRows: 10000 // This option enables the direct results feature.
      }
    );

    const result = await queryOperation.fetchAll();

    await queryOperation.close();

    console.table(result);

    await session.close();
    client.close();
  })
  .catch((error) => {
    console.error(error);
});

출력:

┌─────────┬─────┬────────┬───────────┬───────┬─────────┬────────┬───────┬───────┬────────┬────────┬────────┐
│ (index) │ _c0 │ carat  │    cut    │ color │ clarity │ depth  │ table │ price │   x    │   y    │   z    │
├─────────┼─────┼────────┼───────────┼───────┼─────────┼────────┼───────┼───────┼────────┼────────┼────────┤
│    0    │ '1' │ '0.23' │  'Ideal'  │  'E'  │  'SI2'  │ '61.5' │ '55'  │ '326' │ '3.95' │ '3.98' │ '2.43' │
│    1    │ '2' │ '0.21' │ 'Premium' │  'E'  │  'SI1'  │ '59.8' │ '61'  │ '326' │ '3.89' │ '3.84' │ '2.31' │
└─────────┴─────┴────────┴───────────┴───────┴─────────┴────────┴───────┴───────┴────────┴────────┴────────┘

세션

IDBSQLSession에서 IOperation 개체를 반환하는 모든 메서드에는 동작에 영향을 주는 다음과 같은 일반적인 매개 변수가 있습니다.

runAsync를 true로 설정하면 비동기 모드가 시작합니다. IDBSQLSession 메서드는 작업을 큐에 넣고 가능한 한 빨리 반환합니다. 반환된 IOperation 개체의 현재 상태는 다를 수 있으며 클라이언트는 반환된 IOperation 개체를 사용하기 전에 상태를 확인해야 합니다. 작업을 참조하세요. runAsync를 false로 설정하는 것은 IDBSQLSession 메서드가 작업이 완료되기를 기다리는 것을 의미합니다. Databricks는 항상 runAsync를 true로 설정하는 것이 좋습니다.
maxRows를 null이 아닌 값으로 설정하면 직접 결과를 확인할 수 있습니다. 직접 결과를 사용하여 서버는 작업이 완료되기를 기다린 다음 데이터의 일부를 페치합니다. 정의된 시간 내에 서버가 완료할 수 있었던 작업량에 따라 IOperation 개체는 보류 중인 상태가 아닌 일부 중간 상태로 반환됩니다. 대부분의 경우 모든 메타데이터 및 쿼리 결과가 서버에 대한 단일 요청 내에서 반환됩니다. 서버는 즉시 반환할 수 있는 레코드 수를 결정하는 데 maxRows를 사용합니다. 그러나 실제 청크는 크기가 다를 수 있으니 IDBSQLSession.fetchChunk를 참조하세요. 직접 결과가 기본적으로 사용하도록 설정됩니다. Databricks에서는 직접 결과를 비활성화하지 않는 것이 좋습니다.

작업

세션에 설명된 대로 IOperation의 IDBSQLSession 세션 메서드에서 반환되는 개체는 완전히 채워지지 않습니다. Databricks SQL 웨어하우스가 시작될 때까지 기다리거나, 쿼리를 실행하거나, 데이터를 가져오는 등 관련 서버 작업이 계속 진행 중일 수 있습니다. IOperation 클래스는 사용자로부터 이러한 세부 정보를 숨깁니다. 예를 들어 fetchAll, fetchChunk 및 getSchema 메서드는 내부적으로 작업이 완료된 후 결과를 반환할 때까지 기다립니다. IOperation.finished() 메서드를 사용하여 작업이 완료되기를 명시적으로 기다릴 수 있습니다. 이러한 메서드는 작업이 완료되길 기다리는 동안 주기적으로 호출되는 콜백을 사용합니다. progress 옵션을 true 시도에 설정하여 서버에서 추가 진행률 데이터를 요청하고 해당 콜백에 전달합니다.

close 및 cancel 메서드는 언제든지 호출할 수 있습니다. 호출되면 IOperation 개체가 즉시 무효화되고, 보류 중인 모든 호출(예: fetchAll, fetchChunk 및 getSchema) 이 즉시 취소되며 오류가 반환됩니다. 경우에 따라 서버 작업이 이미 완료되어 cancel 메서드가 클라이언트에만 영향을 줄 수 있습니다.

fetchAll 메서드는 내부적으로 fetchChunk를 호출하고 모든 데이터를 배열로 수집합니다. 이것은 편리하지만 큰 데이터 세트에 사용할 때 메모리 부족 오류가 발생할 수 있습니다. fetchAll 옵션은 일반적으로 fetchChunk에 전달됩니다.

데이터 청크 페치하기

데이터 청크를 페치하는 경우 다음 코드 패턴을 사용합니다.

do {
  const chunk = await operation.fetchChunk();
  // Process the data chunk.
} while (await operation.hasMoreRows());

fetchChunk의 메서드는 메모리 사용량을 줄이기 위해 데이터를 작은 부분으로 처리합니다. fetchChunk는 먼저 작업이 아직 완료되지 않은 경우 완료될 때까지 기다린 다음, 대기 주기 동안 콜백을 호출한 후 다음 데이터 청크를 가져옵니다.

maxRows 옵션을 사용하여 원하는 청크 크기를 지정할 수 있습니다. 그러나 반환된 청크는 크기가 다르거나 작거나 때로는 더 클 수 있습니다. fetchChunk는 데이터를 요청된 부분으로 조각화하기 위해 내부적으로 프리페치를 시도하지 않습니다. 그런 다음 서버에 대한 maxRows 옵션을 보내고 서버가 반환하는 모든 것을 반환합니다. 이 maxRows 옵션을 IDBSQLSession의 옵션과 혼동하지 마세요. maxRows에 전달된 fetchChunk는 각 청크의 크기를 정의하고 다른 작업은 수행하지 않습니다.

Unity 카탈로그 볼륨에서 파일 관리

Databricks SQL 드라이버를 사용하면 다음 예제와 같이 Unity 카탈로그 볼륨에 로컬 파일을 작성하고, 볼륨에서 파일을 다운로드하고, 볼륨에서 파일을 삭제할 수 있습니다.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const token          = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "personal access token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}

const client = new DBSQLClient();
const connectOptions = {
  token: token,
  host:  serverHostname,
  path:  httpPath
};

client.connect(connectOptions)
  .then(async client => {
    const session = await client.openSession();

    // Write a local file to a volume in the specified path.
    // For writing local files to volumes, you must first specify the path to the
    // local folder that contains the file to be written.
    // Specify OVERWRITE to overwrite any existing file in that path.
    await session.executeStatement(
      "PUT 'my-data.csv' INTO '/Volumes/main/default/my-volume/my-data.csv' OVERWRITE", {
        stagingAllowedLocalPath: ["/tmp/"]
      }
    );

    // Download a file from a volume in the specified path.
    // For downloading files in volumes, you must first specify the path to the
    // local folder that will contain the downloaded file.
    await session.executeStatement(
      "GET '/Volumes/main/default/my-volume/my-data.csv' TO 'my-downloaded-data.csv'", {
        stagingAllowedLocalPath: ["/Users/paul.cornell/samples/nodejs-sql-driver/"]
      }
    )

      // Delete a file in a volume from the specified path.
      // For deleting files from volumes, you must add stagingAllowedLocalPath,
      // but its value will be ignored. As such, in this example, an empty string is
      // specified.
      await session.executeStatement(
        "REMOVE '/Volumes/main/default/my-volume/my-data.csv'", {
          stagingAllowedLocalPath: [""]
        }
      )

      await session.close();
      await client.close();
  })
  .catch((error) => {
    console.error(error);
  });

TypeScript

import { DBSQLClient } from '@databricks/sql';

const serverHostname: string | undefined = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath: string | undefined = process.env.DATABRICKS_HTTP_PATH;
const token: string | undefined = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
  throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                  "personal access token. " +
                  "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                  "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath
};

client.connect(connectOptions)
  .then(async client => {
    const session = await client.openSession();

    // Write a local file to a volume in the specified path.
    // For writing local files to volumes, you must first specify the path to the
    // local folder that contains the file to be written.
    // Specify OVERWRITE to overwrite any existing file in that path.
    await session.executeStatement(
      "PUT 'my-data.csv' INTO '/Volumes/main/default/my-volume/my-data.csv' OVERWRITE", {
        stagingAllowedLocalPath: ["/tmp/"]
      }
    );

    // Download a file from a volume in the specified path.
    // For downloading files in volumes, you must first specify the path to the
    // local folder that will contain the downloaded file.
    await session.executeStatement(
      "GET '/Volumes/main/default/my-volume/my-data.csv' TO 'my-downloaded-data.csv'", {
        stagingAllowedLocalPath: ["/Users/paul.cornell/samples/nodejs-sql-driver/"]
      }
    )

    // Delete a file in a volume from the specified path.
    // For deleting files from volumes, you must add stagingAllowedLocalPath,
    // but its value will be ignored. As such, in this example, an empty string is
    // specified.
    await session.executeStatement(
      "REMOVE '/Volumes/main/default/my-volume/my-data.csv'", {
        stagingAllowedLocalPath: [""]
      }
    )

    await session.close();
    await client.close();
  })
  .catch((error: any) => {
    console.error(error);
  });

로깅 구성

로거는 커넥터의 디버깅 문제에 대한 정보를 제공합니다. 모든 DBSQLClient 개체는 콘솔에 인쇄되는 로거로 인스턴스화되지만 사용자 지정 로거를 전달하면 이 정보를 파일에 보낼 수 있습니다. 다음 예제에서는 로거를 구성하고 수준을 변경하는 방법을 보여줍니다.

JavaScript

const { DBSQLLogger, LogLevel } = require('@databricks/sql');
const logger = new DBSQLLogger({
  filepath: 'log.txt',
  level: LogLevel.info
});

// Set logger to different level.
logger.setLevel(LogLevel.debug);

TypeScript

import { DBSQLLogger, LogLevel } from '@databricks/sql';
const logger = new DBSQLLogger({
  filepath: 'log.txt',
  level: LogLevel.info,
});

// Set logger to different level.
logger.setLevel(LogLevel.debug);

추가 예제는 GitHub의 databricks/databricks-sql-nodejs 리포지토리에 있는 예제 폴더를 참조하세요.

테스트

코드를 테스트하려면 Jest과 같은 JavaScript 테스트 프레임워크를 사용할 수 있습니다. Azure Databricks REST API 엔드포인트를 호출하지 않거나 Azure Databricks 계정 또는 작업 영역의 상태를 변경하지 않고 시뮬레이션된 조건에서 코드를 테스트하려면 Jest의 기본 제공 모의 프레임워크를 사용할 수 있습니다.

예를 들어 Azure Databricks 개인용 액세스 토큰을 사용하여 Azure Databricks 작업 영역에 대한 연결을 반환하는 helpers.js 함수와 지정된 테이블에서 지정된 수의 데이터 행을 가져오는 연결을 사용하는 getDBSQLClientWithPAT 함수(예: getAllColumnsFromTable 카탈로그 trips 스키마의 samples 테이블, 데이터 행 콘텐츠를 출력하는 nyctaxi 함수)를 포함하는 printResults라는 이름의 다음 파일을 가정합니다.

// helpers.js

const { DBSQLClient } = require('@databricks/sql');

async function getDBSQLClientWithPAT(token, serverHostname, httpPath) {
  const client = new DBSQLClient();
  const connectOptions = {
    token: token,
    host: serverHostname,
    path: httpPath
  };
  try {
    return await client.connect(connectOptions);
  } catch (error) {
    console.error(error);
    throw error;
  }
}

async function getAllColumnsFromTable(client, tableSpec, rowCount) {
  let session;
  let queryOperation;
  try {
    session = await client.openSession();
    queryOperation = await session.executeStatement(
      `SELECT * FROM ${tableSpec} LIMIT ${rowCount}`,
      {
        runAsync: true,
        maxRows: 10000 // This option enables the direct results feature.
      }
    );
  } catch (error) {
    console.error(error);
    throw error;
  }
  let result;
  try {
    result = await queryOperation.fetchAll();
  } catch (error) {
    console.error(error);
    throw error;
  } finally {
    if (queryOperation) {
      await queryOperation.close();
    }
    if (session) {
      await session.close();
    }
  }
  return result;
}

function printResult(result) {
  console.table(result);
}

module.exports = {
  getDBSQLClientWithPAT,
  getAllColumnsFromTable,
  printResult
};

그리고 main.js, getDBSQLClientWithPAT 및 getAllColumnsFromTable 함수를 호출하는 printResults라는 이름의 다음 파일을 가정합니다.

// main.js

const { getDBSQLClientWithPAT, getAllColumnsFromTable, printResult } = require('./helpers');

const token          = process.env.DATABRICKS_TOKEN;
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const tableSpec      = process.env.DATABRICKS_TABLE_SPEC;

if (!token || !serverHostname || !httpPath) {
  throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
    "Check the environment variables DATABRICKS_TOKEN, " +
    "DATABRICKS_SERVER_HOSTNAME, and DATABRICKS_HTTP_PATH.");
}

if (!tableSpec) {
  throw new Error("Cannot find table spec in the format catalog.schema.table. " +
    "Check the environment variable DATABRICKS_TABLE_SPEC."
  )
}

getDBSQLClientWithPAT(token, serverHostname, httpPath)
  .then(async client => {
    const result = await getAllColumnsFromTable(client, tableSpec, 2);
    printResult(result);
    await client.close();
  })
  .catch((error) => {
    console.error(error);
  });

helpers.test.js(으)로 명명된 다음 파일은 getAllColumnsFromTable 함수가 예상 응답을 반환하는지 여부를 테스트합니다. 이 테스트는 대상 작업 영역에 대한 실제 연결을 만드는 대신 DBSQLClient 개체를 모의합니다. 또한 이 테스트는 실제 데이터에 있는 스키마 및 값을 준수하는 일부 데이터를 모의합니다. 이 테스트는 모의 연결을 통해 모의 데이터를 반환한 다음 모의 데이터 행 값 중 하나가 예상 값과 일치하는지 확인합니다.

// helpers.test.js

const { getDBSQLClientWithPAT, getAllColumnsFromTable, printResult} = require('./helpers')

jest.mock('@databricks/sql', () => {
  return {
    DBSQLClient: jest.fn().mockImplementation(() => {
      return {
        connect: jest.fn().mockResolvedValue({ mock: 'DBSQLClient'})
      };
    }),
  };
});

test('getDBSQLClientWithPAT returns mocked Promise<DBSQLClient> object', async() => {
  const result = await getDBSQLClientWithPAT(
    token = 'my-token',
    serverHostname = 'mock-server-hostname',
    httpPath = 'mock-http-path'
  );

  expect(result).toEqual({ mock: 'DBSQLClient' });
});

const data = [
  {
    tpep_pickup_datetime: new Date(2016, 1, 13, 15, 51, 12),
    tpep_dropoff_datetime: new Date(2016, 1, 13, 16, 15, 3),
    trip_distance: 4.94,
    fare_amount: 19.0,
    pickup_zip: 10282,
    dropoff_zip: 10171
  },
  {
    tpep_pickup_datetime: new Date(2016, 1, 3, 17, 43, 18),
    tpep_dropoff_datetime: new Date(2016, 1, 3, 17, 45),
    trip_distance: 0.28,
    fare_amount: 3.5,
    pickup_zip: 10110,
    dropoff_zip: 10110
  }
];

const mockDBSQLClientForSession = {
  openSession: jest.fn().mockResolvedValue({
    executeStatement: jest.fn().mockResolvedValue({
      fetchAll: jest.fn().mockResolvedValue(data),
      close: jest.fn().mockResolvedValue(null)
    }),
    close: jest.fn().mockResolvedValue(null)
  })
};

test('getAllColumnsFromTable returns the correct fare_amount for the second mocked data row', async () => {
  const result = await getAllColumnsFromTable(
    client    = mockDBSQLClientForSession,
    tableSpec = 'mock-table-spec',
    rowCount  = 2);
  expect(result[1].fare_amount).toEqual(3.5);
});

global.console.table = jest.fn();

test('printResult mock prints the correct fare_amount for the second mocked data row', () => {
  printResult(data);
  expect(console.table).toHaveBeenCalledWith(data);
  expect(data[1].fare_amount).toBe(3.5);
});

TypeScript의 경우 앞의 코드는 비슷합니다. TypeScript를 사용한 Jest 테스트의 경우 ts-jest를 사용합니다.

추가 리소스

GitHub의 Node.js용 Databricks SQL 드라이버 리포지토리
Node.js용 Databricks SQL 드라이버 시작
Node.js용 Databricks SQL 드라이버 문제 해결

API 참조

클래스

클래스

`DBSQLClient` 클래스

데이터베이스와 상호 작용하기 위한 기본 진입점입니다.

메서드

`connect` 메서드

데이터베이스에 대한 연결을 엽니다.

매개 변수
options 유형: `ConnectionOptions` 데이터베이스에 연결하는 데 사용되는 옵션 집합입니다. `host`, `path` 및 기타 필수 필드를 채워야 합니다. 인증을 참조하세요. 예시: `const client: DBSQLClient = new DBSQLClient();` `client.connect(` `{` `host: serverHostname,` `path: httpPath,` `// ...` `}` `)`

반품: Promise<IDBSQLClient>

`openSession` 메서드

DBSQLClient와 데이터베이스 간의 세션을 엽니다.

매개 변수
request 유형: `OpenSessionRequest` 초기 스키마 및 초기 카탈로그를 지정하기 위한 선택적 매개 변수 집합 예시: `const session = await client.openSession(` `{initialCatalog: 'catalog'}` `);`

반품: Promise<IDBSQLSession>

`getClient` 메서드

내부 중고품 TCLIService.Client 개체를 반환합니다. DBSQLClient가 연결된 후에 호출해야 합니다.

매개 변수 없음

TCLIService.Client를 반환합니다.

`close` 메서드

데이터베이스에 대한 연결을 닫고 서버에서 연결된 모든 리소스를 해제합니다. 이 연결에 대한 추가 호출은 오류를 throw합니다.

매개 변수는 없습니다.

반환 값이 없습니다.

`DBSQLSession` 클래스

DBSQLSessions는 주로 데이터베이스에 대한 문 실행과 다양한 메타데이터 페치 작업에 사용됩니다.

메서드

`executeStatement` 메서드

제공된 옵션을 사용하여 문을 실행합니다.

매개 변수

매개 변수
statement 유형: `str` 실행될 문입니다.
options 유형: `ExecuteStatementOptions` 쿼리 시간 제한, 직접 결과의 최대 행 및 쿼리를 비동기적으로 실행할지 여부를 결정하기 위한 선택적 매개 변수 집합입니다. 기본적으로 `maxRows`는 10000으로 설정됩니다. `maxRows`가 null로 설정되면 직접 결과 기능이 꺼진 상태에서 작업이 실행됩니다. 예시: `const session = await client.openSession(` `{initialCatalog: 'catalog'}` `);` `queryOperation = await session.executeStatement(` `'SELECT "Hello, World!"', { runAsync: true }` `);`

statement

유형: str

실행될 문입니다.

options

유형: ExecuteStatementOptions

쿼리 시간 제한, 직접 결과의 최대 행 및 쿼리를 비동기적으로 실행할지 여부를 결정하기 위한 선택적 매개 변수 집합입니다. 기본적으로 maxRows는 10000으로 설정됩니다. maxRows가 null로 설정되면 직접 결과 기능이 꺼진 상태에서 작업이 실행됩니다.

예시:

const session = await client.openSession(
{initialCatalog: 'catalog'}
);

queryOperation = await session.executeStatement(
'SELECT "Hello, World!"', { runAsync: true }
);

반품: Promise<IOperation>

`close` 메서드

세션을 닫습니다. 세션을 사용한 후에 수행해야 합니다.

매개 변수는 없습니다.

반환 값이 없습니다.

`getId` 메서드

세션의 GUID를 반환합니다.

매개 변수는 없습니다.

반품: str

`getTypeInfo` 메서드

지원되는 데이터 형식에 대한 정보를 반환합니다.

매개 변수
request 유형: `TypeInfoRequest` 요청 매개 변수.

반품: Promise<IOperation>

`getCatalogs` 메서드

카탈로그 목록을 가져옵니다.

매개 변수
request 유형: `CatalogsRequest` 요청 매개 변수.

반품: Promise<IOperation>

`getSchemas` 메서드

스키마 목록을 가져옵니다.

매개 변수
request 유형: `SchemasRequest` 요청 매개 변수. 필드 `catalogName` 및 `schemaName`은 필터링 목적으로 사용할 수 있습니다.

반품: Promise<IOperation>

`getTables` 메서드

테이블 목록을 가져옵니다.

매개 변수
request 유형: `TablesRequest` 요청 매개 변수. 필드 `catalogName`, `schemaName` 및 `tableName`을 필터링에 사용할 수 있습니다.

반품: Promise<IOperation>

`getFunctions` 메서드

테이블 목록을 가져옵니다.

매개 변수
request 유형: `FunctionsRequest` 요청 매개 변수. 필드 `functionName`이 필요합니다.

반품: Promise<IOperation>

`getPrimaryKeys` 메서드

기본 키 목록을 가져옵니다.

매개 변수
request 유형: `PrimaryKeysRequest` 요청 매개 변수. `schemaName` 및 `tableName` 필드가 필요합니다.

반품: Promise<IOperation>

`getCrossReference` 메서드

두 테이블 간의 외래 키에 대한 정보를 가져옵니다.

매개 변수
request 유형: `CrossReferenceRequest` 요청 매개 변수. 스키마, 부모 및 카탈로그 이름은 두 테이블에 대해 지정해야 합니다.

반품: Promise<IOperation>

`DBSQLOperation` 클래스

DBSQLOperations는 DBSQLSessions에서 생성되며 문의 결과를 페치하고 실행을 확인하는 데 사용할 수 있습니다. 데이터는 fetchChunk 및 fetchAll 함수를 통해 페치됩니다.

메서드

`getId` 메서드

작업의 GUID를 반환합니다.

매개 변수는 없습니다.

반품: str

`fetchAll` 메서드

작업 완료를 기다린 다음 작업에서 모든 행을 페치합니다.

매개 변수: 없음

반품: Promise<Array<object>>

`fetchChunk` 메서드

작업 완료를 기다린 다음 작업에서 지정된 수의 행까지 페치합니다.

매개 변수
options 유형: `FetchOptions` 페치에 사용되는 옵션입니다. 현재 유일한 옵션은 maxRows로, 지정된 배열에서 반환될 최대 데이터 개체 수에 해당합니다.

반품: Promise<Array<object>>

`close` 메서드

작업을 닫고 연결된 리소스를 모두 해제합니다. 작업을 더 이상 사용하지 않을 때 수행해야 합니다.

매개 변수는 없습니다.

반환 값이 없습니다.

다음을 통해 공유

Node.js용 Databricks SQL 드라이버

요구 사항

인증

Databricks 개인용 액세스 토큰 인증

JavaScript

TypeScript

OAuth 사용자 대 컴퓨터(U2M) 인증

JavaScript

TypeScript

OAuth M2M(machine-to-machine) 인증

JavaScript

TypeScript

Microsoft Entra ID 토큰 인증

JavaScript

TypeScript

쿼리 데이터

JavaScript

TypeScript

세션

작업

데이터 청크 페치하기

Unity 카탈로그 볼륨에서 파일 관리

JavaScript

TypeScript

로깅 구성

JavaScript

TypeScript

테스트

추가 리소스

API 참조

클래스

DBSQLClient 클래스

메서드

connect 메서드

openSession 메서드

getClient 메서드

close 메서드

DBSQLSession 클래스

메서드

executeStatement 메서드

close 메서드

getId 메서드

getTypeInfo 메서드

getCatalogs 메서드

getSchemas 메서드

getTables 메서드

getFunctions 메서드

getPrimaryKeys 메서드

getCrossReference 메서드

DBSQLOperation 클래스

메서드

getId 메서드

fetchAll 메서드

fetchChunk 메서드

close 메서드

피드백

추가 리소스

`DBSQLClient` 클래스

`connect` 메서드

`openSession` 메서드

`getClient` 메서드

`close` 메서드

`DBSQLSession` 클래스

`executeStatement` 메서드

`close` 메서드

`getId` 메서드

`getTypeInfo` 메서드

`getCatalogs` 메서드

`getSchemas` 메서드

`getTables` 메서드

`getFunctions` 메서드

`getPrimaryKeys` 메서드

`getCrossReference` 메서드

`DBSQLOperation` 클래스

`getId` 메서드

`fetchAll` 메서드

`fetchChunk` 메서드

`close` 메서드