Databricks SQL Driver for Node.js

Άρθρο
11/14/2024

The Databricks SQL Driver for Node.js is a Node.js library that allows you to use JavaScript code to run SQL commands on Azure Databricks compute resources.

Requirements

A development machine running Node.js, version 14 or higher. To print the installed version of Node.js, run the command node -v. To install and use different versions of Node.js, you can use tools such as Node Version Manager (nvm).
Node Package Manager (npm). Later versions of Node.js already include npm. To check whether npm is installed, run the command npm -v. To install npm if needed, you can follow instructions such as the ones at Download and install npm.
The @databricks/sql package from npm. To install the @databricks/sql package in your Node.js project as a dependency, use npm to run the following command from within the same directory as your project:
```
npm i @databricks/sql
```
If you want to install and use TypeScript in your Node.js project as devDependencies, use npm to run the following commands from within the same directory as your project:
```
npm i -D typescript
npm i -D @types/node
```
An existing cluster or SQL warehouse.
The Server Hostname and HTTP Path value for the existing cluster or SQL warehouse.
- Get these values for a cluster.
- Get these values for a SQL warehouse.

Authentication

The Databricks SQL Driver for Node.js supports the following Azure Databricks authentication types:

Databricks personal access token authentication
Microsoft Entra ID token authentication
OAuth machine-to-machine (M2M) authentication
OAuth user-to-machine (U2M) authentication

The Databricks SQL Driver for Node.js does not yet support the following Azure Databricks authentication types:

Note

As a security best practice, you should not hard code connection variable values into your code. Instead, you should retrieve these connection variable values from a secure location. For example, the code snippets and examples in this article use environment variables.

Databricks personal access token authentication

To use the Databricks SQL Driver for Node.js with authentication, you must first create an Azure Databricks personal access token. For details on this step, see Azure Databricks personal access tokens for workspace users.

To authenticate the Databricks SQL Driver for Node.js, use the following code snippet. This snippet assumes that you have set the following environment variables:

DATABRICKS_SERVER_HOSTNAMEset to the Server Hostname value for your cluster or SQL warehouse.
DATABRICKS_HTTP_PATH, set to HTTP Path value for your cluster or SQL warehouse.
DATABRICKS_TOKEN, set to the Azure Databricks personal access token.

To set environment variables, see your operating system’s documentation.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
const token = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or ' +
      'personal access token. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.',
  );
}

const client = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath,
};

client.connect(connectOptions);
// ...

TypeScript

import { DBSQLClient } from '@databricks/sql';

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
const token: string = process.env.DATABRICKS_TOKEN || '';

if (token == '' || serverHostname == '' || httpPath == '') {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or personal access token. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.',
  );
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath,
};

client.connect(connectOptions);
// ...

OAuth user-to-machine (U2M) authentication

Databricks SQL Driver for Node.js versions 1.8.0 and above support OAuth user-to-machine (U2M) authentication.

To authenticate the Databricks SQL Driver for Node.js with OAuth U2M authentication, use the following code snippet. This snippet assumes that you have set the following environment variables:

DATABRICKS_SERVER_HOSTNAMEset to the Server Hostname value for your cluster or SQL warehouse.
DATABRICKS_HTTP_PATH, set to HTTP Path value for your cluster or SQL warehouse.

To set environment variables, see your operating system’s documentation.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;

if (!serverHostname || !httpPath) {
  throw new Error(
    'Cannot find Server Hostname or HTTP Path. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME ' +
      'and DATABRICKS_HTTP_PATH.',
  );
}

const client = new DBSQLClient();
const connectOptions = {
  authType: 'databricks-oauth',
  useDatabricksOAuthInAzure: true,
  host: serverHostname,
  path: httpPath,
};

client.connect(connectOptions);
// ...

TypeScript

import { DBSQLClient } from '@databricks/sql';

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';

if (serverHostname == '' || httpPath == '') {
  throw new Error(
    'Cannot find Server Hostname or HTTP Path. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME ' +
      'and DATABRICKS_HTTP_PATH.',
  );
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  authType: 'databricks-oauth',
  useDatabricksOAuthInAzure: true,
  host: serverHostname,
  path: httpPath,
};

client.connect(connectOptions);
// ...

OAuth machine-to-machine (M2M) authentication

Databricks SQL Driver for Node.js versions 1.8.0 and above support OAuth machine-to-machine (U2M) authentication.

To use the Databricks SQL Driver for Node.js with OAuth M2M authentication, you must do the following:

Create an Azure Databricks service principal in your Azure Databricks workspace, and create an OAuth secret for that service principal.

To create the service principal and its OAuth secret, see Authorize unattended access to Azure Databricks resources with a service principal using OAuth. Make a note of the service principal’s UUID or Application ID value, and the Secret value for the service principal’s OAuth secret.
Give the service principal access to your cluster or warehouse. See Compute permissions or Manage a SQL warehouse.

To authenticate the Databricks SQL Driver for Node.js, use the following code snippet. This snippet assumes that you have set the following environment variables:

DATABRICKS_SERVER_HOSTNAMEset to the Server Hostname value for your cluster or SQL warehouse.
DATABRICKS_HTTP_PATH, set to HTTP Path value for your cluster or SQL warehouse.
DATABRICKS_CLIENT_ID, set to the service principal’s UUID or Application ID value.
DATABRICKS_CLIENT_SECRET, set to the Secret value for the service principal’s OAuth secret.

To set environment variables, see your operating system’s documentation.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
const clientId = process.env.DATABRICKS_CLIENT_ID;
const clientSecret = process.env.DATABRICKS_CLIENT_SECRET;

if (!serverHostname || !httpPath || !clientId || !clientSecret) {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or ' +
      'service principal ID or secret. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, and ' +
      'DATABRICKS_CLIENT_SECRET.',
  );
}

const client = new DBSQLClient();
const connectOptions = {
  authType: 'databricks-oauth',
  useDatabricksOAuthInAzure: true,
  host: serverHostname,
  path: httpPath,
  oauthClientId: clientId,
  oauthClientSecret: clientSecret,
};

client.connect(connectOptions);
// ...

TypeScript

import { DBSQLClient } from '@databricks/sql';

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
const clientId: string = process.env.DATABRICKS_CLIENT_ID || '';
const clientSecret: string = process.env.DATABRICKS_CLIENT_SECRET || '';

if (serverHostname == '' || httpPath == '' || clientId == '' || clientSecret == '') {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or ' +
      'service principal ID or secret. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, and ' +
      'DATABRICKS_CLIENT_SECRET.',
  );
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  authType: 'databricks-oauth',
  useDatabricksOAuthInAzure: true,
  host: serverHostname,
  path: httpPath,
  oauthClientId: clientId,
  oauthClientSecret: clientSecret,
};

client.connect(connectOptions);
// ...

Microsoft Entra ID token authentication

To use the Databricks SQL Driver for Node.js with Microsoft Entra ID token authentication, you must provide the Databricks SQL Driver for Node.js with the Microsoft Entra ID token. To create a Microsoft Entra ID access token, do the following:

For an Azure Databricks user, you can use the Azure CLI. See Get Microsoft Entra ID tokens for users by using the Azure CLI.
For a Microsoft Entra ID service principal, see Get a Microsoft Entra ID access token with the Azure CLI. To create a Microsoft Entra ID managed service principal, see Manage service principals.

Microsoft Entra ID tokens have a default lifetime of about 1 hour. To create a new Microsoft Entra ID token, repeat this process.

To authenticate the Databricks SQL Driver for Node.js, use the following code snippet. This snippet assumes that you have set the following environment variables:

DATABRICKS_SERVER_HOSTNAMEset to the Server Hostname value for your cluster or SQL warehouse.
DATABRICKS_HTTP_PATH, set to HTTP Path value for your cluster or SQL warehouse.
DATABRICKS_TOKEN, set to the Microsoft Entra ID token.

To set environment variables, see your operating system’s documentation.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
const token = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or ' +
      '<ms-entra-id> token. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.',
  );
}

const client = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath,
};

client.connect(connectOptions);
// ...

TypeScript

import { DBSQLClient } from '@databricks/sql';

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
const token: string = process.env.DATABRICKS_TOKEN || '';

if (token == '' || serverHostname == '' || httpPath == '') {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or ' +
      '<ms-entra-id> token. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.',
  );
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath,
};

client.connect(connectOptions);
// ...

Query data

The following code example demonstrates how to call the Databricks SQL Driver for Node.js to run a basic SQL query on an Azure Databricks compute resource. This command returns the first two rows from the trips table in the samples catalog’s nyctaxi schema.

Note

The following code example demonstrates how to use an Azure Databricks personal access token for authentication. To use other available Azure Databricks authentication types instead, see Authentication.

This code example retrieves the token, server_hostname and http_path connection variable values from a set of Azure Databricks environment variables. These environment variables have the following environment variable names:

DATABRICKS_TOKEN, which represents your Azure Databricks personal access token from the requirements.
DATABRICKS_SERVER_HOSTNAME, which represents the Server Hostname value from the requirements.
DATABRICKS_HTTP_PATH, which represents the HTTP Path value from the requirements.

You can use other approaches to retrieving these connection variable values. Using environment variables is just one approach among many.

The following code example demonstrates how to call the Databricks SQL Connector for Node.js to run a basic SQL command on a cluster or SQL warehouse. This command returns the first two rows from the trips table.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const token = process.env.DATABRICKS_TOKEN;
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;

if (!token || !serverHostname || !httpPath) {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or personal access token. ' +
      'Check the environment variables DATABRICKS_TOKEN, ' +
      'DATABRICKS_SERVER_HOSTNAME, and DATABRICKS_HTTP_PATH.',
  );
}

const client = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath,
};

client
  .connect(connectOptions)
  .then(async (client) => {
    const session = await client.openSession();
    const queryOperation = await session.executeStatement('SELECT * FROM samples.nyctaxi.trips LIMIT 2', {
      runAsync: true,
      maxRows: 10000, // This option enables the direct results feature.
    });

    const result = await queryOperation.fetchAll();

    await queryOperation.close();

    console.table(result);

    await session.close();
    await client.close();
  })
  .catch((error) => {
    console.error(error);
  });

TypeScript

import { DBSQLClient } from '@databricks/sql';
import IDBSQLSession from '@databricks/sql/dist/contracts/IDBSQLSession';
import IOperation from '@databricks/sql/dist/contracts/IOperation';

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
const token: string = process.env.DATABRICKS_TOKEN || '';

if (serverHostname == '' || httpPath == '' || token == '') {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or personal access token. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.',
  );
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  host: serverHostname,
  path: httpPath,
  token: token,
};

client
  .connect(connectOptions)
  .then(async (client) => {
    const session: IDBSQLSession = await client.openSession();

    const queryOperation: IOperation = await session.executeStatement('SELECT * FROM samples.nyctaxi.trips LIMIT 2', {
      runAsync: true,
      maxRows: 10000, // This option enables the direct results feature.
    });

    const result = await queryOperation.fetchAll();

    await queryOperation.close();

    console.table(result);

    await session.close();
    client.close();
  })
  .catch((error) => {
    console.error(error);
  });

Output:

┌─────────┬─────┬────────┬───────────┬───────┬─────────┬────────┬───────┬───────┬────────┬────────┬────────┐
│ (index) │ _c0 │ carat  │    cut    │ color │ clarity │ depth  │ table │ price │   x    │   y    │   z    │
├─────────┼─────┼────────┼───────────┼───────┼─────────┼────────┼───────┼───────┼────────┼────────┼────────┤
│    0    │ '1' │ '0.23' │  'Ideal'  │  'E'  │  'SI2'  │ '61.5' │ '55'  │ '326' │ '3.95' │ '3.98' │ '2.43' │
│    1    │ '2' │ '0.21' │ 'Premium' │  'E'  │  'SI1'  │ '59.8' │ '61'  │ '326' │ '3.89' │ '3.84' │ '2.31' │
└─────────┴─────┴────────┴───────────┴───────┴─────────┴────────┴───────┴───────┴────────┴────────┴────────┘

Sessions

All IDBSQLSession methods that return IOperation objects in the API Reference have the following common parameters that affect their behavior:

Setting runAsync to true starts asynchronous mode. IDBSQLSession methods put operations into the queue and return as quickly as possible. The current state of the returned IOperation object might vary, and the client is responsible for checking its status before using the returned IOperation. See Operations. Setting runAsync to false means that IDBSQLSession methods wait for operations to complete. Databricks recommends always setting runAsync to true.
Setting maxRows to a non-null value enables direct results. With direct results, the server tries to wait for operations to complete and then fetches a portion of the data. Depending on how much work the server was able to complete within the defined time, IOperation objects return in some intermediate state instead of in some pending state. Very often all the metadata and query results are returned within a single request to the server. The server uses maxRows to determine how many records it can return immediately. However, the actual chunk may be of a different size; see IDBSQLSession.fetchChunk. Direct results are enabled by default. Databricks recommends against disabling direct results.

Operations

As described in Sessions, IOperation objects that are returned by IDBSQLSession session methods in the API Reference are not fully populated. The related server operation might still be in progress, such as waiting for the Databricks SQL warehouse to start, running the query, or fetching the data. The IOperation class hides these details from users. For example, methods such fetchAll, fetchChunk, and getSchema wait internally for operations to complete and then return results. You can use the IOperation.finished() method to explicitly wait for operations to complete. These methods take a callback that is periodically called while waiting for operations to complete. Setting the progress option to true attempts to request extra progress data from the server and pass it to that callback.

The close and cancel methods can be called at any time. When called, they immediately invalidate the IOperation object; all pending calls such as fetchAll, fetchChunk, and getSchema are immediately canceled and an error is returned. In some cases, the server operation might have already completed and the cancel method affects only the client.

The fetchAll method calls fetchChunk internally and collects all of the data into an array. While this is convenient, it might cause out of memory errors when used on large datasets. fetchAll options are typically passed to fetchChunk.

Fetch chunks of data

Fetching data chunks uses the following code pattern:

do {
  const chunk = await operation.fetchChunk();
  // Process the data chunk.
} while (await operation.hasMoreRows());

The fetchChunk method in the API Reference processes data in small portions to reduce memory consumption. fetchChunk first waits for operations to complete if they have not already completed, then calls a callback during the wait cycle, and then fetches the next data chunk.

You can use the maxRows option to specify the desired chunk size. However, the returned chunk might have a different size, smaller or even sometimes larger. fetchChunk does not attempt to prefetch data internally, in order to slice it into the requested portions. It sends the maxRows option to then server and returns whatever the server returns. Do not confuse this maxRows option with the one in IDBSQLSession. maxRows passed to fetchChunk defines the size of each chunk and does not do anything else.

Manage files in Unity Catalog volumes

The Databricks SQL Driver enables you to write local files to Unity Catalog volumes, download files from volumes, and delete files from volumes, as shown in the following example:

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
const token = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or ' +
      'personal access token. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.',
  );
}

const client = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath,
};

client
  .connect(connectOptions)
  .then(async (client) => {
    const session = await client.openSession();

    // Write a local file to a volume in the specified path.
    // For writing local files to volumes, you must first specify the path to the
    // local folder that contains the file to be written.
    // Specify OVERWRITE to overwrite any existing file in that path.
    await session.executeStatement("PUT 'my-data.csv' INTO '/Volumes/main/default/my-volume/my-data.csv' OVERWRITE", {
      stagingAllowedLocalPath: ['/tmp/'],
    });

    // Download a file from a volume in the specified path.
    // For downloading files in volumes, you must first specify the path to the
    // local folder that will contain the downloaded file.
    await session.executeStatement("GET '/Volumes/main/default/my-volume/my-data.csv' TO 'my-downloaded-data.csv'", {
      stagingAllowedLocalPath: ['/Users/paul.cornell/samples/nodejs-sql-driver/'],
    });

    // Delete a file in a volume from the specified path.
    // For deleting files from volumes, you must add stagingAllowedLocalPath,
    // but its value will be ignored. As such, in this example, an empty string is
    // specified.
    await session.executeStatement("REMOVE '/Volumes/main/default/my-volume/my-data.csv'", {
      stagingAllowedLocalPath: [''],
    });

    await session.close();
    await client.close();
  })
  .catch((error) => {
    console.error(error);
  });

TypeScript

import { DBSQLClient } from '@databricks/sql';

const serverHostname: string | undefined = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath: string | undefined = process.env.DATABRICKS_HTTP_PATH;
const token: string | undefined = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or ' +
      'personal access token. ' +
      'Check the environment variables DATABRICKS_SERVER_HOSTNAME, ' +
      'DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.',
  );
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath,
};

client
  .connect(connectOptions)
  .then(async (client) => {
    const session = await client.openSession();

    // Write a local file to a volume in the specified path.
    // For writing local files to volumes, you must first specify the path to the
    // local folder that contains the file to be written.
    // Specify OVERWRITE to overwrite any existing file in that path.
    await session.executeStatement("PUT 'my-data.csv' INTO '/Volumes/main/default/my-volume/my-data.csv' OVERWRITE", {
      stagingAllowedLocalPath: ['/tmp/'],
    });

    // Download a file from a volume in the specified path.
    // For downloading files in volumes, you must first specify the path to the
    // local folder that will contain the downloaded file.
    await session.executeStatement("GET '/Volumes/main/default/my-volume/my-data.csv' TO 'my-downloaded-data.csv'", {
      stagingAllowedLocalPath: ['/Users/paul.cornell/samples/nodejs-sql-driver/'],
    });

    // Delete a file in a volume from the specified path.
    // For deleting files from volumes, you must add stagingAllowedLocalPath,
    // but its value will be ignored. As such, in this example, an empty string is
    // specified.
    await session.executeStatement("REMOVE '/Volumes/main/default/my-volume/my-data.csv'", {
      stagingAllowedLocalPath: [''],
    });

    await session.close();
    await client.close();
  })
  .catch((error: any) => {
    console.error(error);
  });

Configure logging

The logger provides information for debugging problems with the connector. All DBSQLClient objects are instantiated with a logger that prints to the console, but by passing in a custom logger, you can send this information to a file. The following example shows how to configure a logger and change its level.

JavaScript

const { DBSQLLogger, LogLevel } = require('@databricks/sql');
const logger = new DBSQLLogger({
  filepath: 'log.txt',
  level: LogLevel.info,
});

// Set logger to different level.
logger.setLevel(LogLevel.debug);

TypeScript

import { DBSQLLogger, LogLevel } from '@databricks/sql';
const logger = new DBSQLLogger({
  filepath: 'log.txt',
  level: LogLevel.info,
});

// Set logger to different level.
logger.setLevel(LogLevel.debug);

For additional examples, see the examples folder in the databricks/databricks-sql-nodejs repository on GitHub.

Testing

To test your code, you can use JavaScript test frameworks such as Jest. To test your code under simulated conditions without calling Azure Databricks REST API endpoints or changing the state of your Azure Databricks accounts or workspaces, you can use Jest’s built-in mocking frameworks.

For example, given the following file named helpers.js containing a getDBSQLClientWithPAT function that uses an Azure Databricks personal access token to return a connection to an Azure Databricks workspace, a getAllColumnsFromTable function that uses the connection to get the specified number of data rows from the specified table (for example, the trips table in the samples catalog’s nyctaxi schema), and a printResults function to print the data rows’ content:

// helpers.js

const { DBSQLClient } = require('@databricks/sql');

async function getDBSQLClientWithPAT(token, serverHostname, httpPath) {
  const client = new DBSQLClient();
  const connectOptions = {
    token: token,
    host: serverHostname,
    path: httpPath,
  };
  try {
    return await client.connect(connectOptions);
  } catch (error) {
    console.error(error);
    throw error;
  }
}

async function getAllColumnsFromTable(client, tableSpec, rowCount) {
  let session;
  let queryOperation;
  try {
    session = await client.openSession();
    queryOperation = await session.executeStatement(`SELECT * FROM ${tableSpec} LIMIT ${rowCount}`, {
      runAsync: true,
      maxRows: 10000, // This option enables the direct results feature.
    });
  } catch (error) {
    console.error(error);
    throw error;
  }
  let result;
  try {
    result = await queryOperation.fetchAll();
  } catch (error) {
    console.error(error);
    throw error;
  } finally {
    if (queryOperation) {
      await queryOperation.close();
    }
    if (session) {
      await session.close();
    }
  }
  return result;
}

function printResult(result) {
  console.table(result);
}

module.exports = {
  getDBSQLClientWithPAT,
  getAllColumnsFromTable,
  printResult,
};

And given the following file named main.js that calls the getDBSQLClientWithPAT, getAllColumnsFromTable, and printResults functions:

// main.js

const { getDBSQLClientWithPAT, getAllColumnsFromTable, printResult } = require('./helpers');

const token = process.env.DATABRICKS_TOKEN;
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
const tableSpec = process.env.DATABRICKS_TABLE_SPEC;

if (!token || !serverHostname || !httpPath) {
  throw new Error(
    'Cannot find Server Hostname, HTTP Path, or personal access token. ' +
      'Check the environment variables DATABRICKS_TOKEN, ' +
      'DATABRICKS_SERVER_HOSTNAME, and DATABRICKS_HTTP_PATH.',
  );
}

if (!tableSpec) {
  throw new Error(
    'Cannot find table spec in the format catalog.schema.table. ' +
      'Check the environment variable DATABRICKS_TABLE_SPEC.',
  );
}

getDBSQLClientWithPAT(token, serverHostname, httpPath)
  .then(async (client) => {
    const result = await getAllColumnsFromTable(client, tableSpec, 2);
    printResult(result);
    await client.close();
  })
  .catch((error) => {
    console.error(error);
  });

The following file named helpers.test.js tests whether the getAllColumnsFromTable function returns the expected response. Rather than creating a real connection to the target workspace, this test mocks a DBSQLClient object. The test also mocks some data that conforms to the schema and values that are in the real data. The test returns the mocked data through the mocked connection and then checks whether one of the mocked data rows’ values matches the expected value.

// helpers.test.js

const { getDBSQLClientWithPAT, getAllColumnsFromTable, printResult } = require('./helpers');

jest.mock('@databricks/sql', () => {
  return {
    DBSQLClient: jest.fn().mockImplementation(() => {
      return {
        connect: jest.fn().mockResolvedValue({ mock: 'DBSQLClient' }),
      };
    }),
  };
});

test('getDBSQLClientWithPAT returns mocked Promise<DBSQLClient> object', async () => {
  const result = await getDBSQLClientWithPAT(
    (token = 'my-token'),
    (serverHostname = 'mock-server-hostname'),
    (httpPath = 'mock-http-path'),
  );

  expect(result).toEqual({ mock: 'DBSQLClient' });
});

const data = [
  {
    tpep_pickup_datetime: new Date(2016, 1, 13, 15, 51, 12),
    tpep_dropoff_datetime: new Date(2016, 1, 13, 16, 15, 3),
    trip_distance: 4.94,
    fare_amount: 19.0,
    pickup_zip: 10282,
    dropoff_zip: 10171,
  },
  {
    tpep_pickup_datetime: new Date(2016, 1, 3, 17, 43, 18),
    tpep_dropoff_datetime: new Date(2016, 1, 3, 17, 45),
    trip_distance: 0.28,
    fare_amount: 3.5,
    pickup_zip: 10110,
    dropoff_zip: 10110,
  },
];

const mockDBSQLClientForSession = {
  openSession: jest.fn().mockResolvedValue({
    executeStatement: jest.fn().mockResolvedValue({
      fetchAll: jest.fn().mockResolvedValue(data),
      close: jest.fn().mockResolvedValue(null),
    }),
    close: jest.fn().mockResolvedValue(null),
  }),
};

test('getAllColumnsFromTable returns the correct fare_amount for the second mocked data row', async () => {
  const result = await getAllColumnsFromTable(
    (client = mockDBSQLClientForSession),
    (tableSpec = 'mock-table-spec'),
    (rowCount = 2),
  );
  expect(result[1].fare_amount).toEqual(3.5);
});

global.console.table = jest.fn();

test('printResult mock prints the correct fare_amount for the second mocked data row', () => {
  printResult(data);
  expect(console.table).toHaveBeenCalledWith(data);
  expect(data[1].fare_amount).toBe(3.5);
});

For TypeScript, the preceding code looks similar. For Jest testing with TypeScript, use ts-jest.

Additional resources

The Databricks SQL Driver for Node.js repository on GitHub
Getting started with the Databricks SQL Driver for Node.js
Troubleshooting the Databricks SQL Driver for Node.js

API reference

Classes

Classes

`DBSQLClient` class

Main entry point for interacting with a database.

Methods

`connect` method

Opens a connection to the database.

Parameters
options Type: `ConnectionOptions` The set of options used to connect to the database. The `host`, `path`, and other required fields must be populated. See Authentication. Example: `const client: DBSQLClient = new DBSQLClient();` `client.connect(` `{` `host: serverHostname,` `path: httpPath,` `// ...` `}` `)`

Returns: Promise<IDBSQLClient>

`openSession` method

Opens session between DBSQLClient and database.

Parameters
request Type: `OpenSessionRequest` A set of optional parameters for specifying initial schema and initial catalog Example: `const session = await client.openSession(` `{initialCatalog: 'catalog'}` `);`

Returns: Promise<IDBSQLSession>

`getClient` method

Returns internal thrift TCLIService.Client object. Must be called after DBSQLClient has connected.

No parameters

Returns TCLIService.Client

`close` method

Closes the connection to the database and releases all associated resources on the server. Any additional calls to this client will throw error.

No parameters.

No return value.

`DBSQLSession` class

DBSQLSessions are primarily used for the execution of statements against the databbase as well as various metadata fetching operations.

Methods

`executeStatement` method

Executes a statement with the options provided.

Parameters

Parameters
statement Type: `str` The statement to be executed.
options Type: `ExecuteStatementOptions` A set of optional parameters for determining query timeout, max rows for direct results, and whether to run the query asynchronously. By default `maxRows` is set to 10000. If `maxRows` is set to null, the operation will run with the direct results feature off. Example: `const session = await client.openSession(` `{initialCatalog: 'catalog'}` `);` `queryOperation = await session.executeStatement(` `'SELECT "Hello, World!"', { runAsync: true }` `);`

statement

Type: str

The statement to be executed.

options

Type: ExecuteStatementOptions

A set of optional parameters for determining query timeout, max rows for direct results, and whether to run the query asynchronously. By default maxRows is set to 10000. If maxRows is set to null, the operation will run with the direct results feature off.

Example:

const session = await client.openSession(
{initialCatalog: 'catalog'}
);

queryOperation = await session.executeStatement(
'SELECT "Hello, World!"', { runAsync: true }
);

Returns: Promise<IOperation>

`close` method

Closes the session. Must be done after using session.

No parameters.

No return value.

`getId` method

Returns the GUID of the session.

No parameters.

Returns: str

`getTypeInfo` method

Returns information about supported data types.

Parameters
request Type: `TypeInfoRequest` Request parameters.

Returns: Promise<IOperation>

`getCatalogs` method

Gets list of catalogs.

Parameters
request Type: `CatalogsRequest` Request parameters.

Returns: Promise<IOperation>

`getSchemas` method

Gets list of schemas.

Parameters
request Type: `SchemasRequest` Request parameters. Fields `catalogName` and `schemaName` can be used for filtering purposes.

Returns: Promise<IOperation>

`getTables` method

Gets list of tables.

Parameters
request Type: `TablesRequest` Request parameters. Fields `catalogName` and `schemaName` and `tableName` can be used for filtering.

Returns: Promise<IOperation>

`getFunctions` method

Gets list of tables.

Parameters
request Type: `FunctionsRequest` Request parameters. Field `functionName` is required.

Returns: Promise<IOperation>

`getPrimaryKeys` method

Gets list of primary keys.

Parameters
request Type: `PrimaryKeysRequest` Request parameters. Fields `schemaName` and `tableName` are required.

Returns: Promise<IOperation>

`getCrossReference` method

Gets information about foreign keys between two tables.

Parameters
request Type: `CrossReferenceRequest` Request parameters. Schema, Parent, and Catalog name must be specified for both tables.

Returns: Promise<IOperation>

`DBSQLOperation` class

DBSQLOperations are created by DBSQLSessions and can be used to fetch the results of statements and check up on their execution. Data is fetched through functions fetchChunk and fetchAll.

Methods

`getId` method

Returns the GUID of the operation.

No parameters.

Returns: str

`fetchAll` method

Waits for operation completion, then fetches all rows from operation.

Parameters: None

Returns: Promise<Array<object>>

`fetchChunk` method

Waits for operation completion, then fetches up to a specified number of rows from an operation.

Parameters
options Type: `FetchOptions` Options used to fetch. Currently, the only option is maxRows, which corresponds to the max number of data objects to be returned in any given array.

Returns: Promise<Array<object>>

`close` method

Closes the operation and releases all associated resources. Must be done after no longer using operation.

No parameters.

No return value.

Κοινή χρήση μέσω

Databricks SQL Driver for Node.js

Requirements

Authentication

Databricks personal access token authentication

JavaScript

TypeScript

OAuth user-to-machine (U2M) authentication

JavaScript

TypeScript

OAuth machine-to-machine (M2M) authentication

JavaScript

TypeScript

Microsoft Entra ID token authentication

JavaScript

TypeScript

Query data

JavaScript

TypeScript

Sessions

Operations

Fetch chunks of data

Manage files in Unity Catalog volumes

JavaScript

TypeScript

Configure logging

JavaScript

TypeScript

Testing

Additional resources

API reference

Classes

DBSQLClient class

Methods

connect method

openSession method

getClient method

close method

DBSQLSession class

Methods

executeStatement method

close method

getId method

getTypeInfo method

getCatalogs method

getSchemas method

getTables method

getFunctions method

getPrimaryKeys method

getCrossReference method

DBSQLOperation class

Methods

getId method

fetchAll method

fetchChunk method

close method

Σχόλια

Πρόσθετοι πόροι

`DBSQLClient` class

`connect` method

`openSession` method

`getClient` method

`close` method

`DBSQLSession` class

`executeStatement` method

`close` method

`getId` method

`getTypeInfo` method

`getCatalogs` method

`getSchemas` method

`getTables` method

`getFunctions` method

`getPrimaryKeys` method

`getCrossReference` method

`DBSQLOperation` class

`getId` method

`fetchAll` method

`fetchChunk` method

`close` method