다음을 통해 공유


Enterprise-grade R package management made easy in SQL Server

As enterprise data scientists access SQL Server R Services to perform their in-database analytic work such as data mining and machine learning, the R language's full power cannot be materialized without custom yet secure R package installation. SQL Server has offered R package management functionality since 2016 (SQL R Services 9.0) which enables a team of data scientists to, for example, manage R packages on a per-database basis, keep R packages in sync with the database, share R packages and/or each have his/her own custom installation and automatically resolve R package dependencies during uninstallation. This also eliminates the need for the database administrator to understand R and its package dependencies and gives R packages enterprise-level authentication and security. 

Currently the rx function APIs are only for use in a SQL Server compute context. They are a convenient wrapper of the suite of package management functions like install.packages() in base R.

Custom and secure package management is governed by scopes

  • Shared scope - this scope allows allowed users to install & uninstall packages in per database shared location which in turn can be used by other users of the database on SQL server
  • Private scope - this scope allows allowed users to install & uninstall packages in per database per user private location which can only be used by the single user of the database on SQL server

and database roles

  • rpkgs-users - allows users to use shared packages installed by users belong to rpkgs-shared role
  • rpkgs-private - allows all permissions as rpkgs-users role and also allows users to install, remove and use private packages
  • rpkgs-shared - allows all permissions as rpkgs-private role and also allows users to install, remove shared packages
  • db_owner - allows all permissions as rpkgs-shared role and also allows users to install & remove shared and private packages for other users for management

rp

By default, any user can use the packages in the shared scope. The beauty is that, each rpkgs-private role user can also install his/her own version of this package in his/her private scope. And when this happens, the shared-scope package will be masked, meaning that when this user calls library(<package name>), only his/her installed private-scope version of this package will be loaded.

We will walk through a simple example to mimic a mock yet typical scenario where data scientists collaborate in a database.


Example:

 

First we create a database called RevoTestDB. Then we need to

  1. Set up SQL Server R Services
  2. Enable package management

Step 2 will create the additional database roles rpkgs-private, rpkgs-shared and rpkgs-users which are specific for package management as shown on the left. Then we run the T-SQL script below to create the following database users, each taking a specific database role, with their level of permissions in descending order:

data scientist lead db_owner
senior data scientist rpkgs-shared
junior data scientist rpkgs-private
other users rpkgs-users

The script also grants other roles and permissions needed for these users to use the database as required in the set up SQL Server R Services step.

To install the package 'scales' in shared scope for anyone to use, the senior data scientist can run the following script, and by default he/she installed the lastest version (0.4.1) of 'scales' from MRAN:

Then any user can use the package by either passing the package name to transformPackages in an rx fucntion or loading the package in an rxExec call. For example,

Now say the junior data scientist wants to install an older version of scales (0.4.0) for himself/herself's own use, he/she can, for example, download the .zip file to his local working directory and run the following script to install scales 0.4.0 from the downloaded .zip file by specifying repos = NULL in rxInstallPackages and verify that's actually the version that's loaded. Other users will not be affected by the junior data scientist's installation and will still be using scales 0.4.1.

The data scientist lead can install/uninstall any packages on behalf of any user. And when he/she uninstalls a package, the dependencies are automatically taken care of by specifying dependencies = TRUE in rxRemovePackages.

The output will be something like this:

rp2

To change a SQL Server instance's system package installation manually, refer to Install Additional R Packages on SQL Server.

Comments