Sdílet prostřednictvím


SQL R Services optimization for concurrent execution of sp_execute_external_script

Tips and Trick for SQL R Services

With SQL Server 2016, we have introduced in-database analytics by bringing R closer to the database. This allows the compute to happen closer to the data,  and also leverage the power of SQL Server including resource governance. For production scenario, our guideline includes embedding 'R' scripts inside sp_execute_external_script (SPEES), which internally spawns processes for R analytics.

To achieve better concurrency and throughput in parallel execution of SPEES, we recommend customer to test different process pool size to determine the optimum setting for their workload. By default 6 processes are enabled and it takes 100 ms to start a new process in Windows. For a large concurrent request, we should following entry in rlauncher.config to get preemptively increase the pool size.

PROCESS_POOL_SQLSATELLITE_GROWTH=20

If customer is running 10 concurrent connections to SPEES with MAXDOP of 4, we recommend setting the pool size to at least 10 X 4 i.e. 40. In general, the optimum configuration depends on the workload, so they should be tested for specific use case.

Comments

  • Anonymous
    September 21, 2016
    The comment has been removed
  • Anonymous
    February 15, 2017
    Great to see more information on R in SQL Server being published, please keep up the good work.One interesting thing though, on my machine I consistently see 5 R processes instantiated and not 6.
    • Anonymous
      April 21, 2017
      The comment has been removed
  • Anonymous
    April 11, 2017
    I am aware that when ScaleR was initially developed, the focus was on parallel processing for individual analyses — i.e. splitting a job across multiple processors.However, my focus is not on large analyses using large data sets, but on running a large number of analyses (jobs) each using a relatively small dataset.Can the ScaleR function, or another function be used to create queues for each of a the processors in a set — e.g. if we run a Server with 4 cores, is there functionality available when running either R Server, or R in-database (SQL Server 2016) — I’ve not yet been able to work this out from the documentation I have read, as I have not seen any reference to the Use Case I have.
    • Anonymous
      April 12, 2017
      The comment has been removed