SQL R Services optimization for concurrent execution of sp_execute_external_script
Tips and Trick for SQL R Services
With SQL Server 2016, we have introduced in-database analytics by bringing R closer to the database. This allows the compute to happen closer to the data, and also leverage the power of SQL Server including resource governance. For production scenario, our guideline includes embedding 'R' scripts inside sp_execute_external_script (SPEES), which internally spawns processes for R analytics.
To achieve better concurrency and throughput in parallel execution of SPEES, we recommend customer to test different process pool size to determine the optimum setting for their workload. By default 6 processes are enabled and it takes 100 ms to start a new process in Windows. For a large concurrent request, we should following entry in rlauncher.config to get preemptively increase the pool size.
PROCESS_POOL_SQLSATELLITE_GROWTH=20
If customer is running 10 concurrent connections to SPEES with MAXDOP of 4, we recommend setting the pool size to at least 10 X 4 i.e. 40. In general, the optimum configuration depends on the workload, so they should be tested for specific use case.
Comments
- Anonymous
September 21, 2016
The comment has been removed - Anonymous
February 15, 2017
Great to see more information on R in SQL Server being published, please keep up the good work.One interesting thing though, on my machine I consistently see 5 R processes instantiated and not 6.- Anonymous
April 21, 2017
The comment has been removed
- Anonymous
- Anonymous
April 11, 2017
I am aware that when ScaleR was initially developed, the focus was on parallel processing for individual analyses — i.e. splitting a job across multiple processors.However, my focus is not on large analyses using large data sets, but on running a large number of analyses (jobs) each using a relatively small dataset.Can the ScaleR function, or another function be used to create queues for each of a the processors in a set — e.g. if we run a Server with 4 cores, is there functionality available when running either R Server, or R in-database (SQL Server 2016) — I’ve not yet been able to work this out from the documentation I have read, as I have not seen any reference to the Use Case I have.- Anonymous
April 12, 2017
The comment has been removed - Anonymous
April 14, 2018
The comment has been removed
- Anonymous