SQL R Services optimization for concurrent execution of sp_execute_external_script

With SQL Server 2016, we have introduced in-database analytics by bringing R closer to the database. This allows the compute to happen closer to the data,  and also leverage the power of SQL Server including resource governance. For production scenario, our guideline includes embedding ‘R’ scripts inside sp_execute_external_script (SPEES), which internally spawns processes for R analytics.

To achieve better concurrency and throughput in parallel execution of SPEES, we recommend customer to test different process pool size to determine the optimum setting for their workload. By default 6 processes are enabled and it takes 100 ms to start a new process in Windows. For a large concurrent request, we should following entry in rlauncher.config to get preemptively increase the pool size.

PROCESS_POOL_SQLSATELLITE_GROWTH=20

If customer is running 10 concurrent connections to SPEES with MAXDOP of 4, we recommend setting the pool size to at least 10 X 4 i.e. 40. In general, the optimum configuration depends on the workload, so they should be tested for specific use case.

Tags microsoft r server PROCESS_POOL_SQLSATELLITE_GROWTH sp_execute_external_script SQL in-database analytics SQL R Services

This is a repost from the Microsoft R Server Tiger Team (Prabhat_Tripathi_MSFT)