共用方式為


How an apparently innocent antivirus could kill your SSIS package performance?

Interesting story this one. I've been working with a customer who had a very simple package which consisted of a Sequence Container including an Execute SQL Task (which truncates the destination table) followed by a Data Flow Task. The Data Flow Task contained an OLE DB Source (a SQL Server 2000) and an OLE DB Destination (a SQL Server 2005). When the package was run from either of the two 64-bit boxes which were part of a cluster they had, it took over 80 minutes for the package to complete, while when it was run from other boxes in their infrastructure (no matter whether they were 32-bit or 64-bit boxes) it finished moving the same data across the two OLE DB endpoints in no more than 300 seconds.

It happened that an internal component of SSIS execution pipeline which is the Buffer Manager, allocates memory where it will later copy the input rows, but when it comes to storing BLOB (image, text or ntext) columns in those buffers, it normally ends up spooling those BLOBs to external temporary files, which then releases (i.e. close the file and automatically delete it since it was created with the FILE_FLAG_DELETE_ON_CLOSE flag) when the buffer containing the rows is also released.

The point here was that this customer had installed a well known antivirus software and he had enabled the On Access Scan feature (sort of real time scanning) which was introduce the huge delay that in the end summed up the big resulting difference.

The typical stack you would see in a case like this, every time you break DTEXEC.exe execution would look like this:

0:017> kL
Child-SP RetAddr Call Site
00000000`1e34fc68 00000000`77d6e314 ntdll!NtClose+0xa
00000000`1e34fc70 00000000`2359d623 kernel32!CloseHandle+0x5f
00000000`1e34fca0 00000000`2359bfe9 DTSPipeline!CInMemBuffer::ReleaseBLOBDescriptors+0xd3
00000000`1e34fcf0 00000000`23591357 DTSPipeline!CInMemBuffer::ReleaseMemory+0x69
00000000`1e34fd20 00000000`2359c2b1 DTSPipeline!CDTSBufferManager::RemoveBuffer+0xf7
00000000`1e34fdc0 00000000`235c37a4 DTSPipeline!CInMemBuffer::Release+0x31
00000000`1e34fdf0 00000000`235be7b0 DTSPipeline!CWorkThread::DoInputLoop+0x424
00000000`1e34fef0 00000000`781337a7 DTSPipeline!CWorkThread::ThreadProcedure+0x70
00000000`1e34ff20 00000000`78133864 msvcr80!_callthreadstartex+0x17
00000000`1e34ff50 00000000`77d6b6da msvcr80!_threadstartex+0x84
00000000`1e34ff80 00000000`00000000 kernel32!BaseThreadStart+0x3a

As you can see, it's stuck trying to close the handle to the temporary file used to spool the BLOB data for a given row.

So, we tested it after having disabled On Access Scan and it worked like a champ. Customer had heard this was a known issue with that particular build of the antivirus in x64. Meanwhile, before the AV manufacturer could provide them with a long term solution, they could exclude from the AV realtime analysis the directory where these files were created, which you can specify by using the BLOBTempStoragePath property.

That was it, after a long time not having showed around here.

Comments

  • Anonymous
    February 19, 2008
    I saw something like this once. What was your solution? :{> Andy andy doooooott leonard aaatt gmail doooooott com

  • Anonymous
    March 01, 2008
    Disabling the On-Scan Access feature of the AV resolved the performance problem. Now it's in the hands of the AV manufacturer to explain the reason for such big negative affect, which wasn't present in a previous build of that antivirus engine or in the exact same version targeting the x86 architecture.

  • Anonymous
    September 07, 2012
    Instances of AV vendors changing their behavior for one version to another is not uncommon. One way to programmatically/remotely manage common AV features (RTP on/off, Scan file, Scan File, Scan System, update etc.) for any AV vendor is implement the OESIS API from OPSWAT.