Поделиться через


Troubleshooting Netlib that Comes with MDAC2.8 SP1/NET1.0, SNAC, SQL Server 2005 and NET 2.0 with ETW Tracing

Ever have issue with GNE (general network error)? Using ETW tracing can help. For feature description about ETW tracing for data access components, please refer to https://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/data_access_tracing.asp. Note that all commands used in this blog are shipped with OS by default since WINDOWS 2000.

[Steps]

1. Setup registry.

C:tempreg add HKEY_LOCAL_MACHINESOFTWAREMicrosoftBidInterfaceLoader /v ":Path" /t REG_SZ /d msdadiag.dll /f

 

2. Compose a ctrl.guid file to choose which components to trace.

(a) For netlib driver that comes with MDAC, you need to add the following line in ctrl.guid file,

 

{BD568F20-FCCD-B948-054E-DB3421115D61} 0x00000000 0 DBNETLIB.1

 

MDAC need to be 2.8 to have ETW traces.

 

(b) For netlib driver that comes with SNAC, you need to add following line in ctrl.guid file,

 

{BA798F36-2325-EC5B-ECF8-76958A2AF9B5} 0x0003F007 0 SQLNCLI.1

 

This will turn on all traces, including function entry points and many others. In most cases, limited trace points by replacing the control bit mask 0x0003F007 with 0x00020002 can collect trace points that can log error code.

 

(c) For netlib driver that comes with sqlclient of NET2.0, you should add the following line in your ctrl.guid file,

 

{C9996FA5-C06F-F20C-8A20-69B3BA392315} 0xFFFFFFFF 0 System.Data.SNI.1

 

You can modify the control bit mask the same way as for SNAC, for example use 0x00020002 for error tracing.

 

(d) For netlib component of SQL Server 2005, you need to add following line in ctrl.guid file,

 

{AB6D5EEB-0132-74AB-C5F5-B23E1644DADA} 0x0003F007 0 SQLSERVER.SNI.1

 

The control bit mask can be modified the same way as for SNAC.

3. Start the trace.

C:tempLogman start MyTrace -pf ctrl.guid -ct perf -o Out.etl –ets

4. Restart your application process to repro the problem.

 

5. Stop the tracing

C:templogman stop MyTrace -ets

 

6. Generate reports

C:tempmofcomp all_data.mof

C:tempTraceRPT /y Out.etl

This should generate summary.txt and dumpfile.csv. The dumpfile.csv contains all traced points.

 

7. Clean up registry

C:tempreg delete HKEY_LOCAL_MACHINESOFTWAREMicrosoftBidInterfaceLoader /v ":Path" /f

[Trace output examples]

(a) MDAC: The trace string is prefixed with DBNETLIB,

 

DBNETLIB, TextA, 0x000013C4, 127991531896572714, 45, 15, 3, "<Connect|ERR> socket 0x274c{WINERR}", 0, 0

(b) SNAC: The trace string is prefixed with SQLNCLI

SQLNCLI, TextA, 0x00001EF0, 127991236299678815, 45, 105, 2, "<SNI_Packet::SNI_Packet|SNI> 14#{SNI_Packet} created by 7#{SNI_Conn}", 0, 0

(c) NET2.0: The trace string is prefixed with System.Data.SNI,

 

System.Data.SNI, TextA, 0x00000510, 127989406145563747, 15, 45, 1, "enter_05 <SNI_Conn::InitObject|API|SNI> ppConn: 05A2ECF0{SNI_Conn**} fServer: 0{BOOL}", 0, 0

(d) SQL Server: The trace string is prefixed with SQLSERVER.SNI,

 

SQLSERVER.SNI, TextW, 0x00001644, 127991236297874490, 30, 30, 1, "enter_03 <SNI_Conn::InitObject|API|SNI> ppConn: 00A6FAFC{SNI_Conn**} fServer: 1{BOOL}", 0, 0

 

Note that the time stamp in the forth column of a trace point is window's FILETIME. For example, “127991236299678815” is “Thu Aug 3 17:07:09.967 2006 (GMT-7)”

[Inspection]

In most cases, finding the root cause of an issue from the trace output requires extensive diagnosis. But in some cases, the error code can be of great help to pinpoint the issue directly. The error code before “{WINERR}”, or simply error code around “ERR” is the places that is of interest. If the error code can be correlated with the error event in application, for example the time stamp, you can send/post us the trace snippet that contains the first error with a few surround tracing points as context for diagnosis. In certain scenario, we might ask for specific trace points by providing search pattern. Advanced filtering in post processing is out of scope of this blog.

Apart from recovering the error code, very often, we use ETW trace to identify performance bottleneck or discover synchronization bugs in multithreaded application.

 

Nan Tu

Software Design Engineer, SQL Server Protocols

Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights

Comments

  • Anonymous
    October 02, 2006
    You failed to mention how to interpret the error code. In your example: "<Connect|ERR> socket  0x274c{WINERR}", 0, 0 274c == 10060.  WSAETIMEDOUT (10060) Connection timed out kevmc@microsoft.com

  • Anonymous
    May 08, 2008
    Thank you for this information.  Our GNE issue is very disruptive and we're not getting anywhere with it.  We've tried all of the TCP offloading registry tweaks, we've tried the SynAttackProtect switch ... we're using netmon & perfmon: see some patterns there, but can't single out any piece or pieces of equipment yet (some users are totally unaffected and others are hardly able to work at times -- don't think the db server is the problem).  Anyway, I was hoping to see something new and hopefully helpful from this. At first I couldn't get it to work at all for me, but I found a related article that suggested registering a trace schema (using a .mof file).  Anyway, finally got something useful to log, but nothing from the registered DBNETLIB provider.  I even released and renewed my ip address (which kills my application), but see no evidence of that in the log.  Any tips?  From the application we get the "[DBNETLIB][ConnectionWrite (send()).]General network error. Check your network documentation" message.  Would I get anything more specific from this if I could get it working?