Monday, May 16, 2011

Blog Post: Win2008R2 RTM/SP1: STOP 0x44 (MULTIPLE_IRP_COMPLETE_REQUESTS)

Status: In Progress...

Update 110516: The private has now been applied to eight servers. Stay tuned for the results!

Update 110514: We have provided the customer with a private hotfix, which is being tested this weekend. An update will follow shortly. By the way, an RSS feed for (recently) released hotfixes for Win2008R2/SP1 is here.

Update 110512: We have a pretty good understanding what the root cause of this problem is, and we are working on a test hotfix to provide to the customer who reported this. If you think your machine might also be experiencing this, drop me an e-mail so we can discuss.

When opening the dump for this issue, I saw:

BugCheck 44, {fffff98014c94ee0, 1d7b, 0, 0}
Probably caused by : mrxsmb.sys ( mrxsmb!RxCeCompleteConnectRequest+367 )

The crashing stack:

16: kd> knL
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr           Call Site
00 fffff880`0e373028 fffff800`016788bc nt!KeBugCheckEx
01 fffff880`0e373030 fffff800`01b735f8 nt!IopFreeIrp+0xffffffff`fff956dc
02 fffff880`0e373070 fffff880`0173f7c7 nt!IovFreeIrpPrivate+0x68
03 fffff880`0e3730b0 fffff880`0174149e mrxsmb!RxCeCompleteConnectRequest+0x367
04 fffff880`0e373130 fffff880`0174137c mrxsmb!SmbWskAsynchronousConnectCompletionWorker+0x106
05 fffff880`0e3731d0 fffff800`01b765d6 mrxsmb!SmbWskAsynchronousConnectCompletion+0xb8
06 fffff880`0e373220 fffff800`016d4a91 nt!IovpLocalCompletionRoutine+0x166
07 fffff880`0e373280 fffff800`01b6e19f nt!IopfCompleteRequest+0x3b1
08 fffff880`0e373360 fffff880`0708a9a5 nt!IovCompleteRequest+0x19f
09 fffff880`0e373430 fffff880`018eca23 afd!WskProTLConnectComplete+0x105
0a fffff880`0e3734f0 fffff880`018e7c21 tcpip!TcpCreateAndConnectTcbComplete+0x233
0b fffff880`0e373600 fffff880`018d5d54 tcpip!TcpTcbCarefulDatagram+0x801
0c fffff880`0e3737b0 fffff880`018d45ea tcpip!TcpTcbReceive+0x724
0d fffff880`0e3739a0 fffff880`018d62ab tcpip!TcpMatchReceive+0x1fa
0e fffff880`0e373af0 fffff880`018aef0a tcpip!TcpPreValidatedReceive+0x36b
0f fffff880`0e373bc0 fffff880`018af7e4 tcpip!IppLoopbackTransmit+0x11a
10 fffff880`0e373c70 fffff880`018d41e7 tcpip!IppLoopbackEnqueue+0x1f5
11 fffff880`0e373d10 fffff880`018dade5 tcpip!IpNlpFastSendDatagram+0xb77
12 fffff880`0e3740c0 fffff880`018ebb0f tcpip!TcpTcbSend+0x495
13 fffff880`0e374340 fffff880`018eb177 tcpip!TcpCreateAndConnectTcbRateLimitComplete+0x39f
14 fffff880`0e374490 fffff880`018eb0c3 tcpip!TcpCreateAndConnectTcbInspectConnectComplete+0x47
15 fffff880`0e374530 fffff880`018ed37c tcpip!TcpContinueCreateAndConnect+0x373
16 fffff880`0e374630 fffff880`018ecd53 tcpip!TcpCreateAndConnectTcbInspectConnectRequestComplete+0x5c
17 fffff880`0e374660 fffff880`018ed296 tcpip!TcpCreateAndConnectTcbWorkQueueRoutine+0x273
18 fffff880`0e374730 fffff880`0708a50e tcpip!TcpCreateAndConnectTcb+0x4c6
19 fffff880`0e374820 fffff800`01b74c16 afd!WskProIRPConnect+0x14e
1a fffff880`0e3748d0 fffff880`01740f39 nt!IovCallDriver+0x566
1b fffff880`0e374930 fffff880`01739082 mrxsmb!SmbWskInitiateAsynchronousConnect+0x2f9
1c fffff880`0e374a80 fffff880`0173f3a9 mrxsmb!RxCeInitiateConnectRequest+0x52
1d fffff880`0e374ab0 fffff880`0173fba5 mrxsmb!RxCeBuildConnectionOverMultipleTransports+0x659
1e fffff880`0e374c40 fffff880`070011b1 mrxsmb!RxCeInitiateConnection+0x151
1f fffff880`0e374c80 fffff800`0196ecce rdbss!RxpWorkerThreadDispatcher+0x1a1
20 fffff880`0e374d40 fffff800`016c2fe6 nt!PspSystemThreadStartup+0x5a
21 fffff880`0e374d80 00000000`00000000 nt!KxStartSystemThread+0x16

STOP 0x44 issues are quite hard to troubleshoot without the help of advanced tools like Driver Verifier, so this was enabled, as was ETW tracing for the Redirector (Rdbss/MrxSmb). Options we used are:

16: kd> !verifier 1

Verify Level 419 ... enabled options are:             // these can be set using "verifier /flags 0x419 ..."
 Special pool
 All pool allocations checked on unload
 Io subsystem checking enabled
 IRP Logging
...
fffffa801b6f5e00 Loaded           00003d14       00000050    rdbss.sys
fffffa801b6f5db0 Loaded           01581370       000003d8    ndis.sys
fffffa801b6f5d60 Loaded           0000a0a8       00000c18    mrxsmb.sys
fffffa801b6f5d00 Loaded           00001010       00000ae0    mrxsmb10.sys
fffffa801b6f5ca0 Loaded           00000000       00000000    mrxsmb20.sys

The verifier IRP Log (!verifier 0x100) shows us:

======================================================================
IRP fffff98014c94ee0, Thread fffffa8021d84180, IRQL = 2, KernelApcDisable = 0, SpecialApcDisable = 0
fffff80001b66fee nt!IovCancelIrp+0xe
fffff80001652023 nt!IoCancelIrp+0xfffffffffffd40f3
fffff8800173f7b3 mrxsmb!RxCeCompleteConnectRequest+0x353
fffff8800174149e mrxsmb!SmbWskAsynchronousConnectCompletionWorker+0x106
fffff8800174137c mrxsmb!SmbWskAsynchronousConnectCompletion+0xb8
fffff80001b765d6 nt!IovpLocalCompletionRoutine+0x166
fffff800016d4a91 nt!IopfCompleteRequest+0x3b1
fffff80001b6e19f nt!IovCompleteRequest+0x19f
fffff8800708a9a5 afd!WskProTLConnectComplete+0x105
fffff880018eca23 tcpip!TcpCreateAndConnectTcbComplete+0x233
fffff880018e7c21 tcpip!TcpTcbCarefulDatagram+0x801
fffff880018d5d54 tcpip!TcpTcbReceive+0x724
fffff880018d45ea tcpip!TcpMatchReceive+0x1fa
======================================================================
IRP fffff98014c94ee0, Thread fffffa8021d81880, IRQL = 2, KernelApcDisable = 0, SpecialApcDisable = 0
fffff80001b6e02f nt!IovCompleteRequest+0x2f
fffff8800708a9a5 afd!WskProTLConnectComplete+0x105
fffff880018eca23 tcpip!TcpCreateAndConnectTcbComplete+0x233
fffff880018e7c21 tcpip!TcpTcbCarefulDatagram+0x801
fffff880018d5d54 tcpip!TcpTcbReceive+0x724
fffff880018d45ea tcpip!TcpMatchReceive+0x1fa
fffff880018d62ab tcpip!TcpPreValidatedReceive+0x36b
fffff880018aef0a tcpip!IppLoopbackTransmit+0x11a
fffff880018af7e4 tcpip!IppLoopbackEnqueue+0x1f5
fffff880018d41e7 tcpip!IpNlpFastSendDatagram+0xb77
fffff880018dade5 tcpip!TcpTcbSend+0x495
fffff880018ebb0f tcpip!TcpCreateAndConnectTcbRateLimitComplete+0x39f
fffff880018eb177 tcpip!TcpCreateAndConnectTcbInspectConnectComplete+0x47

Reading the ETW tracing (which requires private symbols) and further debugging eventually led us to what seems to be a race condition. In the dump I checked, CPU 23 was running this thread which is racing with the one on CPU 16, the crashing stack above:

16: kd> knL
 # Child-SP          RetAddr           Call Site
00 fffff880`0e364ab0 fffff800`017896f6 nt!RtlpLookupFunctionEntryForStackWalks+0xb7
01 fffff880`0e364b20 fffff800`0178aac3 nt!RtlpWalkFrameChain+0x2e6
02 fffff880`0e3651c0 fffff800`0178b94b nt!RtlWalkFrameChain+0x63
03 fffff880`0e3651f0 fffff800`01b6502c nt!RtlCaptureStackBackTrace+0x4b
04 fffff880`0e365220 fffff800`01b6709a nt!IovpLogStackCallout+0x1c
05 fffff880`0e365250 fffff800`01b6993a nt!ViPoolLogStackTrace+0x8a
06 fffff880`0e365280 fffff800`017fb7fa nt!VfFreePoolNotification+0x4a
07 fffff880`0e3652b0 fffff880`01741347 nt!ExFreePoolWithTag+0xffffffff`ffffea6a
08 fffff880`0e365360 fffff800`01b765d6 mrxsmb!SmbWskAsynchronousConnectCompletion+0x83
09 fffff880`0e3653b0 fffff800`016d4a91 nt!IovpLocalCompletionRoutine+0x166
0a fffff880`0e365410 fffff800`01b6e19f nt!IopfCompleteRequest+0x3b1
0b fffff880`0e3654f0 fffff880`0708a9a5 nt!IovCompleteRequest+0x19f
0c fffff880`0e3655c0 fffff880`018eca23 afd!WskProTLConnectComplete+0x105
0d fffff880`0e365680 fffff880`018e7c21 tcpip!TcpCreateAndConnectTcbComplete+0x233
0e fffff880`0e365790 fffff880`018d5d54 tcpip!TcpTcbCarefulDatagram+0x801
0f fffff880`0e365940 fffff880`018d45ea tcpip!TcpTcbReceive+0x724
10 fffff880`0e365b30 fffff880`018d62ab tcpip!TcpMatchReceive+0x1fa
11 fffff880`0e365c80 fffff880`018aef0a tcpip!TcpPreValidatedReceive+0x36b
12 fffff880`0e365d50 fffff880`018af7e4 tcpip!IppLoopbackTransmit+0x11a
13 fffff880`0e365e00 fffff880`018d41e7 tcpip!IppLoopbackEnqueue+0x1f5
14 fffff880`0e365ea0 fffff880`018dade5 tcpip!IpNlpFastSendDatagram+0xb77
15 fffff880`0e366250 fffff880`018ebb0f tcpip!TcpTcbSend+0x495
16 fffff880`0e3664d0 fffff880`018eb177 tcpip!TcpCreateAndConnectTcbRateLimitComplete+0x39f
17 fffff880`0e366620 fffff880`018eb0c3 tcpip!TcpCreateAndConnectTcbInspectConnectComplete+0x47
18 fffff880`0e3666c0 fffff880`018ed37c tcpip!TcpContinueCreateAndConnect+0x373
19 fffff880`0e3667c0 fffff880`018ecd53 tcpip!TcpCreateAndConnectTcbInspectConnectRequestComplete+0x5c
1a fffff880`0e3667f0 fffff880`018ed296 tcpip!TcpCreateAndConnectTcbWorkQueueRoutine+0x273
1b fffff880`0e3668c0 fffff880`0708a50e tcpip!TcpCreateAndConnectTcb+0x4c6
1c fffff880`0e3669b0 fffff800`01b74c16 afd!WskProIRPConnect+0x14e
1d fffff880`0e366a60 fffff880`01740f39 nt!IovCallDriver+0x566
1e fffff880`0e366ac0 fffff880`01739082 mrxsmb!SmbWskInitiateAsynchronousConnect+0x2f9
1f fffff880`0e366c10 fffff880`070068fd mrxsmb!RxCeInitiateConnectRequest+0x52
20 fffff880`0e366c40 fffff880`070011b1 rdbss!RxWorkItemDispatcher+0x59
21 fffff880`0e366c80 fffff800`0196ecce rdbss!RxpWorkerThreadDispatcher+0x1a1
22 fffff880`0e366d40 fffff800`016c2fe6 nt!PspSystemThreadStartup+0x5a
23 fffff880`0e366d80 00000000`00000000 nt!KxStartSystemThread+0x16

Note: On Win2008R2 RTM this STOP 0x44 shows as "mrxsmb!RxCeCompleteConnectRequest+363", instead of "mrxsmb!RxCeCompleteConnectRequest+367". Not all STOP 0x44 errors are caused by this problem.

Stay tuned for updated on this issue.

Ananda Lewis Kate Bosworth Tamala Jones Yamila Diaz Alicia Keys

No comments:

Post a Comment