Jaa


Don't believe everything the debugger is telling you!!! (aka Rootkit)

<sigh>

 

So this issue was incredibly interesting, but you know at the end it just makes you mad. This past week a customer was having a very clear cut, reproduciable issue: When attempting to attach an item to a message via OWA, the Information Store would crash with what appeared to be stack corruption. Boy, I didn’t know where this would take me and everyone else that got drawn into this issue. Below is our debug journey to understand what was happening. The conclusion of this was a collective effort of several Escalation Engineers, so I am not taking full credit for all of this investigation… The information below may make it look easy, but it had us going in an infinite loop for a couple of hours <grin>

First the stack looked like this after the crash (yes I had correct symbols in place):

0:058> kb

ChildEBP RetAddr Args to Child

WARNING: Frame IP not in any known module. Following frames may be wrong.

7c576b4e f8458d51 0875ff50 ffffb7e8 0fc085ff 0x77f87cac

So based upon my past experience with Stack corruption, I looked at raw dump of the stack which showed over and over an exception handler was attempting to be located, which chewed up the remnants of the original stack. Fortunately the customer allowed me to perform a remote debug, so I didn’t have to decipher the mess left on the stack or attempt to find the real .cxr (Exception Record), besides there where errors when this userdump was created related to not being able to read memory… so we really couldn’t trust everything in the userdump. In turns out there was a valid reason for this error...

So after manually walking through the functions that are called, I eventually found the crash was occurring in this code path:

ChildEBP RetAddr
25b0cf10 001f3cae ESE!ErrOSSLVFileOpen+0x61
25b0d68c 001edb85 ESE!ErrSLVSetColumnFromFile+0x5b
25b0dc84 001c4bc8 ESE!ErrSLVSetColumn+0x4d9
25b0dcc0 001c49a6 ESE!ErrIsamSetColumn+0x1ec
25b0dcd8 001c48f1 ESE!ErrESESetColumn+0x6f
25b0dd10 001c4885 ESE!JetSetColumnEx+0x4d
25b0dd54 004041d1 ESE!JetSetColumn+0x4c
25b0dd84 0040f710 store!JetSetColumnFn+0x4a
25b0ddc8 004a78ba store!JTAB_BASE::EcSetColumn+0x1aa
25b0de08 004a50ea store!ATTACH::EcFlushFileHandleProps+0x94
25b0de9c 004a45c9 store!ATTACH::EcFlushProps+0xac7
25b0df18 004a42d6 store!ATTACH::EcSaveChanges2+0x4a1
25b0df30 004a429a store!EcSaveOneAttach+0x19
25b0df58 213bb966 store!EcSaveChangesAttachOp2+0x81
25b0df6c 213bb7c2 EXOLEDB!CStoreLogon::ScSaveChangesAttach+0x42
25b0e030 213bb43d EXOLEDB!CWMMultiPartFormStream::ScCommitAttachment+0x1c6
25b0e06c 213c0233 EXOLEDB!CWMMultiPartFormStream::FWrite+0x9b
25b0efe0 2138ddb0 EXOLEDB!CRequestStream::ScProcessContinuation+0x8e
25b0f014 2138d9c9 EXOLEDB!ScDispatchRequest+0x1e1
25b0fec8 2138d88b EXOLEDB!CDavServer::ProcessItem+0x94

So in this code path we are taking in the information about the file and creating a file handle by calling NTCreateFile(). This is the call that causes the crash to occur. Simply stepping into this call instruction the store will crash. This means the problem is somewhere in NtCreateFile.

0:014>r
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=001f429f esp=20b9cb54 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
ESE!ErrOSSLVFileOpen+0x3b7:
001f429f ff159c253200 call dword ptr [ESE!pfnNtCreateFile (0032259c)]{ntdll!ZwCreateFile (77f87cac)} ds:0023:0032259c=77

OK, so I dumped the parameters about to be passed to NTCreateFile (not shown) and nothing looked out of the ordinary. Next was to unassemble NtCreateFile and see what is about to happen:

0:014> u eip
ntdll!ZwCreateFile:
77f87cac b820000000 mov eax,0x20
77f87cb1 8d542404 lea edx,[esp+0x4]
77f87cb5 cd2e int 2e
77f87cb7 c22c00 ret 0x2c

Still nothing out of the ordinary in comparison to a working system, so stepping through NTCreateFile actually gets me into ZwCreateFile (the backend function that is really called). Here are the register states as we sit poised at the first instruction of ZwCreateFile.

0:014> t
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=77f87cac esp=20b9cb50 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
ntdll!ZwCreateFile:
77f87cac b820000000 mov eax,0x20

So I executed the first assembly instruction of ZwCreateFile, but $#%* what happened? EIP didn’t advance to the next instruction as expected, there is a weird value for EIP now, and the values for all the other registers where exactly the same. We didn’t skip ahead and execute the int 2e instruction to cause a kernel mode transition so what the heck happened?

0:014> p
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=7ffa488d esp=20b9cb50 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
7ffa488d ?? ???

So what instruction(s) are these?

0:014> u eip
7ffa488d ?? ???
^ Memory access error in 'u eip'

Why couldn’t we read this memory? This is a live debug, it should be readable regardless if it is mapped in the process or not. It turns out this is the reason why the userdump could not be created successfully. So I reset the repro and stepped through ESE!ErrOSSLVFileOpen again to ensure we didn’t miss anything… and of course we found nothing. At this point the impossible was happening... a benign instruction was blowing up. So its time to get a different perspective on this, so I asked a few people to give me their thoughts... well, a few hours later there was 8 people huddled over my desk all scratching their heads.

Hmmm… time to breakout the kernel debugger J

OK, so once again we walked to the point in assembly of the instruction that failed and had the customer generate a kernel dump.

So now looking at the kernel dump lets first find our process (note output is truncated for brevity):

: kd> !process 0 0

**** NT ACTIVE PROCESS DUMP ****

PROCESS ff629400  SessionId: 0 Cid: 0f54 Peb: 7ffdf000 ParentCid: 0130

    DirBase: 041eb000 ObjectTable: 81432428 TableSize: 3056.

    Image: store.exe

PROCESS fd43e1e0 SessionId: 0 Cid: 0f5c Peb: 7ffdf000 ParentCid: 0df0

    DirBase: 046c7000 ObjectTable: 811a54c8 TableSize: 265.

    Image: cdb.exe

Set the context to the store.exe process:

0: kd> !process ff629400 1

PROCESS ff629400 SessionId: 0 Cid: 0f54 Peb: 7ffdf000 ParentCid: 0130

    DirBase: 041eb000 ObjectTable: 81432428 TableSize: 3056.

    Image: store.exe

    VadRoot fd434fe8 Clone 0 Private 6435. Modified 445462. Locked 67.

    DeviceMap 8208c128

    Token e2c4a470

    ElapsedTime 0:14:19.0750

    UserTime 0:00:07.0809

    KernelTime 0:00:04.0998

    QuotaPoolUsage[PagedPool] 325628

    QuotaPoolUsage[NonPagedPool] 207428

    Working Set Sizes (now,min,max) (58595, 50, 345) (234380KB, 200KB, 1380KB)

    PeakWorkingSetSize 61146

    VirtualSize 560 Mb

    PeakVirtualSize 561 Mb

    PageFaultCount 340776

    MemoryPriority BACKGROUND

    BasePriority 8

    CommitCharge 9763

    DebugPort e2602040

Now lets go and see the state of the memory range that could not be read from usermode (truncated output):

0: kd> !vad fd434fe8

VAD level start end commit

<snip>

fb8882e8 ( 7) 7ffa0 7ffa4 5 Private EXECUTE_READWRITE

So can we dump this memory? You bet…

0: kd> dd 7ffa0000

7ffa0000 000000e8 b62d5800 c300405d 3d2d2e5f

7ffa0010 6361485b 2072656b 65666544 7265646e

7ffa0020 2e2d3d5d 0000005f 00000000 00000800

7ffa0030 72656b00 336c656e 6c642e32 6553006c

7ffa0040 73614c74 72724574 4300726f 74616572

7ffa0050 69614d65 6f6c736c 47004174 614d7465

7ffa0060 6c736c69 6e49746f 57006f66 65746972

7ffa0070 656c6946 61655200 6c694664 6c430065

What is the state of this entry in the Page Tables?

0: kd> !pte 7ffa0000

7FFA0000 - PDE at C03007FC PTE at C01FFE80

          contains 0D634867 contains 035BC8C6

        pfn d634 --DA--UWV not valid

                               Transition: 35bc

                               Protect: 6

0: kd> !pte 7ffd4000

7FFD4000 - PDE at C03007FC PTE at C01FFF50

          contains 0D634867 contains 028070C2

        pfn d634 --DA--UWV not valid

                               PageFile 1

                               Offset 2807

                               Protect: 6

OK, so the value of ‘Protect’ explains why we can access this memory from usermode, but why? This condition isn’t normal. Hmmm… lets look at the memory again using dc command:

0: kd> dc 7ffa0000

7ffa0000 000000e8 b62d5800 c300405d 3d2d2e5f .....X-.]@.._.-=

7ffa0010 6361485b 2072656b 65666544 7265646e [Hacker Defender

7ffa0020 2e2d3d5d 0000005f 00000000 00000800 ]=-._...........

7ffa0030 72656b00 336c656e 6c642e32 6553006c .kernel32.dll.Se

7ffa0040 73614c74 72724574 4300726f 74616572 tLastError.Creat

7ffa0050 69614d65 6f6c736c 47004174 614d7465 eMailslotA.GetMa

7ffa0060 6c736c69 6e49746f 57006f66 65746972 ilslotInfo.Write

7ffa0070 656c6946 61655200 6c694664 6c430065 File.ReadFile.Cl

0: kd> dc

7ffa0080 4865736f 6c646e61 65470065 766e4574 oseHandle.GetEnv

7ffa0090 6e6f7269 746e656d 69726156 656c6261 ironmentVariable

7ffa00a0 65470057 646f4d74 46656c75 4e656c69 W.GetModuleFileN

7ffa00b0 41656d61 70754400 6163696c 61486574 ameA.DuplicateHa

7ffa00c0 656c646e 65724300 50657461 65636f72 ndle.CreateProce

7ffa00d0 00417373 74697845 65726854 43006461 ssA.ExitThread.C

7ffa00e0 74616572 72685465 00646165 61657243 reateThread.Crea

7ffa00f0 69506574 50006570 4e6b6565 64656d61 tePipe.PeekNamed

<lights and sirens go off>

Hacker Defender??? What the heck is that? (I’ll let you go dig around the web for that thing and remember to read the disclaimer on my blog before you do.)

So this is the thread waiting in the usermode debugger at the time the kernel dump was taken:

       THREAD 8477f020 Cid f54.fd8 Teb: 7fee1000 Win32Thread: a217aea8 WAIT: (WrLpcReply) KernelMode Non-Alertable
SuspendCount 1
8477f208 Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00034466:
Pending LPC Reply Message:
e2602041: [40000000,01000000]
Not impersonating
Owning Process ff629400
Wait Start TickCount 244909 Elapsed Ticks: 1690
Context Switch Count 1200 LargeStack
UserTime 0:00:00.0004
KernelTime 0:00:00.0006
Start Address KERNEL32!BaseThreadStartThunk (0x7c574333)
Win32 Start Address MDBTASK!MdbTaskPoolThread (0x61bd13e2)
Stack Init bd412000 Current bd41169c Base bd412000 Limit bd40f000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0 DecrementCount 0

 

        ChildEBP RetAddr Args to Child
bd4116b4 8042c413 8477f020 8477f1d8 8477f208 nt!KiSwapThread+0x1b1
bd4116dc 80434085 8477f208 00000011 00000000 nt!KeWaitForSingleObject+0x1a3
bd41170c 804c8113 e2602040 e2602040 bd411740 nt!LpcpRequestWaitReplyPort+0x52f
bd411720 80526c43 e2602040 bd4118c0 bd411740 nt!LpcRequestWaitReplyPort+0x13
bd4118a0 80526cf0 bd4118c0 e2602040 00000001 nt!DbgkpSendApiMessage+0x43
bd411938 8042f24a bd411d10 00000001 00000000 nt!DbgkForwardException+0x78
bd411cf4 80467735 bd411d10 00000000 bd411d64 nt!KiDispatchException+0x172
bd411d5c 804676d0 00000000 00000000 00000000 nt!CommonDispatchException+0x4d
bd411d64 00000000 00000000 00000000 00000000 nt!KiUnexpectedInterruptTail+0x1e1

 

bd411cf4 80467735 nt!KiDispatchException(
struct _EXCEPTION_RECORD * ExceptionRecord = 0xbd411d10,
struct _KTRAP_FRAME * ExceptionFrame = 0x00000000,
struct _KTRAP_FRAME * TrapFrame = 0xbd411d64,
char PreviousMode = 1 '',
unsigned char FirstChance = 0x01 '')+0x172 

We can see this thread is hooked:

 

0: kd> .trap bd411d64
ErrCode = 00000000
eax=0e84cb90 ebx=80000000 ecx=0e84cbc0 edx=0e84cba8 esi=00000000 edi=00000716
eip=77f87cac esp=0e84cb50 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
ntdll!ZwCreateFile:
001b:77f87cac e9dccb0108 jmp 7ffa488d

 

Well, this isn’t the same code that usermode was showing us… recall this from the usermode debug at exactly the same address:

0:014> t
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=77f87cac esp=20b9cb50 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
ntdll!ZwCreateFile:
77f87cac b820000000 mov eax,0x20

So, this means Hacker Defender was also intercepting calls made by the debugger during the live debug to read process memory and showing what would be expected at that address, not what was really happening… In kernel mode we can clearly see the instruction at 7ffa488d that attempting to be executed.. This address is in the range of 7ffa0000-7ffa4000 that we determined was unreadable from usermode.

The jmp that actually occurred placed the thread’s instruction pointer (EIP) pointing into the region of memory that could not be accessed from usermode prior to transitioning to kernel mode and this is why the information store was crashing. So it appears this rootkit has a bug …

TOO FUNNY!!!!

Comments

  • Anonymous
    July 19, 2004
    Doesn't seem so funny to me? How is a typical user EVER going to figure this out?

  • Anonymous
    July 19, 2004
    I was referring to the fact the Rootkit had a bug. That is what I thought was ironic and humorous. Of course, if it wasn't for the bug, the customer would never know it was there. I do NOT consider Rootkits funny for customers...They are serious matter that Microsoft is devoting a lot of resources to negating them.

  • Anonymous
    July 20, 2004
    The comment has been removed

  • Anonymous
    July 21, 2004
    David,
    The memory address is random and wouldn't have a direct correlation to this problem. Are you able to get a user.dmp file when the store crashes? If so, upload it to ftp://ftppss.microsoft.com/incoming/mail in a zip file with your contact information and I'll take a look at it.

    Out of curosity, is there antivirus in the picture? Does REMOVING, not disabling it, correct the problem?

  • Anonymous
    July 22, 2004
    No userdump that we can find, he swears it is exiting silently, but I have a tough time believing that the store can still exit without generating something of an error.

    No AV disabling doesn't fix it, but I don't think he's tried removing it. I'll suggest it though...

    Thanks...

    David

  • Anonymous
    July 22, 2004
    Also, it's not E2K3, it's E2K, with sp3 and the post SP3 rollup.

    David

  • Anonymous
    August 11, 2004
    Rootkits are being used in increasing and alarming numbers. I worked with Jeremy briefly on this case and it is one of hundreds of cases my team has worked on in the last 2 years involving Hacker Defender which is the 'de facto' standard in rootkit technology being used by the miscreants to hide stuff from even the most clueful of admins.

    Jeremy is a seriously hard-core debugging ninja here in PSS and as you can see he was able to figure out what was going on but this was a learning exercise for him . . . my point is, this isn't just a problem for our customers who don't know about this stuff, it's a problem for Microsoft PSS . . . we have thousands of engineers across many regions and most of them have never heard of a rootkit and even if they have, they probably have NO clue how widespread they are and on how many systems they are getting installed on every day.

    We are doing our best to increase awareness of rootkits (I gave an internal presentation to our entire site recently) but its slow going.

    Rest assured though - if you think you've been hacked and you tell your support professional that - they SHOULD know how to get your case to the PSS Security team and we can help you determine whether you've been hacked or not through the use of our automated IR toolkit you can run on a live system. Obviously our IR toolkit wouldn't be worth very much if it didn't have some very sophisticated tools to be able to detect rootkits . . . so it does. :)

  • Anonymous
    April 21, 2006
    It could be overheating or malware.

  • Anonymous
    April 21, 2006
    PingBack from http://blog.3deurope.com/index.php/2006/04/21/ninja-debugging/

  • Anonymous
    November 26, 2007
    PingBack from http://feeds.maxblog.eu/item_1201475.html

  • Anonymous
    June 08, 2009
    PingBack from http://quickdietsite.info/story.php?id=1915