Exercise 2 - Track User Mode Process Allocations
Heap allocations are made directly via Heap APIs (HeapAlloc, HeapRealloc, and C/C++ allocations such as new, alloc, realloc, calloc) and are serviced using three types of heaps:
Mainline NT Heap – Services allocation requests of sizes less than 64 KB.
Low Fragmentation Heap – Composed of sub-segments that service allocation requests of fixed size blocks.
VirtualAlloc – Services allocation requests of sizes greater than 64 KB.
VirtualAlloc is used for large dynamic memory allocations that are made directly via the VirtualAlloc API. The typical usage is usually for bitmaps or buffers. You can use VirtualAlloc to reserve a block of pages and then make additional calls to VirtualAlloc to commit individual pages from the reserved block. This enables a process to reserve a range of its virtual address space without consuming physical storage until it is needed.
There are two concepts to understand in this area:
Reserved memory: Reserves an address range for usage but does not acquire memory resources.
Committed memory: Ensures that either physical memory or page file space will be available if the addresses are referenced.
In this exercise, you will learn how to gather traces to investigate how a user mode process allocates memory.
The exercise focuses on a dummy test process called MemoryTestApp.exe that allocates memory through:
The VirtualAlloc API to commit large memory buffers.
The C++ new operator to instantiate small objects.
You can download MemoryTestApp.exe from here.
Step 1: Gather a virtualAlloc/heap trace using WPR
Large memory allocations are usually the ones that impact the footprint of a process and are serviced by the VirtualAlloc API. This is where all investigations should begin, but it is also possible that a process misbehaves with smaller allocations (e.g. memory leaks using new operator in C++, etc.). Heap tracing becomes useful when this situation happens.
Step 1.1: Prepare the system for heap tracing
Heap tracing should be considered optional and done when VirtualAlloc analysis does not provide any relevant explanation for a memory usage issue. Heap tracing tends to produce larger traces, and it is recommended to enable tracing only for the individual processes that you’re investigating.
Add the registry key for the process of interest (MemoryTestApp.exe in this case); heap tracing is then enabled for every subsequent process creation.
reg add "HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\MemoryTestApp.exe" /v TracingFlags /t REG_DWORD /d 1 /f
Step 1.2: Capture a trace using WPR
In this step, you’ll gather a trace using WPR that contains VirtualAlloc and Heap data.
Open WPR and modify the tracing configuration.
Select the VirtualAlloc and Heap providers.
Select general as the performance scenario.
Select general as the logging mode.
Click Start to start tracing.
Launch MemoryTestApp.exe, and wait for the process to terminate (it should take around 30 seconds).
Return to WPR, save the trace, and open it with Windows Performance Analyzer (WPA).
Open the Trace menu and select Configure symbols path.
- Specify the path of the symbol cache. For more information on symbols, see the Symbol Support page on MSDN.
Open the Trace menu and select Load symbols.
You now have a trace that contains all memory allocation patterns for the MemoryTestApp.exe process during its lifetime.
Step 2: Review VirtualAlloc dynamic allocations
The detailed VirtualAlloc data is exposed via the ‘VirtualAlloc Commit Lifetimes’ graph in WPA. The key columns of interest are the following:
Column | Description |
---|---|
Process | The name of the process that performs memory allocations through VirtualAlloc. |
Commit Stack | The call stack that shows the code path leading to memory being allocated. |
Commit Time | The timestamp of when memory was allocated. |
Decommit Time | The timestamp of when memory was freed. |
Impacting Size | The size of outstanding allocations or the size difference between the start and end of the selected time interval. This size adjusts based on the selected view port. The Impacting Size value will be zero if all memory allocated by a process is freed by the end of the visualized interval in WPA. |
Size | The cumulative sum of all allocation during the selected time interval. |
Follow these steps to analyze MemoryTestApp.exe
Find the VirtualAlloc Commit Lifetimes graph in the Memory category of the Graph Explorer.
Drag and drop the VirtualAlloc Commit Lifetimes onto the Analysis tab.
Organize the table to show these columns. Right-click on the column headers to add or remove columns.
Process
Impacting Type
Commit Stack
Commit Time and Decommit Time
Count
Impacting Size and Size
Find MemoryTestApp.exe in the process list.
Apply a filter to keep only MemoryTestApp.exe on the graph.
- Right-click, and select Filter to Selection.
Your analysis viewport should look similar to this:
In the preceding example, two values are of interest:
Size of 126 MB: This indicates that MemoryTestApp.exe allocated a total of 125 MB over the course of its lifespan. It represents the cumulative sum of all VirtualAlloc API calls made by the process and its dependencies.
Impacting Size of 0 MB: This indicates that all of the memory allocated by the process is freed by the end of the time interval being currently analyzed. The system didn’t suffer from an increase of its steady state memory usage.
Step 2.1: Analyze steady state memory usage
When investigating memory allocation, you should try to answer the question: “Why is the steady state memory usage growing for this scenario?” In the MemoryTestApp.exe example, you can see that it has about 10 MB of steady state memory allocated at the beginning, and then it increases to 20 MB halfway through.
To investigate this behavior, narrow the zoom to around the time interval when the sudden increase occurs in the middle of the trace.
Your viewport should look like this.
As you can see, the Impacting Size is now 10 MB. This means that, between the start and the end of the time interval being analyzed, there’s a 10 MB increase in steady state memory usage.
Sort by Impacting Size by clicking on the column header.
Expand the MemoryTestApp.exe row (in the Process column).
Expand the Impacting row (in the Impacting Type column).
Navigate through the process Commit Stack until you find the function that allocated 10 MB of memory.
In this example, the Main function of MemoryTestApp.exe allocates 10 MB of memory in the middle of the workload by directly calling VirtualAlloc. In the real world, the application developer should determine if the allocation is reasonable or if the code could be rearranged to minimize the steady state memory usage increase.
You can now unzoom the viewport in WPA.
Step 2.2: Analyze transient (or peak) memory usage
When investigating memory allocations, you should try to answer the question: “Why is there a transient peak in the memory usage for this part of the scenario?” Transient allocations cause spikes in memory usage, and can lead to fragmentation and push valuable content out of the system Standby cache when there’s memory pressure.
In the MemoryTest example, you can see that there are 10 different spikes of memory usage (of 10 MB) evenly scattered across the trace.
Narrow the zoom to the last four spikes, to focus on a smaller region of interest and reduce noise from non-relevant behaviors.
Your viewport should look like this:
Sort by Size by clicking on the column header.
Expand the MemoryTestApp.exe row (in the Process column).
Click on the Transient row (in the Impacting Type column).
- This should highlight in blue all the spikes of memory usage in the viewport.
Note the value of the different columns:
Count = 4: This indicates that four transient memory allocations were made during that time interval.
Impacting Size = 0 MB: This indicates that all four transient memory allocations were freed by the end of the time interval.
Size = 40 MB: This indicates that sum of all four transient memory allocations amount to 40 MB of memory.
Navigate through the process Commit Stack until you find the functions that allocated 40 MB of memory.
In this example, the Main function of MemoryTestApp.exe calls a function named Operation1, which in turn calls a function named ManipulateTemporaryBuffer. This ManipulateTemporaryBuffer function then directly calls VirtualAlloc four times, creating and freeing a 10 MB memory buffer every time. The buffers only last 100 ms each. The buffers' allocation and free times are represented by the Commit Time and Decommit Time columns.
In the real world, the application developer would determine if those short-lived transient temporary buffer allocations are necessary, or if they can be replaced by using a permanent memory buffer for the operation.
You can now unzoom the viewport in WPA.
Step 3: Review heap dynamic allocations
So far, the analysis has only focused on large memory allocations that are serviced by the VirtualAlloc API. The next step is to determine if there are issues with other small allocations made by the process, using the Heap data initially gathered.
The detailed Heap data is exposed via the “Heap Allocations” graph in WPA. The key columns of interest are the following:
Column | Description |
---|---|
Process | The name of the process that is performing memory allocation. |
Handle | The identifier of the Heap that is used to service the allocation. Heaps can be created, so there could be multiple heap handles for the process. |
Stack | The call stack that shows the code path that leads to memory being allocated. |
Alloc Time | The timestamp of when memory was allocated. |
Impacting Size | The size of outstanding allocations or the difference between the start and end of the selected viewport. This size adjusts based on the selected time interval. |
Size | The cumulative sum of all allocations/deallocations. |
Follow these steps to analyze MemoryTestApp.exe
Find the Heap Allocations graph in the Memory category of the Graph Explorer.
Drag and drop the Heap Allocations onto the Analysis tab.
Organize the table to show these columns:
Process
Handle
Impacting Type
Stack
AllocTime
Count
Impacting Size and Size
Find MemoryTestApp.exe in the process list.
Apply a filter to keep only MemoryTestApp.exe on the graph.
- Right-click and select Filter to Selection.
Your viewport should look like this:
In this example, you can see that one of the heaps is steadily increasing in size over time at a constant rate. There are 1200 memory allocations on that heap, accounting for 130 KB of used memory by the end of the interval.
Zoom in on a smaller interval (for example, 10 seconds) in the middle of the trace.
Expand the head Handle that shows the largest amount of allocations (as shown in the Impacting Size column).
Expand the Impacting type.
Navigate through the process Stack until you find the function that is responsible for allocating all this memory.
In this example, the Main function of MemoryTestApp.exe calls a function named InnerLoopOperation. This InnerLoopOperation function then allocates 40 bytes of memory 319 times through the C++ new operator. This memory remains allocated until the process is terminated.
In the real world, the application developer should then determine if this behavior implies a possible memory leak and fix the issue.
Step 4: Clean up the test system
Once the analysis is complete, you should clean up the registry to make sure that heap tracing is disabled for the process. Run this command on an elevated command prompt:
reg delete "HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\MemoryTestApp.exe" /v TracingFlags /f