Your First Phoenix Program: A Walkthrough of StaticGlobalDump

In this introduction to building tools with Phoenix I'm going to start with one of the simplest tools possible. What this tool does dumps the global/static variables in an image. And as it turns out, I've actually had customers ask for a tool that could do this. This is something you can do with existing tools, but the nice thing here is that you can do this now with a single tool across managed and unmanaged code.

 

I call the tool StaticGlobalDump. Lets walk through the code. At the end of this blog entry I give the source in full, and it works with the November Phoenix RDK, which is available for download.

 

In terms of requirements, it must be simple to use. It simply takes as input a PE file (either a DLL or EXE) and writes out to stdout. No fancy processing, just a straightforward use of Phoenix.

 

The RDK exposes a purely managed API. This allows us to use C#, C++/CLI, VB, or any other .NET language. I'll do most of my code examples in this blog either in C# or C++.

Things Covered in this Article

· Initializing Phoenix for a simple PE read scenario.

· Reading in a PE file.

· Loading the module symbol table from the PE file.

· Walking the symbol table.

· Finding basic type information associated with a symbol.

 

The Main Function

Let’s start with a look at Main(), which is given below. The code in bold has more user-defined functionality behind it, whereas the non-bold code calls directly into framework code (either the CRT, STL, CLR, or Phoenix).

 

Looking at the code, we see the first thing (code point 1) we do is to initialize the Phoenix targets. This will be the next function we look at, in more depth after Main, but in summary this call allows Phoenix to read and/or write x86 and MSIL binaries.

 

Code point 2 is done after the targets are initialized. Phx.Init is a static class in Phoenix used to initialize Phoenix. BeginInit is a method on this static class that initializes some of the key aspects of Phoenix such as memory management, the global unit, the global unit's symbol table, alias package, etc…

 

The EndInit lets Phoenix know that initialization is done. In this particular example there was no good reason to have a BeginInit and EndInit, as I did no particular work in between, but there are cases when you will do work in between. I'll talk about those functions of scenarios in a future blog.

 

Code point 3: We open the file that we passed as command-line arguments to the program. This is done with a call to Phx.PEModuleUnit.Open, which creates a new object of type PEModuleUnit. The PEModuleUnit holds the representation of PE file in an object. At this point the instructions have not been raised into Phoenix Low-level Intermediate Representation (LIR). For this tool, all we care about are the names of variables, and not instructions, so we will not raise them at all. But that too will come in the future..

 

Code point 4: We instruct the PEModuleUnit to load the global symbols for the PE file. This reads in the types and symbols into the PEModuleUnit. It reads from both the PDB and metadata (if it's a managed image).

 

Code point 5: We call PrintStaticGlobals. This is the function that I've written which will take the symbol table for the module and print all of the static and global variables defined for this module.

 

So from the 50,000 foot view, that's all there is too it. We'll dive into the user written functions now, but as you can now tell, this is conceptually straightforward.

 

public static int Main(String[] argv) {

if (argv.Length != 1) {

Phx.Output.WriteLine(

"Usage: StaticGlobalDump <input-image-name>\n");

return 1;

}

 

// 1

StaticGlobalDump.InitializeTargets();

 

// 2

Phx.Init.BeginInit();

Phx.Init.EndInit("PHX|_PHX_", argv);

 

// 3

Phx.PEModuleUnit module = Phx.PEModuleUnit.Open(argv[0]);

 

// 4

module.LoadGlobalSyms();

 

// 5

StaticGlobalDump.PrintStaticGlobals(module.SymTable);

 

return 0;

}

 

The InitializeTargets Function

InitializeTargets initializes four objects, broken into two categories. The two categories are Arch and Runtime. Arch specifies the processor architecture that we are initializing for Phoenix to operate on. In this case we have picked two architectures: MSIL and x86. We treat MSIL as an architecture, as it is a completely different instruction set architecture.

 

The next thing we configure is the runtime that Phoenix targets. Phoenix can target either the x86Runtime or the msilRuntime. The difference between Arch and Runtime is that Arch focuses on characteristics of the ISA, i.e., differences in opcodes, registers, conditional codes, etc… The Runtime component is focused on runtime differences of the architectures, most notably exception handling.

This is largely boilerplate code that you'll simply cut and paste for a good number of applications of this sort. In fact that’s precisely what I did for this example (it’s from a sample in the RDK).

 

static void InitializeTargets() {

Phx.Targets.Archs.Arch msilArch =

Phx.Targets.Archs.MSIL.Arch.New();

Phx.GlobalData.RegisterTargetArch(msilArch);

 

Phx.Targets.Archs.Arch x86Arch =

Phx.Targets.Archs.X86.Arch.New();

Phx.GlobalData.RegisterTargetArch(x86Arch);

 

Phx.Targets.Runtimes.Runtime msilRuntime =

Phx.Targets.Runtimes.VCCRT.Win32.MSIL.Runtime.New(msilArch);

Phx.GlobalData.RegisterTargetRuntime(msilRuntime);

 

Phx.Targets.Runtimes.Runtime x86Runtime =

Phx.Targets.Runtimes.VCCRT.Win32.X86.Runtime.New(x86Arch);

Phx.GlobalData.RegisterTargetRuntime(x86Runtime);

}

 

The PrintStaticGlobals Function

PrintStaticGlobals is where the real action happens. Understanding this part is probably the most important part of this entry. This function takes a symbol table for the PEModuleUnit as its sole argument, and from that is able to print out the globals and statics. This is a user-leaf function in that it doesn't call any other code that is written by the user (although it calls some BCL and Phoenix routines).

Code point 6: This is where we create size and initialize it to 0. size is the variable that holds the number of globals and statics we’ve encountered thus far. We will use this variable to dump the total number encountered at the end of the function.

 

Code point 7: This is where we iterate over all of the symbols in the table. Later I’ll go into more detail as to how the symbols package works, but for now what you need to know is that each table has a set of maps, where each map maps from some characteristic to a symbol in the table. A characteristic can be the name, or a GUID, or the RVA of a symbol. In this case we use the LocalId map as it has all the symbols in the table in its map (a map can have a subset of the symbols in the table).

Code point 8: At code point 8 we determine if the given symbol we are looking at is really a global or a static. There are a lot of “Is*” properties on symbols. The properties that we care about are if the symbol is a global variable (IsGlobalVarSym) or a static field (StaticFieldSym). In native code, static fields are represented as GlobalVarSyms, but in managed code static fields are actually represented as StaticFieldSym.

You may notice that we also check to make sure that symbol is not a reference to a symbol. The symbol table has a list of all definitions and references to a symbol. In this case we want to only dump definitions, but you can imagine for other tools dumping references might be what you want (in fact you could use the symbol references to find out who references the globals and static symbols by name, but it only will give you named references and not aliased references).

Last point on this line of code is that we do use “!sym.IsRef”. Now, you’re probably wondering why I didn’t use “sym.IsDef” instead. In theory that would work, but the current RDK has an issue where StaticFieldSyms aren’t correctly setting IsDef to true in this case. It’s a pre-alpha SDK J

Code point 9: The if-statement checks that the global or static variable name doesn’t begin with “__” as that is reserved for compiler use. If you look at all the global symbols in a typical native application you will see quite a few symbols that begin with “__”. These are either compiler reserved uses or uses in some standard header, but your code should not have global variables with such a name. This program assumes that you haven’t started your globals with “__”.

If it passes that “__” test then it simply writes out the name of the symbol. All symbols have a NameString property, which returns the name of the symbol. We print out the name and increment our size variable.

Code point 10: It would also be handy to print the type of each global and static variable. User-defined global and static variables, of course, have a type, but the PE reader may not be able to deduce types for other global symbols in the symbol table. For this reason we need to check to make sure that a given symbol has a type before attempting to generate the string corresponding to that type – otherwise we may have an access violation when we try to call a method on a type that doesn’t exist. If there is no type for this symbol then we simply insert a carriage return and move on to the next symbol.

static void PrintStaticGlobals(Phx.Syms.Table symTable)

{

    // 6

    int size = 0;

 

    // 7

    foreach(Phx.Syms.Sym sym in symTable.LocalIdMap.InternalMap) {

    // 8

        if ((sym.IsStaticFieldSym || sym.IsGlobalVarSym)

         && !sym.IsRef)

    // 9

            if (!sym.NameString.StartsWith("__"))

            {

                size++;

                Console.Write("{0}", sym.NameString);

    // 10

                if (sym.Type != null)

                    Console.WriteLine(" [{0}]",

   sym.Type.ToString());

                else

                    Console.WriteLine();

            }

    }

    Console.WriteLine("Number of Globals: {0}", size);

  } 

 

The only thing left is to make sure that you use the correct references for the application. The necessary references are: arch-msil.dll, arch-x86.dll, phx.dll, runtime-vccrt-win32-msil, and runtime-vccrt-win32-x86. One requirement of the RDK is that native code must be compiled using VC2005 using the /Zi switch and linked with the /PROFILE switch.

That’s it, we’re done! The program dumps all globals and statics. Now try it on your favorite managed or native application or DLL.

 

//------------------------------------------------------------------------------

//

// Copyright (C) Microsoft Corporation. All Rights Reserved.

//

// Description:

//

// StaticGlobalDump prints out

//

// Unmanaged input files must be compiled with -Zi and linked with /PROFILE.

//

// Usage:

// StaticGlobalDump <input-file1>

//

//------------------------------------------------------------------------------

 

using System;

 

public class StaticGlobalDump {

public static int Main(String[] argv) {

// Initialize the infrastructure.

StaticGlobalDump.InitializeTargets();

Phx.Init.BeginInit();

 

// Simple usage check

if (argv.Length != 1) {

Phx.Output.WriteLine(

"Usage: StaticGlobalDump <input-image-name>\n");

return 1;

}

 

Phx.Init.EndInit("PHX|_PHX_", argv);

 

// Open the module.

Phx.PEModuleUnit module = Phx.PEModuleUnit.Open(argv[0]);

 

// Read symbols in.

module.LoadGlobalSyms();

PrintStaticGlobals(module.SymTable);

return 0;

}

 

static void PrintStaticGlobals(Phx.Syms.Table symTable) {

 

int size = 0;

Phx.Syms.Sym sym;

Phx.Collections.SymIConstIterator symIter = symTable.NameMap.SymIterator;

for (symIter.MoveNext(); symIter.MoveNext(); ) {

sym = (Phx.Syms.Sym)symIter.Current;

if (sym.IsGlobalVarSym && sym.IsDef)

if (!sym.NameString.StartsWith("__")) {

size++;

Console.Write("{0}", sym.NameString);

if (sym.Type != null)

Console.WriteLine(" [{0}]", sym.Type.ToString());

else

Console.WriteLine();

}

}

Console.WriteLine("Number of Globals: {0}", size);

}

 

static void InitializeTargets() {

Phx.Targets.Archs.Arch msilArch =

Phx.Targets.Archs.MSIL.Arch.New();

Phx.GlobalData.RegisterTargetArch(msilArch);

 

Phx.Targets.Archs.Arch x86Arch =

Phx.Targets.Archs.X86.Arch.New();

Phx.GlobalData.RegisterTargetArch(x86Arch);

 

Phx.Targets.Runtimes.Runtime msilRuntime =

Phx.Targets.Runtimes.VCCRT.Win32.X86.Runtime.New(msilArch);

Phx.GlobalData.RegisterTargetRuntime(msilRuntime);

 

Phx.Targets.Runtimes.Runtime x86Runtime =

Phx.Targets.Runtimes.VCCRT.Win32.X86.Runtime.New(x86Arch);

Phx.GlobalData.RegisterTargetRuntime(x86Runtime);

}

}

Comments

  • Anonymous
    January 07, 2006
    Great tutorials on Phoenix. I have been reading your posts on Phoenix.

    It appeared to me that Phoenix is basically designed as a code generation platform, with additional things like program analysis and transformation. My question is whether it is suitable for program synthesis as well?

    My idea is that we can have a combined specification of systems, a graphical specification of the dynamic behaviors of agents and static data variables declarations and local computation implementations in c#. We may then synthesize prototype from the combined specification. I believe making use of the Phoenix framework has advantages (easy cooperation with visual studio, the program analysis facility and etc).

    Please comment if such an approach would be approxiate in the Phoenix framework?

    yours,
    Sun Jun
    sunj@comp.nus.edu.sg
  • Anonymous
    January 12, 2006
    Could Phoenix framework work with Visual Studio 2005 Express Edition?

  • Anonymous
    January 13, 2006
    Hi Paul, it should work with the Express Edition. If something doesn't work with it, let me know.
  • Anonymous
    May 23, 2006
    The comment has been removed
  • Anonymous
    February 04, 2008
    I am unable to get the first Phoenix code started. It fails in the InitializeTargets() call in the Phx.Targets.Architectures.X86.Architecture.New() call with a FileIO exception. I am unable to figure this out - any pointers would be great