Udostępnij za pośrednictwem


YOUnique

Exchange as a development platform is a popular and intriguing notion and has led to several ISV's and enterprise development teams to leverage this capability to create some innovative and inspiring solutions. Many of these applications have been written to add functionality; others to restrict functionality; still some others to just keep track of what's going on. A large portion of these applications have one problem that my team hears about all the time, so I deemed it an appropriate topic for blogging. That problem is the idea of uniquely identifying a message over time.

MAPI is not SQL Server.
MAPI is not a record management application.
MAPI is not a compliance application.
MAPI is not a change management application.
MAPI is not a paper file cabinet.

MAPI (Messaging Application Programming Interface) is a blue print. It resembles a specification more than a programming interface, defining a messaging subsystem. Microsoft's implementation of MAPI is used as the method by which to interact with data stored in the JET store, where all the message data for Exchange is kept. Data is read and written to the store constantly. Exchange and Outlook are MAPI applications which manipulate the data in the MAPI store to suit their needs. Exchange stamps several properties on each message throughout the lifetime of the message, as does Outlook. Obviously, Outlook gives the end user the ability (and responsibility) to manipulate many of these properties in Outlook's UI. Microsoft exposes even more functionality and data through its various messaging APIs (CDO 1.21, Outlook Object Model, Exchange Web Services, and WebDAV, to name a few). You can also interact directly with MAPI using Extended MAPI from your unmanaged C++ application.

When developing an application using a database back-end such as Microsoft SQL Server, the best practice is to create a column called a Primary Key which will uniquely identify that row in that table, such as an EmployeeID or SSN. Sometimes requirements dictate that this Primary Key will actually be a collection of columns, such as FirstName plus LastName plus PhoneNumber; or OrderNumber plus OrderDate. Now imagine if there were several applications (not all of them developed by you) which had access to your data and could not only change the values of existing columns but also add or remove some of the columns that you put there. You can create some requirements and enforce some degree of regulation which could maintain a small degree of order to your database, but there's still the possibility that one application could put another in a precarious state.

To further deepen the complexity and risk, suppose your application was built on an assumption you made based on the previous behavior of the other applications in play. For example, ApplicationA used to never touch PropertyX, but now ApplicationA changes the value of PropertyX everytime it touches that row; or ApplicationB used to create a property called PropertyY but now they don't, or they still do but it looks completely different now. This is the state that many of our ISVs and partners find themselves in with regard to programming against MAPI.

Here is one of the more common real-life scenarios. You create a store event sink which detects new items in a folder and you track them in a database by writing the PR_ENTRYID into the database so that you can later go back into Exchange to retrieve the message. A month goes by and now your application wants to retrieve this particular message, except you get a MAPI_E_NOT_FOUND when you do a call to IMAPISession::OpenEntry. After some research, you discover that the entry ID has changed! How could this be so? Isn't the PR_ENTRYID the unique identifier of the message? Yes, it is. It is guaranteed to be unique within the store; but unique doesn't mean unchanging. Assignment of the ENTRYID is the responsibility of the store provider. When messages are moved to a new store, a new EntryID is created. Is it still the same message even if it's in a different store? More commonly, meeting updates in the calendar will cause the old appointment to be deleted and replaced with the one described by the update. This causes a new EntryID to be assigned to the newly created appointment.

So if I can't trust the PR_ENTRYID to always be the same, what can I use to uniquely identify the message? Again, "unique" is not the same as "unchanging." PR_ENTRYID is unique in the store – guaranteed; but it is not necessarily unchanging. There are several other properties which developers have used to try to uniquely and permanently identify a message. When doing backups and restores, sometimes some of these properties which usually will be unique can end up the same. How can you tell the difference? What uniquely identifies a message invariably is all of the properties taken together. If you want to tell two messages apart, compare each of the properties until you find a difference. Obviously start with the properties that are more likely to be unique (such as PR_ENTRYID or PR_MESSAGE_SUBMISSION_ID). Work your way through every property until you find one that differs.

There are several scenarios which many ISVs and developers have their own opinions on what "should be." Imagine the scenario where I address a message to two recipients and send it. Now imagine that the two recipients are in mailboxes on different servers, or different stores on the same server? Are the two copies the same message or are they separate messages? Before you answer, what if one recipient changes the subject of the message or removes an attachment or sets the priority flag or assigns a category? What if during delivery and content conversion, one of the messages ended up with more whitespace than the other in the body? What if I copied myself on the message? Are the message I receive and the message in my Sent Items folder different messages? What if I save a message to the Drafts folder and use it repeatedly as a template for sending other messages? Is each of those the same message? What if I right click a message and copy and paste it into the same folder. What is "same" and what is "different?" It's up to your app to decide that. Compare properties until you can establish sameness or difference.

Comparing properties solves the problem of uniquely identifying a message as the same or different than another message, but it leaves the problem of being able to store an identifier that can later be used to retrieve the message. Microsoft's answer to this has always been that if you need to uniquely identify a message and need to be able to find that message later whether it gets moved or not, is to stamp it with your own property. MAPI gives application developers the power to create their own custom properties on a message. If other applications don't know about your property, they can't change it. Then you just search for your item by that property and you're golden.

More Info:

Comments

  • Anonymous
    June 21, 2007
    MAPI is one of the most sophisticated set of APIs Read the full story here... http://blogs.msdn.com/pcreehan/archive/2007/06/19/wait-it-s-gotta-be-your-bull.asp

  • Anonymous
    November 15, 2007
    I haven't tested this in a while, but if I recall correctly, retrieving a message by a property is significantly slower than retrieving a message by store id/entry id, so much so that your proposed alternative of using properties to retrieve messages is not feasible, depending on your solution's performance requirements.

  • Anonymous
    December 07, 2007
    Jeff - I assume your concern is how to locate an item given a set of "unique" properties, correct? Well - for one, as much as your design allows you should be storing the entry ID. You'd only resort to other mechanisms when you find the entry ID doesn't work. If we assume the entry ID can't work, this is where you'd want to have built a search folder in advance. The search folder would have in it columns that match the properties you track, and would already be ordered in a way that makes it efficient for you to do a binary search. It's hard to be more specific without details of what your definitions of "unique" and "same" are and without details of what your goals are (and I'm not particularly interested in getting those details), but the general algorithm shouldn't change. There are a few benefits of setting up a search folder: 1 - You can set the folder up to monitor multiple folders at once. 2 - Once you set your columns and sort order, the store maintains the folder. The amortized cost of maintenance of such a folder is quite cheap. 3 - If you use FindRow with a search criteria compatible with your sort order, the store will do the binary search for you. If you anticipate multiple candidate matches, they'll be contiguous (if you sorted right), making final determination a matter of examining just a few rows of data.

  • Anonymous
    January 04, 2008
    Yes, the "stamping" method seems to be a solution, but it has many drawbacks: you need to modify the message, you need to define the new property (forms?), etc. It introduces more problems than solutions...