Learning ILM 2007
The Identity Lifecycle Manager (ILM) Synchronization Engine - formerly known as Microsoft Identity Integration Server (MIIS) - is enterprise software that touches many parts of your organization’s infrastructure. Similarly, the Certificate Lifecycle Manager (CLM) component of ILM touches critical parts of the security infrastructure. The large variety of deployment scenarios and use cases makes learning it “by running setup and playing with it” a very risky proposition and one not to be contemplated in a production environment.
I wrote this article to share with you what I’ve learned about practical ways to get started without deleting all the data in your enterprise directory.
Introduction
I’ve learned lots of different things in my years on the Identity Management team. I’ve had the good fortune to learn about real problems in production deployments by working with the support team and hearing essential details about things that went wrong. As a student of information technology, I’ve learned that the more we prepare, the more we test, the more we learn, the better we will be able to understand system behaviors. Implementing system behaviors that meet business requirements is why we’re here. Sometimes, in our need to meet those requirements as quickly as possible, we don’t have the opportunity to learn everything we will need to know later. This is an expensive tradeoff, and one I have seen many times. Over the last several years, I’ve accumulated a list of things I wish I knew before being hired or before talking to a customer about what they should or should not do. I have been lucky enough to work with extremely talented people documenting and learning about this technology.
We added a feature called a “deletion threshold” to MIIS 2003 in the first service pack, which was released in 2004. This feature stops a management agent run when it exports more than the specified number of object deletions to a connected data source. The reason the feature was implemented is that immediately after the product was released in 2003, in their zeal to learn the product, some developers installed it in production before confirming how it worked in a sandbox. As essentially a "version 1 product", there was an unfortunate combination of a lack of well-documented best practices (always thoroughly test in a lab) and some unexpected behaviors with different versions of connected data sources (one management agent in particular depended on a database driver that reported end of data when the connection was lost). The end result was an import run that caused deletion of all accounts in different enterprise directories for some customers. The lesson to be learned from that experience is that developing in production without a very thorough understanding of how the system works can be extremely dangerous! In every single case of this that I saw, the behavior could have been determined in a lab. This was compounded by the fact that the behavior appeared in the database driver for the connected data source after we shipped, so we couldn’t test it before customers. Once we recognized the pattern of “dropped-connection on import ==> end of rowset ==> obsoletion of all objects not seen in import” we were able to test it. We realized that there was no way we could determine the difference between the end of the import record set and a connection outage. We had no control of that piece. What could we do? We determined that restricting the number of object deletions at export time was the last possible time to prevent the catastrophic deletion scenario. The deletion threshold was born. On one hand, we had already documented in the WMI reference that scripting MA runs would give users a chance to check what is going to be exported, and if there were deletions, the export should NOT run. On the other hand, we knew that we couldn’t yet count on the documentation that was most critical rising to the top of the heap, and it was a new product. Now that a few years have passed, and we’ve worked on lots of different types of documentation, here’s a list of do’s and don’ts that relies heavily on the documentation to help you learn what I believe to be a safe approach to implementing your solution.
Dos and Don’ts
Over time, I've learned many things through observation of real customer cases. In this section you will find some of the things I have learned through the errors of others.
Dos
- Do have a strong foundation in Public Key Infrastructure (PKI) certificate template management and Kerberos web application authentication if you will be using CLM. This still trips me up. When I think I am positive something is related to a bug in the product, it almost always goes away when I turn on "basic auth" on the portal.
- Do familiarize yourself with the synchronization scenarios the product is intended to cover. Download and read the scenario documents (just read them – don’t be tempted to do the walk throughs yet). Learn what your organization is using this product to accomplish. Is it one of the following or something other than these?
- HR-driven data synchronization from a database to many data sources
- Aggregation from multiple data sources
- Account provisioning to Active Directory
- Group synchronization
- Password synchronization
- Do know where to find CLM documentation by browsing the CLM Technical Library.
- Do know where to find synchronization documentation by using the documentation roadmap .
- Do know what platforms you need to manage:
- Are the platforms in scope supported out of the box?
- Do you have to write a management agent?
- Do learn how the different platforms in your organization are expected to feed each other – or are they expected to feed each other at all? Build your system dataflow model first.
- Do read the MIIS 2003 Design and Planning Collection:
- Use the worksheets.
- Understand how the best practices and planning strategies already documented for the product apply to your case.
- Do consult with the community if you become stuck in your learning process.
- Do familiarize yourself with the MIIS 2003 Technical Library. The home page has a “what are you looking for” listing that provides links to detailed sections matching your needs.
- Do learn how different data stewardship agreements affect what operations you will be able to perform with your configuration – for example:
- Do you have the appropriate permissions in each of the connected directories for the operations you will need to perform?
- Have you fully tested all of the operations you expect to perform?
- Have you determined all of the cases in which you would want to delete data and figured out how to test that in a lab before attempting it in production?
Don'ts
- Don’t learn how the product works in your production environment before you have tested it in an isolated development lab and a production scale lab.
- Don’t ignore best practices.
- Don’t omit planning.
- Don’t experiment without an understanding of your goals. This will lead to frustration and a lack of a concrete plan for solution deployment.
- Don’t make assumptions about how the product works. Build your plan based on knowledge of the documented features, best practices and the feature set as you’ve observed it in your own testing.
- Don’t configure export run profiles in the first phases.
- Don’t run the export run profiles, unless you’ve thoroughly sampled the outbound synchronization and provisioning statistics of the synchronization run profiles.
Learn by architecting
By architecting, you will have a better understanding of how data will flow through your system. For example, if you have developed your architecture and you plan to run Password Change Notification Service (PCNS), you will have understood what systems need to have connectors before you can flow passwords. Setting your Service Principal Name (SPN) for the service account will be the least of your concerns. Based on certain discussions in the forum, it is evident that trying to solve the high-cost problem of password management tempts some organizations to deploy PCNS without understanding the foundation on which this feature is built. A solid understanding of synchronization is absolutely essential prior to attempting to install password filters on a domain controller.
For CLM deployments, the isolated testing cannot be skipped, either. Production certificate authorities can be taken offline and the entire PKI infrastructure can be placed at risk by deploying CLM without an understanding of how it works with the CA infrastructure.
In order to build effective synchronization solution architecture, you will want to know not just the information in the Design and Planning Collection referred to above, but you will also want to know about the Design Concepts. When you have a solid foundation, look at the synchronization technical reference.
Learn by doing (once you understand the architecture)
In order to put your conceptual and architectural learning into practice, follow these guidelines:
- Build a small lab. A virtual PC is a fantastic development and learning environment since you can’t harm production resources if you put it on a loopback network.
- Now that you’ve read the background information about how the technology works, start at the getting started documents. At this point, you’re truly getting started. Reading is a pre-requisite to understanding what the getting started documents are meant to convey.
- Walk through the scenarios referred to above.
Summary
The development of features in a product like ILM is based on customer requirements. Features like the deletion threshold in MIIS 2003 came to exist to save you from a great deal of pain that can result from learning in production, which I have seen in real life customer deployments. However, the deletion threshold is not a protection you ever want to be in the position to need—you should know how the system will behave before you put it in production. Also, make sure you understand how the solutions you’re deploying are going to work by understanding the platforms and features you are using. If you were to use a particular database for your HR system for example, and if during an import the MA was told it was an end of file, instead of a dropped connection, any record in the database that didn’t make it before the import would be deleted. If this were to happen during a full import, any record that wasn't seen would cause deprovisioning throughout the entire system, and all of the connected objects would also be deleted, assuming you had your deprovisioning rules set that way. I surely don't have to tell you that would be bad. This article provides some references to essential resources that can accelerate your learning without putting your infrastructure at risk in this fashion.
Additional Information
For more information, please see the Microsoft Identity Integration Server 2003 Technical Reference.
AhmadAW