Share via


Troubleshooting Large Scale Workflow Failures

Fabrikam has a policy that ensures all groups have unique aliases.
A database in an external system keeps track of all the aliases that have been assigned to groups in Fabrikam.
Whenever a new group is created, or has its alias modified, the group’s alias is verified against the external system for uniqueness.
After the group is created, the new alias for the group is added to the external system.

In the middle of one typical weekday, the FIM Services servicing group management requests lose connectivity to the external system tracking the group aliases.
This connectivity issue results in several outcomes:

  1. All authorization workflows attempting to validate a group alias encounter an unhandled exception and terminate as a result.
    This results in all group creation, and alias modification, requests currently in the authorizing state being marked as “denied”.
  2. All action workflows attempting to reserve a new alias in the external system encounter an unhandled exception and terminate.
    This results in all group creation, and alias modification requests currently in the “Post Processing” state being marked as “PostProcessingError”.

Ichiro, the administrator for the FIM Service, does not immediately become aware of the connectivity issue.
By the time the connectivity issue is resolved, Ichiro realizes there may be a large number of requests affected by the issue.
Ichiro first identifies the requests that were denied because of the connectivity issue.

  1. He submits a query, in the FIM Portal, to identify all users whose requests that were denied because of a termination in the alias validation authorization workflow.
  2.  Ichiro’s query looks as follows: /Request[CreatedTime >= ‘X’ and AuthorizationWorkflowInstance = /WorkflowInstance[WorkflowDefinition = ‘Y’ and WorkflowStatus = ‘Terminated’]]/Creator
    1.  ‘X’ is the approximate DateTime when the connectivity issue first appeared.
    2.  ‘Y’ is the ObjectID of the workflow definition corresponding to the alias validation authorization workflow.

From his previous step, Ichiro discovers that a large number of users were affected by the connectivity issue.
Since there is no way for Ichiro to restart these failed authorization workflows, these users need to resubmit their requests.
Ichiro decides that he wants to notify the users that they may need to resubmit their requests, due to the temporary connectivity issue.
In order to notify the users, Ichiro needs the ability to either extract the list of users from the FIM Portal and paste them as the recipients of an email message in Outlook, or he needs to create a new set with the users as members so that he can create a new retroactive policy to send an email notification to these users.

  1. There is no way for Ichiro to create a dynamic set with the target users as members, because the xpath query needed to identify the users is not supported by sets in FIM.
  2. Ichiro must get a list of the ObjectIDs of the target users and add them in bulk as explicit members to a set.
    The FIM Portal does not support this ability, so Ichiro must rely on the PowerShell cmdlets for FIM to accomplish this.
    Once Ichiro has created the set containing all the affected users he wants to notify, he creates the email notification workflow he wants to apply to them, and a new MPR that run the workflow.

Ichiro’s next step is to identify the requests whose alias reservation action workflow failed because of the connectivity issue.

  1. He submits a query, in the FIM Portal, to identify groups whose aliases were not reserved because of a termination in the alias reservation action workflow.
  2. Ichiro’s query looks as follows: /Request[CreatedTime >= ‘X’ and ActionWorkflowInstance = /WorkflowInstance[WorkflowDefinition = ‘Y’ and WorkflowStatus = ‘Terminated’]]/Target
    •  ‘X’ is the approximate DateTime when the connectivity issue first appeared.
    •  ‘Y’ is the ObjectID of the workflow definition corresponding to the alias reservation action workflow.

From his previous step, Ichiro discovers that a large number of groups have not had their alias reserved.
The alias reservation workflow does not need any information from the Request that triggered it, since it reads the alias to reserve from the group itself.
Ichiro uses the “run on policy update” feature to retroactively apply a policy that reserves the alias for all the groups identified in his previous step.

  1.  Ichiro creates a new static set with all the affected groups as members.
    •  There is no way for Ichiro to create a dynamic set with the target users as members, because the xpath query needed to identify the users is not supported by sets in FIM.
    •  Ichiro must get a list of the ObjectIDs of the target groups and add them in bulk as explicit members to a set. The FIM Portal does not support this ability, so Ichiro must rely on the PowerShell cmdlets for FIM to accomplish this.
  2.  Ichiro creates a new MPR that applies the alias reservation action workflow to all members of the set he created.

To re-run the failed action workflows, Ichiro has developed the following script.

Community Resources

 

Note

To provide feedback about this article, create a post on the FIM TechNet Forum.