Partilhar via


CA cluster and NDES

This is a brief of PKI solution intended to issue certificates for network devices. The initial requirement was to make issuing layer fully centralized and highly available. Thus it has fallen in some contradiction with normal placement of NDES feature. (List of standard AD CS features is here https://technet.microsoft.com/en-us/library/cc731564.aspx)

The decision was to place NDES feature on the separate box and to target it at the issuing CA cluster. Both components are not complex looked at one by one. Their deployment is documented well (here https://www.microsoft.com/download/en/confirmation.aspx?id=331 for the cluster and here https://www.microsoft.com/download/en/details.aspx?displaylang=en&id=1607 for NDES); most of possible consequences are described in knowledge bases and are known to the community.

All together looks like quite complex solution, and even it has not been yet tested by the developers. Usual supportability practice is that only tested configuration is supported. Nevertheless this time developers ensured us there are no known limitations of targeting SCEP at clustered CA, comparing to the standalone CA server.

So, the solution was adopted by the customer (we met all the requirements) and by the support service (we followed all the recommendations).

Unfortunately, Microsoft SCEP facility could not be used as a clustered or balanced solution, mostly due to the “enrollment password” encryption for initial certificate request. Password encryption relies in Windows onto DPAPI (https://msdn.microsoft.com/en-us/library/ms995355.aspx) and uses user profile data and system-dependent data. It appeared impossible (and unsupported, of course) to synchronize the password across machine boundaries. Thus customer was satisfied with two or more independent NDES servers of the same configuration, and transferred resolution of NDES availability issues to the network device operations.

What was wrong for the interaction: SCEP service, which is web application pool in behind, sporadically lost connectivity to the clustered CA. Usually it happened after CA failover, but later on connectivity might become restored “by itself”, and yet later on lost at once.

Network sniffer brought nothing or almost nothing. We ensured, yes, there is no connectivity to RPC service or DCOM service, but with no idea about the root cause. Fortunately, web application pool settings in IIS 7 (and previous as well) allow to recycle the pool and thus to avoid impact of “sporadically lost connectivity”. What has been done see here https://technet.microsoft.com/en-us/library/cc733120(WS.10).aspx and https://technet.microsoft.com/en-us/library/cc771956(WS.10).aspx .

I didn’t find any KB article describing the approach explained here. But this is hopefully seldom – i.e. complex – situation.

Comments

  • Anonymous
    January 01, 2003
    Yes, you are right; to schedule the task out it was exactly the same what we first proposed to the customer. Unfortunately they did not consider  this solution as a smart one. To be honest, the word "crutch" was alouded... It is really difficult customer... Finally straightdorward IIS settings were saving. Sometimes, Windows internals are really surprisingly helpful.