Getting Cross Domain Kerberos and Delegation working with SSIS Package

I started working on this issue started by way of a Twitter conversation between myself, Jorge Segarra (@sqlchicken) and Matt Masson (@mattmasson). I then found out that Jorge was working with a customer that had opened up a Support Case.  Also special thanks to Joey Seifert for letting me bounce some Active Directory questions off of him.

The issue we were troubleshooting was that when running an SSIS 2012/2014 Package from SQL Server A (Parent Domain) that connected to SQL Server B (Child Domain) it would fail with the dreaded “Login failed for user ‘NT AUTHORITY\ANONYMOUS’”

image
Illustration Credit: Svetlin Velinov

Whenever I see this error, I always start with configuration.  That is typically the cause.  Back in 2010 I wrote out my Kerberos Checklist for validating configuration.  Jorge was aware of this and had gone through it. He also mentioned that they had ran the Kerberos Configuration Manager and it didn’t find any issues. Although in this scenario, it wouldn’t have as it doesn’t yet support cross domain topologies.

I was able to reproduce the issue they were seeing in a local environment on my end.  Here is what it looked like.

image

So, I have two domains (battlestar.local & cylons.battlestar.local).  The SQL Server in the Parent Domain (battlestar.local) is using a Service account from the child domain (cylons.battelstar.local).  From a delegation standpoint, we are using full delegation.  I’ll touch on Constrained Delegation later on. To make sure that everyone understand what I mean by full delegation, with the CYLONS\sqlservice AD Object, I have the following setting:

image

How SSIS 2012 and later work

When you run a package that is hosted in the SSIS Catalog, it will cause a child process to get spawned from the SQL Service itself. This process is the ISServerExec.exe.

image

The other thing to note is that this process context is the context of the session that launched the package, not the SQL Server Process Context (service account).  Here you can see that the ISServerExec is running as BATTLESTAR\asaxton where as the SQL Service is running as CYLONS\sqlservice.

image

This is the case regardless of how you execute the package.  This could be through the SSMS GUI, via Stored Procedure or even by way of DTExec.  If you want it to run under the context of the SQL Service account, you can “fake it” by doing a runas like operation  on a process (Command Prompt, SSMS or SQL Agent Job security account).

I initially thought that this was the cause of the problem, however I later found that it is not.  While I haven’t dug fully into that, I believe this to be due to the way we are launching the child process. My guess is it has to do with handle inheritance in the child process.

The Single Hop

Before I even get into the SSIS package, I want to verify that the first leg of the journey is working.  To do this, I connected with Management Studio from a remote machine to my SQL Server in the Parent Domain.  I then used DMV’s to validated it I had connected via NTLM or Kerberos.  Here is the query I used. depending on where you are doing this from, you may want to include a predicate to exclude the session you are running this from, if you are trying to find out a different sessions.

select c.session_id, s.login_name, c.auth_scheme, c.net_transport, st.text
from sys.dm_exec_connections c
JOIN sys.dm_exec_sessions s ON c.session_id = s.session_id
JOIN sys.dm_exec_requests r ON c.session_id = r.session_id
cross apply sys.dm_exec_sql_text(r.sql_handle) as st

image

This showed NTLM which is not what we want. I had enabled Kerberos Event Logging, and I saw the following:

Log Name: System
Source: Microsoft-Windows-Security-Kerberos
Date: 6/26/2014 11:22:42 AM
Event ID: 3
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: DrBaltar.battlestar.local
Description:
A Kerberos error message was received:
on logon session
Client Time:
Server Time: 16:22:42.0000 6/26/2014 Z
Error Code: 0x7  KDC_ERR_S_PRINCIPAL_UNKNOWN
Extended Error:
Client Realm:
Client Name:
Server Realm: BATTLESTAR.LOCAL
Server Name: MSSQLSvc/captthrace.home:49375
Target Name: MSSQLSvc/captthrace.home:49375@BATTLESTAR.LOCAL

captthrace.home?  What is that?  This is because I’m working from home.  At the office I saw a different domain. If we do a Ping on Captthrace, we will see the same result. In this case, I have my machines multi homed.  This was picking up the external nic and not the internal which should have had an IP Address of 10.0.0.10.

image

We want this to resolve to captthrace.battlestar.local [10.0.0.10].  On my end I can do that multiple ways.  I could disable the external nic, add an entry in the HOSTS file.  In my case I decided to update the DNS Search suffix for the internal adapter.  Making sure that battlestar.local and cylons.battlestar.local were listed.  After doing that we get a properly result.

image

image

That looks better.  retrying the connection to my Parent SQL Server shows that we are connected with Kerberos now.

image

Now when I try a single hop to the Destination SQL Server, I was seeing Kerberos as well for a single hop.

The Double Hop

Now I wanted to run the package from the client machine off of the Parent SQL Server.  When I did that, I got the error.

image

Looking at the the Parent SQL Server I saw the same issue as we had on the client box.  So, I adjusted the DNS Suffixes on that machine as well. After that, the package connected successfully using Kerberos to the Destination server in the Child Domain.

image

image

Cross Domain SPN Lookups with Active Directory

One item I ran into the first time I was going through this was that I kept getting NTLM even though the name resolution was fine on the Parent SQL Server.  It was using an account from the Child Domain though which had the SPN for the server.  When Domains are within the same forest, the KDC should consult the GC (Global Catalog) and provide a referral if the account is in a different domain.  If the account is not in the same forest you would need to define Host Mapping for the account, unless you are using a forest trust.  Then you could define a Kerberos Forest Search Order.

What happened was that the Parent DC was not able to communicate with the Child DC.  I discovered this when I tried to force Domain Replication.  It errored out saying it couldn’t find the CYLONS domain.  This could also lead to a failure as the SPN may not be noticed from the GC Perspective if replication wasn’t working.  So, if you made a change in the Child Domain, the Parent Domain wouldn’t pick it up.

image

What about Constrained Delegation?

With the amount of work I do with SharePoint Integration, Constrained Delegation comes up a lot when we talk about Claims to Windows Tokens.  This may force your environment to use Constrained Delegation.  Before Windows 2012, this means that all Service Accounts and machines hosting the services all had to be in the same Domain.  You were really restricted to one domain.  Starting with Windows 2012, you can cross domain boundaries, but the configuration is different for Constrained Delegation from what it used to be.  It is modified via PowerShell commands.  If you want to read more about that, you can have a look at the following:

Kerberos Constrained Delegation Overview for Windows 2012
https://technet.microsoft.com/en-us/library/jj553400.aspx

How Windows Server 2012 Eases the Pain of Kerberos Constrained Delegation, Part 1
https://windowsitpro.com/security/how-windows-server-2012-eases-pain-kerberos-constrained-delegation-part-1

I did get this to work in my environment and will look to get some posts specific to how to get this to work in the future.

Takeaway – What’s in a name?

If you have verified your Kerberos configuration, be sure to validate your name resolution within your environment.  It may not be resolving to the proper names.  When we go to build out the SPN to use, we based on the DNS name that was resolved from the NETBIOS name. If DNS resolution isn’t working properly, then it can lead to all sorts of problems. 

I’ve learned the hard way, over time, that DNS and Active Directory really blend together.  If DNS has issues, then AD will more than likely have some issues. Hopefully this helps to at least show that it could be more than the normal Kerberos Configuration items that are causing an issue.  Be sure to check out DNS Forwarders, Network Configuration (including Networking Binding/Order and DNS Suffix if needed).

A Ping command should return the proper name, or an NSLookup.  If you have doubts, do an IPConfig /flushdns and try again.  Verify the DC’s can talk/replicate to each other.  As you can see from above, this should work for Full Delegation.  Constrained Delegation would work with some modifications.

Adam W. Saxton | Microsoft SQL Server Escalation Services
https://twitter.com/awsaxton

Comments

  • Anonymous
    November 02, 2014
    Awesome write-up, just the right level of details, brilliant!

  • Anonymous
    August 11, 2015
    Hey Adam, This post helped me figure out the unconstrained use case.  I'm struggling with the constrained use case.  Can you write a follow up? Thanks, Mark

  • Anonymous
    November 20, 2015
    I have an intermittent problem. I have a SQL SSIS job that move data from the SQL DB to a SharePoint farm. Kerberos is configured and is working well. We are moving small amounts of data every 15 minutes from the SQL db to SharePoint. On some occasions, maybe once every 2-3 days, we will get an error from the SSIS package that states: “The HTTP request is unauthorized with client authentication scheme ‘Ntlm’.  I would suggest it is a domain intermittent problem? Whats your take on it. thanks johan