Why You Need To Understand NAT When Setting Up Lync or OCS
Written by Joe Lefort, Senior Microsoft Premier Field Engineer.
Microsoft Lync and OCS are fairly easy to set up if all you want to do is have internal IM/presence and conferencing. The moment you decide to open your SIP domain up to the outside world, things take a change for the complex. Why you might ask? Simply put, NAT (Network Address Translation) comes into play.
There are lots of things to think about on the edge, but one question I hear with way too much regularity (partly because our documentation is pretty weak in this area), is 'what is the big deal about NAT, and why should I care? '. That is going to be the topic of this article.
First off, if you need something to help you sleep, here are some definitions and their RFCs:
- NAT - Network Address Translation (RFC 2663)
- ICE - Interactive Connectivity Establishment (RFC 5245)
Second, I will answer the "why should you care? " It's actually pretty simple. Most likely your company does not have enough public IP version 4 addresses to enable every computer on the internal network to get out to the Internet without NAT. ICE provides a mechanism for NATed hosts on the inside to be able to establish media connectivity to hosts on the Internet. Without ICE, your internal hosts might be able to 'see' those Internet based hosts, but you would never know, because they would not be able to communicate back to you. At the end of the day, the bits that matter to the end user are the media that gets sent after the initial communications.
What does NAT do that matters so much here?
Short form: Without ICE, the SDP content in a SIP INVITE is not parsed by the NAT and thus, the internal IP associated to the inviting host is not changed to reflect the NATed change. No change in SDP means unroutable responses that are not sent back to the requestor.
Long form: SIP (Session Initiation Protocol RFC 3261) is concerned with one thing: the establishment of a SIP dialog between two hosts. SIP doesn't really care about any media (Voice, Desktop Share, App Share, Whiteboard, etc.) that might be required. That is 'just media' from SIP's perspective. SIP is however nice enough to allow for the media setup information to be carried in SIP data. That media setup information is SDP (Session Description Protocol - RFC 4566).
Among its other media setup duties, SDP defines the IP address and port (called a socket) that a particular host is willing to use for receiving media (i.e. 10.16.12.125:50016). The socket definition is buried deep in the bowels of the SDP XML that is carried in the SIP payload.
When you don't have a NAT device between SIP hosts, this is not a big deal. Both hosts can simply resolve the other side's desired media socket, and go ahead and start sending/receiving.
Throw a NAT in there and the SDP content becomes a big deal. Remember that a NAT device 'changes' socket from one side of the NAT device to the other. In other words, the socket that is internally identified as 10.16.12.125:50019 on one side of the NAT, might be 25.25.25.25:1234 on the other. The NAT is 'trained' to look at the IP address and port in the TCP headers of a packet, but it cares not at all for any other references to that socket elsewhere in the packet. In other words, the header gets changed, but the payload contents (in this case, the SDP XML) does not. This means that the SDP gets to the destination unchanged, with a socket that are probably not resolvable from the destination. As you can guess, much unhappiness ensues.
So you might be asking 'why not have the NAT do deeper inspection than just the TCP headers? ' In theory, that sounds like a painless fix. Nope. Not so much. If this were employed, the NAT would now have to examine EVERY packet (including the payload) that passes through it. Instead of examining only the first few bytes of a packet, now the NAT would need to examine the whole thing, parsing specifically for the socket that is being changed. If you did this, you would watch your NAT fall to its knees due to the exponential increase in work expected of it.
So, how does ICE work around this NAT badness?
ICE provides a mechanism to offload the deep inspection (described above for the NAT) to the hosts interested in sending and receiving media. What happens is this: during the initial SIP dialog setup, a host will send a list of possible IP addresses that it might be contacted on to the other host. This is called the ICE candidates list (the candidates list is built up through an additional process I’ll save for another article).
When the other host receives the list of possible candidates, it attempts to connect to all of them. Some might work, some will fail. Among those that work, there is an order of preference. The most preferred media candidate(s) wins and become(s) the (protocol,) IP address and port combination that is used for subsequent media.
In summary, ICE allows media traffic to traverse NAT devices, which allows you to get that audio/video session with your (fill in the significant other blank here). Without ICE, this simply would not happen.
In other articles I may talk about the ICE mechanisms (STUN and TURN) for those who really want to go deep. Post a comment and let me know. Better yet, ask a question. If I get enough questions (or if I think it is a cool enough topic to kill an evening...) I might just answer your questions instead of my customers.
Cheers!
Comments
Anonymous
December 04, 2013
ExcellentAnonymous
December 05, 2013
great post ! I'm on my way to read your next acticle :]Anonymous
May 29, 2014
Joe, This is an amazing article.
what many people would want to know is how does the Candidate pairs get selected/exchanged.
Say there are 6 candidates in the Caller's SDP and vice versa for the callee. How do they achieve this check and how is the winner selected ?Anonymous
January 11, 2015
Great post. A response to Barath question.
The Lync ICE endpoints at the 1st invite (the initial SDP exchange) as u know will contain all discovered ip/ports that each ICE endpoint can potentially establish media on.
What happens media traffic is then sent out on all candidate pairs and which ever responds then media is established. This is called early media, then 5ish seconds into the call a candidate promotion happens and the ice endpoints will pick the optimal media path that also responded on one of the pairs. You will see this in snooper 5 secs into the call with only the one candidate.