How we got to enforce DMARC for sub-domains of Microsoft's largest consumer email brands

I couldn't believe it. I had been blind for ages. Why had I not seen it before?

The month was August 2017, and none of Microsoft's largest consumer email brands - msn.com, live.com, hotmail.com, and outlook.com - had DMARC reject records in place. Not one. As a result, we were still seeing lots and lots of spoofing.

I had a rough plan in my head to close these down, although it required us to roll out a fix that stopped up entirely from modifying messages once they passed through the Exchange pipeline, thus breaking DKIM signatures. We got that partially fixed this past year, but not entirely. Getting it entirely fixed is something I just didn't have time to drive, I'm far too consumed with antiphishing work, random antispam escalations, and running the service.

It always irked me that even though we managed to lock down @microsoft.com, the biggest brands were still open.

And that's when I had a revelation. I felt ridiculous; why had I never used this technique before?

The big reason why I couldn't move any of our consumer domains to DMARC reject or quarantine is because of false positives. While Microsoft Corporate IT might be willing to take the hit when it comes to mailing lists (lots of our employees are on mailing lists, and going to p=reject resulted in major disruption for them unfortunately [although we do have some mailing lists on our "do not enforce DMARC" list to avoid false positives when sending to Office 365]), it's not something I felt I could expect consumer users to understand. They'd have their mailflow working one day, and the next day experience disruption. We work hard to ensure a good email experience even for our free users, and I was unwilling to push the envelope on that side. So we (I) had to live with the fact that it was prone to spoofing [1].

I consoled myself with the fact that even though @yahoo.com and @aol.com had gone to p=reject three years earlier, at least @gmail.com hadn't. So worst case, we were even with Gmail.

And then one day I said to myself "Self, why are you trying so hard to block spoofing of all of our consumer domains? Can you not clamp down on the subdomains?"

I then said "Self... that's a great idea!" [2]

I had gotten into the rut of thinking that @hotmail.com was like @microsoft.com. @microsoft.com has many dozens of subdomains from which it sends email, so my goal was to prevent spoofing of @microsoft.com and @*.microsoft.com. But for our Consumer domains, there isn't that much legitimate email on our subdomains. That would be a much easier piece of the elephant to consume.

I hadn't previously considered separating out the DMARC enforcement policy before. I knew that DMARC let you specify a different subdomain policy from organizational domain policy, but I had never intentionally tried to decouple them before. It turned out to not be that difficult.

I decided to start with msn.com. I had set up DKIM-signing for it a year ago, along with outlook.com and hotmail.com. Out of all of our consumer domains, for some reason msn.com was the least-spoofed one. I headed over to the Agari portal which is where we point our DMARC reports to and looked at the trend for the organizational domain:

[Disclaimer: The y-axis on the left hand side is not necessarily the actual values; I may have modified them to prevent information disclosure. Not sure that matters... but I did it anyway. The point is to show the trend]

You can see that even though msn.com has SPF and DKIM set up, the majority of email from @msn.com fails DMARC. That means that there's a lot of spoofing of it.

But the organizational domain is not what I was targeting. Instead, I focused on subdomains. I knew that msn.com definitely was used to send email, and I knew that we never opened up any subdomains to allow users to sign up to receive email. But was there any other team at Microsoft using *.msn.com to send email? I checked out the subdomains:

Those all look super spammy to me except maybe web.store.msn.com. Indeed, there were a handful of legitimate-sounding domains like spaces.msn.com or server4.msn.com. I wasn't sure that they were legitimate, but none of them were sending from IPs that looked legitimate.

The puzzle was not yet complete, I didn't have all the data I wanted. And, I wanted to move fast.

The Agari portal was one thing, but I wasn't sure I was getting the full data set of what I needed. Not every email receiver sends DMARC reports reliably, and some don't send them at all. So I decided to go straight to the source I have the best access to - our own logs we store internally for email scanned by Office 365 and Outlook.com.

We store all of our email logs in a big database that can require several hours to parse and run, depending on what you want to do. Some of it is optimized, some of it... not so much.

I discovered the logs optimized by From: addresses could be queried fairly quickly, but did not have that much metadata on it (such as sending IP, or DMARC status). I decided to use those, as they could be good enough.

I did a query on all traffic over the past 30 days sending to Outlook.com (all of our consumer domains that I wanted to enforce) and Office 365, looking only for subdomains. I then added a column for whether or not it was sent to a consumer domain or enterprise customer, and I also added a column for whether or not it was sent by a bulk email provider (we wrap all these up into a BCL value in Enterprise, or NCL value in Consumer). The idea was that any subdomain with a BCL was probably legitimate.

I ran the query and got pretty much the same data I saw in the Agari portal, except I saw a lot more of what was clear phishing attempts - a bunch of phishers sending from variations of accountprotection.msn.com. For months we've been battling phishers impersonating our Microsoft password/notification alias, and they were using msn.com to do it. I was surprised by how much was in there.

But I also found a handful of domains that kind of looked legitimate like email.msn.com, mail.msn.com, and so forth. I went through the list of domains, eyeballing the ones that looked legitimate. I stopped going down the list when the volume became too small. Any legitimate subdomain of msn.com was going to have a lot of traffic. I got it down to a list of 6 subdomains, and I created SPF records for them. If they were legitimate, then they were not passing SPF before but would be now. If they weren't legitimate, then they were not passing before and now would be failing.

I could live with that. Furthermore, every single team that tries to send from a subdomain of *.microsoft.com can't deliver unless they set up proper authentication. The teams that are motivated to have actual delivery come and find me, so I wasn't too worried about breaking anyone's legitimate email flow on a long-term basis. I know the infrastructure (more or less) of how most of Microsoft sends email, and I was pretty sure the default SPF records I set up would work.

I took the plunge and on Aug 31, 2017, updated the subdomain policy of msn.com to p=quarantine. The subdomain was now protected everywhere (that enforced DMARC).

I repeated the process for hotmail.com on Sept 5, 2017 (hotmail.com, believe it or not, was the easiest and had the least amount of subdomain spoofing). This was followed by live.com (which had by far the most subdomain phishing on it... a fact I never would have guessed), and enforced subdomain protection on Sept 7, 2017.

And did spammers give up?

[The following chart may or may not obfuscate the actual numbers on the y-axis]

The answer on msn.com (gray line) is yes, they did. Sort of. They seem to come and go in little waves.

For hotmail.com, it had no impact on their behavior.

For live.com, it went away almost overnight.

For outlook.com, which just went live (Oct 21, 2017), I will have to pull some updated data. You can see the overall trend was downward for months, so perhaps this just firms up the protection.

So at this point, all four large consumer brands of Microsoft's - Outlook, MSN, Hotmail, and Live - have their subdomains protected by DMARC quarantine. And it's also a process I recommend to others: instead of trying to protect your organization all at once, consider splitting up the org domain policy from its subdomain policy and tackling that.

The next stop, of course, is going to enforcement even for the organizational domains. But that's a story for another blog post.


[1] We have other techniques beside DMARC to detect spoofs of our consumer domains. So it's not actually that vulnerable to spoofing.

[2] In this blog post, I frequently refer to myself in the first person "I", as if I did all this work by myself. That's a major oversimplification. There are many people behind-the-scenes who contribute to DMARC enforcement of subdomains and by no means did I do it all myself. In this blog post, I refer to myself in the first person pronoun for storytelling purposes.

Comments

  • Anonymous
    October 24, 2017
    Any plans to protect *.onmicrosoft.com subdomains ?
  • Anonymous
    November 01, 2017
    The comment has been removed
    • Anonymous
      November 03, 2017
      I thought about it, but we have so many subdomains that it wasn't worth the time or effort.
  • Anonymous
    December 17, 2017
    A couple observations that helped our firm get to enterprise reject:You should require all vendors sending on your behalf to use a subdomain and then require SPF, DKIM, and DMARC. For large, complex portfolios of domains and subdomains, I suggest enforcing p=quarantine on the subdomains first, then p=quarantine at the TLD, then go for TLD p=reject and then go back and remove the subpolicies from the subdomains so that they will inherit reject from the TLDs.