User Privacy
Mary Kirtland
Microsoft Corporation
February 14, 2001
In my last column, I discussed defining the vision for the Web Services guidance team's first sample project, the Favorites Service. I apologize for the long delay between columns; I've been out for the better part of a month with a nasty cold. I hope things are back on track now for regular weekly columns for the next couple months.
To recap, the goal of the Favorites Service is to provide a way for applications to store end users' favorite links to Web sites in a safe, secure, central location, so that a user can access his or her favorites through these applications, regardless of which machine the user happens to be using. From a technical perspective, this seems like a pretty straightforward service to implement. It's basically just a specialized data store.
About the same time we started looking at the Favorites Service, there was a flurry of news articles about user privacy—specifically about the information third parties could collect through advertisements on Web pages. This got us thinking: The whole Web Services model is based on a Web page that uses third-party services, most likely without the knowledge of the end user. Were there privacy issues to worry about?
Even without a good definition of user privacy, we were able to come up with a few possible scenarios our proposed Favorites Service would enable that seemed questionable. Based on our initial research into the issues, we decided to implement the Favorites Service in phases, deferring the questionable scenarios to later phases. In this column, I'll discuss what we discovered during our initial research, the hard problems we've deferred, the privacy issues that remain in phase one of the project, and the impact of these on our design and implementation.
Privacy Defined
Let's start by looking at what we mean by user privacy. We will focus our discussion on user privacy and the Web. Whenever you use the Web, there are three kinds of information that might be exchanged between the application you are using (such as a Web browser) and the Web sites the application is connected to (such as the pages displayed in the browser):
- Information you create using some combination of applications, such as the e-mail you write, your vacation photos, your financial records, and so on.
- Information about you, such as your name, address, personal interests, and so on, collected by an application in order to provide services to you.
- Information about the machine and/or network connection you are using, such as an IP address, collected by an application in order to provide services to you.
The issue with user privacy is how this information is collected, used, and distributed. If you buy a book from an online bookstore, of course you will need to provide your name, address, and credit card number in order for the bookstore to complete the order. But what if the bookstore dumps this information in a database, along with records of the specific books you've purchased? On one hand, it could use the information to provide useful services, such as notifying you when new books by your favorite authors are published. On the other hand, it could sell your personal information, resulting in a flood of unwanted junk mail. What constitutes fair use of this information by the companies that provide the applications you use?
Unfortunately, there's no one-size-fits-all answer to this question. The right thing to do is difficult to determine, especially because public perception and government regulations are in flux (and may vary from one legal jurisdiction to another). Standard practice for Web sites today is to post a privacy policy that informs users of what information is collected and how it may be used and distributed. However, there's no standard regarding whether the user must read the privacy policy before information is collected or before the user can access the Web site.
The situation becomes even more uncertain with Web Services. The end user probably doesn't even know that any Web Services are being used. If a Web Service collects information that can be tied to a specific end user (known as personally identifiable information), how does the service provider inform the user about what information is collected and how it may be used or distributed? Do applications that distribute the personal information to Web Services need to disclose this to end users? Traditionally, businesses have not disclosed that they outsource particular aspects of their business processes. For example, a company might not disclose that order fulfillment or customer support are outsourced, although both the order fulfillment company and customer support organization have access to personal information about customers. But the rules may be different online. Only time will tell…
Fair Information Practices
Fair information practices keep customers informed and in control of their personal information. Such information is protected from unwanted use, access, or distribution, so that customers are confident and satisfied when using a company's products. Our first step toward understanding what user privacy meant to the Favorites Service was to read up on Microsoft's fair information practices. Microsoft's Corporate Privacy Group defines five elements of fair information practices:
- Notice. Your company should define a clear policy regarding the collection, use, and distribution of personal information. This policy should include primary and secondary use of data, distribution of data across business divisions within the company, data sharing with affiliate and non-affiliate businesses, and contract obligations with vendors who support business transactions. The company should establish guidelines for policy changes and the impact of changes on data collected prior to the change. You'll want to work with your legal advisors to make sure the policy is something you can enforce in your Web sites and Web Services. Make the policy available to customers and users through multiple distribution channels, including online and offline.
- Consent. You should provide flexible and accessible mechanisms for users to manage their preferences for data collection, use, and distribution. You'll need to categorize information into reasonable and meaningful groupings so that users can figure out what they are consenting to, and so that it doesn't take too long for the user to set up preferences. It's important to think about the default values for user preferences. Does the user need to explicitly enable a particular use of personal information (known as opting in) or explicitly disable the use (known as opting out)?
- Access. The user should be able to view and/or edit any personal information you store, to ensure that it is kept up to date and to manage usage preferences. You'll need to figure out which information the user can edit and which information can only be viewed. For example, the user might not be allowed to edit a unique user identifier, but could be allowed to edit a password. Ideally the tools to manage personal information would be available to both online and offline users.
- Security. You should implement appropriate security measures to protect users' personal information. This includes authentication and authorization mechanisms to protect access to stored data. It also may include mechanisms to protect data during transmission between machines. Security measures should be proportional to the sensitivity of the information. For example, you'll be a lot more concerned about security if you're working with a user's bank account or medical records than if you're working with a list of his favorite authors.
- Enforcement. It doesn't do any good to have a privacy policy if you don't follow it. Your company should define (and follow) procedures for monitoring your information systems for compliance with your privacy policies. Define dispute-resolution processes for all customer information services, and maintain safe harbor relationships with third-party certification organizations. Although enforcement is largely external to the Web site or Web Service itself, you should consider what kinds of auditing information should be kept in order to support enforcement processes. For example, you might want to track whether and when users have read the privacy policy, when and how a user modified user preferences, and so on.
Fair Information Practices and Favorites
This all sounded reasonable in theory, but it still wasn't completely clear to us how this applied to our Web Service, or how you would implement all these elements for Web Services in general. So I spent a few hours discussing the issues with a member of the Corporate Policy Group. We started with a list of scenarios the Favorites Service could potentially enable (based on our initial vision statement):
- A user installs some add-in to Internet Explorer that provides a set of menu options such as Internet Explorer favorites, except that the favorites are actually stored at coldrooster.com. (If you read the last column, you know that we defined a business scenario around a consulting firm. We can now reveal that the name of this fictitious consulting firm is Cold Rooster Consulting, in honor of the rooster that has been hanging around our building on the Microsoft campus. Hence coldrooster.com.)
- Coldrooster.com provides a Web application that lets users manage their favorites.
- A Web site, say msdn.microsoft.com, provides a button on each of its pages that a user can click to add that page to the user's favorite stored at coldrooster.com.
- Msdn.microsoft.com provides a Web page that displays a user's favorites, which were originally stored by msdn.microsoft.com on the user's behalf.
- Msdn.microsoft.com provides a Web application that lets a user manage the favorites that were originally stored by msdn.microsoft.com on the user's behalf.
- Cold Rooster Consulting periodically takes all the stored favorites, stripped of any information that links them back to a particular user, and dumps them into a separate database for analysis.
- Msdn.microsoft.com provides a Web page that displays all of the favorites stored by a user, regardless of the Web site that originally stored the favorite on the user's behalf.
- Msdn.microsoft.com provides a Web application that lets users manage all of their favorites.
- Cold Rooster Consulting provides a separate Web Service that msdn.microsoft.com can license. This service lets licensees retrieve information such as "favorite favorites" or "people who saved this page also saved these pages," but only for the msdn.microsoft.com domain.
- Cold Rooster Consulting provides the Web Service described in scenario 9, except that the recommendations returned to msdn.microsoft.com can include favorites from other domains.
Since we would need to link a user's favorites to their personal identification, such as an e-mail address or Microsoft Passport identifier, in order to make all the user's favorites available through any application and any machine, user favorites data definitely fell within the category of personally identifiable information. If we stuck with this definition of the Favorites Service, we would need to implement fair information practices through a combination of policy, procedures, and code.
At the time of our discussion, there weren't any laws that would require notifying the user before storing information on their behalf. So we could implement the notice element by posting a privacy policy on coldrooster.com. How would users know that they needed to read the policy? We came up with two options: Either users would need to sign up with coldrooster.com before they could store favorites via our service, or client applications would need to notify their users that the Cold Rooster Consulting Favorites Service was being used, with a pointer to our privacy policy.
From a security standpoint, user favorites don't fall into the same category as medical records, but a user will still want to have some control over who can access them. For example, by looking at the favorites I have stored on my home machine, you could find out which sports teams I support, what kinds of books I like to read, what kinds of music I like to listen to, and where I have my bank accounts—not information I want everyone in the world to have access to. And if anyone could modify my favorites, they could replace the links I've selected with other sites (possibly for nefarious purposes, such as intercepting confidential information) or add new links to my favorites. So we would definitely want to secure access to user favorites. And we'd probably want to let users specify which applications could read or write which favorites. For example, I might let MSDN modify my favorites for the msdn.microsoft.com domain, but I wouldn't want MSDN to even see the links for my favorite sports teams. Why should MSDN care about those?
To let users control which applications could read or write which favorites, we would need to implement the consent and access elements of fair information practices. We'd also probably want to implement auditing code to support the enforcement element.
Suddenly our simple little Web Service doesn't sound so simple! What level of control should we give users? Should we let them specify exactly which applications can read or write favorites from each domain? Or should we group applications and domains into zones to simplify configuration? And which of the scenarios listed above should be enabled by default?
Our privacy expert didn't have any concerns about scenarios 1 - 5. The typical privacy policy would cover these scenarios. However, for scenario 2, we would need to consider whether coldrooster.com should be able to manage all of a user's favorites, regardless of which application stored the favorites for the user, or just the favorites that Cold Rooster Consulting's applications added. We'd probably err on the side of caution and say that Cold Rooster Consulting's applications could only manage user favorites added through those apps, unless the user explicitly specified that the apps could be used to view or edit all favorites stored on the user's behalf.
Even scenario 6 isn't too much of a problem, as long as the privacy policy indicates that we may use the stored user favorites for further analysis. Again, we need to consider whether the data needs to be partitioned—either by domain or by the application that originally provided the data—before it is analyzed. And since many people are wary of data profiling, we might want to give users the ability to opt out of having their favorites included in the pooled data used for analysis.
The remaining scenarios become increasingly dicey from a privacy perspective. That's not to say that they shouldn't be implemented, just that it would be harder to write an accurate yet understandable policy statement, and users might not be comfortable with the scenarios, so they probably should be disabled by default (the user must opt in).
Scenario 7 initially sounds pretty innocuous, but what it really means from a Web Service perspective is that an application can get a copy of all of a user's favorites from the Favorites Service. Once the application has a copy of the data, it can do whatever it wants with it. If we supply a Web Service that supports this scenario, we'd probably want to restrict access to the Web Service to known clients with privacy policies that meet some minimal criteria.
Scenario 8 is even more problematic. Once an application has the ability to modify a user's favorites, what's to prevent the application from adding random pages to the user's list or deleting a favorite that points to a competitor's site? In other words, how can the Web Service distinguish valid service requests made by an application on behalf of an end user from service requests made by an application that the end user is unaware of? The available security mechanisms that work with HTTP and XML don't really support this kind of client/server/service scenario directly—we'd need to implement some custom security solution. Even with the custom security mechanism, there would probably be additional work required to provide a way for users to specify which applications could edit which favorites.
Finally, scenarios 9 and 10 go even further into the realm of online profiling than scenario 6 does. The technical issues really aren't any different from those already mentioned, but the user discomfort level would be even higher.
Based on this analysis of the scenarios, we decided to step back and rethink the vision for the initial delivery of the Favorites Service. The new vision for phase one focuses on scenarios 3 - 5 above. Essentially each application has its own private store for user favorites. If I go to msdn.microsoft.com and store a link to this column, I can only view or edit that link through the user interface msdn.microsoft.com provides.
This approach eliminates several hard problems. In fact, it eliminates the entire user privacy issue as it relates to user favorites! Since each application that uses the Favorites Service effectively has a separate store of user favorites , there's no need for a global user identification scheme that the Favorites Service understands. Each application can use whatever kind of identifier it wants. The Favorites Service has no way of interpreting these identifiers or correlating information stored by different applications. Because the data can only be accessed by a single application (or, more precisely, a single licensee of the Favorites Service), we don't need to worry about providing a way for users to opt in or out of various scenarios. We've effectively delegated the issue of user privacy back to the calling application.
That's not to say we don't care about solving the technical challenges raised in our analysis of the scenarios above. We do want to address these in a future phase of the Favorites Service. We just want to take a bit more time to think things through and come up with a solution we feel comfortable recommending to the developer community.
So what if you need to solve the problem today? I can't see any way around implementing a licensing mechanism for both users and applications. Users would need to sign up for an account with your service. That means you have a Web site where they can read your privacy policy, sign up for the account, and manage their preferences. Companies developing applications would also need to sign up for a license to use your Web Service. Your license agreement should specify how licensees notify their users about the use of your Web Service. You'll have to figure out whether you can trust the licensees to only use the Web Service appropriately. If so, you can probably get away with letting the Web site collect user credentials and pass them along to your Web Service. If not, you'll need to provide some code that licensees can use to provide a secure mechanism to retrieve user credentials and pass them along to the Web Service. Either way, there'll be a considerable amount of work involved.
The Remaining Privacy Issues
Although we don't need to worry about user privacy with respect to user favorites in phase one, there are still some privacy issues to think about. We decided to license access to the Favorites Service. This means that we'll need to maintain some contact information about licensees. That information falls into the category of personally identifiable information. So we have the standard privacy issues that any application that maintains account information faces.
We've addressed these issues using a combination of policy and code. The following diagram provides a high level view of our system architecture:
Figure 1. Favorites Service architecture in phase one
Our service is implemented with a layered architecture and is deployed on two physical tiers, the Web farm and the data cluster. Licensee account information is stored in a database on the data cluster. Our Web Service and the Web site through which licensees manage their account information are deployed on the Web farm. There are several layers of protection for licensee information:
- The data cluster is not accessible from machines outside Cold Rooster Consulting.
- The Favorites Service does not need to access licensee contact information, so it uses a Logon component to authenticate licensees. The Logon component only retrieves the information it needs.
- On the other hand, the license management Web site does need to access licensee contact information. How else can it let the licensee edit the data? The Web site performs all data access through the Licensing component. Access controls on the Licensing component prevent anything other than the license management Web site from calling the component.
- Access controls on the licensee database prevent anything other than the Logon component and Licensing component from accessing the database.
- Confirmation e-mails are sent to addresses specified in contact information whenever contact information is modified.
Net effect: It should be very difficult for unauthorized users to access or modify the licensee contact information, unless a licensee's identifier and password are compromised. Even in that situation, if someone attempted to change contact information, the current contact would be informed.
In addition, we'll post a privacy policy on our Web site. We could also provide the privacy policy along with other documentation we give to new licensees, such as documentation on how to write applications that use the Favorites Service.
Conclusion
User privacy is a thorny issue for developers of Web Services and the applications that use them. Our analysis of the problem for the Favorites Service caused us to rethink the entire objective for the service. Even with the reduced scope, a significant number of requirements were added in order to ensure that user information was protected from inappropriate use. The most important requirement was the need to restrict access to licensed applications. Next week, we'll look at licensing in more detail: the business models we considered, the model we selected, and the impact of the model on our design and implementation.
If your Web Service needs to maintain personally identifiable information, you have a lot of work to do beyond implementing the core functionality of your service. You need to address all five elements of fair information practices: notice, consent, access, security, and enforcement. You'll need to determine when you must address these directly with users, and when you can defer the privacy issues to the applications using your Web Service. I highly recommend involving your legal advisors in discussions regarding these issues to ensure that you are up to date regarding user privacy laws wherever your users are located. The following resources provide additional information about user privacy: