Hybrid Cloud Infrastructure Design Considerations
Published: June 26, 2013
Version: 1.1
Abstract: The Hybrid Cloud Infrastructure Design Considerations guide provides the enterprise architect and designer with a collection of critical design considerations that need to be addressed before beginning the design decisions process that will drive a hybrid cloud computing infrastructure implementation. This article can be used together with the Hybrid Cloud Solution for Enterprise IT reference implementation guidance set to create a core hybrid cloud infrastructure.
To provide feedback on this article, leave a comment at the bottom of the article or send e-mail to SolutionsFeedback@Microsoft.com. To easily save, edit, or print your own copy of this article, please read How to Save, Edit, and Print TechNet Articles. When the contents of this article are updated, the version is incremented and changes are entered into the change log. The online version is the current version. See the bottom of this article for a list of technologies discussed in this article.
1.0 Introduction
Most enterprise information technology (IT) organizations have data centers that have limited IT staff, data center space, hardware, and budgets. To avoid adding more of these resources, or to more effectively use the resources they already have, many organizations now use external IT services to augment their internal capabilities and services. Examples of such services are Microsoft Office 365 and Microsoft Dynamics CRM Online. Services that are provided by external providers typically exhibit the five essential characteristics of cloud computing (on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service) that are defined in The NIST Definition of Cloud Computing.
In the remainder of this document, the term “cloud services” refers to services that exhibit the United States National Institute of Standards and Technology (NIST) essential characteristics cloud computing. Services that do not exhibit these characteristics are referred to simply as “services.” Services often don’t exhibit many, if any of the essential characteristics. Another term that is used throughout this document is “technical capabilities.” Technical capabilities are the functionality that is provided by hardware or software, and when they are used together in specific configurations, they provide a service, or even a cloud service.
For example, when providing a messaging service in your environment, you’d use an email server application, network, servers, name resolution, storage, authentication, authorization, and directory technical capabilities, at a minimum to provide it. If you wanted to provide that same messaging service as a cloud service in your environment, you’d add additional capabilities such as a self-service portal, and probably an orchestration capability (to execute the tasks that support the essential cloud characteristic of self-service).
Rather than several people consuming a number of cloud services from external providers independently, typically a department within the IT organization establishes a relationship with an external provider at the organizational level. This department consumes the service at the organizational level, integrates the service with some of their own internal technical capabilities and/or services, and then provides the integrated hybrid service to consumers within their own organization. The consumers within the organization are often unaware of whether the service is owned and managed by their own IT organization or owned and managed by an external provider. And they don’t care who owns it, as long as the service meets their requirements.
An important consideration when dealing with a hybrid cloud infrastructure is that while the in-house IT department will be seen as a provider of cloud services to the corporate consumer of the hybrid cloud solution, it is also true that the IT organization itself is a consumer of cloud services. That means that there are multiple levels of consumers. The corporate consumer might be considered a second-level consumer of the public cloud services, while the IT organization might be considered a first-level consumer of the service. This has important implications when thinking about the architecture of the solution. This issue will be covered later in this document.
This document details the design considerations and configuration options for integrating Windows Azure Infrastructure Services (virtual machines (or “compute”), network, and storage cloud services) with the infrastructure capabilities and/or services that currently exist within typical organizations. This discussion will be driven by requirements and capabilities. Microsoft technologies are mentioned within the context of the requirements and capabilities and not vice versa. It is our expectation that this approach will resonate better with architects and designers who are interested in what problems must be solved and what approaches are available for solving these problems. Only then is the technology discussion relevant.
1.1 Audience
The primary audience for this document is the enterprise architect or designer who is interested in understanding the issues that need to be considered before engaging in a hybrid cloud project, and the options available that enable them to meet the requirements based on the key infrastructure issues. Others that might be interested in this document include IT implementers who are interested in the design considerations that went into the hybrid cloud infrastructure they are tasked to build.
1.2 Document Purpose
The purpose of this document is two-fold. The first purpose is to provide the enterprise architect or designer a collection of issues and questions that need to be answered for each of the issues for building a hybrid cloud infrastructure. The second purpose is to provide the enterprise architect or designer a collection of options that can be evaluated and chosen based on the answers to the questions. While the questions and options can be used with any public cloud service provider's solution, examples of available options will focus on Windows Azure.
In addition, this document includes:
- The relevant design requirements and environmental constraints that must be gathered in an environment before integrating Windows Azure Infrastructure Services into an environment.
- Conceptual design considerations for integrating infrastructure cloud services into an existing environment, regardless of who is the external provider of the cloud services.
- Physical design considerations to evaluate when integrating Windows Azure Infrastructure Services into an existing environment.
This document was conceived and written with the desire that enterprise IT should not want to replicate their current datacenter in the cloud. Instead, it is assumed that that enterprise IT would like to base a new solution on new architectural principles specific for a hybrid cloud environment. This document focuses on the hybrid cloud infrastructure because core infrastructure issues need to be addressed before even considering creating a single virtual machine for production. Issues revolving around security, availability, performance and scalability need to be considered in the areas of networking, storage, compute and identity before embarking on a production environment. We recognize that there is a tendency to want to stand up applications as soon as the public cloud infrastructure service account is created, but we encourage you to stem that urge and read this document so that you can avoid unexpected complications that could put your hybrid cloud project at risk.
Note that the existing environment can be a private cloud or a traditional data center. The goal is to enable you to integrate your current environment with a public cloud provider of infrastructure services (an Infrastructure as a Service [IaaS] provider).
While this document does explain design considerations and the relevant Microsoft technology and configuration options for integrating Windows Azure Infrastructure Services with the existing infrastructure of technical capabilities and/or services in an environment, it does not provide any example designs for doing so. A future document set will address a specific design example. You can find more information about this on the Cloud and Datacenter Solutions Hub at http://technet.microsoft.com/en-US/cloud/dn142895.
If you’re also interested in guidance that includes lab-tested designs that integrate infrastructure cloud services into existing environments, it is available separately. For more information, see http://technet.microsoft.com/en-US/cloud/dn142895.
2.0 Hybrid Cloud Problem Definition
The following problems or challenges typically drive the need to integrate infrastructure cloud services from external providers into existing environments:
- Existing hardware, software, or staff resources cannot meet the demand for new technical capabilities and/or services within the environment.
- Periodic demand “spikes” require acquisition of hardware and software resources that sit idle during normal, non-spike usage periods.
- On-premises cloud services are usually not as cost effective as consuming the services from an external provider. Private cloud solutions make sense for maximizing flexibility and efficiency on-premises, and provide a path to integrating with or migrating to public cloud services, we should not forget that extreme economies of scale are only going to be realized via public cloud services offerings. The Economics of the Cloud whitepaper from Microsoft estimated a 10 fold reduction in cost when fully utilizing public cloud.
Organizations with a large application portfolio will need to be able to determine hybrid cloud infrastructure requirements before starting new applications, or moving existing applications into a cloud environment. Different applications will have different demands in the areas of networking, storage, compute, identity, security, availability and performance. You will need to determine if the public cloud infrastructure service provider you choose is able to deliver on the requirements you define in each of these areas. In addition, you will need to consider are regulatory issues specific to your organization's geo-political alignment.
3.0 Envisioning the Hybrid Cloud Solution
After clearly defining the problem you’re trying to solve, you can begin to define a solution to the problem that satisfies your consumer’s requirements and fits the constraints of the environment in which you’ll implement your solution.
3.1 Solution Definition
To solve the problems previously identified, many organizations are beginning to integrate infrastructure cloud services from external providers into their environments. In many organizations today, a department within the organization owns and manages network, compute (virtual machine), and storage technical capabilities. The people in this department may provide these technical capabilities for use by people in other departments within the organization, and/or, with additional technical capabilities, provide these capabilities as services, or even cloud services within their environment.
The design considerations in this document are for a solution that enables an organization to:
- Set up an organization-level account and billing with an external provider of cloud infrastructure services, so that its consumers don’t do so at an individual level.
- Allow its consumers to provision new virtual machines with the external provider that have capabilities similar to the capabilities of virtual machines that are provided on premises.
- Allow its consumers to move existing applications that run on the organization’s on-premises network into a public cloud infrastructure as a service offering.
- Allow consumers of applications in the organization to resolve names and authenticate to resources that are running on the external provider’s infrastructure cloud services, just as they do with resources that are running on premises.
- Enable core security, data access controls, business continuity, disaster recovery, availability and scalability requirements
3.2 Solution Requirements
Before integrating infrastructure cloud services from an external provider with existing infrastructure technical capabilities and/or services to solve the problems that were previously listed, you must first define a number of requirements for doing so, as well as the constraints for integrating the services. Some of the requirements and constraints are defined by the consumers of the capabilities, while others are defined by your existing environment, in terms of existing technical capabilities, services, policies, and processes.
Determining the requirements, constraints, and design for integrating the services is an iterative process. Initial requirements, coupled with the constraints of your environment may drive an initial design that can’t meet all of the initial requirements, necessitating changes to the initial requirements and subsequent design. Multiple iterations through the requirements definition and the solution design are necessary before finalizing the requirements and the design. Therefore, do not expect that your first run through this document will be the last one, as you’ll find that decisions you make earlier will exclude more preferred options that you might want to select later.
The answers to the questions in this section provide a comprehensive list of requirements for integrating infrastructure cloud services from an external provider with the existing infrastructure technical capabilities and/or services in your environment.
3.2.1 Service Delivery Requirements
Before integrating cloud infrastructure services from an external provider with existing infrastructure technical capabilities and/or services in your environment, you’ll need to work with the consumer(s) of these cloud services in your environment to answer the questions in the sections that follow. The questions are aligned to the Service Delivery processes that are defined in the CSFRM. The initial answers to these questions that you get from your consumer(s) are the initial Service Delivery requirements for your initial design.
After further understanding the constraints of your environment and the products and technologies that you will ultimately use to extend your existing infrastructure technical capabilities to an external provider however, you will likely find that not all of the initial requirements can be met. As a result, you’ll need to work with your consumer to adjust the initial requirements and continue iterating until you have a final design that satisfies the requirements and the constraints of your environment.
The outcome of this process is a clear definition of the functionality that will be provided, the service level metrics it will adhere to, and the cost at which the functionality will be provided. The service design applies the outcomes of the following questions.
The following table contains questions that you’ll need to address in these areas.
Service delivery requirements | Questions to ask |
---|---|
Demand and capacity management |
|
Availability and continuity management |
|
Information security management |
|
Regulatory and compliance management |
|
Financial management |
|
3.2.2 Service Operations Requirements
You have a variety of operational processes that are applied to the delivery of all services and technical capabilities in your environment. As a result, you need to answer the questions in the following sections to determine how the hybrid cloud infrastructure you’re designing will apply to and comply with your operational processes. The questions are aligned to the service operations processes defined in the CSFRM. The answers to these questions become the service operations requirements for the design of your hybrid cloud infrastructure. Questions you need to ask to address these areas are included in the following table.
Service operations requirements | Questions to ask |
---|---|
Request fulfillment |
|
Service asset and configuration management |
|
Change management |
|
Release and deployment management |
|
Access management |
|
Systems administration |
|
Knowledge management |
|
Incident and problem management |
|
3.2.3 Management and Support Technical Capability Requirements
Every organization uses a variety of technical capabilities to manage and support services in their environment. As the provider of this service, you need to work with the people in your organization who provide these technical capabilities to determine the answers to the questions in this section. The questions are aligned to the Management and Support Technical Capabilities that are defined in the CSFRM. The answers to these questions become the Management and Support Technical Capability Requirements and constraints for the design of your hybrid cloud infrastructure.
While the introduction of this service may require unique changes to the existing capabilities in your environment, it’s assumed that because such changes start to de-standardize the existing capabilities, they should be avoided whenever possible.
When thinking about management support and technical capabilities, you should ask the questions in the following table.
Management and support technical capability | Questions to ask |
---|---|
Service reporting |
|
Service Management |
|
Service Monitoring |
|
Configuration Management |
|
Fabric Management |
|
Deployment and Provisioning |
|
Data Protection |
|
Network Support |
|
Billing |
|
Self-Service |
|
Authentication |
|
Authorization |
|
Directory |
|
Orchestration |
|
3.2.4 Infrastructure Services Capabilities Requirements
Every organization uses a variety of infrastructure technical capabilities, or infrastructure services, or some combination of the two to host IT services. As the provider of this service, you need to work with the people in your organization who provide these technical capabilities to determine the answers to the questions in this section. The questions are aligned to the Infrastructure Technical Capabilities that are defined in the CSFRM. The answers to these questions become the Infrastructure Technical Capability Requirements and constraints for the design of your hybrid cloud infrastructure.
While the introduction of this service may require unique changes to the existing capabilities in your environment, it’s assumed that because such changes start to de-standardize the existing capabilities, they should be avoided whenever possible.
When considering infrastructure capability requirements, you should start by asking the questions in the following table.
Infrastructure services requirements | Questions to ask |
---|---|
Network |
|
Virtual Machine |
|
Storage |
|
3.2.5 Infrastructure Technical Capability Requirements
Every organization uses a variety of infrastructure services, or infrastructure technical capabilities, or some combination of the two to host IT services. As the provider of this service, you need to determine how best to use the existing infrastructure services in your environment. The infrastructure services that your environment uses may be provided by your own organization, by external organizations, or some combination of the two. If your environment uses existing internal or external infrastructure (and in almost all cases it will), then your organization’s technical capabilities will be driven by that infrastructure, which supports the infrastructure capabilities that are mentioned in the infrastructure component section.
One of the core tenets of cloud computing is that the infrastructure should be completely transparent to the user. So the users of the cloud service should never know (nor should they care) what the infrastructure services are that support the cloud infrastructure.
The questions in this section are aligned to the Infrastructure component in the CSFRM. The answers to these questions become the Infrastructure Requirements and constraints for the design of your hybrid cloud infrastructure. Note that the CSFRM does not assume that the infrastructure for your environment is provided by your own organization. The infrastructure might be provided by your organization, or they might be provided by an external organization. However, in a hybrid cloud infrastructure, infrastructure is provided by both the company’s IT organization and the public cloud infrastructure provider.
While the introduction of public cloud infrastructure may require unique changes to the existing services in your environment, it’s assumed that because such changes start to de-standardize the existing services, they should be avoided whenever possible.
The following table includes questions you should ask about infrastructure requirements.
Infrastructure requirements | Questions to ask |
---|---|
Network |
|
Compute |
|
Virtualization |
|
Storage |
|
3.2.6 Platform Requirements
Some organizations provide platform services that are consumed by application developers and the software services that they develop for the organization. As the provider of this service, you need to determine whether your organization currently provides its own platform services, or uses platform services from external providers. If you do use external providers, you need to know how their service will use the platform services. If your environment has its own existing platform services or uses external services, then your organization’s self-service technical capability will include a service catalog, which includes the list of available services in the environment and the service-level metrics they adhere to.
The questions in this section are aligned to the Platform Services that are defined in the CSFRM. The answers to these questions become the Platform Services Requirements and constraints for the design of your hybrid cloud infrastructure. Note that the CSFRM does not assume that the platform services for your environment are provided by your own organization. The platform services might be provided by your organization, or they might be provided by an external organization.
While the introduction of this service may require unique changes to the existing services in your environment, it’s assumed that because such changes start to de-standardize the existing services, they should be avoided whenever possible.
The following table includes questions you should ask about platform service requirements.
Platform service requirements | Questions to ask |
---|---|
Structured data |
|
Unstructured data |
|
Application server |
|
Middleware server |
|
Service bus |
|
4.0 Conceptual Design Considerations
After determining the requirements and constraints for integrating cloud infrastructure services from a public cloud infrastructure provider into your environment, you can begin to design your solution. Before creating a physical design, it’s helpful to first define a conceptual model (commonly referred to as a “reference model”), and some principles that will work together as a foundation for further design.
4.1 Reference Model
A reference model is a vendor-agnostic depiction of the high level components of a solution. A reference model can provide common terminology when evaluating different vendors’ product capabilities. A reference model also helps to illustrate the relationship of the problem domain it was created for to other problem domains within your environment. As a starting point, we can use the previously mentioned Cloud Services Foundation Reference Model (CSFRM).
We won’t include a detailed explanation of the CSFRM in this document, but if you’re interested in understanding it further, you’re encouraged to read the Microsoft Cloud Services Foundation Reference Model document. It will be available as part of the Microsoft Cloud Services Foundation Reference Architecture guidance set. To stay abreast of the work in this area, please see http://aka.ms/Q6voj9
Although it is from Microsoft, this reference model is vendor-agnostic. It can serve as a foundation for hosting cloud services and can be extended, as appropriate, by anyone. If you decide to use it in your environment, you’re encouraged to adjust it appropriately for your own use. Figure 1 illustrates the CSFRM.
Figure 1: Microsoft Cloud Services Foundation Reference Model
Recall from the Solution Definition section of this document, that the solution to the problems that are defined in the Problem Definition section of this document is to host virtual machines with an external provider such that the consumers within the organization can provision new virtual machines in a manner similar to how they provision virtual machines that are hosted on premises today.
The solution also requires that the virtual machines that are hosted by an external provider have capabilities that are similar to the capabilities of the on-premises virtual machines. As mentioned previously, the components, or boxes, in the reference model either change the way existing technical capabilities and/or services are provided in an environment, or introduce new services into an environment.
The Physical Design Considerations section of this document will discuss the design considerations for all of the black-bordered boxes in Figure 1.
4.2 Hybrid Cloud Architectural Principles
After you’ve defined a reference model, you can establish some principles for integrating infrastructure cloud services from an external provider. Principles serve as “guidelines” for physical designs to adhere to. You can use the principles that follow as a starting point for defining your own. They are a combination of both principles from the CSFRA, and principles unique to integrating infrastructure cloud services from an external provider.
The Microsoft Private Cloud Reference Architecture (PCRA) provides a number of vendor-agnostic principles, patterns, and concepts to consider before designing a private cloud. Although they were defined with private clouds in mind, they are in fact applicable to any cloud based solution. You are encouraged to read through the document, Private Cloud Principles, Concepts and Patterns, in full, as the information in it contains valuable insight for almost any type of cloud infrastructure planning, including the hybrid cloud infrastructure that is discussed in this document.
As mentioned previously, designing a cloud infrastructure may be different from how you’ve historically designed infrastructure. In the past, you often purchased and managed individual servers with specific hardware specifications to meet the needs of specific workloads. Because these workloads and servers were unique, automation was often difficult, if for no other reason than the sheer volume of variables within the environment. You may have also had different service level requirements for the different workloads you were planning infrastructure for, often causing you to plan for redundancy in every hardware component.
When designing a cloud infrastructure for a mixture of workload types with standardized cost structures and service levels, you need to consider a different type of design process. Consider the following differences between how you planned for and designed unique, independent infrastructures for specific workloads in the past, and how you might plan for and design a highly standardized infrastructure that supports a mixture of workloads for the future.
A hybrid cloud infrastructure introduces new variables, because even if you currently host a private cloud infrastructure on premises, you are not responsible for enabling the essential cloud characteristics in the public cloud infrastructure service provider’s side of the solution. And if you don’t have a private cloud on premises, you can still have a hybrid cloud infrastructure. In that case, you’re not at all responsible for providing any of the essential characteristics of cloud computing, because the only cloud you’re working with is the one on the public cloud infrastructure side.
The following table provides some perspective on some specific design aspects of a cloud based solution versus how you have done things in a traditional data center environment.
Design aspect | Non-cloud infrastructure | Cloud infrastructure |
---|---|---|
Hardware acquisition | Purchase individual servers and storage with unique requirements to support unique workload requirements. | Private Cloud: Purchase a collection (in a blade chassis or a rack) of servers, storage, and network connectivity devices pre-configured to act as one large single unit with standardized hardware specifications for supporting multiple types of workloads. These are referred to as scale units. Adding capacity to the data center by purchasing scale units, rather than individual servers, lowers the setup and configuration time and costs when acquiring new hardware, although it needs to be balanced with capacity needs, acquisition lead time, and the cost of the hardware.
Public Cloud: No hardware acquisition costs other than possible gateway devices that are required to connect the corporate network to the cloud infrastructure service provider’s network. |
Hardware management | Manage individual servers and storage resources, or potentially aggregations of hardware that collectively support an IT service. | Private Cloud: Manage an infrastructure fabric. To illustrate this simplistically, think about taking all of the servers, storage, and networking that support your cloud infrastructure and managing them like one computer. While most planning considerations for fabric management are not addressed in this guide, it does include considerations for homogenization of fabric hardware as a key enabler for managing it like a fabric.
Public Cloud: No need to manage new in-house servers or storage devices—host servers and storage infrastructure are managed by the public cloud infrastructure service provider. |
Hardware utilization | Acquire and manage separate hardware for every application and/or business unit in the organization. | Private Cloud: Consolidate hardware resources into resource pools to support multiple applications and/or business units as part of a general-purpose cloud infrastructure.
Public Cloud: Set up virtual machines and virtual networks for specific applications and business units. No hardware acquisition required. |
Infrastructure availability and resiliency | Purchase infrastructure with redundant components at many or all layers. In a non-cloud infrastructure, this was typically the default approach, as workloads were usually tightly coupled with the hardware they ran on, and having redundant components at many layers was generally the only way to meet service level guarantees. | Private Cloud: With a fabric that is designed to run a mixture of workloads that can move dynamically from physical server to physical server, and a clear separation between consumer and provider responsibilities, the fabric can be designed to be resilient, and doesn’t require redundant components at as many layers, which can decrease the cost of your infrastructure. This is referred to as designing for resiliency over redundancy. To illustrate, if a workload running in a virtual machine can be migrated from one physical server to another with little or no downtime, how necessary is it to have redundant NICs and/or redundant storage adapters in every server, as well as redundant switch ports to support them? To design for resiliency, you’ll first need to determine what the upgrade domain (portion of the fabric that will be upgraded at the same time) and physical fault domain (portion of the fabric that is most likely to fail at the same time) are for your environment. This will help you determine the reserve capacity necessary for you to meet the service levels you define for your cloud infrastructure.
Public Cloud: Infrastructure resiliency is built into the public cloud infrastructure service provider’s offering. You don’t need to purchase additional equipment or add redundancy. |
A hybrid cloud infrastructure shares all of the principles of a private cloud infrastructure. Principles provide general rules and guidelines to support the evolution of a cloud infrastructure. They are enduring, seldom amended, and inform and support the way a cloud fulfills its mission and goals. They should also be compelling and aspirational in some respects because there needs to be a connection with business drivers for change. These principles are often interdependent, and together they form the basis on which a cloud infrastructure is planned, designed, and created.
After you’ve defined a reference model, you can then define principles for integrating infrastructure cloud services from a public provider with your on-premises services and technical capabilities. Principles serve as “guidelines” for physical designs to adhere to, and are oftentimes, inspirational, as fully achieving them often takes time and effort. The Microsoft Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns article lists several principles that can be used as a starting point when defining principles for both private and hybrid cloud services. While all of the Microsoft Cloud Services Foundation Reference Architecture principles are relevant to designing hybrid cloud services, the principles listed below are the most relevant, and are applied specifically to hybrid cloud services:
4.2.1 Perception of Infinite Capacity
Statement:
From the consumer’s perspective, a cloud service should provide capacity on demand, only limited by the amount of capacity the consumer is willing to pay for.
Rationale:
The rationale for applying each of the following principles is the same as the rationale for each principle listed in the Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns, so each rationale is not restated in this article.
Implications:
Combining capacity from a public cloud with your own existing private cloud capacity can typically help you achieve this principle more quickly, easily, and cost-effectively than by adding more capacity to your private cloud alone. Among other reasons, this is because you don't need to manage the physical acquisition process and delay, this process is now the public provider's responsibility.
4.2.2 Perception of Continuous Service Availability
Statement:
From the consumer’s perspective, a cloud service should be available on demand from anywhere, on any device, and at any time.
Implications:
Designing for availability and continuity often requires some amount of normally unused resources. These resources are utilized only in the event of failures. Utilizing on-demand resources from a public provider in service availability and continuity designs can typically help you achieve this principle more cost-effectively than with private cloud resources alone. To illustrate this point, if your organization doesn't currently have it's own physical disaster recovery site, and is evaluating whether or not to build one, consider the costs in real-estate, additional servers, and software that a disaster recovery site would require. Compare that cost against utilizing a public provider for disaster recovery. In most cases, the cost savings of using a public provider for disaster recovery could be significant.
4.2.3 Optimization of Resource Usage
Statement:
The cloud should automatically make efficient and effective use of infrastructure resources.
Implications:
Some service components may have requirements that allow them only to be hosted within a private cloud. Specific security or regulatory requirements are two examples of such requirements. Other service components may have requirements that allow them to be hosted on public clouds. Individual service components may support several different services within an organization. Each service component may be hosted on a private or public cloud. According to Microsoft's The Economics of the Cloud whitepaper, hosting service components on a private cloud can be up to 10X more than hosting the service components with a public cloud provider. As a result, utilizing public cloud resources can help organizations optimize usage of their private cloud resources by augmenting them with public cloud resources.
4.2.4 Incentivize Desired Behavior
Statement:
Enterprise IT service providers must ensure that their consumers understand the cost of the IT resources that they consume so that the organization can optimize its resources and minimize its costs.
Implications:
While this principle is important in private cloud scenarios, it's oftentimes a challenge to adhere to if the actual costs to provide services are not fully understood by the IT organization or if consumers of private cloud services are not actually charged for their consumption, but rather, only shown their consumption. When utilizing public cloud resources however, consumption costs are clear, consumption is measured by the public provider, and the consumer is billed on a regular basis. As a result, the actual cost to an organization for consuming public cloud services may be much more tangible and measurable than consuming private cloud services is. These clear consumption costs may make it easier to incent desired behavior from internal consumers as a result.
4.2.5 Create a Seamless User Experience
Statement:
Within an organization, consumers should be oblivious as to who the provider of cloud services are, and should have similar experiences with all services provided to them.
Implications:
Many organizations have spent several years integrating and standardizing their systems to provide seamless user experiences for their users, and don't want to go back to multiple authentication mechanisms and inconsistent user interfaces when integrating public cloud resources with their private cloud resources. The myriad of application user interfaces and authentication mechanisms utilized across various applications has made achieving this principle very difficult. The user interfaces and authentication mechanisms utilized across multiple public cloud service providers can make achieving this principle even more difficult. It's important to define clear requirements to evaluate public cloud providers against. These requirements may include specific authentication mechanisms, user interfaces, and other requirements that public providers must adhere to before you incorporate their services into your hybrid service designs.
4.3 Hybrid Cloud Architectural Patterns
Patterns are specific, reusable ideas that have been proven solutions to commonly occurring problems. The Microsoft Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns article lists and defines the patterns below. In this article, the definitions are not repeated, but considerations for applying the patterns specifically to hybrid infrastructure and service design are discussed for each pattern.
4.3.1 Resource Pooling
Problem: When dedicated infrastructure resources are used to support each service independently, their capacity is typically underutilized. This leads to higher costs for both the provider and the consumer.
Solution: When designing hybrid cloud services, you may have pools of resources on premises and may treat the resources at a public provider as a separate pool of resources. Further, you may separate public provider resources into separate resource partition pools for reasons such as service class, systems management, or capacity management, just as you might for your on-premises resources. For example, and organization may define two separate service class partition resource pools, one within its private cloud, which might host medium and high business impact information, and one in its public cloud, which might host only low business impact information.
4.3.2 Scale Unit
Problem: Purchasing individual servers, storage arrays, network switches, and other cloud infrastructure resources requires procurement, installation, and configuration overhead for each individual resource.
Solution: When designing physical infrastructure, application of this pattern usually encompasses purchasing pre-configured collections of several physical servers and storage. While a public provider's scale unit definition strategy is essentially irrelevant to its consumers, you may still choose to define units of scale for the resources you utilize with a public cloud provider. With a public provider, since you typically pay for every resource consumed, and you have no wait time for new capacity like you do when adding capacity to your private cloud, you may decide that your compute scale unit, for example, is an individual virtual machine. As you near capacity thresholds, you can simply add and remove individual virtual machines, as necessary.
4.3.3 Capacity Plan
Problem: Eventually every cloud infrastructure runs out of physical capacity. This can cause performance degradation of services, the inability to introduce new services, or both.
Solution: The capacity plan in a hybrid solution design incorporates all the same elements of a capacity plan for an on-premises-only solution design. Service designers however, will likely find it to be much less effort to add/remove capacity on-demand when utilizing resources from a public provider, if for no other reason than doing so doesn't require them to order and wait for the arrival of new hardware. Meeting spikes in capacity needs will often prove to be more cost-effective when using public provider resources over using only dedicated on-premises resources too, since when the spike is over, you no longer need to pay for the usage of the extra capacity required to meet the demand spike. Some public providers also offer auto-scaling capabilities, where there systems will auto-scale service component tiers based on user-defined thresholds.
4.3.4 Health Model
Problem: If any component used to provide a service fails, it can cause performance degradation or unavailability of services.
Solution: Initially, you might think that the definition of health models for hybrid services will be more difficult than defining health models for services that only include components hosted on your private cloud. Part of the reason for this may be a fear of the unknown. You understand your private cloud systems, and can do deep troubleshooting on them, if necessary. When using a public provider however, you have little understanding of the underlying hardware configuration, and no troubleshooting capability. While this might initially be concerning, after your confidence in a public provider grows, you'll likely find that defining health models for service components hosted on a public cloud are even easier than when the components are hosted on your private cloud, since all of the hardware configuration and troubleshooting responsibility is now the public provider's, not yours. As a result, your health models will have significantly less failure or degradation conditions, which also means less conditions that your systems must monitor for and remediate. Some public cloud providers offer service level agreements (SLAs) that include an availability level that they commit to meet each month. As long as your service provider meets this SLA, you no longer need be concerned with how the provider meets the SLA, only that it did meet it. While this is true when consuming infrastructure as a service functionality from a public provider, it's even more true when consuming platform as a service (PaaS) capabilities from public providers.
4.3.5 Application
Problem: Not all applications are optimized for cloud infrastructures and may not be able to be hosted on cloud infrastructures.
Solution: Not all public cloud service providers support the same application patterns. For example, if you have an application that relies upon Microsoft Windows Server Failover Clustering as its high-availability mechanism, this application can be thought of as using the stateful application pattern. This application could be deployed with some public service providers, but not with others. Among other reasons, Windows Server Failover Clustering requires some form of shared storage, a capability that few public service providers currently support. It's important to understand which application patterns are used within the organization. It's also important to identify which application patterns a public provider supports. It's only possible to migrate applications that were designed with patterns supported by the public service provider.
4.3.6 Cost Model
Problem: Consumers tend to use more resources than they really need if there's no cost to them for doing so.
Solution: While a public provider will charge your organization based on consumption, you must decide what costs you'll show or charge your internal consumers for the resources. You will likely show or charge a higher cost to your internal consumers than you were charged by the public provider. This is largely due to the fact that you will probably integrate the public cloud provider's functionality with your private cloud functionality, and that integration most likely has a cost. For example, you probably currently show or charge your internal consumers when they use a virtual machine on your private cloud. You may provide some type of single sign-on capability to your internal consumers, and offer that capability with the virtual machines that are hosted on your private cloud. As a result, some portion of the cost that you show or charge internal consumers for that virtual machine is the cost to provide the single sign-on capability. A similar cost should be added to the virtual machines that are hosted with a public provider, if you also offer the same single sign-on capability for them. You may add further costs to support additional capabilities such as monitoring, backup, or other capabilities for public cloud virtual machines too.
5.0 Physical Design Considerations
With an understanding of the requirements detailed in the Envisioning the Hybrid Cloud Solution section of this document, and the reference model and principles, you can select appropriate products and technologies to implement the hybrid cloud infrastructure design. The following table lists the hardware vendor-agnostic and Microsoft products, technologies, and services that can be used to implement various entities from the reference model that is defined in this document.
Reference model entity | Product/technology/external service |
Network (support and services) |
|
Authentication (support and services) |
|
Directory (support and services) |
|
Compute (support and services) |
|
Storage (support and services) |
|
Network infrastructure |
|
Compute infrastructure |
|
Storage infrastructure |
|
After selecting the products, technologies, and services to implement the hybrid cloud infrastructure, you can continue the design of the hybrid cloud infrastructure solution. The sections that follow outline a logical design process for the service, but, as mentioned in the Envisioning the Hybrid Cloud Solution section of this document, the design and requirements definition process is iterative until it’s complete. As a result, after you make some design decisions in earlier sections of this document you may find that decisions you make in later sections require you to re-evaluate decisions you made in earlier sections.
The primary sub-sections of this section are the “functional” design for the service, and align to entities in the reference model. Lower-level sub-sections then address specific design considerations which may vary from functional to service-level considerations.
The remainder of the document addresses design considerations and the products, technologies, and services listed in the preceding table. In cases where multiple Microsoft products, technologies, and services can be used to address different design considerations, the trade-offs between them are discussed. In addition to Microsoft products, technologies, and services, relevant vendor-agnostic hardware technologies are also discussed.
5.1 Overview
The physical design of the hybrid cloud infrastructure brings together the answers to the questions that were presented earlier in the document and the technology capabilities and options that are made available to you. The physical design that is discussed in this document uses a Microsoft–based, on-premises infrastructure and a Windows Azure Infrastructure Services–based public cloud infrastructure component. With that said, the design options and considerations can be applied to any on-premises and public cloud infrastructure provider combination.
When considering the hybrid cloud infrastructure from the physical perspective, the primary issues that you need to address include:
- Public cloud infrastructure service account acquisition and billing considerations
- Public cloud infrastructure server provider authentication and authorization considerations
- Network design considerations
- Storage design considerations
- Compute design considerations
- Application authentication and authorization considerations
- Management and support design considerations
We will discuss each of these topics in detail and will discuss the advantages and disadvantages of each of the options. In many cases, you will find that there is a single option. When this is true, we will discuss capabilities and possible limitations and how you can work with or around the limitations.
5.2 Service Account Acquisition and Billing Considerations for Public Cloud Infrastructure
When designing a hybrid cloud infrastructure, the first issue you need to address is how to obtain and provision accounts with the public cloud infrastructure service provider. In addition, if the public cloud infrastructure service provider supports multiple payment options, you will need to determine which payment option best fits your needs now, and whether, in the future, you might want to reconsider the payment options that you’ve selected.
For example, Windows Azure offers several payment plans:
- Pay as you go—no up-front time commitment and you can cancel at any time
- 6-months—pay monthly for six months
- 6-months—pay for six months up front
- 12-months—pay monthly for twelve months
- 12-months—pay for twelve months up front
Pay as you go is the most expensive. Discounts are offered for each of the other four plans. You also have the choice to have the service billed to your credit card or your organization can be invoiced.
For more information on Windows Azure pricing plans, see Windows Azure Purchase Options.
You also need to consider whether you want to have the same person who owns the account (and therefore is responsible for paying for the service) to also have administrative control over the services that are running the public side of your hybrid cloud infrastructure. In most cases, the payment duties and the administrative duties will be separate. Determine whether your cloud service provider enables this type of role-based access control.
For example, Windows Azure as the notions of accounts and subscriptions. The Windows Azure subscription has two aspects:
- The Windows Azure account, through which resource usage is reported and services are billed.
- The subscription itself, which governs access to and use of the Windows Azure services that are subscribed to. The subscription holder manages services (for example, Windows Azure, SQL Azure, Storage) through the Windows Azure Platform Management Portal.
A single Windows Azure account can host multiple subscriptions, which can be used by multiple teams responsible for the hybrid cloud infrastructure if you need additional partitioning of your services.
It’s important to be aware that using a single subscription for multiple projects can be challenging from an organizational and billing perspective. The Windows Azure management portal provides no method of viewing only the resources used by a single project, and there is no way to automatically break out billing on a per-project basis. While you can somewhat alleviate organizational issues by giving similar names to all services and resources that are associated with a project (for example, HRHostedSvc, HRDatabase, HRStorage), this does not help with billing.
Due to the challenges with granularity of access, organization of resources, and project billing, you may want to create multiple subscriptions and associate each subscription with a different project. Another reason to create multiple subscriptions is to separate the development and production environments. A development subscription can allow administrative access by developers while the production subscription allows administrative access only to operations personnel.
Separate subscriptions provide greater clarity in billing, greater organizational clarity when managing resources, and greater control over who has administrative access to a project. However this approach can be more costly than using a single subscription for all of your projects. You should carefully consider your requirements against the cost of multiple subscriptions.
For more information on Windows Azure accounts and subscriptions, see What is an Azure Subscription.
For more information on account acquisition and subscriptions, see Provisioning Windows Azure for Web Applications
You need to determine how the public cloud service provider partitions the services for which you will be billed. For example:
- Does the public cloud infrastructure service provider surface all cloud infrastructure services that are part of a single offering?
- Does the public cloud infrastructure service provider require that you purchase each infrastructure service separately?
- Does the public cloud infrastructure provider provide their entire range of cloud infrastructure services as a single entity, but also make available some value-added services that you can purchase separately?
Note that in all three of these cases, the public cloud service provider would bill based on usage, because metered services is an essential characteristic of cloud computing.
For example, Windows Azure Infrastructure Services is a collection of unique service offerings within the entire portfolio of Azure service offerings. Specifically, Azure Infrastructure Services includes Azure Virtual Machines and Azure Virtual Networks. In addition, Azure Virtual Networks takes advantage of some of the PaaS components of the system to enable the site-to-site and point-to-site VPN gateway. However, when you obtain an Azure account and set up a subscription, all of the Windows Azure services are available to you with the exception of some additional value-added services that you can purchase separately.
5.3 Network Design Considerations
In most cases, a hybrid cloud infrastructure requires you to extend your corporate network to the cloud infrastructure service provider’s network so that communications are possible between the on-premises and off-premises components. There are several primary issues that you need to consider when designing the networking component to support the hybrid cloud infrastructure. These include:
- On-premises physical network design
- Inbound connectivity to the public infrastructure service network
- Load balancing inbound connections to public infrastructure service virtual machines
- Name resolution for the public infrastructure service network
This section expands each of these issues.
5.3.1 On-Premises Physical Network Design
You need to consider the following issues when deciding what changes you might need to make to the current physical network:
- How will you connect the on-premises network to the public infrastructure services network?
- What path should the on-premises users take to access resources in the public cloud infrastructure provider’s network?
- What network access controls will you use to control access between on-premises and off-premises resources?
5.3.1.1 Network Connection Between On-Premises and Off-Premises Resources
There are typically three options available to you to connect on- premises and off-premises resources:
- Site-to-site VPN connection
- Dedicated WAN link
- Point-to-site connection
Site-to-Site VPN
A site-to-site VPN connection enables you to connect entire networks together. Each side of the connection hosts at least one VPN gateway, which essentially acts as router between the on-premises and off-premises networks. The routing infrastructure on the corporate network is configured to use the IP address of the local VPN gateway to access the network ID(s) that are located on the public cloud provider’s network that hosts the virtual machines that are part of the hybrid cloud solution.
For more information about site-to-site VPNs, see What is VPN?
Windows Azure Virtual Networks
Windows Azure enables you to put virtual machines on a virtual network that is contained within the Windows Azure infrastructure. Virtual Networks enable you to create a virtual network and place virtual machines into the virtual network. When virtual machines are placed into an Azure Virtual Network, they will be automatically assigned IP addresses by Windows Azure, so all virtual machines must be configured as DHCP clients. However, even though the virtual machines are configured as DHCP clients, they will keep their IP addressing information for the lifetime of the virtual machine. ** **
Note:
The only time when a virtual machine will not keep an IP address for the life of the virtual machine on an Azure Virtual Network is when a virtual machine might need to be moved as a consequence of “service healing.” If a virtual machine is created in the Windows Azure portal, and it then experiences service healing, that virtual machine is assigned a new IP address. You can avoid this by creating the virtual machine by using PowerShell instead of creating it in the Windows Azure portal. For more information on service healing, please see Troubleshooting Deployment Problems Using the Deployment Properties.
Virtual machines on the same Azure Virtual Network will be able to communicate with one another only if those virtual machines are part of the same cloud service. If the virtual machines are on the same virtual network and are not part of the same cloud service, those virtual machines will not be able to communicate with one another directly over the Azure Virtual Network connection.
You can use an IPsec site-to-site VPN connection to connect your corporate network to one or more Azure Virtual Networks. Windows Azure supports several VPN gateway devices that you can put on your corporate network to connect your corporate network to an Azure Virtual Network. The on-premises gateway device must have a public address and must not be placed behind a NAT device.
For more information on which VPN gateway devices are supported, see About VPN Devices for Virtual Network.
Note:
While you can connect your on-premises network to multiple Azure Virtual Networks, you cannot connect a single Azure Virtual Network to multiple on-premises points of presence.
A single Azure Virtual Network can be assigned IP addresses in multiple network IDs. You can obtain a summarized block of addresses that represents the number of addresses you anticipate you will need and then you can subnet that block. However, connections between the IP subnets are not routed, and therefore there are no router ACLs that you can apply between the IP subnets.
However, you should still consider whether you will want multiple subnets. One reason for multiple subnets is for accounting purposes, where virtual machines that match certain roles within your hybrid cloud infrastructure are placed on specific subnets that are assigned to those roles. However, you can use Network ACLs to control traffic between virtual machines in an Azure Virtual Network. For more information on Network ACLs in Azure Virtual Networks, please see setting an Endpoint ACL on a Windows Azure VM.
You should also consider the option of using multiple Azure Virtual Networks to support your hybrid cloud infrastructure. While different Azure Virtual Networks can’t directly communicate with each other over the Azure network fabric, they can communicate with each other by looping back through the on-premises VPN gateway. Keep in mind that there are egress traffic costs that are involved with this option, so you need to assess cost issues when considering this option. This is also the case when you host some virtual machines in the Windows Azure PaaS services (which are part of a different cloud service than the virtual machines). The virtual machines in the PaaS services need to loop back through the on-premises VPN gateway to create the machines in the Azure Infrastructure Services Azure Virtual Networks.
You should decide on the IP addressing scheme, whether to use subnets, and the number of Azure Virtual Networks you will need before creating any virtual machines. After these decisions are made, you should create or move virtual machines onto those virtual networks.
Another important consideration is that Azure site-to-site VPN uses pre-shared keys to support the IPsec connection. Some enterprises may not consider pre-shared keys as an enterprise ready approach for supporting IPsec site to site VPN connections, so you will want to confer with your security team to determine if this approach is consistent with corporate security policy. For more information on this issue, please see Preshared Key Authentication. Note that your IT organization may consider the security and management issues for pre-shared keys to be an remote access VPN client problem only.
For more information on Azure Virtual Networks and how to configure and manage them, see Windows Azure Virtual Network Overview.
Dedicated WAN Link
A dedicated WAN link is a permanent telco connection that is established directly between the on-premises network and the cloud infrastructure service provider’s network. Unlike the site-to-site VPN, which represents a virtual link layer connection over the Internet, the dedicated WAN link enables you to create a true link layer connection between your corporate network and the service provider’s network.
For more information on dedicated WAN links, see Wide Area Network.
At the time this document was written, Windows Azure did not support dedicated WAN link connections between the on-premises network and Azure Virtual Networks.
Point-to-Site Connections
A point-to-site connection (typically referred to as a remote access VPN client connection) enables you to connect individual devices to the public cloud service provider’s network. For example, suppose you have a hybrid cloud infrastructure administrator working from home from time to time. The administrator could establish a point-to-site connection from his computer in his home to the entire public cloud service provider’s network that hosts the virtual machines for his organization.
For more information on remote access VPN connections, see Remote Access VPN Connections.
Windows Azure supports point-to-site connectivity that uses a Secure Socket Tunneling Protocol (SSTP)–based remote access VPN client connection. This VPN client connection is done using the native Windows VPN client. When the connection is established, the VPN client can access any of the virtual machines over the network connection. This enables administrators to connect to the virtual machines using any administrative web interfaces that are hosted on the virtual machines, or by establishing a Remote Desktop Protocol (RDP) connection to the virtual machines. This enables hybrid cloud infrastructure administrators to manage the virtual machines at the machine level without requiring them to open publically accessible RDP ports to the virtual machines.
In order to authenticate VPN clients, certificates must be created and exported. If you have a PKI, you can use an X.509 certificate issued by your CA. If you don’t have a PKI, you must generate a self-signed root certificate and client certificates chained to the self-signed root certificate. You can then install the client certificates with private key on every client computer that requires connectivity.
For more information on point-to-site connections to Windows Azure Virtual Networks, see About Secure Cross-Premises Connectivity.
The following table lists the advantages and disadvantages of each of the approaches that are discussed in this section.
Connectivity options | Advantages | Disadvantages |
---|---|---|
Site-to-site VPN |
|
|
Dedicated WAN Link |
|
|
Point-to-site connection (remote access VPN client connection) |
|
|
5.3.2 Inbound Connectivity to the Public Cloud Infrastructure Service Network
Inbound connectivity to the public cloud infrastructure provider’s network is about how users will connect to the services that are hosted by the virtual machines within the provider’s network. Important options to consider include:
- All access to services that are hosted on the public cloud infrastructure provider’s network will be done over the Internet.
- All access to services that are hosted in the public cloud infrastructure provider’s network will be done over the corporate network and any site-to-site VPN or dedicated WAN link connection that connects the corporate network to the public cloud infrastructure service provider’s network.
- Some access to services that are hosted on the public cloud infrastructure provider’s network will be done over the Internet and some will be done from within the corporate network.
All Access to Cloud Hosted Services is Through the Internet
With the first option, all connections to services in the public cloud infrastructure service provider’s network will be made over the Internet. It doesn’t matter whether the client system is inside the corporate network or outside the corporate network. With this configuration you need to maintain only a single DNS entry for inbound access to the service, because all client machines will be accessing the same IP address, In Windows Azure, this is the address of the VIP that is assigned to the front-ends of the service that is hosted in the Azure Infrastructure Services Virtual Network.
All Access to Cloud Hosted Services is Through Site-to-Site VPN or WAN Link
The second option represents the opposite of the first, in that all clients that need to connect to parts of the service that are hosted in the public cloud infrastructure provider’s network will need to do it from within the confines of the corporate network. The service will not be available to users on the Internet “at large” and client systems will have to take a path through the corporate network to reach the services.
That doesn’t mean that the client systems must be physically attached to the corporate network (or attached through the corporate wireless). A client system could be off-site, but connected to the corporate network over a remote-access VPN client connection or similar technology, such as Windows DirectAccess. The DNS configuration in this case would require just a single entry, because all access to the resources in the public cloud infrastructure service provider’s network will be to the IP address that is assigned to the virtual machine in the public cloud infrastructure service provider’s network. In an Azure Virtual Network, this would be the DIP that is assigned to the front-end virtual machines of the service.
For more information on DirectAccess in Windows Server 2012, see Remote Access (DirectAccess, Routing and Remote Access) Overview.
Access to Cloud Hosted Services Varies with Client Location
The third option allows for hosts that are not connected to the corporate network to connect through the Internet to the service that is hosted in the public cloud infrastructure service provider’s network. Clients that are connected to the corporate network can access the service by going through a site-to-site VPN or dedicated WAN link that connects the corporate network to the public cloud infrastructure service provider’s network.
This option requires that you maintain a DNS record that client systems can use when they are not on the corporate network, which in Azure represents the VIP that is used to access the virtual machine. It also requires a DNS record that clients will use when they are connected to the corporate network, which in Azure represents the DIP that is assigned to the virtual machine. This design requires that you create a split DNS infrastructure.
For more information on a split DNS infrastructure, see You Need A Split DNS!
The following table describes some of the advantages and disadvantages of the three options for inbound connectivity.
Inbound connectivity option | Advantages | Disadvantages |
---|---|---|
All inbound access is done over the Internet. |
|
|
All inbound access is done through the corporate network. |
|
|
Some inbound access is over the Internet and some is over the corporate network. |
|
|
5.3.3 Load Balancing of Inbound Connections to Virtual Machines of a Public Infrastructure Service
Services that you place in the public cloud infrastructure service provider’s network may need to be load balanced to support the performance and availability characteristics that you require for a hybrid application running on a hybrid cloud infrastructure. There are several ways you can enable load balancing of connections to services that are hosted on the public cloud infrastructure service provider’s network. These include:
- Use a load balancing mechanism that is provided by the public cloud infrastructure service provider that’s integrated with the service provider’s fabric management system.
- Use some form of network load balancing that is enabled by the operating systems, or use an add-on product that runs on the virtual machines themselves.
- Use an external network load balancer to perform load balancing of the incoming connections to the service components that are hosted in the public cloud infrastructure provider’s network.
Load Balancing Mechanism Provided by the Public Cloud Infrastructure Service Provider
The first option requires that the service provider has a built-in load balancing capability that is included with its service offering. In Windows Azure, external communication with virtual machines can occur through endpoints. These endpoints are used for various purposes, such as load-balanced traffic or direct virtual machine connectivity, like RDP or SSH.
Windows Azure provides round-robin load balancing of network traffic to publicly defined ports of a cloud service that is represented by these endpoints. For virtual machines, you can set up load balancing by creating new virtual machines, connecting them under a cloud service, and then adding load-balanced endpoints to the virtual machines.
For more information on load balancing for virtual machines in Windows Azure, see Load Balancing Virtual Machines
Load Balancing Enabled on the Virtual Machines
The second option requires that the operating system running on the virtual machines in the public cloud infrastructure service provider’s network must run some kind of software-based load-balancing system.
For example, Windows Server 2012 includes the Network Load Balancing feature, which can be installed on any virtual machine that runs that operating system. There are other load balancing applications that can be installed on virtual machines. The service provider must be able to support these guest-based load balancing techniques, because they often change the characteristics of the MAC address that is exposed to the network. At the time this paper was written, Azure Virtual Networks did not support this type of load balancing.
For more information about Windows Server Network Load Balancing, see Network Load Balancing Overview
Use an External Network Load Balancer
The third option is a relatively specialized one because it requires that you can control the path between the client of the service that is hosted in the public cloud infrastructure service provider’s network and the destination virtual machines. The reason for this is that the clients must pass through the dedicated hardware load balancer so that the hardware load balancer can perform the load balancing for the client systems.
This option is likely not going to be available from the public cloud infrastructure service provider’s side, because public providers in general do not allow you to place your own equipment on their network. This method would work if you are hosting an application in the service provider’s network that is accessible only to clients on the corporate network. Because you have control of what path internal clients will use to reach the service, you can easily put a load balancer in the path.
For more information on external load balancers, see Load Balancing (computing)
The following table describes the advantages and disadvantages of each of these three approaches.
Load-balancing mechanism | Advantages | Disadvantages |
---|---|---|
Public cloud infrastructure service provider load balancing solution |
|
|
OS-based or add-on load-balancing solution |
|
|
External load-balancing solution |
|
|
5.3.4 Name Resolution for the Public Infrastructure Service Network
Name resolution is a critical activity for any application in a hybrid cloud infrastructure. Applications that span on-premises components and those in the public cloud infrastructure provider’s network must be able to resolve names on both sides in order for all tiers of the application to work easily with one another.
There are several options for name resolution in a hybrid cloud infrastructure:
- Name resolution support that is provided by the public cloud infrastructure service provider
- Name resolution support that is based on your on-premises DNS infrastructure
- Name resolution support that is based on external DNS infrastructure
Name Resolution Services Provided by the Public Cloud Infrastructure Service Provider
The public cloud infrastructure service provider may provide some type of DNS services as part of its service offering. The nature of the DNS services will vary. For example, Azure Virtual Networks provide basic DNS services for name resolution of virtual machines that are part of the same cloud service. Be aware that this is not the same as virtual machines that are on the same Azure Virtual Network. If two virtual machines are on the same Azure Virtual Network, but as not part of the same cloud service, they will not be able to resolve each other’s names by using the Azure Virtual Network DNS service.
For more information on Azure Virtual Network DNS services, see Windows Azure Name Resolution.
Name Resolution Services Based on On-Premises DNS Infrastructure
The second option is the one you’ll typically use in a hybrid cloud infrastructure where applications span on-premises networks and cloud infrastructure service provider’s networks. You can configure the virtual machines in the service provider’s network to use DNS servers that are located on premises, or you can create virtual machines in the public cloud infrastructure service provider’s network that hosts corporate DNS services and are part of the corporate DNS replication topology. This makes name resolution for both on-premises and cloud based resources to be available to all machines that support the hybrid application.
Name Resolution Services External to Cloud and On-Premises Systems
The third option is less typical, as it would be used when there is no direct link, such as a site-to-site VPN or dedicated WAN link, between the corporate network and the public cloud infrastructure services network. However, in this scenario you still want to enable some components of the hybrid application to live in the public cloud and yet keep some components on premises. Communications between the public cloud infrastructure service provider’s components and those on premises can be done over the Internet. If on-premises components need to initiate connections to the off-premises components, they must use Internet host name resolution to reach those components. Likewise, if components in the public cloud infrastructure service provider’s network need to initiate connections to those that are located on premises, they would need to do so over the Internet by using a public IP address that can forward the connections to the components on the on-premises network. This means that you would need to publish the on-premises components to the Internet, although you could create access controls that limit the incoming connections to only those virtual machines that are located in the public cloud infrastructure services network.
The following table describes some advantages and disadvantages of each of these approaches.
Name resolution approach | Advantages | Disadvantages |
Public cloud infrastructure service provider supplies DNS. |
|
|
DNS is integrated with on-premises DNS infrastructure. |
|
|
DNS is based on public/external DNS infrastructure. |
|
|
5.4 Storage Design Considerations
When considering options for storage in a hybrid cloud infrastructure scenario, you will need to assess current storage practices and storage options that are available with your public cloud infrastructure service provider.
Storage issues that you might consider include:
- Storage tiering options
- IaaS database options
- PaaS database options
5.4.1 Storage Tiering
Storage tiering enables you to place workloads on storage that support the IOPS requirements of a particular workload. For example, you might have a database bound application that needs to handle a large number of transactions per second. You would want the public cloud infrastructure service provider to have an option for you to host your database on fast storage, perhaps Solid State Disk (SSD) storage. On the other hand, you may have other applications that do not require ultra-fast storage, in which case you could put those applications in a slower storage tier. The assumption is that the public cloud infrastructure service provider will charge more for the high performance storage and less for the less performant storage.
At the time this document was written, Azure Infrastructure Services does not provide an option for tiered storage. However, the service constantly evolves. Make sure to refer to the Windows Azure documentation pages on a regular basis during your design process.
5.4.2 IaaS Database
There are scenarios in a hybrid cloud infrastructure where the front-end and application tiers will be hosted in the public cloud infrastructure service provider’s network and the database tier is hosted on premises. Another possibility is that the front-end, application and database tiers are hosted on the public cloud infrastructure service provider’s network. In this scenario, you will need to investigate whether or not the public cloud infrastructure service provider supports running database applications on virtual machines hosted on its network.
Windows Azure supports placing SQL Server on Azure Infrastructure Services. For applications that need full SQL Server functionality, Azure Infrastructure Services is a viable solution. SQL Server 2012 and SQL Server 2008 R2 images are available and they include standard, web and enterprise editions. If you have an existing SQL Server license with software assurance, you can move your existing license to Windows Azure and only pay for compute and storage.
Running SQL Server in Azure Infrastructure Services is an viable option in the following scenarios:
- Developing and testing new SQL Server applications quickly
- Hosting your existing Tier 2 and Tier 3 SQL Server applications
- Backing up and Restoring your On-Premises databases
- Extending On-Premises Applications
- Create Multi-Tiered Cloud Applications
5.4.3 PaaS Database and Storage
While the focus of this document is on core IaaS functionality and considerations in a hybrid cloud infrastructure, there may be scenarios where you will want to take advantage of a PaaS database offering provided by your public cloud infrastructure service provider. Sometimes referred to as “Database as a Service”, you can take advantage of this option to simplify your design by allowing the service provider to manage the infrastructure that supports the database application so that you can focus on the front-end and application tiers.
Windows Azure has a PaaS database as a service offering. For applications that need a full featured relational database-as-a-service, Windows Azure offers SQL Database, formerly known as SQL Azure Database. SQL Database offers a high-level of interoperability, enabling you to build applications using many of the major development frameworks.
Table storage is another option that your public cloud service provider might offer. This can be used to store large amounts of unstructured data. Windows Azure offers table based storage that is a ISO 27001 certified managed service which can auto scale to meet massive volume of up to 200 terabytes. Tables are accessible from virtually anywhere via REST and managed APIs.
Finally, your public cloud infrastructure service provider may offer blob storage for your applications and virtual machines. Blobs are easy way to store large amounts of unstructured text or binary data such as video, audio and virtual machine images. Like table storage, Windows Azure Blobs are an ISO 27001 certified managed service which can auto scale to meet massive volume of up to 200 terabytes a. Blobs are accessible from virtually anywhere via REST and managed APIs.
For more information on these storage options, please see Azure Data Management.
5.5 Compute Design Considerations
Compute design considerations center on the virtual machines that will be hosted on premises and in the public cloud service provider’s network. In some cases, the only virtual machines that are participating in a hybrid cloud infrastructure will be on the public cloud infrastructure service provider’s network, since the on-premises resources will be hosted on physical hardware instead of being virtualized. Whether current services are run on physical or virtualized hardware, you will need to take into account issues related to the virtual machine offering made available by the public cloud service provider.
Consider the following issues when designing the hybrid cloud infrastructure’s compute components:
- Does your public infrastructure service provider make operating system images available?
- Can you port on-premises images into the public cloud service provider’s network?
- What types of disks does the public infrastructure service provider make available?
- What level of customization for virtual machine virtual hardware is available?
- How will you access the virtual machines on the public cloud infrastructure service provider’s network?
- What virtual machine availability options does your public cloud infrastructure service provider support?
- What are your backup and disaster recovery options?
5.5.1 Operating System and Service Images
An image is a virtual disk file that you use as a template to create a new virtual machine. An image is a template because, unlike a running virtual machine, it doesn't have specific settings such as the computer name and user account settings. When you create a virtual machine from an image, an operating system disk is automatically created for the new virtual machine.
Some public cloud infrastructure service providers will provide images that not only contain operating systems, but also contain services that run on top of the operating system. These are sometimes referred to as “service templates” and such templates can enable you to stand up services more quickly than it would be if you had to first install the operating system and then install the services that you want to run.
Windows Azure makes both operating system and service images available to you. You can either use an image provided by Windows Azure in the Image Gallery, or you can create your own image to use as a template. For example, you can create a virtual machine from an image in the Image Gallery. Windows Azure provides a selection of Windows and Linux images, as well as images that have BizTalk and other applications already installed.
For more information on operating system and service images in Windows Azure, please see Manage Disks and Images.
5.5.2 On-Premises Physical and Virtual Service Images and Disks
Another option available to you when designing a hybrid cloud infrastructure is to create your own images and post them to the public cloud infrastructure service provider s network This enables you to:
- Create your own operating system images that contain your own customizations
- Create your own service images, that contain the services that you want to be ready to run
- Perform physical to virtual conversions so that you can move applications running on physical hardware to virtual hardware on the public cloud infrastructure service provider’s network
- Move virtual disks that host services in your own datacenter to the public cloud infrastructure service provider’s network
Windows Azure enables you to not only use images provided by Azure, but also images that you create on premises. To create a Windows Server image, you must run the Sysprep command on your development server to generalize and shut it down before you can upload the .vhd file that contains the operating system.
For more information about using Sysprep, see How to Use Sysprep: An Introduction.
To create a Linux image, depending on the software distribution, you must run a set of commands that are specific to the distribution and you must run the Windows Azure Linux Agent.
For more information on creating and moving on premises disk images, please see Manage Disks and Images.
5.5.3 Virtual Disk Formats and Types
Virtual Disk Formats
You will need to consider what virtual disk formats are supported by your public cloud infrastructure services provider. Each virtualization platform vendor typically supports its own virtual disk container format. You will need to determine which virtual disk formats are supported by the public cloud infrastructure service provider. If the provider you choose does not support the disk formats you currently have in production for the services you want to move to the infrastructure service provider’s network, then you will need to perform a disk format conversion prior to posting those disks into your public cloud infrastructure service provider’s network.
For example, Windows Azure currently supports only the .vhd file format. If you have virtual machines running on a non-Hyper-V virtualization infrastructure, or if you have virtual machines running on a Windows Server 2012 virtualization infrastructure that use the .vhdx format, you will need to convert those disk formats to .vhd. There are a number tools available for converting disk formats. For one example, please see How to Deploy a Virtual Machine by Converting a Virtual Machine (V2V).
Virtual Disk Types
Some public cloud infrastructure service providers will make different virtual disk types available to you that you can use in your hybrid cloud infrastructure. These virtual disk types might be useful in different scenarios, such as disks that can be used as operating system disks or storage disks.
Azure Infrastructure Services supports an operating system disk VHD that you can boot and mount as a running version of an operating system. Any VHD that is attached to virtualized hardware and that is running as part of a service is an operating system disk. After an image is provisioned, it becomes an operating system disk. An operating system disk is always created when you use an image to create a virtual machine. The VHD that is intended to be used as an operating system disk contains the operating system, any operating system customizations, and your applications. Azure Infrastructure Services operating system disks are read-write cache enabled.
Azure Infrastructure Services also supports a VHD can be used as a data disk to enable a virtual machine to store application data. After you create a virtual machine, you can either attach an existing data disk to the machine, or you can create and attach a new data disk. Whenever you use a data-intensive application in a virtual machine, it’s highly recommended that you use a data disk to store application data, rather than using the operating system disk. Azure Infrastructure Services data disks by default have read-write caching disabled.
A third type of disk known as a "Caching Disk" it automatically included with any virtual machine created in Azure Infrastructure Services. This disk is used for the pagefile by default. If you have other temporary data that you want to save to local storage, you can place that data on the Caching disk. The information on the Caching Disk is not persistent and does not survive reboots of the virtual machine.
For more information about Azure Infrastructure Services operating system and data disks, please see Azure Virtual Machines.
5.5.4 Virtual Machine Customization
Different public cloud infrastructure service providers will provide various levels of customization for your virtual machines. Typical customizations at the infrastructure layer include how much memory, how many and speeds of processors, and how much storage you can make available to a virtual machine. In some cases the public cloud infrastructure service provider will allow granular options for provisioning memory, processors and storage, and in some cases the provider will require you to select from a set of “t-shirt” sized virtual machines with each size defining the amount of processing, memory and storage resources available for that size. Windows Azure Infrastructure Services uses this “t-shirt” size model.
For more information on the types of virtual hardware available to you, please see Virtual Machines.
The amount you pay for virtual machines on the public cloud infrastructure service provider’s network is typically proportional to the size and number of virtual machines you choose. Consider what virtual machines you require to support your hybrid cloud infrastructure in advance. Investigate whether or not the public cloud infrastructure service provider has a price calculator that will assist you in estimating the costs of running the virtual machines you require, in advance.
Windows Azure Infrastructure Services has a pricing calculator to help you assess what your costs will be. Please see Windows Azure Pricing Calculator.
5.5.5 Virtual Machine Access
You will need to consider how you will access the virtual machines running on the public cloud infrastructure service provider’s network. The method of access will vary with the operating system running within the virtual machine. For Windows based operating systems, you have the option to use the Remote Desktop Protocol (RDP) to connect to the virtual machine so that you can manage it. You also have the option of using remote PowerShell commands. If the virtual machine is running a Linux-based operating system, you can use the SSH protocol.
For more information about logging on to a virtual machine running Windows Server in Azure Infrastructure Services, see How to Log on to a Virtual Machine Running Windows Server 2008 R2.
For more information about logging on to a virtual machine running Linux in Azure Infrastructure Services, see How to Log on to a Virtual Machine Running Linux.
5.5.6 Virtual Machine and Service Availability
Service Availability
When designing a hybrid cloud infrastructure you will need to consider how you will make the virtual machines running in the public cloud infrastructure service provider’s network highly available. You will need to consider how to make the application highly available as well as the virtual machines that run the application.
Load balancing incoming connections to the virtual machines running the application can help increase application availability. Incoming connections can be spread across multiple virtual machines. These virtual machines typically host the front-end stateless component of the application. If one of the virtual machines hosting the front-end component becomes disabled, connections can be load balanced to other front-end virtual machines. Different public cloud service providers will likely use different load balancing algorithms, so you will want to consider the load balancing algorithm used by the provider when designing application high availability into your hybrid cloud infrastructure.
Azure Infrastructure Services supports load balancing connections to virtual machines on an Azure Virtual Network. For more information about this, please see Load Balancing Virtual Machines.
Virtual Machine Availability
The hardware that supports the virtual machines needs to be maintained on a periodic basis. Your public cloud infrastructure service provider will need to schedule times when software and hardware is serviced and upgraded. In order to make sure that the services that run on those virtual machines continue to be available during maintenance and upgrade windows, you need to consider options that the public cloud service provider makes available to you to prevent downtime during these cycles.
For example, Windows Azure periodically updates the operating system that hosts the virtual machines. A virtual machine is shut down when an update is applied to its host server. An update domain is used to ensure that not all of the virtual machine instances are updated at the same time. When you assign multiple virtual machines to an availability set, Windows Azure ensures that the machines are assigned to different update domains. The previous diagram shows two virtual machines running Internet Information Services (IIS) in separate update domains and two virtual machines running SQL Server also in separate update domains.
For more information on availability for Azure Infrastructure Services virtual machines, please see Manage the Availability of Virtual Machines.
5.6 Management and Support Design Considerations
From the perspective of basic cloud infrastructure considerations, there are some basic issues around management and support design that you'll want to consider. The primary areas include, but are not limited to, the following:
- Consumer and providers portals
- Usage and billing
- Service reporting
- Public cloud infrastructure service provider authentication and authorization
- Application authentication and authorization
- Backup services and disaster recovery
The remainder of this section discusses the options and considerations in these areas.
5.6.1 Consumer and Provider Portal
If you’re providing cloud services to your consumers today, then you already provide them a consumer portal. When your users interact with Windows Azure services, they use the Windows Azure Management Portal as a consumer portal. How similar, or different, is the Windows Azure Management Portal experience to the consumer portal experience you provide to your consumers for your private cloud services? Recall the Create a Seamless User Experience principle mentioned earlier in this document.
A few options are available to you to provide a seamless user experience across both your private cloud services and the Windows Azure public cloud services.
- System Center 2012 App Controller: App Controller provides a common self-service experience that can help you configure, deploy, and manage virtual machines and services across your private cloud, the Windows Azure public cloud, as well as public clouds provided by some public hosting service providers. If your consumers use App Controller to provision new capacity, they could use it instead of using the Windows Azure Management Portal for many of their tasks, though some tasks would still need to be completed through the Windows Azure Management Portal.
- Windows Azure Services for Windows Server: Windows Azure Services for Windows Server includes a consumer portal that you can install on-premises. It integrates with System Center 2012 Virtual Machine Manager, and provides an almost-identical experience to the Windows Azure Management portal experience. It does not however, enable you to provision services both on-premises and on Windows Azure, as App Controller does. So while it provides a similar experience to the Windows Azure Portal, your consumers will still have to use your on-premises portal for provisioning on-premises services, and the Windows Azure Management Portal to provision Windows Azure services.
Note:
Windows Azure Services for Windows Server integrates with Windows Server 2012 and System Center 2012. The next version of Windows Azure Services for Windows Server is the Windows Azure Pack for Windows Server. It will integrate with Windows Server 2012 R2 and System Center 2012 R2.
5.6.2 Usage and Billing
If you’re providing cloud services to your consumers today, then you are already able to track resource consumption by your consumers. You use this data to either charge your customers for their consumption, or simply report back to them on their consumption. Public cloud service providers each have their own pricing and billing options.
Windows Azure Virtual Machines pricing is publicly available, and provides purchase options by credit card or invoicing. Purchase options are connected to a Microsoft Account. When using Windows Azure Services, you’ll need to determine which purchase option you’ll choose, and how those costs will either be charged back or shown back to the individuals within the organization that consumed the resources. As of the writing of this document, Windows Azure billing is provided at the subscription level, and doesn’t provide much granularity for the individual resources consumed within a subscription. Thus, you may decide to setup multiple subscriptions to track resource consumption, or strategies for tracking resource consumption through a single subscription.
5.6.3 Service Reporting
If you’re providing cloud services to your consumers today, then you are already provide reports to your consumers as to whether services met their service level agreements (SLAs) in areas such as performance and availability. Public services providers offer SLAs for the services they provide, as well as service reporting so you know whether or not they met their SLAs.
Windows Azure provides availability SLAs for its various services. An example of the availability SLA offered with the Windows Azure Virtual Machines service is defined in the Virtual Machines article. You’ll need to decide whether it’s possible to integrate the service reporting offered by the public provider with your own service reporting capability. If it is possible, you’ll need to determine whether you want to integrate the public providers’ service reporting with your own or not. If you’re providing a service to your consumers which has some components running on-premises, and others running on a public provider’s cloud, you’ll have to integrate the service reporting capability so that you can provide service level reporting to your consumers.
5.6.4 Public Cloud Infrastructure Service Provider Authentication
When working with a public cloud infrastructure service provider’s system, you need to understand what authentication and authorization/access control options are available to you. In addition, you’ll need to understand how authentication and authorization come together to support your overall account management requirements. In this section we will cover these issues.
5.6.4.1 Authentication
Users need to authenticate to the provider’s system to gain access to system resources. When designing your hybrid cloud infrastructure, you need to determine what authentication options are available to you and what the advantages and disadvantages might be to each approach.
There are several options that might be possible for your authentication to the service provider’s system design:
- The service provider maintains an authentication system completely separate from yours. The service provider requires you to create accounts on that system and those accounts are managed separately from accounts that you maintain on premises.
- The service provider and the enterprise IT group can create a direct federation between their systems or use some method of directory synchronization.
- The service provider and the enterprise IT group can create an indirect federation by leveraging a third-party federation service.
The following table describes the advantages and disadvantages of each of these options by using Active Directory Federation Services and Windows Azure Active Directory as examples of technologies that you can use for direct and indirect federation, respectively.
Option | Advantages | Disadvantages |
You authenticate to the service provider’s proprietary authentication mechanism, separately from any you have already on premises. |
|
|
You can federate your on-premises authentication mechanism with the service provider or use some form of directory synchronization. |
|
|
You can federate your on-premises authentication mechanism with the service provider’s through a federation service such as Windows Azure Active Directory. |
|
|
For more information on how to integrate your on-premises Active Directory domain with Windows Azure Active Directory, see Windows Azure, now with more enterprise access management.
5.6.4.2 Authorization and Access Control
In a hybrid cloud infrastructure, you need to determine what authorization capabilities your public cloud infrastructure provider makes available to you and also what authorization capabilities that you already have, or plan to enable, in your on-premises components of the solution. Important issues that you need to consider include:
- Have you enabled, or do you plan to enable, role-based administrative access control on the on-premises side of your hybrid cloud infrastructure? If so, how will you define the roles? Will you separate roles between service owners and account owners?
- Does the public cloud infrastructure service provider enable role-based access control? If so, how does it define the roles? Does the public cloud infrastructure service provider enable you to separate service management and account management roles?
- Are all employees in the company authorized to request resources from the hybrid cloud infrastructure? If not, how will you determine which employees will have access to the services acquisition portal? Will you add more granularity and allow certain groups of authorized users to have access to specific components of the hybrid cloud infrastructure?
- How do you plan to reflect your current IT organizational structure on the public cloud infrastructure component of the hybrid cloud solution? Do you plan to mirror your IT organization the way it is now? Will you assign members of the current IT organization to the hybrid cloud infrastructure? Will you let the service owners who consume hybrid cloud resources have access to management of the infrastructure that their service runs on—in effect mirroring in the cloud components of their on-premises siloed IT infrastructure?
The following table shows advantages and disadvantages of each of these options in authorization and access control.
AuthN and access control option | Advantages | Disadvantages |
On-premises role-based administrative access control. |
|
|
Public cloud infrastructure service role-based administrative access control. |
|
|
Authorized employees are allowed to acquire hybrid cloud infrastructure resources. |
|
|
Dedicated hybrid cloud infrastructure group. |
|
|
Reflect IT organizational structure to hybrid cloud infrastructure. |
|
|
Allow consumers of the hybrid cloud infrastructure to mirror on-premises siloed infrastructure. |
|
|
For more information on role-based access control in Hyper-V, see Configure Hyper-V for Role Based Access Control
For more information on role-based access control in System Center Virtual Machine Manager, see Private Cloud in System Center Virtual Machine Manager 2012 - Part 2 – Delegate Control.
At this time granular role-based access control is not available in Azure Infrastructure Services.
5.6.4.3 Account Management
You need to consider workflow issues regarding who has access to both the public cloud service account that is used for billing services and any sub-accounts that might be used for administration of the public infrastructure service components.
For example, suppose there is a manager who is responsible for the infrastructure service account. What might happen if that manager were released from the company? It’s possible that if the former manager left on bad terms, that person could potentially cancel the account and thereby block access to all the services. Similarly, what might happen if a member of the hybrid cloud infrastructure team were released from the company, and that person’s administrative account were still active? If the administrator who was released left the company on bad terms, that person could delete virtual machines, leave an exploit on the service, and any number of other things that a person with administrative access could achieve.
For these reasons and more, it’s critical that you have a workflow or account provisioning and deprovisioning process that can prevent these problems from happening. You may already have a workflow and account management system in place that performs these actions for you for on-premises accounts. If that is the case, you can investigate the possibilities of connecting your on-premises account management system with the management system that is used by your public cloud infrastructure service provider.
For example, as mentioned in the table in section 5.3.1 Authentication, you may have the option to federate your on-premises account system with the service provider’s system. If that is the case, user accounts that are provisioned and de-provisioned on premises will automatically be managed for access to the service provider’s system. You might consider an on-premises solution that is based on Forefront Identity Manager (FIM) to help you with this type of account management and tie it into the federated environment.
At this time in Windows Azure, you have the option of assigning an account to be a Service Administrator or Service Co-Administrator. The difference between these two roles is that the Service Co-Administrator cannot delete the Service Administrator account for a subscription. Only the Windows Azure account owner can delete a Service Administrator.
For more information on administrative roles in Windows Azure, see Provisioning Windows Azure for Web Applications.
5.6.5 Application Authentication and Authorization
In a hybrid cloud infrastructure, you will need to consider options available for authentication and authorization. While there are a number of authentication and authorization options available for the applications that you’ll run in the public cloud infrastructure service provider’s network, in the majority of cases those applications will be dependent to a certain degree on Active Directory. For this reason, it’s important to consider your design options for applications run some or all of their components in the public cloud infrastructure service provider’s network.
Key issues for consideration include:
- Active Directory domain controllers in the public cloud infrastructure service provider’s network Considerations
- Read-only domain controller considerations
- Domain controller locator considerations
- Domain, Forest and global catalog considerations
- Active Directory name resolution and geo-location considerations
- Active Directory Federation Services (ADFS) considerations
- Windows Azure Active Directory Considerations
The remainder of this section will detail considerations in each of these areas.
5.6.5.1 Active Directory Domain Controllers in the Public Cloud Infrastructure Provider's Network Considerations
Historically the recommendation has been not to virtualize domain controllers. Many virtualization infrastructure designers have virtualized domain controllers only to experience a failure related to a virtualized domain controller.
For example, backing up and restoring domain controllers can roll back the state of the domain controller and lead to issues that are related to inconsistencies in the Active Directory database. Restoring snapshots from a virtualized domain controller would have the same effect as restoring from backup—the previous state would be restored and lead to Active Directory database inconsistencies. The same effects are seen when you use more advanced technologies to restore a domain controller, such as creating SAN snapshots and restoring those, or creating a disk mirror and then breaking the mirror and using the version on one side of the mirror at a later time as part of a restore process.
Update Sequence Number (USN) “bubbles” create the problems that are most commonly encountered with virtualized domain controllers. USN bubbles can lead to a number of problems, including:
- Lingering objects in the Active Directory database
- Inconsistent passwords
- Inconsistent attribute values
- Schema mismatch if the Schema Master is rolled back
- Duplicated security principles
For these reasons and more, it is critical to avoid USN bubbles.
For more information on USN bubbles, see How the Active Directory Replication Model Works.
VM Generation ID
Virtualization makes it easier to create a USN bubble scenario, and therefore the recommendation in the past has been that you should not virtualize domain controllers. However, with Windows Server 2012, virtualizing domain controllers is now fully supported.
Full support for virtualizing domain controllers is enabled by a feature in the hypervisor which is called the VM Generation ID. When a domain controller is virtualized on a supported virtualization platform, the domain controller will wait until replication takes place to be told what its state and role is. If the virtualized domain controller is one that was restored from a snapshot, it will wait to be told what the correct state is instead of replicating a previous state and causing a USN bubble.
For more information on VM Generation IDs, see Introduction to Active Directory Domain Services Virtualization.
Note:
VM Generation IDs must be supported by both the hypervisor and the guest operating system. Used together, Windows Server 2012 Hyper-V and the Windows Server 2012 operating system acting as a guest will support VM Generation IDs. VMware also supports VM Generation ID when running Windows Server 2012 domain controller guests. Windows Azure Infrastructure Services also supports VM Generation ID and therefore also supports virtualization of domain controllers.
When creating domain controllers in Azure Infrastructure Services, you have the option to create them new on an Azure Virtual Network, or to use one that you created on premises and move it to an Azure Virtual network.
Note:
Do not sysprep domain controllers—sysprep will generate an error when you try to run it on a domain controller.
Instead of using sysprep, consider moving the VHD file to Azure storage and then create a new virtual machine by using that VHD file. If your on-premises domain controller is running on physical hardware, you have the option to do a physical to virtual conversion and move the resultant .vhd file to Azure storage. Then you can create the new virtual machine from that .vhd file.
You also have the option to create a new domain controller in Azure Infrastructure Services and enable inbound replication to the domain controller. In this case, all the replication traffic is inbound, so there are no bandwidth charges due to egress traffic during the initial inbound replication, but there will be egress traffic costs for outbound replication.
Active Directory Related File Placement
When designing an Active Directory design to support hybrid application authentication, you will need to consider the disk types that are available from the public cloud infrastructure service provider. There may be some disk types and caching schemes that are more or less favorable to specific Active Directory domain controller data types.
For example, Windows Azure supports two disk types where you can store information for virtual machines:
- Operating System Disks (OS Disks)—used to store the operating system files
- Data Disks—used to store any other kind of data
As mentioned earlier in this paper, Windows Azure Infrastructure Services also supports a “temporary disk,” but you should avoid storing data on a temporary disk because the information on the temporary disk is not persistent across reboots of the virtual machine. In Windows Azure, the temporary disk is primarily used for the page file and it helps speed up the virtual machine boot process.
In Windows Azure, the main difference between a data disk and an OS disk relates to their caching policies. The default caching policy for an OS disk is read/write. When read/write activity takes place, it will first be performed on a caching disk. After a period of time, it will be written to permanent blob storage. The reason for this is that for the OS disk, which should contain only the core operating system support files, the reads and writes will be small. This makes local caching a more efficient mechanism than making the multiple and frequent small writes directly to permanent storage.
Note:
The OS Disk size limit at the time this was written was 127 GB. However, this might change in the future, so watch the support pages on the Windows Azure website for updates.
The default caching policy for Data Disks is “none,” which means that no caching is performed. Data is written directly to permanent storage. Unlike OS Disks, which are currently limited to 127 GB, Data Disks currently support up to 1 TB. If you need more storage for a disk, you can span up to 16 disks for up to 16 TB, which is available as part of the current Extra Large, A6 and A7 virtual machine’s disk offering.
Note:
These are current maximum Data Disk sizes and numbers. This might change in the future. Please check the Windows Azure support pages for updates.
With all this in mind, consider where you want to place the DIT/Sysvol location. Would it be where caching could lead to a failure to write, or would it be where Active Directory related information is immediately written to disk? The latter is the preferred option.
The main reason for this is that write-behind disk caching invalidates some core assumptions made by Active Directory:
- Domain controllers assert forced unit access (FUA) and expect the I/O infrastructure to honor that assumption.
- FUA is intended to ensure that sensitive writes make it to permanent media (not temporary cache locations).
- Active Directory seeks to prevent (or at least reduce the chances of) encountering a USN bubble.
For more information related to Active Directory and FUA, see Things to consider when you host Active Directory domain controllers in virtual hosting environments.
The following table describes some of the advantages and disadvantages of Azure Infrastructure Services disk types in the context of Active Directory domain controllers.
Windows Azure disk type | Advantages in domain-controller scenario | Disadvantages in domain-controller scenario |
OS Disk |
|
|
Data Disk |
|
|
Temporary Disk |
|
|
5.6.5.2 Read-Only Domain Controller Considerations
There are several options available to you for putting Active Directory in the Azure Infrastructure Services cloud:
- Full read/write domain controllers in a production domain
- Full read/write domain controllers in a trusting domain or forest
- Read-only domain controllers in the production domain
- Read-only domain controllers in a trusting domain or forest
- No domain controllers at all, and you use Active Directory Federation Services
In a hybrid cloud environment, you might consider the public cloud infrastructure service provider’s network as being similar to a branch office, or as an off-premises hosted data center. So it would make sense to take advantage of read-only domain controllers, because they were designed for a branch office deployment.
However, while a public cloud infrastructure service provider’s network may be treated as similar to a branch office, there are some significant differences between the branch office environment that was envisioned by the creators of the read-only domain controller role and the environment seen in a public cloud infrastructure service provider’s network. The main difference is that the branch office scenario is seen as a low security environment, where the domain controller might not be in a physically secure location, which make it vulnerable to theft or tampering. Because of this, the read-only domain controller was designed as a good alternative for branch offices, providing the following benefits:
- Faster authentication.
- Authentication even when the link between the branch office and main office goes down.
- Limited damage if a read-only domain controller is compromised.
The following table describes the advantages and disadvantages of deploying a read-only domain controller (RODC) in a public cloud infrastructure provider’s network, such as Windows Azure.
Advantages | Disadvantages |
|
|
For more information on attribute filtering and credential caching, see RODC Filtered Attribute Set, Credential Caching, and the Authentication Process with an RODC.
5.6.5.3 Domain-Controller Locator Considerations
When putting Active Directory domain services in a public cloud infrastructure service provider’s network, you need to think about how to correctly define and connect Active Directory subnets and sites to the off-premises components—as the choices you make here will influence the cost of the overall solution.
Sites, site links, and subnets affect where authentication takes place and also the topology of domain controller replication. To begin with, here are some definitions:
- A collection of subnets defines a site.
- You connect the sites together using site links.
- You can then create replication policies.
When creating replication policies, consider the following:
- How frequently do you want replication to take place? If you put a read/write domain controller into Azure Virtual Machines and Virtual Networks, then inbound replication events are going to increase the cost because these are seen as egress traffic by Azure Infrastructure Services.
- What days of the week do you want replication to take place? Fewer days mean lower costs that are related to egress traffic.
- Consider creating a replication policy that is based on when you want them to take place, and not have them be event-driven, as this can also run up egress traffic costs.
One option is to define the Azure Virtual Network (or any public cloud service provider’s network) network ID as a subnet in Active Directory, and then machines on that subnet will use the local domain controller for authentication (assuming that they are available). This means that services that are situated in Azure Infrastructure Services won’t have to reach out to on-premises domain controllers for authentication services. This also reduces cost, because if the service in Azure Infrastructure Services had to authenticate using on-premises domain controllers, that would generate egress traffic, which you must pay for.
For more information on Active Directory sites, see Active Directory Sites.
Also consider what costs you want to set on the links. For example, the Azure Infrastructure Services connection represents a much higher-cost link. You’ll also want to consider that when the issue of “next closest site” occurs, the domain controllers in the Azure Infrastructure Services are not considered to be the next closest (unless that’s what you want to intend, such as in the case of remote offices that use a domain controller in Azure Infrastructure Services as a backup).
For more information on this issue, see Enabling Clients to Locate the Next Closest Domain Controller.
Active Directory replication also supports compression. The more compressed the data is, the lower the egress costs will to be.
For more information on Active Directory compression, see Active Directory Replication Traffic.
Finally, consider putting together your replication schedule based on anticipated latency issues. Remember that domain controllers replicate only the last state of a value, so slowing down replication saves cost if there's sufficient churn in your environment.
5.6.5.4 Domain, Forest, and Global Catalog Considerations
Domain and Forest Considerations
A read-only domain controller is not the only option for placing a domain controller into a public cloud infrastructure service provider’s network. Another viable option is to place full read/write domain controllers into the off-premises side of a hybrid cloud infrastructure.
When considering putting a full read/write domain controller on to the public cloud infrastructure service provider’s network, you’ll first want to ask yourself about their security model and operational principles. Azure Infrastructure Services is a public cloud offering, which means that you’re using a shared compute, networking, and storage infrastructure. In such an environment, isolation is a key operating principle, and the Azure team has insured that isolation is enforced to the extent that placing a domain controller in Azure Infrastructure Services is a supported and secure deployment model.
For more information on Azure security, see Windows Azure Security Overview.
The next step is to consider what kind of domain/forest configuration you want to deploy. Some of the options are:
- Deploy domain controllers that are part of the same domain in the same forest.
- Deploy domain controllers that are part of a different domain in the same forest, and configure a one-way trust.
- Deploy domain controllers that are of a different domain in a different forest, and configure a one-way trust.
The first option might represent the least secure option of the three, because if the domain controller in the cloud is compromised, the entire production directory services infrastructure would be affected. The second and third options can be considered incrementally more secure, because there is only a one-way trust, but the overhead of maintaining trusts might not fit organizational requirements.
The last option might be considered be the most secure, but there is administrative overhead that you need to take into account, and not all deployment scenarios will support this kind of configuration. You need to consider these issues before deciding on a domain and forest model.
Given the Azure security model, the consensus is that the first option is the preferred option when you weigh the options for application compatibility, management overhead, and security.
Another important consideration is regulatory and compliance issues. A lot of PII can be stored in these read/write domain controllers, and there may be regulatory issues that you need to consider. There are also cost considerations. You’ll end up generating a lot more egress traffic (depending on authentication load), and there will also be egress replication traffic that you’ll need to factor into the cost equation.
For detailed information about Active Directory security considerations, see Best Practice Guide for Securing Active Directory Installations.
The following table describes some of the advantages and disadvantages of each of the domain and forest models.
Domain/forest model | Advantages | Disadvantages |
Deploy domain controllers that are part of the same domain in the same forest. |
|
|
Deploy domain controllers that are part of a different domain in the same forest, and configure a one-way trust. |
|
|
Deploy domain controllers that are of a different domain in a different forest, and configure a one-way trust. |
|
|
Global Catalog Considerations
When designing a hybrid cloud infrastructure, you need to consider whether you want to put a Global Catalog domain controller into the off-premises component of your infrastructure. A Global Catalog server is a domain controller that keeps information about all objects in its domain and partial information about objects in other domains.
To learn more about Global Catalog servers, see What is the Global Catalog.
A Global Catalog enables an application to ask a single domain controller one question that might refer to multiple domains, even though that domain controller is not a member of the domain for which the question is being asked. A Global Catalog server contains a partial copy of the rest of the forest, and this information is a defined attribute set that is filtered to a Global Catalog server. This is also known as the Partial Attribute Set or PAS.
For more information on the Partial Attribute Set, see How the Global Catalog Works.
There are some reasons why you might not want your domain controller in the Azure Infrastructure Services to be a Global Catalog server. These reasons include:
- Size—inbound replication will start and might contain information you don’t necessarily need. However, there is no cost to you for inbound replication, so there’s no negative impact in that regard. However, depending on the size of your organization, you might need a larger VM that has the requisite storage—this is going to cost more for the larger VM instance.
- Global catalog servers replicate Partial Attribute Set content between themselves—which means that this information will be passed to Global Catalog servers that you put in the Azure Infrastructure Services cloud.
- There’s a chance that the Global Catalog in the Azure Virtual Machines and Virtual Networks cloud will become a preferential source to another Global Catalog somewhere in the world. This is not optimal, as it results in sending updates from the cloud that happen in a domain that you don’t really care about. This subsequently requires a lot of outbound bandwidth, which you have to pay for.
Those are some reasons why you wouldn’t want to put a Global Catalog in the cloud. With those reasons in mind, when would you put a Global Catalog in the cloud? One answer would be, when you have a single-domain forest.
What should you do if you have two domains in the same forest? For example, suppose that one domain is on premises and the second domain is in the Azure Infrastructure Services cloud. The answer is to make the domain controllers in the cloud into Global Catalogs. The reason for this is that authentication (as the user logs on) requires access to a group type in Windows Active Directory called a Universal Group, and Universal Groups require a Global Catalog in order to populate. This means that a Global Catalog is a required step in all authentication scenarios where you have more than a single domain.
Also, consider whether you want the domain controllers in Azure Infrastructure Services to require a round trip to the on-premises network in order to access a Global Catalog at every single authentication attempt. This is a tradeoff, and the decision depends on what the replication requirements would be versus how many authentication attempts are made. You probably don’t think so much about these issues when Active Directory is on premises only, but when you design a hybrid cloud infrastructure in which egress traffic is billable, your design considerations must take this factor into account.
Workloads in the cloud that authenticate against a domain controller in the cloud will still generate outbound authentication traffic if you don’t have a Global Catalog in the cloud. It’s difficult to provide hard and fast guidance because this scenario is fairly new, and you’re likely going to have to figure out the relative costs of the different options (authentication traffic versus replication traffic) or wait until we have something that is based on our experiences that we might be able to share with you in the future.
What we do know is that the Global Catalogs are used to expand Universal Group membership, which is likely going to lead to even less predictable costs for Global Catalogs because they host every domain (in part). However, something that might complicate issues even more, or at least require more study, is the effect of creating an Internet-facing service that authenticates with Active Directory.
One option is to take advantage of Universal Group Membership Caching, but there are issues with this solution and you probably will want to consider those.
For more information on Universal Group Membership Caching, see Enabling Universal Group Caching for a Site.
Finally, most replication for the Global Catalogs in the Azure Infrastructure Services cloud is going to be inbound, so cost is not an issue there. Outbound replication is possible, but this can be avoided by configuring the right site links.
The following table summarizes some of the advantages and disadvantages of putting a Global Catalog server in the public cloud infrastructure service provider’s network.
Advantages of a Global Catalog in the cloud | Disadvantages of a Global Catalog in the cloud |
|
|
5.6.5.5 Active Directory Name Resolution and Geo-Distribution Considerations
You will need to consider both Active Directly name resolution and geo-distributon issues when designing your hybrid cloud infrastructure.
Active Directory Name Resolution Considerations
As mentioned earlier, Azure Virtual Networks has its own DNS services that it enables when you put new virtual machines in a Virtual Network. This is very basic name resolution that allows machines in the same cloud service to resolve each other’s names. While this is useful if all your machines and all the services running on those machines are dependent only on each other, it’s not enough to support an Active Directory environment. If you include in your design some level of support for Active Directory authentication for services that are running on the Azure Virtual Network, you will need to deploy a name-resolution infrastructure that exceeds the capacity of the Azure Virtual Network DNS services. The reason for this is that Azure Virtual Networks do not meet the complex name-resolution requirements for Active Directory domains (dynamic DNS registration, SRV record support, and others).
Domain controllers and their clients must be able to register and resolve resources within their own domains and forest, as well as across trusts. And because static addressing isn’t supported in Azure Virtual Networks, these settings must be configured within the Virtual Network definition.
There are several ways to approach the name resolution requirements for Active Directory in a hybrid cloud infrastructure. The following is one suggested approach:
- Create an Azure Virtual Network.
- Use DHCP for IP addressing assignments for domain controllers you plan to put in a Virtual Network.
- Install and configure Windows Server DNS on the domain controller(s) you’ve placed in Windows Azure.
- Configure the domain controllers and the domain members’ DNS client resolver settings so that:
- The on-premises DNS server is set as the primary preferred DNS server.
- There is an alternate DNS server that has the IP address of a domain controller that is also a DNS server on the Azure Virtual Network.
Geo-Distribution Considerations
Your hybrid cloud infrastructure design might include geo-distributed, Azure Virtual Network hosted domain controllers. Azure Infrastructure Services can be an attractive option for geo-distributing domain controllers. They can provide:
- Off-site fault tolerance
- Lower latency for branch offices where you don’t want to house the domain controller on premises
However, keep in mind that virtual networks are isolated from one another. If you want different Virtual Networks to communicate, you must establish site-to-site links with each of them and then have them loop back through the corporate network to reach other Azure Virtual Networks. This means that all replication traffic will route through your on-premises domain controllers, which is going to generate some egress traffic. You will want to consider piloting such a configuration to see what your egress numbers look like before deploying a full blown geo-distributed architecture.
5.6.5.6 Active Directory Federation Service (ADFS) Considerations
Another Active Directory function that might be appropriate to consider when constructing a hybrid cloud infrastructure is Active Directory Federation Services, or ADFS. While the scenarios might not be as broad as those for Active Directory Domain Services, there are some scenarios where you will want to consider this option.
The three primary advantages of deploying ADFS in a public cloud infrastructure services network are:
- It enables you to provide high availability for ADFS by using the native server load-balancing capabilities of the public cloud infrastructure services network (if the provider makes them available, as Azure Infrastructure Services does).
- It enables you to more simply deploy a set of federated applications to employees and partners without the complexities and requirements inherent in deploying ADFS in a perimeter network on your corporate network.
- You can deploy corporate domain controllers alongside ADFS in a public cloud infrastructure service provider’s network, which provides additional guarantees of service availability in the event of unforeseen failures such as natural disasters. This is especially true for online services such as Microsoft Office 365, which can authenticate users directly from their on-premises corporate Active Directory.
Deploying Windows Server ADFS in a public cloud infrastructure service provider’s network is very similar to doing so on premises; however, differences do exist. Any Windows Server ADFS requirement to connect back to the on-premises network depends upon the relative placement of the roles. If Windows Server ADFS is running on a public cloud infrastructure service provider’s network and its domain controllers are deployed only on-premises, then the off-premises side of the solution must connect the virtual machines back to the on-premises network by using the link that connects the public and private sides of the hybrid cloud solution.
Important issues to consider when designing a hybrid cloud infrastructure to support ADFS include:
- If you deploy a Windows Server ADFS proxy server on a public cloud infrastructure services network, connectivity to the ADFS federation servers is needed. If they are on premises, you will need a connection between the on-premises and off-premises networks, by using a site-to-site VPN or dedicated WAN link.
- If you deploy a Windows Server ADFS federation server on the public cloud infrastructure services network, then connectivity to Windows Server Active Directory domain controllers, Attribute Stores, and Configuration databases are required.
- If you deploy Windows Server ADFS (or any other workload) on a virtual machine on the public cloud infrastructure service provider’s network so that it can be reached directly from the Internet, you must configure the cloud service to expose public-facing ports that map to the ADFS http (80 by default) and https (443 by default) ports.
- If you deploy and configure Windows Server ADFS on a virtual machine in a public cloud infrastructure service provider’s network so that it can be reached directly from the Internet, it is also advisable to treat the cluster as though it were deployed on an on-premises perimeter network. This includes additional considerations such as server hardening or deploying Windows Server ADFS proxy instead of the Windows Server ADFS federation server role itself.
- Charges may be applied to all virtual-machine egress traffic, such as when the virtual machine is placed in Azure Infrastructure Services. If cost is the driving factor, it is advisable to deploy Windows Server ADFS proxy on Windows Azure, leaving the Windows Server ADFS federation servers on premises. If Windows Server ADFS federation is deployed on Windows Azure virtual machines instead of Windows Server AD FS proxy, it could create unnecessary costs to authenticate intranet users.
- If you choose Azure Infrastructure Services for your public cloud infrastructure service provider, we recommend that you use Windows Azure native server load-balancing capabilities for high availability of Windows Server ADFS servers in your deployment. Windows Azure software load balancing is supported only by VIPs, not by DIPs. The load balancing provides probes that are used to determine the health of the virtual machines within the cloud service. In the case of Windows Azure Virtual Machines, you configure the type of probe you would like to use, such as TCP, UDP or ICMP. For simplicity, you might use a custom TCP probe. This requires only that a TCP connection (a SYN-ACK) be successfully established to determine virtual machine health. You can configure the custom probe to use any TCP port that is actively listening on your virtual machines.
Note:
Machines that need to expose the same set of ports directly to the Internet (such as port 80 and 443) cannot share the same cloud service. Therefore, we recommend that you create a dedicated cloud service for your Windows Server ADFS servers to avoid potential overlaps between port requirements for an application and for Windows Server Active Directory.
For more information on Active Directory Federation Services and Active Directory Domain Services in Azure Infrastructure Services, see Guidelines for Deploying Windows Server Active Directory on Windows Azure Virtual Machines.
5.6.5.7 Windows Azure Active Directory Considerations
This document does not discuss the use of Windows Azure Active Directory, which is a REST-based service that provides identity management and access control capabilities for cloud applications. Windows Azure Active Directory and Windows Server Active Directory Directory Services are designed to work together to provide an identity and access management solution for today’s hybrid cloud environments and modern cloud-based applications. The scope of this paper is on the core infrastructure requirements for a hybrid cloud infrastructure that does not include cloud-based PaaS and SaaS applications, which is the key scenario for which Windows Azure Active Directory applies.
To help you understand the differences and relationships between Windows Server AD DS and Windows Azure AD, consider the following:
- You might run Windows Server AD DS in the cloud on Azure Infrastructure Services when you’re using Windows Azure to extend your on-premises datacenter into the cloud.
- You might use Windows Azure Active Directory to give your users single sign-on to Software-as-a-Service (SaaS) applications. Microsoft’s Office 365 uses this technology, for example, and applications running on Windows Azure or other cloud platforms can also use it.
- You might use Windows Azure Active Directory (its Access Control Service) to let users log in using identities from Facebook, Google, Microsoft, and other identity providers to applications that are hosted in the cloud or on-premises.
For more information about Windows Azure Active Directory, please see Identity.
5.6.6 Backup Service and Disaster Recovery
When designing your hybrid cloud infrastructure you will want to consider backup and disaster recovery options.
5.6.6.1 Backup Services
Consider asking your public cloud infrastructure service provider if it offering backup services so that you can use it as an off-site backup for on premises data. This is a useful option because in the event of a disaster at the primary datacenter, there will be a backup copy of information on the public cloud service provider’s network.
Windows Azure offers a backup service that you can use to back up on-premises data. Backup can help you protect important server data offsite with automated backups to Windows Azure, where they are available for data restoration.
You can manage cloud backups from the backup tools in Windows Server 2012, Windows Server 2012 Essentials, or System Center 2012 Data Protection Manager. These tools provide similar experiences when configuring, monitoring, and recovering backups whether to local disk or Windows Azure storage. After data is backed up to Windows Azure, authorized users can recover backups to any server.
Windows Azure backup also supports incremental backups, where only changes to files are transferred to the cloud. This helps ensure efficient use of storage, reduced bandwidth consumption, and point-in-time recovery of multiple versions of the data. Configurable data retention policies, data compression and data transfer throttling also offer you added flexibility and help boost efficiency. Backups are stored in Windows Azure and are "offsite," which reduces the need to secure and protect onsite backup media.
For more information on Windows Azure Backup, please see Windows Azure Backup Overview.
5.6.6.2 Disaster Recovery
Another important option to consider is the role a public cloud infrastructure service provider can play in disaster recovery and business continuity. Some public cloud infrastructure service providers will make various disaster recovery options available to you.
For example, Windows Azure currently offers Recovery services. If you are using Hyper-V Recovery Manager you will create Hyper-V Recovery Manager vaults to orchestrate failover and recovery for virtual machines managed by System Center 2012 Virtual Machine Manager (VMM). You configure and store information about VMM servers, clouds, and virtual machines in a source location that are protected by Windows Azure recovery services; and about VMM servers, clouds, and virtual machines in a target location that are used for failover and recovery. You can create recovery plans that specify the order in which virtual machines fail over, and customize these plans to run additional scripts or manual actions.
For more information about Windows Azure Recovery services, please see Recovery Services Overview.
6.0 Summary
After identifying the requirements and constraints in your environment and then evaluating each of the design considerations that are detailed within this document, you can create a hybrid cloud infrastructure design that best meets your unique needs. Then, you can implement it in a test environment, test it, and deploy it into production.
To complement this document, Microsoft has created reference implementation (RI) guidance sets for hybrid cloud infrastructure solutions that are designed for specific audiences. Each RI guidance set includes the following documents:
- Scenario Definition: For a particular domain, different audiences generally have different requirements and constraints. This document describes a fictitious example organization that is implementing a hybrid cloud infrastructure solution. It provides answers to the questions in the Envisioning The Hybrid Cloud Solution section of this document—answers that relate to the fictitious organization. Many organizations within this audience type will find that they have requirements and constraints similar to those of the fictitious organization. This document is most helpful to people responsible for designing hybrid cloud infrastructure solutions at organizations similar to those that are defined by the RI’s audience type.
- Design: This document details which specific products, technologies, and configuration options were selected, out of the hundreds of individual available options, to meet the unique requirements for the example organization that is defined in the Scenario Definition document. This document also explains the rationale for why specific design decisions were made. For organizations that have requirements and constraints similar to the example organization, the lab-tested design and rationale in this document can help decrease both the implementation time and the risk of implementing a custom hybrid cloud solution. This document is most helpful to those responsible for designing a hybrid cloud infrastructure or implementing solutions within enterprise IT organizations, because it details an example design, and the rationale for the design.
Note:
The Design document within a Reference Implementation (RI) guidance set uses one combination of the almost infinite number of combinations of design and configuration options that are presented in this Hybrid Cloud Infrastructure Design Considerations article. The specific design options from this Hybrid Cloud Infrastructure Design Considerations document that are chosen in an RI Design document are based on the unique requirements from the Scenario Definition document in the RI guidance set. As a result, many people who read this Hybrid Cloud Infrastructure Design Considerations document will find it helpful to also read the RI guidance set for this domain that is targeted at an audience type similar to their own. The RI guidance set shows which design options from this document were chosen for the example organization, and helps the reader to better understand why those options were chosen. Other people will decide that reading an RI guidance set is unnecessary for them, and that this Hybrid Cloud Infrastructure Design Considerations document provides all the information they need to create their own custom design.
Although the Design document in an RI guidance set is related to this Hybrid Cloud Infrastructure Design Considerations document, there are no dependencies between the documents.
- Implementation: This document provides a step-by-step approach to implementing the design in your environment. While this document lists implementation steps to install and configure the solution, the steps are written at a level that assumes you already have some familiarity with the technologies that are used in the design that is detailed in the Design document. In cases where new technologies are used, more detailed implementation steps are included in the document. To review lower-level implementation steps than those that are provided in this document, you’re encouraged to read the information found at the hyperlinks that are included throughout this document. This document is most helpful to those responsible for implementing hybrid cloud infrastructure solutions within types of organizations that are identified by the audience type for the RI.
7.0 Technologies Discussed in this Article
- Windows Server 2012 DNS services
- Active Directory Domain Services
- Windows Azure Active Directory
- Windows Azure Virtual Machines
- Windows Azure Cloud Services
- Windows Azure Storage
- Windows Azure Storage Services
- Windows Azure Recovery Service
- Windows Azure Virtual Network
8.0 Authors and Reviewers
Authors:
- Thomas W. Shinder - Microsoft
- Jim Dial - Microsoft
Reviewers:
- Yuri Diogenes - Microsoft
- John Dawson - Microsoft
- Cheryl McGuire - Microsoft
- Kathy Davies - Microsoft
- John Morello - Microsoft
- Jamal Malik - Microsoft
This article is maintained by the Microsoft DDEC Solutions Team.
9.0 Change Log
Version | Date | Change Description |
1.0 | 7/1/2013 | Initial posting and editing complete. |
1.1 | 8/22/2013 | New hybrid cloud principles and patterns were added. Fixed table entries in multiple tables so that disadvantage are all moved to the disadvantages columns. |