Resource Pooling Related Private Cloud Security Operations Challenges

Article
2024-01-17

As an operator of a private cloud solution:http://blogs.technet.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-85-24-metablogapi/5658.image_5F00_64BCDD48.png

How can I detect any breaches in isolation between tenants' resources and how do I respond to the situation?
How do I design my disaster recovery procedures to ensure isolation?
What impact does the private cloud have on my plans for maintaining confidentiality, integrity, and availability of services and data?

Security Functionality

The operational procedures associated with managing a private cloud should include the following security functionality in relation to the resource pooling attribute of the private cloud:

Resources that belong to different tenants may co-exist on the same physical device and one tenant's resources may be spread across multiple physical devices. You must ensure that authentication and authorization rules protect one tenant's resources from other tenants.
Resources that belong to a tenant may move between physical devices. You must ensure that authentication and authorization rules continue to protect tenant's resources.
Resources that belonged to one tenant may be recycled for use by another tenant. You must ensure that this does not accidently reveal data belonging to a previous tenant.
You must monitor for attempts to gain access to another tenant's resources.
You must maintain a full audit trail by logging all access to resources in the cloud where possible (noting that you may not have access to consumer virtual machines in IaaS scenarios).
You must ensure that the resource pooling works efficiently to maintain the overall availability of the services hosted in the private cloud.
You must ensure that any data protection functionality is not compromised by resource pooling.

The following sections describe in more detail how to provide this functionality in the private cloud.

Infrastructure Security

In the private cloud, the monitoring that supports the event management and incident management processes must include monitoring for attacks on the infrastructure launched from the virtual machines hosted in the platform layer. Such attacks may be launched in an attempt to cause damage to the infrastructure or in an attempt to gain access to other services that may be hosted on the same physical device. An automated incident management response can then shut down the virtual machine that originated the attack, and notify operators so that they can investigate the problem. An alternate approach would be to reallocate resources temporarily to keep the service running while attempting to fix the problem.

Note:
This document is part of a collection of documents that comprise the Reference Architecture for Private Cloud document set. The Solution for Private Cloud is a community collaboration project. Please feel free to edit this document to improve its quality. If you would like to be recognized for your work on improving this document, please include your name and any contact information you wish to share at the bottom of this page

Your operational procedures should ensure that host operating systems and software have security patches and updates applied in a timely manner to help mitigate the threat of any attack on the infrastructure from the hosted virtual environments (and elsewhere). These security updates should also include updates to computer BIOS, switch firmware, and virtualization environments.

In the private cloud, tenant applications and services can be hosted on any physical host device in the cloud, and load balancing mechanisms can dynamically move applications to other servers. Any operational controls over physical access to the data center for operators must assume that any host server could hold the most sensitive or business critical data or application.

You should ensure that you have operational procedures in place for the secure disposal of all physical hardware that may still contain data. In a private cloud, it will be difficult to track what data may have been stored on a device, so all hardware must be subject to the same rigorous disposal procedures. Although virtual machine images and virtual hard disks may reside in SAN storage, the servers that run the virtual machines may still cache information on local storage, or in some other form of persistent or volatile memory.

Platform Security

Operational processes and procedures (such as those that relate to service continuity, availability management, and incident management) must preserve and respect the authentication and authorization rules defined to control access to virtual cloud resources and to hosted applications and services.

For example, if access to an application's data in a virtual machine is not permitted for operators, then operators should not be able to access archived data from the same application. Similarly, access controls must be preserved if a tenant application moves to a different host or even to a different host in a different data center.

Although the infrastructure elements of the private cloud (such as the hypervisor) should ensure that virtual environments are isolated from each other, resource pooling opens up additional threats to the confidentiality, integrity and availability of data in virtual environments in the private cloud.

For example, an attacker with access to an application owned and managed by a tenant could exploit a weakness in the isolation provided by the hypervisor or virtual network infrastructure to gain access to another virtual machine on the same host. Therefore, the virtual machines in the private cloud must take steps to protect themselves, and operational procedures should ensure that the protection continues to be effective:

Automated platform provisioning processes should apply baseline security settings to the platform or virtualized guest operating system. These baseline settings should include any firewall configuration settings, monitoring behavior, and IPsec configuration. When you create baseline virtual machine images, you should also ensure that they do not contain any sensitive data (for example, keys or certificates) that could be used to gain access to other virtual environments created from the same baseline.
Where consumers do not want the CSP to have visibility into application data or behavior, then the SLA must explicitly define the processes that tenants must carry out to protect their applications.
If your SLA includes security monitoring for tenants, then your logging and monitoring should record any changes made to the baseline security configuration. For example, you should record any change to the host-based firewall port rules so that you can determine which ports were open in the virtual environment at any point in its lifecycle while you are investigating a problem.
The automated provisioning process must also ensure that resources are cleaned to prevent data accidentally leaking from one tenant to another as a consequence of the resource pooling behavior reallocating a resource to another tenant. For example, when a virtual hard disk is deleted, no data should be left on the underlying physical storage that could become available to another tenant when a new virtual hard disk is initialized.
Guest operating systems should have security patches and updates applied in a timely manner to help mitigate the threat of any attack on the platform and its resources. The patching and updating procedures should ensure that the base images from which new virtual environments are created are kept up to date, and both running and dormant virtual guest operating systems should be patched and updated. These update arrangements with the virtual machines should be recorded in the SLA so that in IaaS environments, the CSP and the tenant understand at which point responsibility for a virtual machine is handed over to the tenant. Note that with PaaS and SaaS environments, the tenant will not have responsibility for this security monitoring.
Regular anti-malware scanning can also help to protect the virtualized guest environments. Again, responsibility for these scans may lie either with the CSP or the tenant, depending on how security permissions are partitioned within the SLA. If the CSP has responsibility then you should plan carefully when these scans are scheduled because running scans on multiple virtual machines on the same physical machine simultaneously will have a major impact on the performance of all the guests hosted on that physical machine.
With anti-malware scanning, different vendors offer differing solutions to malware scanning in virtual environments, for example: Microsoft Forefront Endpoint Protection uses an agent installed in the virtual machine to perform the scan, whereas VMware vShield Endpoint enables the scan to be performed from a separate virtual appliance from a security vendor without installing an agent in the virtual machine.

The allocation of pooled resources to tenants is handled automatically by the private cloud infrastructure and may be adjusted dynamically as part of the private cloud's load balancing function. Problem management often requires detailed information about the state of the environment in order to carry out root cause analysis and understand the consequences. Your logging solution must enable you to track which tenant applications were deployed on which physical servers at a particular time. This information will help you to track and contain any cross virtual machine attack or host-based attack that might have compromised tenant's data.

One of the specific goals of monitoring in the private cloud should be to identify attempts to gain unauthorized access to a tenant's data. The attack could be:

Directly from one virtual machine to another
Indirectly from one virtual machine to another through the virtualization layer
On the shared virtual storage infrastructure (for example in a SAN)
Directly from the host environment
From an external source

Alternatively, the attack might be a denial-of-service attack that attempts to over-allocate new resources and empty the shared pool of resources.

Monitoring should attempt to detect such attacks before they succeed. Automated incident management processes should trigger a response to contain the attack and to notify the appropriate operations staff and the relevant tenants.

For example, if an attack is detected that originates from another virtual machine in the private cloud, the automated response should shutdown that virtual machine, notify an operator and notify the owner of that virtual machine. You should also ensure that sufficient log information is collected to be able to understand what data might have been compromised should such an attack only be detected after it succeeded. Identifying what constitutes sufficient logging information is a major design process in itself, as you do not want the attack to compromise the cloud environment yet you also need enough information to help counter any repeated attacks.

In certain cases, you might leave a virtual machine running as a honeypot to attempt to identify what the attacker is doing. Your cloud environment may also include dedicated honeypot virtual machines specifically to trap and track attacks.

Rapid, automated incident responses are necessary in case an attack manages to spread to multiple virtualized environments in the private cloud compromising the isolation of other services and applications. However, you should be aware of the possibility that a false-positive detection of an attack, in combination with an automated incident response, the false-positive could shut down a number of tenant services unnecessarily.

All administrative access to the platform or guest operating system from operations staff and the owner of the virtual resource should be fully logged, auditable, and subject to role-based access controls.

In some scenarios, the operators and automated processes may not have access to the virtual environment, in which case responsibility for the security of the virtual machine lies entirely with the business unit owner. In this scenario, the SLA should specify what the owner must do to maintain the security of the environment, for example:

Installing patches and updates.
Maintaining security logs.
Recording configuration changes.

Software Security

Tenant applications and services hosted in the cloud may take steps to ensure the confidentiality, integrity, and trust-worthiness of their own data by using encryption technologies. Operational procedures must ensure that data encrypted by a tenant's application remains available and usable, in addition to maintaining its confidentiality, availability, and trust-worthiness. For example:

Operational procedures, such backups or problem management process, must ensure that the private encryption keys are not accidentally exposed to other tenants or to unauthorized operations staff.
Any encryption that hosted applications and services use continues to protect archived and backed-up data so that it is inaccessible without the private key, but can be accessed after any disaster recovery procedures have been performed.
Automated processes, such as those that move a virtual machine between servers, must continue to protect any private encryption keys that may be stored in a virtual environment.

Although the design and management of software running in virtualized environments is not typically the responsibility of the cloud service provider in the IaaS and PaaS models, there may be recommended or mandated processes and procedures for the tenant to follow in a private cloud. You should audit and verify that tenants are complying with any mandatory processes in order to ensure the overall security of the cloud environment.

For example, tenants may be mandated to change their storage access keys on a regular basis and because the software is owned and managed by the tenant, this process may not be easily automated. In this scenario, you should regularly audit the tenant to ensure that they are changing the keys.

Management Security

Operational procedures such as those related to incident management and IT service continuity must also ensure that data continues to be protected from unauthorized access. For example:

Analyzing problems that have occurred in the private cloud may use monitoring data that has been collected from the infrastructure, the platform, and the software. In addition to the problems associated with correlating the data from the different physical and virtual sources, you must also be aware of the possibility that log data contains sensitive information. Infrastructure logs may contain information relevant to many tenant applications and services, software and platform logs may contain sensitive information relating to the tenant application. You must carefully manage who has access to those log files or implement processes to scrub sensitive data before you share log files with tenants for the purpose of forensic investigations.
Business continuity plans for a private cloud must take into account the effects of resource pooling. For example, you must ensure that isolation is preserved when data belonging to multiple tenants is backed up from a physical device, while it is archived on a pooled storage resource, and when it is restored to a new virtual environment.

All management operations, whether performed by the CSP or tenant must be logged and be auditable.

Legal Issues

Operational procedures such as back-ups, planning for IT service continuity, collection of monitoring data for problem analysis must all comply with any legal requirements that affect data storage and data privacy. One of the consequences of the resource pooling behavior in private clouds is the difficulty in identifying what data is stored in what location at what time: in consequence, pooling may make it more difficult to verify compliance.

You should ensure that you implement a regular review of current industry and governmental regulations that affect your private or hybrid cloud environment. Changes to laws and regulatory requirements at state, country or supra-national level (such as with the European Union) can all affect the standard operating procedures of your cloud service and may require adjustments to your SLA.

REFERENCES:
**
**

ACKNOWLEDGEMENTS LIST:
If you edit this page and would like acknowledgement of your participation in the v1 version of this document set, please include your name below:
[Enter your name here and include any contact information you would like to share]

Return to Private Cloud Security Operations Challenges[

Return to A Solution for Private Cloud Security](http://social.technet.microsoft.com/wiki/contents/articles/a-solution-for-private-cloud-security.aspx)

Return to Reference Architecture for Private Cloud

Move forward to Private Cloud Security Operations Challenges - Broad Network Access

Table of Contents for A Solution for Private Cloud Security

Share via