Understand Microsoft Online Services incident response phase 2 - detection and analysis
Microsoft focuses on key threat scenarios and complementary detection and analysis activities to enable security response as early in the attack lifecycle as possible. Detection tools are configured to provide enough information for effective and efficient response actions when a potential incident is detected. Microsoft has dedicated security signal teams who are responsible for improving the detection of potential security incidents using the learnings from the security response teams and their partners.
Detection tools and strategies
While Microsoft is prepared to handle any incident, detection strategies are focused on common attack vectors such as insider threat, web service attacks, denial-of-service attacks, and tenant attacks. Signs of incidents fall into one of two categories: precursors and indicators. A precursor is a sign that an incident may occur in the future and an indicator is a sign that an incident may have occurred or may be occurring now.
One of the most challenging parts of the incident response process is accurately detecting and assessing possible incidents due to the sheer volume of activity associated with Microsoft Online Services. Even if an indicator is accurate, it does not necessarily mean that an incident has occurred. Microsoft uses multiple techniques with varying levels of detail and fidelity to detect potential incidents.
Centralized audit logging and analysis is one of the main methods used to detect anomalous or suspicious activity. Log files from Microsoft Online Services servers and infrastructure devices are collected and stored in a central, consolidated database. Centralized log analysis allows the Microsoft's security response teams to comprehensively monitor the environment and correlate log entries from different services.
Other detection tools include network-based and host-based intrusion detection systems, centrally managed anti-virus and anti-malware suites, and manual detection methods, such as observations from engineers and end users. Microsoft employs highly experienced, proficient, and skilled people with competencies in all components of the cloud stack. The expertise of our engineers complements and supports our automated detection mechanisms.
Escalation and investigation
Since every observation may not be a security issue, service teams must perform an initial triage and preliminary review to examine the nature of the issue and determine its severity. Microsoft’s security response teams create and maintain the escalation criteria and procedures for service teams to follow if the observation is determined to be a true security incident.
Once escalated, the security response team serves as the key orchestrator for the remainder of the security incident response process. The security response team is responsible for analyzing the detection indicators to determine whether a security incident has occurred and to adjust its severity level if needed. If at any point the team discovers that customer data has been disclosed, modified, or destroyed, the team initiates the customer security notification process.
At the beginning of the investigation, the security response team, working together with the service team, records all information relevant to the incident and maintains its accuracy throughout the incident response process. Relevant information may include:
- A summary of the incident
- The incident's severity and priority based on its potential impact
- A list of all indicators that led to detection of the incident
- A list of any related incidents
- A list of all actions taken by the security response team and any associated service teams
- Any evidence gathered during the incident response process, which will be preserved for post-mortem analysis and potential forensic investigations
- Recommended next steps and actions
When a potential security incident is escalated, the corresponding investigation team includes only personnel who are critical to the investigation. Non-Microsoft full-time employees, such as subprocessors or staff augmentation, are disengaged. These personnel are only re-engaged if necessary and in a capacity with limited scope.