Breaking Oplocks
After an oplock is requested and granted, the owner of that oplock has access to the stream based on the type of oplock that was requested. If the operation received isn't compatible with the current oplock, the system breaks the oplock.
When an oplock is granted, the system pends the requesting IRP. When an oplock is broken, the pended oplock's request IRP is completed with STATUS_SUCCESS. For Level 1, Batch, and Filter oplocks the IoStatus.Information member of the IRP is set to indicate the level to which the oplock is breaking. These levels are:
FILE_OPLOCK_BROKEN_TO_NONE: The oplock was broken and there's no current oplock on the stream. The oplock is said to be "broken to None."
FILE_OPLOCK_BROKEN_TO_LEVEL_2: The current oplock (Level 1 or Batch) was converted to a Level 2 oplock. Filter oplocks never break to Level 2, they always break to None.
For Read-Handle, Read-Write, and Read-Write-Handle oplocks, the level to which the oplock is breaking is described as a combination of zero or more of the flags OPLOCK_LEVEL_CACHE_READ, OPLOCK_LEVEL_CACHE_HANDLE, or OPLOCK_LEVEL_CACHE_WRITE in the NewOplockLevel member of the REQUEST_OPLOCK_OUTPUT_BUFFER structure passed as the lpOutBuffer parameter of DeviceIoControl. In a similar manner, FltFsControlFile and ZwFsControlFile can be used to request Windows 7 oplocks from kernel mode. For more information, see FSCTL_REQUEST_OPLOCK.
When the system's oplock package breaks a Level 1, Batch, Filter, Read-Write, Read-Write-Handle, or, under certain circumstances a Read-Handle oplock:
- The oplock package completes the pended oplock request IRP.
- The operation that caused the oplock break is itself pended.
The I/O manager causes the operation to block, rather than return STATUS_PENDING, if the operation:
- Is issued on a synchronous handle.
- Is an IRP_MJ_CREATE, which is always synchronous.
The I/O manager waits for an acknowledgment from the oplock owner to tell the oplock package that they finished their processing and it's safe for the pended operation to proceed. This delay allows the oplock owner to put the stream back into a consistent state before the current operation proceeds. The system would wait forever to receive the acknowledgment as there's no timeout. It's therefore incumbent on the owner of the oplock to acknowledge the break in a timely manner. The pended operation's IRP is set into a cancelable state. If the application or driver performing the wait terminates, the oplock package immediately completes the IRP with STATUS_CANCELLED.
An IRP_MJ_CREATE IRP can specify the FILE_COMPLETE_IF_OPLOCKED create option to avoid being blocked as part of oplock break acknowledgment. This option tells the oplock package not to block the create IRP until the oplock break acknowledgment is received. Instead, the create is allowed to proceed. If a successful create results in an oplock break, the return code is STATUS_OPLOCK_BREAK_IN_PROGRESS, rather than STATUS_SUCCESS. The FILE_COMPLETE_IF_OPLOCKED flag is typically used to avoid deadlocks. For example, if a client owns an oplock on a stream and the same client later opens the same stream, the client would block waiting for itself to acknowledge the oplock break. In this scenario, use of the FILE_COMPLETE_IF_OPLOCKED flag avoids the deadlock.
The NTFS file system initiates oplock breaks for Batch and Filter oplocks before checking for sharing violations. It's therefore possible for a create that specified FILE_COMPLETE_IF_OPLOCKED to fail with STATUS_SHARING_VIOLATION but still cause a Batch or Filter oplock to break. In this case, the information member of the IO_STATUS_BLOCK structure is set to FILE_OPBATCH_BREAK_UNDERWAY to allow the caller to detect this case.
For Read-Handle and Read-Write-Handle oplocks, the oplock break is initiated after NTFS checks for and detects a sharing violation. This sequence gives the oplock holders an opportunity to close their handles and get out of the way, thus allowing for the possibility of not returning the sharing violation to the user. It also avoids unconditionally breaking the oplock in cases where the handle that the oplock caches doesn't conflict with the new create.
When Level 2, Read, and, under certain circumstances Read-Handle oplocks break, the system doesn't wait for an acknowledgment. The reason is that there should be no cached state on the stream that needs to be restored to the file before allowing other clients access to it.
There are certain file system operations that check the current oplock state to determine if the oplock needs to be broken. The following operation-specific articles describe what triggers an oplock break, what determines the level to which the oplock breaks, and whether an acknowledgment of the break is required:
- IRP_MJ_CREATE
- IRP_MJ_READ
- IRP_MJ_WRITE
- IRP_MJ_CLEANUP
- IRP_MJ_LOCK_CONTROL
- IRP_MJ_SET_INFORMATION
- IRP_MJ_FILE_SYSTEM_CONTROL
A break of a Windows 7 oplock requires an acknowledgment if the REQUEST_OPLOCK_OUTPUT_FLAG_ACK_REQUIRED flag is set in the Flags member of the REQUEST_OPLOCK_OUTPUT_BUFFER structure passed as the output parameter of DeviceIoControl(lpOutBuffer), FltFsControlFile(OutBuffer), or ZwFsControlFile(OutBuffer). For more information, see FSCTL_REQUEST_OPLOCK.
The listed per-operation articles describe the details of when a break of a Read-Handle oplock results in the pending of the operation that broke the oplock. For example, the IRP_MJ_CREATE article contains the associated Read-Handle details.