Exploring the CFB File Format 7
Exploring the CFB File Format 7
-------------------------------
[- Red-Black Trees -]
As we have seen, OLE Structured Storage files are a series
of storages and streams like a modern day FAT filesystem. One design goal of a
filesystem in this case is the ability to keep track of the entire hierarchy of
data. Each storage object and stream object within a compound file has a
directory entry. Thus all data is contained in this tree-like structure. The
structure itself is comprised of the nature of a red-black tree. In such a
tree, each node will have a color code, black or red. This particular tree
implementation in Office files has 3 specific constraints:
- The root storage object MUST always be black.
Because the root directory does not have siblings, its color is irrelevant and
can therefore be either red or black. - Two consecutive nodes MUST NOT both be red.
- The left sibling MUST always be less than the
right sibling. This sorting relationship is defined as follows.
- A node with a shorter name is less than a node
with a longer name (compare the length of the names from the Directory Entry
Name Length field). - For nodes with the same name length from
Directory Entry Name Length, iterate through each UTF-16 code point, one at a
time, from the beginning of the Unicode string. - For each UTF-16 code point, convert to
upper-case with the Unicode Default Case Conversion Algorithm, simple case
conversion variant (simple case foldings), with the following notes.<2>
Compare each upper-cased UTF-16 code point binary value.
- Unicode surrogate characters are never
upper-cased, because they are represented by two
UTF-16 code points, while the sorting relationship upper-cases a single
UTF-16 code point at a time.
- Lowercase characters defined in a newer, later
version of the Unicode standard can be upper-cased by an implementation that
conforms to that later Unicode standard.
The compound file producer will save the contents of the
data in the file via the color and sibling attributes. Thus, the significance
and value of these attributes are decided before data is ultimately saved on
disk. I will thus present a simple mechanism / app that takes into
consideration the constraints behind the red-black tree in cfb files. This
mechanism was not created by me; however, it was implemented by Mark C.
Chu-Carroll:
class RedBlackTree(object):
def __init__(self):
self._tree = None
def Insert(self, n):
if self._tree == None:
self._tree = RedBlackTreeNode(n)
self._tree.SetColor("Black")
else:
self._tree = self._tree.Insert(n)
def Print(self):
if self._tree == None:
print "Empty"
else:
self._tree.Print(1)
Noting the structure above, this is the actual red-black
tree object. Now to represent each node:
class RedBlackTreeNode(object):
def __init__(self, value):
self._left = None
self._right = None
self._value = value
self.SetColor("Red")
self._parent = None
def GetParent(self):
return self._parent
def SetParent(self, parent):
self._parent = parent
def GetColor(self):
return self._color
def SetColor(self, color):
self._color = color
def GetLeft(self):
return self._left
def SetLeft(self, left):
self._left = left
def GetRight(self):
return self._right
def SetRight(self, right):
self._right = right
def GetGrandParent(self):
if self.GetParent() != None:
return self.GetParent().GetParent()
else:
return None
def GetUncle(self):
grand = self.GetGrandParent()
if grand is not None:
if grand.GetLeft() == self.GetParent():
return grand.GetRight()
else:
return grand.GetLeft()
else:
return None
def Rebalance(self):
# WP case 1: tree root
if self.GetParent() is None:
self.SetColor("Black")
return self
# WP case 2: The parent of the target node is BLACK,
# so the tree is in fine balance shape; just return the
# tree root
if self.GetParent().GetColor() == "Black":
return self.GetRoot()
# From here on, we know that parent is Red.
# WP Case 3: self, parent, and uncle are all red.
if self.GetUncle() is not None and self.GetUncle().GetColor() == "Red":
self.GetUncle().SetColor("Black")
self.GetParent().SetColor("Black")
self.GetGrandParent().SetColor("Red")
return self.GetGrandParent().Rebalance()
# By now, we know that self and parent are red; and the uncle is black.
# We also know that the grandparent is not None, because if it were, the
# parent would be root, which must be black. So this means that we
# need to do a pivot on the parent
return self.PivotAndRebalance()
def GetRoot(self):
if self.GetParent() is None:
return self
else:
return self.GetParent().GetRoot()
def PivotAndRebalance(self):
# First, distinguish between the case where where my parent is
# a left child or a right child.
if self.GetGrandParent().GetLeft() == self.GetParent():
if self.GetParent().GetRight() == self:
# WP case 4: I'm the right child of my parent, and my parent is the
# left child of my grandparent. Pivot right around me.
return self.PivotLeft(False)
else:
# WP case 5
# If I'm the left child, and my parent is the left child, then
# pivot right around my parent.
return self.GetParent().PivotRight(True)
else: # My parent is the right child.
if self.GetParent().GetLeft() == self:
# WP case 4, reverse.
return self.PivotRight(False)
else:
# WY case 5 reverse
return self.GetParent().PivotLeft(True)
def PivotRight(self, recolor):
# Hurrah, I'm going to be the new root of the subtree!
left = self.GetLeft()
right = self.GetRight()
parent = self.GetParent()
grand = self.GetGrandParent()
# move my right child to be the left of my soon-to-be former parent.
parent.SetLeft(right)
if right is not None:
right.SetParent(parent)
# Move up, and make my old parent my right child.
self.SetParent(grand)
if grand is not None:
if grand.GetRight(parent) == parent:
grand.SetRight(self)
else:
grand.SetLeft(self)
self.SetRight(parent)
parent.SetParent(self)
if recolor is True:
parent.SetColor("Red")
self.SetColor("Black")
return self.GetRoot()
else:
# rebalance from the new position of my former parent.
return parent.Rebalance()
def PivotLeft(self, recolor):
# Hurrah, I'm going to be the new root of the subtree!
left = self.GetLeft()
right = self.GetRight()
parent = self.GetParent()
grand = self.GetGrandParent()
# move my left child to be the right of my soon-to-be former parent.
parent.SetRight(left)
if left is not None:
left.SetParent(parent)
# Move up, and make my old parent my right child.
self.SetParent(grand)
if grand is not None:
if grand.GetRight() == parent:
grand.SetRight(self)
else:
grand.SetLeft(self)
self.SetLeft(parent)
parent.SetParent(self)
if recolor is True:
parent.SetColor("Red")
self.SetColor("Black")
return self.GetRoot()
else:
# rebalance from the position of my former parent.
return parent.Rebalance()
def Insert(self, value):
if self._value > value:
if self.GetLeft() is None:
self.SetLeft(RedBlackTreeNode(value))
self.GetLeft().SetParent(self)
return self.GetLeft().Rebalance()
else:
return self.GetLeft().Insert(value)
else:
if self.GetRight() is None:
self.SetRight(RedBlackTreeNode(value))
self.GetRight().SetParent(self)
return self.GetRight().Rebalance()
else:
return self.GetRight().Insert(value)
def Print(self, indent):
for i in range(indent):
print " ",
print "%s (%s)" % (self._value, self.GetColor())
if self.GetLeft() is None:
for i in range(indent+1):
print " ",
print "None(Black)"
else:
self.GetLeft().Print(indent+1)
if self.GetRight() is None:
for i in range(indent+1):
print " ",
print "None(Black)"
else:
self.GetRight().Print(indent+1)
The actual data needed to create the tree in memory is
already available on disk in the various fields in the relevant directory
entries. Therefore, it is merely parsing each directory entry and performing
the insert logic based upon the color of the entry and the entries left and
right sibling value. Pushing this data back out to disk is merely traversing
the tree from the root using recursion like any other tree.