Simplifying boolean expressions (part 3)

or "Where we wrap together all that prototype code and get something actually useful"

 

So now we get to the function that does all the work for us:

 

let rec reduce_normal_form nform1 =

    let rnf = reduce_normal_form

    in

    let rec replace snform var const =

        match snform with

            SemiNormalConstant(_) -> snform

          | SemiNormalVariable(_) ->

              if snform = var then

                  const

              else

                  snform

          | SemiNormalIf(a,b,c) -> SemiNormalIf(replace a var const, replace b var const, replace c var const)

    in

    let simple_cases nform2 =

        match nform2 with

            SemiNormalConstant(_) -> nform2

          | SemiNormalVariable(_) -> nform2

          | SemiNormalIf(SemiNormalConstant(Constant(true)), x, _) -> reduce_normal_form x

          | SemiNormalIf(SemiNormalConstant(Constant(false)), _, y) -> reduce_normal_form y

          | SemiNormalIf(SemiNormalVariable(_), x, y) when x = y -> reduce_normal_form x

          | SemiNormalIf(SemiNormalVariable(_) as v, SemiNormalConstant(Constant(true)), SemiNormalConstant(Constant(false))) -> v

          | _ -> nform2

    in

    let simplified = simple_cases nform1 in

        match simplified with

          | SemiNormalIf(SemiNormalVariable(_) as v, y, z) ->

              let reduced_y = rnf (replace y v (SemiNormalConstant(Constant(true)))) in

              let reduced_z = rnf (replace z v (SemiNormalConstant(Constant(false)))) in

              simple_cases (SemiNormalIf(v, reduced_y, reduced_z))

          | _ -> simplified;;

 

It’s a little complicated so I’ll break it down into constituent parts to explain.  The first part is a helper function called “replace.”  The idea behind it is to take a formula and replaces all instances of a variable in it with the specified constant.  In the previous post I showed how normal-form allowed us to know the values of variables in the “then” and “else” portions of an if-clause.  This function allows for that functionality.  As you can see, if it ever sees a variable in an expression it tests if that variable is equal to the variable passed in.  If so, it replaces it with constant passed in.  If it sees an if-expression then it just recursively calls itself on each constituent part of the expression.  This way the knowledge we have about variables will now get passed down throughout the entire structure.  As it stands, this is actually quite inefficient as we are now traversing the entire tree structure on every replace (which will lead to exponential running time).  In practice we would actually use a better system for doing this that would defer this traversal, but this suffices for the sake of a simple prototype.

 

After that helper function is defined we see another helper function “simple_cases”.  This function handles the 6 cases I specified in the first post.  It deals with the basic simplifications like transforming “if (true) { return x; } else { return y; }” into “return x;”

 

Finally these two functions get pulled into the final bit of work we have.  First we take the expression we have and see if it can be trivially reduced.  After that we see if we now have an expression which we know cannot be trivially reduced anymore.  After this we go into the “then” and “else” portions of the if-expression and we replaces the test expression in the “if” with either “true” or “false”.  After that replacement we then try to reduce the subclauses and after that we might now be in a position where the original rules might again apply and so we run “simple_cases” again.

 

So now we’ve gone through and have removed extraneous boolean checks and we’ve reduced the equation to the smallest possible form.  All that remains is a little function that can take this simplified form and reconstitute it into a boolean_formula that can then be dumped into the code.  This is fairly trivial to write as it us just decomposition and the rules of boolean algebrea that we used before in reverse:

 

let rec semi_normal_form_to_bool_expr snform =

    let snf_to_be = semi_normal_form_to_bool_expr

    in

    match snform with

        SemiNormalConstant(a) -> a

      | SemiNormalVariable(a) -> a

      | SemiNormalIf(a, SemiNormalConstant(Constant(true)), SemiNormalConstant(Constant(false))) -> snf_to_be a

      | SemiNormalIf(a, SemiNormalConstant(Constant(false)), SemiNormalConstant(Constant(true))) -> Not (snf_to_be a)

      | SemiNormalIf(a, b, SemiNormalConstant(Constant(false))) -> And (snf_to_be a, snf_to_be b)

      | SemiNormalIf(a, SemiNormalConstant(Constant(false)), b) -> And (Not (snf_to_be a), snf_to_be b)

      | SemiNormalIf(a, SemiNormalConstant(Constant(true)), b) -> Or (snf_to_be a, snf_to_be b)

      | SemiNormalIf(a, b, SemiNormalConstant(Constant(true))) -> Or (Not (snf_to_be a), snf_to_be b);;

 

Smarter rewrite rules can get us back Equivalence operators, and can also easily be extended to handle relational.

 

And there we have it.  Problem easily handled through a series of simple and understandable steps.  End benefit to the user is that we can now examine complicated expressions and offer simpler forms to help you get clearer code.  As long as your code is obeying the semantics of the boolean algebra operators, then this will be totally safe.  If your code doesn’t obey those semantics… well be pretty assured that people who use your types will end up getting very confused at some point in time.

 

A later post will discuss a type in the .Net framework which doesn’t obey these expected relational semantics.  2 points if you can guess what it is J

Comments

  • Anonymous
    January 24, 2005
    I can think of quite a few exceptions in the framework. String differs in its implementation of <, >, <=, and >= when compared to == and !=.

    Another example is Single/Double, when comparing to NaN. I think there are cases where !(a<b) && !(b<a) && !(b==a).

    And the SqlTypes and Nullable<T> follow tri-state boolean algebrea.

    Still, it'd be nice to be able to simplify boolean expressions, even if there are some classes that do not follow all the rules.
  • Anonymous
    January 24, 2005
    Matthew: Yup. I didn't know about string, and i'm going to have to look into that, but you got the other ones :)
  • Anonymous
    January 25, 2005
    The comment has been removed
  • Anonymous
    January 26, 2005
    So is this feature going to be available as an extension for VS2005?

    I love this kind of thing because I often find that when I am manually able to simplify a boolean expression, it turns out that the simpler form reveals an insight into the actual behavior of the code I'm writing that I hadn't yet realized.

    Your original "a == b" code is an example. Perhaps you originally thought you were testing to avoid a couple of particular combinations that were problematic, but when reduced to "a == b" it's entirely clear that you're actually testing to see whether two separate conditions have the same result.

    This sometimes has led me to a better understanding of code I've already written, to the point that I can sometimes then simplify entire chunks of architecture with this new realization in mind.

    A tool that automates the process of gaining new insights into your own code is quite something!
  • Anonymous
    January 26, 2005
    Duncan: That's a fair point, and i have given it a lot of thought. The result of a boolen expression should not be affected by shortcircuiting unless you're shortcircuiting an expression that would have side effects (like throwing an exception, getting into an infinite loop, changing state etc.).

    The reason that short circuiting works is because in the context of a boolean expression there is no need ot to some bit of work because the rewsult would be irrelevant regardless of the result of true or false.

    Now, because there can be side effects in code this could cause problems. However, that's why there would safe vs unsafe ways to transform your code. "safe" would preserve the semantics and would only reduce purely boolean expressions. unsafe would assume that all executable code in the expression was not something that changed state and it would simplify accordingly.
  • Anonymous
    January 26, 2005
    The comment has been removed
  • Anonymous
    January 26, 2005
    The comment has been removed
  • Anonymous
    January 27, 2005
    Stuart: There are many reasons we've chosen not to use F#, starting with the fact that we don't knwo how stable or mature it is. And we've definitely investigated that idea and talked with Don Syme about that sort of thing.
  • Anonymous
    January 27, 2005
    The comment has been removed