Defining Equals on an interface
I was wondering today if ICollection should declare an Equals method on it and state what the semantics of it would be. This came around because I currently have 3 implementations of ICollection:
- ArrayCollection (an ICollection backed by an array)
- EmptyCollection (a specialized collection that holds no elements. Uses a singleton for memory efficiency and specialized implementations of the methods for speed)
- SingletonCollection (a specialized collection that holds only one element. Useful for when you have a single element and you want ot pass it to something that expects a collection).
Now, it seemed reasonable that the following would be true:
new
ArrayCollection<int>().Equals(EmptyCollection<int>.Instance) and
ICollection<int> c = new ArrayCollection<int>();
c.Add(4);
c.Equals(new SingletonCollection<int>(4));
Now, the question is “why are those true”. It's intuitive to me that any empty collections should be equal, and any collection with one element in it would be equal to a SingletonCollection with that same element in it. However, is there a way to formalize that? The best i could come up with is “Two collections are equal if as you iterate over both of them you find their corresponding elements are equals _and_ you finish iterating over both collections at the same time.” Expressed in code that would be:
public override bool Equals(object obj)
{
ICollection<A> other = obj as ICollection<A>;
if (null == other)
{
return false;
}
IEnumerator<A> e1 = this.GetEnumerator();
IEnumerator<A> e2 = other.GetEnumerator();
while (true)
{
bool e1MoveNext = e1.MoveNext();
bool e2MoveNext = e2.MoveNext();
if (e1MoveNext != e2MoveNext)
{
//they have a different number of elements
return false;
}
if (e1MoveNext == false)
{
//no more elements in either
return true;
}
if (!this.authenticate(e1.Current, e2.Current))
{
return false;
}
}
}
That seems to work fine and works within my intuitive definition of how collections are felt of as equal. The problem I then ran into was that I tried to think down the line about whatever types of collections might be in the system. Pretty quickly i though about ISet (a collection containing only unique elements). If I were an ISet i would only consider something equal to me if it contained all my elements, no more, and no duplicate. Wow. That's something that's woudl be pretty difficult to check. I'd have to iterate over both checking if my elements were in it and it's elements were in me. I'd then have to figure out some way to make sure that it had no duplicates. That's a pretty tall order. (I'm not even sure how I would go ahead checking for duplicates). The alternative is to place the restriction that the thing I am comparing myself to is also an ISet. Then I know that the elements must be unique. I can then check my containment in it and it's containment in me. I.e. (this.IsSubsetOf(that) && that.IsSubsetOf(this)). If I were to eschew infinite sets and have a Count property then I could do: if (this.Count == that.Count && this.IsSubsetOf(that)). (note: taht would also be possible with an IOptional Count property.
However, I face a conundrum. Say I do: EmptyCollection<int>.Instance(EmptySet<int>.Instance). I get true based on the rules of Collection equality. However, if I fo EmptySet<int>.Instance(EmptyCollection<int>.Instance), i get falgs based on the rules of Set equality.
This is a big no-no (i think) as I've violated the requirement that a.Equals(b) <==> b.Equals(a). :-(
Is there a way to reconcile this system?
Has anyone else out there faced this problem of defining equals on interfaces in the presense of sub-interfaces?
Comments
- Anonymous
May 19, 2004
The comment has been removed - Anonymous
May 19, 2004
Damien: I never stated that collections took ordering into account. In fact, I would not consider two collections the same if their ordering was different. That would be a very unintuitive model for me. However, for Sets, I could see this relationship holding. I.e. two sets can be equal regardless of the ordering of Iterate.
Are you proposing that ICollection not set down rules for equality, and instead those rules should only be on its subinterfaces? (I have no problem with that, I'm just trying to see if that's what you're saying). - Anonymous
May 20, 2004
You never defined ordering for collections. You implied it by using an array.
Im suggesting that the rules for equality, if any, are determined by the details of the collection implementation, which determine which equality algorithms are availble to us, and which is most efficient.
For example, I have a HashSet class, with union, difference, and intersect operations defined. The arguments can be either ICollection or ISet. In the case of intersect, the approach taken is based on what I know about ICollection (that i can iterate over it efficiently), and what I know about ISet (that I can query its contents efficiently). As a result if my arguments are one ICollection and one ISet, I iterate over the collection testing for membership in the set. If my arguments are two ISets, I iterate over the smaller set testing for membership in the larger one.
What Im suggesting is that ArrayCollection should not have a Contains method, and that it would be nice to be able to know more about the properties of the sequence returned by the enumerator of ICollection, maybe even simply attributes that tell us Deterministic, Sorted, Ordered, etc etc.
Youll forgive me if these seem like half-constructed thoughts - they are. - Anonymous
May 20, 2004
The comment has been removed - Anonymous
May 20, 2004
Damien: I'm interested in your implementation of Union/Intersect/Difference. They seem very costly (O(n)). I wanted to implement the following:
class Union<A> : ISet<A> {
private readonly ISet<A> set1;
private readonly ISet<A> set2;
public Union(ISet<A> set1, ISet<A> set2) {
//assign the class fields
}
public bool Contains(A element) {
return set1.Contains(element) || set2.Contains(element);
}
//etc.
}
This has the benefit of making construction of a union extremely fast, and it is also nice because the original sets are unnaffected. I don't see Union as an operation on a set, but rather a binary operation that produces a new set. When I think about where I've seen unions (in math and CS proofs/algorithms) they've always been non destructive. i.e. c = A U B, neither A nor B is modified. - Anonymous
May 20, 2004
The comment has been removed - Anonymous
May 20, 2004
Not lazy, just functional :-P
Good point about enumerations. I would imagine that people would probably want the uniqueness property satisfied:
Since i am dealing with immutable sets, the same this would happen when you added: you'd get a new set without affecting that last one. - Anonymous
May 20, 2004
I just realized that something as simple as a Count would involve iterating over your expression-tree type sets and that immutable sets would be extremely inefficient when adding one item at a time. - Anonymous
May 20, 2004
Damien: Yes. That's why i am not implementing Count (For now).
I'm not sure why you think it would be innefficient though. Functional sets are quite effcient even when adding elements one at a time. - Anonymous
May 20, 2004
I dont know how functional sets are implemented efficiently. Perhaps you could enlighten me.
I was working on a problem similar to this last week, where I wanted to efficiently copy hashsets. It turned out that without reference counting and finalizers, I couldnt find a way of implmenting a copy operation in the most efficient way.
I wanted to make a copy operation reference the underlying store, which would then be marked as immutable. I also wanted the underlying store to revert to a mutable state when only 1 reference remained. Once a copy was made, writes would be done to a local store, while reads would happen locally with a fallthrough to the lower-level store. As with the Union operation you outlined, the complexity of the expression tree can get out of control, and some kind of flattening strategy would be required. - Anonymous
May 20, 2004
Damien: I would love to. (It would probably make for a good post too). However, I'm going home now. I will try to get to it soon! - Anonymous
May 20, 2004
Have you looked at Java's collection package (source or API?)
I imagine the design decisions there would be of interest to you. Many find the package to be well thought out.
Or does that raise legal issues? - Anonymous
May 20, 2004
David: I'm very familiar with teh java collections API, and certainly something I consider when making my decisions. however, my porpose in doing this is not to clone a preexisting package, but rather to create a ewll designed framework from teh ground up where i understand and can justify every design decision that was made.
A clear example of this w.r.t. java collections is the interplay of ICollection and IMap. At some point I will probably have to cross that bridge or deciding "are they related, or are they separate interface hierarchies". However when I do I really want to have a clear understanding of the benefits/downsides of either choice. If i have that then in the future there will e a clear place for people to turn to if they're curious "why did you do it that way?". - Anonymous
May 21, 2004
Random thought here: Would it be possible to define Equals() such that it's only true if the two collections both agree that they are equal?
Making that check efficient would be a slick trick, and probably doable, but right this minute I can't think of how. - Anonymous
May 21, 2004
Scot: Logically, that's exactly what you'd want.
A = B <==> A.Equals(B) && B.Equals(A).
So it might be possible to do in an external utilities class. So:
CollectionsUtilities.Equals<A>(ICollections<A> c1, ICollection<A> c2) {
return c1.Equals(c2) && c2.Equals(c1);
}
That actually doesn't sound unreasonable. And it seems to extract the arbitraty equality definition away from the individual collection and into a helper (which you could replace with another helper at any given time). - Anonymous
May 24, 2004
The only problem I see with calling Equals() on each collection twice, is that each collection is iterated over twice. The only way around this I've been able to come up with is an interface designed to allow cooperative iteration. So, an interface ICheckEquality would have 3 methods: bool BeginCheck(ICollection<T> collection), void CheckElement(T item), bool EndCheck(T item).
The third-party method would do something like this:
CollectionsUtilities.Equals<A>(ICheckEquality c1, ICheckEquality c2)
{
c1.BeginCheck(c2);
c2.BeginCheck(c1);
//loop to iterate both collections omitted
{
c2.CheckElement(c1.Current);
c1.CheckElement(c2.Current);
c2.MoveNext();
c1.MoveNext();
}
//last element in one of the collections
return c1.EndCheck(c2.Current) && c2.EndCheck(c1.Current);
}
I have butchered the Collection syntax and omitted the control statements, but that should illustrate the idea. It's also probably over-engineering the problem - but it would only iterate each collection once.