Using LINQ to write constraints in OCL style v3
In a comment to my last post, Keith Short challenged me to write a version of this constraint with good error reporting, i.e. to post an error to the task list that gives the names of the duplicate properties, and supports navigation to the element that is in error. After a bit of head-scratching, I decided I can't do this easily using the built-in functions (it turns out that the Except() function will not subtract a set from a bag leaving the duplicates). But, using extension methods, I can build my own supporting functions. This is what I ended up with.
[ValidationMethod(ValidationCategories.Menu | ValidationCategories.Save)]
private void TestExampleElement(ValidationContext context)
{
var propnames = this.Properties.Select( p => p.Name);
var duplicateProps = propnames.Duplicates();
if (duplicateProps.Count() > 0)
{
string errors = duplicateProps.CommaSeparatedList();
context.LogError(string.Format("Non-unique property names: {0}", errors), "Error 1", this);
}
var subpropnames = this.Properties.SelectMany(p => p.SubProperties).Select( p => p.Name);
var duplicateSubProps = subpropnames.Duplicates();
if (duplicateSubProps.Count() > 0)
{
string suberrors = duplicateSubProps.CommaSeparatedList();
context.LogError(String.Format("Non-unique sub property names: {0}", suberrors), "Error 2", this);
}
}
public static class C
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> source)
{
return source.Aggregate(Enumerable.Empty<T>(),
(agg, p) => ( source.Count(x => x.Equals(p)) > 1 ?
agg.Union(new T[1] { p } ) : agg ) );
}
public static string CommaSeparatedList(this IEnumerable<string> source)
{
return source.Aggregate(String.Empty,
(agg, s) => String.IsNullOrEmpty(agg) ? s : (agg + ", " + s));
}
}
The Duplicates function goes through the source collection, tests whether each item appears in the source more than once, and if so puts it into the result collection. The errors put into the task list contain a comma-separated list of the duplicated names. The third parameter to LogError is the model element that carries the error; double-clicking on the error navigates to that element.
Comments
Anonymous
August 30, 2007
CommaSeparatedList can be written like this: source.Aggregate((agg, s) => agg + ", " + s); The overload of Aggregate that doesn't take a seed value uses the first value as the seed: http://www.atrevido.net/blog/2007/08/16/Practical+Functional+C+Part+III+Loops+Are+Evil.aspx It probably doesn't matter in this case, but I think the Duplicates function performs in N*N?Anonymous
August 30, 2007
Michael, thanks for the agg tip, especially since I'm always calling it for a non-empty collection. You are right about the N*N - it's not particularly easy to make it faster though, so for this exercise it seemed ok. Any pointers to a more efficient version?Anonymous
August 30, 2007
I think if you pass new List<Property>(duplicateProperties).ToArray() to the erro message, you'll get teh individual properties selected rather than just the parent which might be a bit nicer too.Anonymous
August 30, 2007
Actually, perf is what's going to get interesting here. As these LINQ constructs stack up against the brute force non-indexed collections we have, it'd be easy to end up having traded terseness for perf slightly too far over large models. We'll need to see what the tipping point is to need mopre efficient query processing.Anonymous
August 30, 2007
Well, a very quick way is to use a HashSet: public static IEnumerable<T> Duplicates2<T>(this IEnumerable<T> source) { HashSet<T> primary = new HashSet<T>(); HashSet<T> secondary = new HashSet<T>(); return source.Where(i => !primary.Add(i) && secondary.Add(i)); } That performs linear as far as I know (but uses N + D memory). If there was a way to set the capacity on the HashSet, then it'd perform in constant time. There are probably more efficient but less 'cute' ways of doing it. The difference on a list of 10,000 integers is about 2000x. (12000ms versus 6ms). I'm thinking there could be a good use for a conditional extension method for sets so you wouldn't have to add if statements all over the place for logging. Something like: propNames.Duplicates().IfAny(dups => { Log... });Anonymous
August 30, 2007
Sorry, it wouldn't be constant time overall; I was just referring to the HashSet usage (when the HashSet grows beyond its capacity, it has to resize which is an O(N) operation).Anonymous
September 02, 2007
The comment has been removedAnonymous
September 02, 2007
Oh yes, those hashsets are captured... silly me missing something like that ! :S