Udostępnij za pośrednictwem


Longest consecutive elements sequence

A tiny detail that can be uncovered by looking at a problem on a different angle usually is the key to the solution. So many times I looked at a great API design or problem solution saying “how I didn’t see it”. But sometimes I do see. It was a problem of searching for the longest consecutive elements sequence within unsorted array of integers. For example, in {5, 7, 3, 4, 9, 10, 1, 15, 1, 3, 12, 2, 11} sought sequence is {1, 2, 3, 4, 5}.

The first thing that comes mind is to sort given array in O(n log n ) and look for longest consecutive elements sequence. Eh, we can do better than that.

Using bit vector indexed by numbers from original array may not be justified due incomparable solution space and original array size (although time complexity is O(n)).

Let’s look at the problem more closely. The problem may be reduced to problem of effective range manipulation. Disjoint-set structure or interval trees offer O(n log n) building time complexity. But they do not take into consideration fact that we are dealing with integers. Knowing range boundaries we can definitely say what numbers are in there (for example, [1..3] contains 1, 2, 3). O(1) time complexity for range manipulation operations with O(1) space complexity for each range are the things we are looking for. We can do this using two hash tables:

  • ‘low’ with range start numbers as keys and range end numbers as values
  • ‘high’ with range end numbers as keys and range start numbers as values

For example, for a range [x..y] the tables will hold low[x] = y and high[y] = x. The algorithm looks the following:

  • Scan original array and for each element:
    • If it already belongs to any range skip it
    • Otherwise create unit size range [i..i]
    • If there is a range next to the right [i+1.. y] merge with it to produce [i..y] range
    • If there is a range next to the left [x..i-1] merge with it as well (either [i..i] or merged [i..y] will be merged with [x..i-1])
  • Scan through created ranges to find out the longest

The only question left is how to check if an element already belongs to some range as we are keeping only range boundaries in hash tables. Any number is either processed previously and thus in one of the ranges or a new range created out of it (and potentially merged with others). Thus if a number previously seen it is in some range and we do not need to process it. So before scanning original array we can simply remove any duplicates in O(n) time and space.

 class Solver
{
    public static Tuple<int, int> FindMaxRange(IEnumerable<int> seq)
    {
        // Generate all ranges and select maximum
        return EnumerateRanges(seq)
            .Aggregate((x, y) => Length(x) > Length(y) ? x : y);
    }

    public static IEnumerable<Tuple<int, int>> EnumerateRanges(IEnumerable<int> seq)
    {
        var low = new Dictionary<int, int>();
        var high = new Dictionary<int, int>();

        // Remove duplicates
        foreach (var val in seq.Distinct())
        {
            // Create unit size range
            low[val] = high[val] = val;
            // Merge [i..i] with [i+1..y]
            var endsWith = MergeWithNext(val, low, high, 1);
            // Merge [i..endsWith] with [x..i-1]
            MergeWithNext(endsWith, high, low, -1);
        }

        return low.Select(p => Tuple.Create(p.Key, p.Value));
    }

    static int MergeWithNext(int currStart, IDictionary<int, int> low, IDictionary<int, int> high, int sign)
    {
        var currEnd = low[currStart];
        var nextStart = currEnd + sign;
        if (low.ContainsKey(nextStart))
        {
            low[currStart] = low[nextStart];
            high[low[currStart]] = currStart;
            low.Remove(nextStart);
            high.Remove(currEnd);
        }
        return low[currStart];
    }

    static int Length(Tuple<int, int> t)
    {
        return t.Item2 - t.Item1;
    }
}

The solution has O(n) time and space complexity assuming hash table implementation has O(1) time and space complexity for each operation/element.