HOW TO：使用 parallel_invoke 來撰寫平行排序常式

發行項
05/01/2013

本文件說明如何使用 parallel_invoke 演算法改善雙調排序演算法的效能。雙調排序演算法會以遞迴方式將輸入序列分割成較小的已排序資料分割。因為每個分割作業都與所有其他作業無關，所以雙調排序演算法可以以平行方式執行。

雖然雙調排序是一種會將所有輸入序列組合排序的「排序網路」(Sorting Network)，但是它會將長度為 2 的次方的序列排序。

注意事項
這個範例說明使用平行迴圈的排序常式。您也可以使用 PPL 所提供的內建排序演算法: concurrency::parallel_sort、 concurrency::parallel_buffered_sort和 concurrency::parallel_radixsort。如需詳細資訊，請參閱平行演算法。

章節

本文件將說明下列工作：

以循序方式執行雙調排序
使用 parallel_invoke 以平行方式執行雙調排序

以循序方式執行雙調排序

下列範例顯示雙調排序演算法的循序版本。bitonic_sort 函式會將序列分割成兩個資料分割，並將這些資料分割反向排序，然後合併結果。這個函式會以遞迴方式呼叫自己兩次，來將每個資料分割排序。

const bool INCREASING = true;
const bool DECREASING = false;

// Comparator function for the bitonic sort algorithm.
template <class T>
void compare(T* items, int i, int j, bool dir)
{
   if (dir == (items[i] > items[j]))
   {
      swap(items[i], items[j]);
   }
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void bitonic_merge(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }
      bitonic_merge(items, lo, m, dir);
      bitonic_merge(items, lo + m, m, dir);
   }
}

// Sorts the given sequence in the specified order.
template <class T>
void bitonic_sort(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;
      bitonic_sort(items, lo, m, INCREASING);
      bitonic_sort(items, lo + m, m, DECREASING);

      // Merge the results.
      bitonic_merge(items,lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void bitonic_sort(T* items, int size)
{
    bitonic_sort(items, 0, size, INCREASING);
}

[]上述

使用 parallel_invoke 以平行方式執行雙調排序

本節說明如何使用 parallel_invoke 演算法來平行執行雙調排序演算法。

程序

若要以平行方式執行雙調排序演算法

加入標頭檔 ppl.h 的 #include 指示詞。
```
#include <ppl.h>
```
加入 concurrency 命名空間的 using 指示詞。
```
using namespace concurrency;
```

建立名為 parallel_bitonic_mege 的函式，這個函式會在有足夠的工作量時使用 parallel_invoke 演算法平行合併序列。否則呼叫 bitonic_merge，循序合併序列。

// Sorts a bitonic sequence in the specified order.
template <class T>
void parallel_bitonic_merge(T* items, int lo, int n, bool dir)
{   
   // Merge the sequences concurrently if there is sufficient work to do.
   if (n > 500)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }

      // Use the parallel_invoke algorithm to merge the sequences in parallel.
      parallel_invoke(
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo, m, dir); },
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo + m, m, dir); }
      );
   }
   // Otherwise, perform the work serially.
   else if (n > 1)
   {
      bitonic_merge(items, lo, n, dir);
   }   
}

使用 bitonic_sort 函式，執行與前述步驟類似的程序。

// Sorts the given sequence in the specified order.
template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{   
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );

      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

建立 parallel_bitonic_sort 函式的多載版本，依遞增順序排序陣列。

// Sorts the given sequence in increasing order.
template <class T>
void parallel_bitonic_sort(T* items, int size)
{
   parallel_bitonic_sort(items, 0, size, INCREASING);
}

parallel_invoke 演算法會在呼叫端內容上執行最後一個工作序列，以減少額外負荷。例如，在 parallel_bitonic_sort 函式中，第一個工作會在個別內容上執行，而第二個工作會在呼叫端內容上執行。

// Sort the partitions in parallel.
parallel_invoke(
   [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
   [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
);

下列完整範例會執行雙調排序演算法的循序和平行版本。這個範例也會將每個計算需要的執行時間列印至主控台。

// parallel-bitonic-sort.cpp
// compile with: /EHsc
#include <windows.h>
#include <algorithm>
#include <iostream>
#include <random>
#include <ppl.h>

using namespace concurrency;
using namespace std;

// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{
   __int64 begin = GetTickCount();
   f();
   return GetTickCount() - begin;
}

const bool INCREASING = true;
const bool DECREASING = false;

// Comparator function for the bitonic sort algorithm.
template <class T>
void compare(T* items, int i, int j, bool dir)
{
   if (dir == (items[i] > items[j]))
   {
      swap(items[i], items[j]);
   }
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void bitonic_merge(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }
      bitonic_merge(items, lo, m, dir);
      bitonic_merge(items, lo + m, m, dir);
   }
}

// Sorts the given sequence in the specified order.
template <class T>
void bitonic_sort(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;
      bitonic_sort(items, lo, m, INCREASING);
      bitonic_sort(items, lo + m, m, DECREASING);

      // Merge the results.
      bitonic_merge(items,lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void bitonic_sort(T* items, int size)
{
    bitonic_sort(items, 0, size, INCREASING);
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void parallel_bitonic_merge(T* items, int lo, int n, bool dir)
{   
   // Merge the sequences concurrently if there is sufficient work to do.
   if (n > 500)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }

      // Use the parallel_invoke algorithm to merge the sequences in parallel.
      parallel_invoke(
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo, m, dir); },
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo + m, m, dir); }
      );
   }
   // Otherwise, perform the work serially.
   else if (n > 1)
   {
      bitonic_merge(items, lo, n, dir);
   }   
}

// Sorts the given sequence in the specified order.
template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{   
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );

      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void parallel_bitonic_sort(T* items, int size)
{
   parallel_bitonic_sort(items, 0, size, INCREASING);
}

int wmain()
{  
   // For this example, the size must be a power of two.
   const int size = 0x200000;

   // Create two large arrays and fill them with random values.
   int* a1 = new int[size];
   int* a2 = new int[size];

   mt19937 gen(42);
   for(int i = 0; i < size; ++i)
   {
      a1[i] = a2[i] = gen();
   }

   __int64 elapsed;

   // Perform the serial version of the sort.
   elapsed = time_call([&] { bitonic_sort(a1, size); });
   wcout << L"serial time: " << elapsed << endl;

   // Now perform the parallel version of the sort.
   elapsed = time_call([&] { parallel_bitonic_sort(a2, size); });
   wcout << L"parallel time: " << elapsed << endl;

   delete[] a1;
   delete[] a2;
}

下列是針對配備四個處理器之電腦的範例輸出。

serial time: 4353
parallel time: 1248

[]上述

編譯程式碼

編譯程式碼，請複製並貼到 Visual Studio 專案或貼入名為平行雙調 sort.cpp 並在 Visual Studio 命令提示字元] 視窗中執行下列命令的檔案。

cl.exe /EHsc parallel-bitonic-sort.cpp

健全的程式設計

，因為存留期每個工作群組不會比函式長，擴充這個範例會使用 parallel_invoke 演算法而非 concurrency::task_group 類別。建議您盡量使用 parallel_invoke，因為它的執行額外負荷小於 task group 物件，因此可讓您撰寫執行效果更佳的程式碼。

有些演算法的平行版本只有在有足夠的工作要執行時，才會有較佳的執行效果。例如，如果序列中有 500 個以下的項目，則 parallel_bitonic_merge 函式會呼叫 bitonic_merge 這個循序版本。您也可以根據工作量來規劃整體排序策略。例如，如果陣列包含少於 500 個項目，使用快速排序演算法的循序版本可能更有效率，如下列範例所示：

template <class T>
void quick_sort(T* items, int lo, int n)
{
   // TODO: The function body is omitted for brevity.
}

template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{
   // Use the serial quick sort algorithm if there are relatively few
   // items to sort. The associated overhead for running few tasks in 
   // parallel may not overcome the benefits of parallel processing.
   if (n - lo + 1 <= 500)
   {
      quick_sort(items, lo, n);
   }
   else if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );

      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

如同任何平行演算法，建議您適當地進行程式碼剖析和微調。

共用方式為