restrict(amp) restrictions part 3 of N – function declarators and calls

Article
12/22/2011

This post assumes and requires that you have read the introductory post to this series which also includes a table of content. With that out of the way let’s look at restrictions around function declarators and calls.

Function declarators with restrict(amp)

For a function declarator with restrict(amp) (or restrict(amp, cpu) ), besides the obvious rules that its return type and parameter types must be supported for amp, there are some extra rules as following:

· It is not allowed to have a trailing ellipsis (…) in its parameter list;

· It is not allowed to have an exception specification (including the empty throw() and __declspec(nothrow) );

· It is not allowed to have extern”C” linkage when it has multiple restriction specifiers;

· It is not allowed to be virtual;

Variadic functions require direct support from the C runtime, which is not amp- compatible in C++ AMP v1. In addition, C++ AMP does not support exception handling, therefore, an exception cannot be thrown inside an amp restricted function, and neither can the function have exception specifications. The empty exception specification is harmless, but we disallow it for consistency. The limitation on extern “C” linkage is due to the fact that the current C++ AMP implementation generates multiple symbols for a function with multiple restriction specifiers, which cannot be done for extern “C” functions since they do not have C++ decorated names and thus those symbols cannot be differentiated. Finally, the non-virtual requirement is due to the lack of hardware function call support.

Function calls

Within an amp-restricted function, the target of a function-like invocation (e.g., functions, member functions, object constructors & destructors, operators) must be amp-restricted too. Following the amp type restrictions, we know that it cannot be a virtual function or a function pointer/pointer to member function either. In addition, due to the lack of hardware stack and function call support, it is not allowed for a function to recursively invoke itself directly or via other functions indirectly.

Comments

Anonymous
May 16, 2013
I know the functionality is implied, but just for clarity could you discuss considerations for functions declared with "inline"? How will the VC++11 compiler respond to this keyword and will the rules for inlining functions be different for amp restricted code?
Anonymous
May 17, 2013
Hi Arman, when a restrict(amp) or restict(amp, cpu) function is called within the call graph rooted from the parallel_for_each, it will always be inlined. Please take a look at: blogs.msdn.com/.../c-amp-full-inlining-requirement.aspx. When a restrict(amp, cpu) function is called on host, its the inlining behavior is unchanged.
Anonymous
May 28, 2014
Is it possible to call variadic template amp restricted functions in parallel_for_each with restrict(amp) like this? template <typename... Functions> int FillArray(std::vector<double>& vArray, Functions... functs) { double dParam = 1.0; std::vector<std::function<bool(double)>> vFunctions = { functs... }; for (auto funct : vFunctions) parallel_for_each(vArray.begin(), vArray.begin(), [funct, dParam](double& d) { d += funct(dParam); }); }
Anonymous
May 29, 2014
The comment has been removed
Anonymous
May 30, 2014
Thanks a lot Lukasz. I've known about pointer restriction and is good to know that lambda closures will inline parallel_for_each(...) restrict (amp) statically in compile-time. My problem is more complicated becase I need to use it for multiple GPU-s running on vArray.section like this: // std::vector<double> vParams = { 0, 1, 2, ..., N}; array_view<double> avP(vParams); std::vector<std::function<double(double)>> vFunctions = { functs... }; for (auto funct: vFunctions) parallel_for_each(vGPUs.begin(), vGPUs.end(), [&](pair<accelerator, int> accel) { accelerator_view device = accel.first.get_default_view(); accel.first.set_default_cpu_access_type(access_type_auto); device.wait(); int nGPU = accel.second; auto vArray_index = concurrency::index<1>(nGPU * vArray.extent / nGPUCount); auto vArray_extent = concurrency::extent<1>(vArray.extent / nGPUCount); auto P_index = concurrency::index<1>(nGPU * vArray.extent / nGPUCount); auto vArraySection = vArray.section(vArray_index, vArray_extent); auto avP_section = avP.section(P_index); vArraySection.discard_data(); parallel_for_each(device, vArraySection.extent, [&, funct, avP_section, vArraySection](index<1> idx) restrict(amp) { vArraySection(idx).val = funct(avP_section(idx));; }); vArraySection.synchronize(); }); Problem is - vArray.section is determined in run-time.
Anonymous
June 01, 2014
The comment has been removed
Anonymous
June 03, 2014
Thank You Łukasz! It' works even though I need to use std::tuple instead of std::pair because of another parameters for extent<2> and extent<3> array_view ;-)

Share via

restrict(amp) restrictions part 3 of N – function declarators and calls

Function declarators with restrict(amp)

Function calls

Comments

Additional resources