Mais conteúdo relacionado
Semelhante a SYCL 1.2.1 Reference Card (20)
Mais de The Khronos Group Inc. (16)
SYCL 1.2.1 Reference Card
- 1. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818
SYCL 1.2.1 API Reference Guide Page 1
Runtime: Device class [4.6.4]
device();
explicit device(const device_selector &deviceSelector);
explicit device(cl_device_id deviceId);
cl_device_id get() const;
platform get_platform() const;
bool is_host() const;
bool is_cpu() const;
bool is_gpu() const;
bool is_accelerator() const;
template <info::device param> typename info::param_traits<
info::device, param>::return_type get_info() const;
bool has_extension(const string_class &extension) const;
template <info::partition_property prop>
vector_class<device> create_sub_devices(
size_t nbSubDev) const;
template <info::partition_property prop>
vector_class<device> create_sub_devices(
const vector_class<size_t> &counts) const;
template <info::partition_property prop>
vector_class<device> create_sub_devices(
info::affinity_domain affinityDomain) const;
static vector_class<device> get_devices(
info::device_type deviceType =
info::device_type::all);
(Continued on next page)
SYCL™ is a C++ programming model for OpenCL which builds on the underlying concepts,
portability, and efficiency of OpenCL while adding much of the ease of use and flexibility of C++.
[n.n] refers to sections in the SYCL 1.2.1 specification available at khronos.org/registry/sycl
SYCL example [3.1.3]
Below is an example of a typical SYCL application which schedules a job
to run in parallel on any OpenCL accelerator.
#include <CL/sycl.hpp>
#include <iostream>
int main() {
using namespace cl::sycl;
int data[1024]; // Allocates data to be worked on
// Include all the SYCL work in a {} block to ensure all
// SYCL tasks are completed before exiting the block.
{
// Create a queue to enqueue work to
queue myQueue;
// Wrap the data array variable in a buffer.
buffer<size_t, 1> resultBuf { data, range<1> { 1024 } };
// Create a command group to
// issue commands to the queue.
myQueue.submit([&](handler & cgh)
{
// Request access to the buffer
auto writeResult = resultBuf.get_access<access::mode::write>(cgh);
// Enqueue a parallel_for task.
cgh.parallel_for<class simple_test>(range<1>{ 1024 }, [=](id<1> idx)
{
writeResult[idx] = idx[0];
}); // End of the kernel function
}); // End of the queue commands
} // End of scope, so wait for the queued work to complete
// Print result
for (int i = 0; i < 1024; i++) {
std::cout <<''data[''<< i << ''] = '' << data[i] << std::endl;
}
return 0;
}
Runtime: Context class [4.6.3]
explicit context(async_handler asyncHandler = {});
context(const device &dev,
async_handler asyncHandler = {});
context(const platform &plt,
async_handler asyncHandler = {});
context(const vector_class<device> &deviceList,
async_handler asyncHandler = {});
context(cl_context clContext,
async_handler asyncHandler = {});
cl_context get() const;
bool is_host() const;
template <info::context param>
typename info::param_traits
<info::context, param>::return_type get_info() const;
platform get_platform() const;
vector_class<device> get_devices() const;
Context queries using get_info():
Descriptor Return type
info::context::reference_count cl_uint
info::context::platform platform
info::context::devices vector_class<device>
Runtime: Platform class [4.6.2]
platform();
explicit platform(cl_platform_id platformID);
explicit platform(const device_selector &deviceSelector);
cl_platform_id get() const;
static vector_class<platform> get_platforms();
vector_class<device> get_devices(
info::device_type = info::device_type::all) const;
template <info::platform param>
typename info::param_traits<
info::platform, param>::return_type get_info() const;
bool has_extension(const string_class & extension) const;
bool is_host() const;
Platform information descriptors:
Platform descriptors Return type
info::platform::profile string_class
info::platform::version string_class
info::platform::name string_class
info::platform::vendor string_class
info::platform::extensions vector_class<string_class>
Header file
SYCL programs must include the
<CL/sycl.hpp> header file to provide
all of the SYCL features.
Namespace
All SYCL names are defined in the
cl::sycl namespace.
Queue
See queue class functions [4.6.5]
on page 2 of this reference guide.
Buffer
See buffer class functions [4.7.2]
on page 2 of this reference guide.
Accessor
See accessor class function [4.7.6]
on page 3 of this reference guide.
Handler
See handler class functions [4.8.3]
on page 5 of this reference guide.
Scopes
The kernel scope specifies a single
kernel function that will be, or has
been, compiled by a device compiler
and executed on a device. See
Invoking Kernels [4.8.5] in the spec.
The command group scope specifies
a unit of work which is comprised of
a kernel function and accessors.
The application scope specifies all
other code outside of a command
group scope.
Runtime: Device selection class [4.6.1]
device_selector();
device_selector(const device_selector &rhs);
device_selector &operator=(const device_selector &rhs);
virtual ~device_selector();
device select_device() const;
virtual int operator()(const device &device) const;
SYCL device selectors:
Device selectors Description
default_selector Devices selected by heuristics of the system
gpu_selector
Select devices according to device type
info::device::device_type::gpu
cpu_selector
Select devices according to device type
info::device::device_type::cpu
host_selector Selects the SYCL host
Runtime: Common interface [4.3.2 - 4.3.4]
Member functions for by-value semantics
T may be device, context, queue, program, kernel, event,
buffer, image, sampler, accessor, or stream.
T(const T &rhs);
T(T &&rhs);
T &operator=(const T &rhs);
T &operator=(T &&rhs);
~T();
bool operator==(const T &rhs) const;
bool operator!=(const T &rhs) const;
Member functions for by-value semantics
In the following functions, T may be id, range, item, nd_item,
h_item, group, or nd_range.
T(const T &rhs) = default;
T(T &&rhs) = default;
T &operator=(const T &rhs) = default;
T &operator=(T &&rhs) = default;
~T() = default;
bool operator==(const T &rhs) const;
bool operator!=(const T &rhs) const;
Properties interface [4.3.4.1]
Some runtime classes provide this properties interface.
class T {
. . .
template <typename propertyT>
bool has_property() const;
template <typename propertyT>
propertyT get_property() const;
. . .
};
class property_list {
public:
template <typename... propertyTN>
property_list(propertyTN... props);
};
- 2. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818
SYCL 1.2.1 API Reference Guide Page 2
Runtime: Device class (continued)
Device queries using get_info():
Descriptor in info::device Return type
device_type info::device_type
vendor_id cl_uint
max_compute_units cl_uint
max_work_item_dimensions cl_uint
max_work_item_sizes id<3>
max_work_group_size size_t
preferred_vector_width_char
cl_uint
preferred_vector_width_short
preferred_vector_width_int
preferred_vector_width_long
preferred_vector_width_float
preferred_vector_width_double
preferred_vector_width_half
native_vector_width_char
cl_uint
native_vector_width_short
native_vector_width_int
native_vector_width_long
native_vector_width_float
native_vector_width_double
native_vector_width_half
max_clock_frequency cl_uint
address_bits cl_uint
max_mem_alloc_size cl_ulong
image_support bool
Descriptor in info::device Return type
max_read_image_args cl_uint
max_write_image_args cl_uint
image2d_max_width size_t
image2d_max_height size_t
image3d_max_width size_t
image3d_max_height size_t
image3d_max_depth size_t
image_max_buffer_size size_t
image_max_array_size size_t
max_samplers cl_uint
max_parameter_size size_t
mem_base_addr_align cl_uint
half_fp_config vector_class<info::fp_config>
single_fp_config vector_class<info::fp_config>
double_fp_config vector_class<info::fp_config>
global_mem_cache_type info::global_mem_cache_type
global_mem_cache_line_size cl_uint
global_mem_cache_size cl_ulong
global_mem_size cl_ulong
max_constant_buffer_size cl_ulong
max_constant_args cl_uint
local_mem_type info::local_mem_type
local_mem_size cl_ulong
error_correction_support bool
host_unified_memory bool
profiling_timer_resolution size_t
is_endian_little bool
Descriptor in info::device Return type
is_available bool
is_compiler_available bool
is_linker_available bool
execution_capabilities
vector_class
<info::execution_capability>
queue_profiling bool
built_in_kernels vector_class<string_class>
platform platform
name string_class
vendor string_class
driver_version string_class
profile string_class
version string_class
opencl_c_version string_class
extensions vector_class<string_class>
printf_buffer_size size_t
preferred_interop_user_sync bool
parent_device device
partition_max_sub_devices cl_uint
partition_properties
vector_class <info::
partition_property>
partition_affinity_domains
vector_class
<info::partition_affinity_
domain>
partition_type_property info::partition_property
partition_type_affinity_domain info::partition_affinity_domain
reference_count cl_uint
Runtime: Queue class [4.6.5]
Property class members:
property::queue::enable_profiling::enable_profiling();
explicit queue(const property_list &propList = {});
queue(const async_handler &asyncHandler,
const property_list &propList = {});
queue(const device_selector &deviceSelector,
const property_list &propList = {});
queue(const device_selector &deviceSelector,
const async_handler &asyncHandler,
const property_list &propList = {});
queue(const device &syclDevice,
const property_list &propList = {});
queue(const device &syclDevice,
const async_handler &asyncHandler,
const property_list &propList = {});
queue(const context &syclContext,
const device_selector &deviceSelector,
const property_list &propList = {});
queue(const context &syclContext,
const device_selector &deviceSelector,
const async_handler &asyncHandler,
const property_list &propList = {});
queue(cl_command_queue clQueue,
const context &syclContext,
const async_handler &asyncHandler = {});
cl_command_queue get() const;
context get_context() const;
device get_device() const;
bool is_host() const;
void wait();
void wait_and_throw();
void throw_asynchronous();
template <info::queue param>
typename info::param_traits<
info::queue, param>::return_type get_info() const;
template <typename T> event submit(T cgf);
template <typename T> event submit(T cgf,
const queue &secondaryQueue);
Queue queries using get_info():
Descriptor Return type
info::queue::context context
info::queue::device device
info::queue::reference_count cl_uint
Buffer class [4.7.2]
Property class members:
property::buffer::use_host_ptr::use_host_ptr();
property::buffer::use_mutex::use_mutex(mutex_class &mutexRef);
property::buffer::context_bound::context_bound(
context boundContext);
mutex_class *property::buffer::use_mutex::get_mutex_ptr() const;
context property::buffer::context_bound::get_context() const;
Class Declaration
template <typename T, int dimensions = 1,
typename AllocatorT = cl::sycl::buffer_allocator>
class buffer;
Member Functions
buffer(const range<dimensions> &bufferRange,
const property_list &propList = {});
buffer(const range<dimensions> &bufferRange,
AllocatorT allocator, const property_list &propList = {});
buffer(T *hostData, const range<dimensions> &bufferRange,
const property_list &propList = {});
buffer(T *hostData, const range<dimensions> &bufferRange,
AllocatorT allocator, const property_list &propList = {});
buffer(const T *hostData,
const range<dimensions> &bufferRange,
const property_list &propList = {});
buffer(const T *hostData,
const range<dimensions> &bufferRange,
AllocatorT allocator, const property_list &propList = {});
buffer(shared_ptr_class<T> &hostData,
const range<dimensions> &bufferRange,
AllocatorT allocator, const property_list &propList = {});
buffer(shared_ptr_class<T> &hostData,
const range<dimensions> &bufferRange,
const property_list &propList = {});
template <class InputIterator>
buffer(InputIterator first, InputIterator last,
AllocatorT allocator, const property_list &propList = {});
template <class InputIterator>
buffer(InputIterator first, InputIterator last,
const property_list &propList = {});
buffer(buffer<T, dimensions, AllocatorT> &b,
const index<dimensions> &baseIndex,
const range<dimensions> &subRange);
buffer(cl_mem clMemObject, const context &syclContext,
event availableEvent = {});
range<dimensions> get_range() const;
size_t get_count() const;
size_t get_size() const;
AllocatorT get_allocator() const;
template <access::mode mode,
access::target target = access::target::global_buffer>
accessor<T, dimensions, mode, target>
get_access(handler &commandGroupHandler);
template <access::mode mode>accessor<T, dimensions,
mode, access::target::host_buffer> get_access();
template <access::mode mode,
access::target target = access::target::global_buffer>
accessor<T, dimensions, mode, target>
get_access(handler &commandGroupHandler,
range<dimensions> accessRange,
id<dimensions> accessOffset = {});
template <access::mode mode>
accessor<T, dimensions, mode, access::target::host_buffer>
get_access(range<dimensions> accessRange,
id<dimensions> accessOffset = {});
template <typename Destination = std::nullptr_t>
void set_final_data(Destination finalData = std::nullptr);
void set_write_back(bool flag = true);
bool is_sub_buffer() const;
template <typename reinterpretT, int reinterpretDim>
buffer<reinterpretT, reinterpretDim, AllocatorT>
reinterpret(range<reinterpretDim> reinterpretRange)
const;
Runtime: Event class [4.6.6]
event()
event(cl_event clEvent, const context &syclContext);
cl_event get();
vector_class<event> get_wait_list();
void wait();
void wait_and_throw();
static void wait(const vector_class<event> &eventList);
static void wait_and_throw(
const vector_class<event> &eventList);
template <info::event param>typename info::param_traits<
info::event, param>::return_type get_info() const;
template <info::event_profiling param>
typename info::param_traits<info::event_profiling,
param>::return_type get_profiling_info() const;
Event queries using get_info()
Descriptor Return type
info::event::command_execution_status info::event::command_
status
info::event::reference_count cl_uint
info::event_profiling::command_submit cl_ulong
info::event_profiling::command_start cl_ulong
info::event_profiling::command_end cl_ulong
- 3. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818
SYCL 1.2.1 API Reference Guide Page 3
Accessor class [4.7.6]
Enums
access::mode access::target
read,
write,
read_write,
discard_write,
discard_read_write,
atomic
global_buffer,
constant_buffer,
local,
image,
host_buffer,
host_image,
image_array
Class Declaration
template <typename dataT, int dimensions,
access::mode accessmode, access::target accessTarget
= access::target::global_buffer, access::placeholder
isPlaceholder = access::placeholder::false_t> class accessor;
Members of class accessor for buffers
accessor(buffer<dataT, 1> &bufferRef);
accessor(buffer<dataT, 1> &bufferRef,
handler &commandGroupHandlerRef);
accessor(buffer<dataT, dimensions> &bufferRef);
accessor(buffer<dataT, dimensions> &bufferRef,
handler &commandGroupHandlerRef);
accessor(buffer<dataT, dimensions> &bufferRef,
range<dimensions> accessRange,
id<dimensions> accessOffset = {});
accessor(buffer<dataT, dimensions> &bufferRef,
handler &commandGroupHandlerRef,
range<dimensions> accessRange,
id<dimensions> accessOffset = {});
constexpr bool is_placeholder() const;
size_t get_size() const;
size_t get_count() const;
range<dimensions> get_range() const;
range<dimensions> get_offset() const;
operator dataT &() const;
dataT &operator[](id<dimensions> index) const;
dataT &operator[](size_t index) const;
operator dataT() const;
dataT operator[](id<dimensions> index) const;
dataT operator[](size_t index) const;
operator atomic<dataT, access::address_space::global_space>
() const;
atomic<dataT, access::address_space::global_space>
operator[](id<dimensions> index) const;
atomic<dataT, access::address_space::global_space>
operator[](size_t index) const;
__unspecified__ &operator[](size_t index) const;
dataT *get_pointer() const;
global_ptr<dataT> get_pointer() const;
constant_ptr<dataT> get_pointer() const;
Members of class accessor for local access
accessor(handler &commandGroupHandlerRef);
accessor(range<dimensions> allocationSize,
handler &commandGroupHandlerRef);
size_t get_size() const;
size_t get_count() const;
operator dataT &() const;
dataT &operator[](id<dimensions> index) const;
dataT &operator[](size_t index) const;
operator atomic<dataT, access::address_space::local_space>
() const;
atomic<dataT, access::address_space::local_space>
operator[](id<dimensions> index) const;
atomic<dataT, access::address_space::local_space>
operator[](size_t index) const;
__unspecified__ &operator[](size_t index) const;
local_ptr<dataT> get_pointer() const;
Members of class accessor for images
template <typename AllocatorT>
accessor(image<dimensions, AllocatorT> &imageRef);
template <typename AllocatorT>
accessor(image<dimensions, AllocatorT> &imageRef,
handler &commandGroupHandlerRef);
template <typename AllocatorT>
accessor(image<dimensions + 1, AllocatorT> &imageRef,
handler &commandGroupHandlerRef);
accessor<dataT, dimensions, mode, image>
operator[](size_t index) const;
size_t get_size() const;
size_t get_count() const;
template <typename coordT>
dataT read(const coordT &coords) const;
template <typename coordT>
dataT read(const coordT &coords, const sampler &smpl)
const;
template <typename coordT> void write(const coordT
&coords, const dataT &color) const;
Accessor Capabilities
Buffer [Table 4.44]
The data type must match that of the SYCL buffer. The
dimensionality is 0, 1, 2, or 3.
Access Target
Accessor
Type Access Mode
Placeholder
modes
global_buffer Device
read, write, read_write
discard_write,
discard_read_write,
atomic
false_t
true_t
constant_buffer Device read
false_t
true_t
host_buffer Host
read, write, read_write
discard_write,
discard_read_write
false_t
Local [Table 4.47]
The data type must match that of the SYCL buffer. The
dimensionality is 0, 1, 2, or 3.
Access Target
Accessor
Type Access Mode
Placeholder
modes
local Device read_write, atomic false_t
(Continued on next page)
Image class [4.7.3]
Property class members:
property::image::use_host_ptr::use_host_ptr();
property::image::use_mutex::use_mutex(mutex_class &mutexRef);
property::image::context_bound::context_bound(
context boundContext);
mutex_class *property::image::use_mutex::get_mutex_ptr() const;
context property::image::context_bound::get_context() const;
Enums
image_channel_order image_channel_type
a,
r,
rx,
rg,
rgx,
ra,
rgb,
rgbx,
rgba,
argb,
bgra,
intensity,
luminance,
abgr
snorm_int8,
snorm_int16,
unorm_int8,
unorm_int16,
unorm_short_565,
unorm_short_555,
unorm_int_101010,
signed_int8,
signed_int16,
signed_int32,
unsigned_int8,
unsigned_int16,
unsigned_int32,
fp16,
fp32
Class Declaration
template <int dimensions = 1, typename AllocatorT =
cl::sycl::image_allocator> class image;
Member Functions
image(image_channel_order order,
image_channel_type type,
const range<dimensions> &range,
const property_list &propList = {});
image(image_channel_order order,
image_channel_type type,
const range<dimensions> &range, AllocatorT allocator,
const property_list &propList = {});
image(image_channel_order order, image_channel_type type,
const range<dimensions> &range,
const range<dimensions - 1> &pitch,
const property_list &propList = {});
image(void *hostPointer, image_channel_order order,
image_channel_type type,
const range<dimensions> &range,
const range<dimensions - 1> &pitch, AllocatorT allocator,
const property_list &propList = {});
image(void *hostPointer,
image_channel_order order, image_channel_type type,
const range<dimensions> &range,
const property_list &propList = {});
image(void *hostPointer,
image_channel_order order, image_channel_type type,
const range<dimensions> &range, AllocatorT allocator,
const property_list &propList = {});
image(void *hostPointer,
image_channel_order order, image_channel_type type,
const range<dimensions> &range,
const range<dimensions - 1> &pitch,
const property_list &propList = {});
image(void *hostPointer, image_channel_order order,
image_channel_type type,
const range<dimensions> &range, const
range<dimensions - 1> &pitch, AllocatorT allocator,
const property_list &propList = {});
image(void *hostPointer, image_channel_order order,
image_channel_type type,
const range<dimensions> &range,
const property_list &propList = {});
image(void *hostPointer,
image_channel_order order, image_channel_type type,
const range<dimensions> &range, AllocatorT allocator,
const property_list &propList = {});
image(shared_ptr_class<void> &hostPointer,
image_channel_order order, image_channel_type type,
const range<dimensions> &range,
const property_list &propList = {});
image(shared_ptr_class<void> &hostPointer,
image_channel_order order, image_channel_type type,
const range<dimensions> &range, AllocatorT allocator,
const property_list &propList = {});
image(shared_ptr_class<void> &hostPointer,
image_channel_order order, image_channel_type type,
const range<dimensions> &range,
const range<dimensions - 1> &pitch,
const property_list &propList = {});
image(shared_ptr_class<void> &hostPointer,
image_channel_order order, image_channel_type type,
const range<dimensions> &range, const
range<dimensions - 1> &pitch, AllocatorT allocator,
const property_list &propList = {});
image(cl_mem clMemObject, const context &syclContext,
event availableEvent = {});
range<dimensions> get_range() const;
range<dimensions-1> get_pitch() const;
size_t get_count() const;
size_t get_size() const;
AllocatorT get_allocator() const;
template <typename dataT, access::mode mode>
accessor<dataT, dimensions,
accessMode, access::target::image>
get_access(handler &commandGroupHandler);
template <typename dataT, access::mode mode>
accessor<dataT, dimensions, accessMode,
access::target::host_image> get_access();
template <typename Destination = std::nullptr_t>
void set_final_data(Destination finalData = std::nullptr);
void set_write_back(bool flag = true);
Data management: Host allocation [4.7.1]
The default allocator for memory objects is implementation
defined, but users can supply their own allocator class.
buffer<int, 1, UserDefinedAllocator<int> > b(d);
Default allocators
Allocators Description
buffer_allocator Default buffer allocator used by the runtime,
when no allocator is defined by the user.
image_allocator Default image allocator used by the runtime for
the image class, when no allocator is defined by
the user. Must be a byte-sized allocator.
- 4. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818
SYCL 1.2.1 API Reference Guide Page 4
Multi_ptr class [4.7.7]
Class Declaration
template <typename elementType,
access::address_space addressSpace> class multi_ptr;
Member Functions
multi_ptr();
multi_ptr(const multi_ptr&);
multi_ptr(multi_ptr&&);
multi_ptr(pointer_t);
multi_ptr(elementType*);
multi_ptr(void*);
multi_ptr(std::nullptr_t);
~multi_ptr();
multi_ptr &operator=(const multi_ptr&);
multi_ptr &operator=(multi_ptr&&);
multi_ptr &operator=(pointer_t);
multi_ptr &operator=(elementType*);
multi_ptr &operator=(void*);
multi_ptr &operator=(std::nullptr_t);
elementType &operator=() const;
elementType *operator=() const;
template <int dimensions, access::mode addressMode>
multi_ptr(accessor<elementType, dimensions,
addressMode, access::target::global_buffer>);
template <int dimensions, access::mode addressMode>
multi_ptr(accessor<elementType, dimensions,
addressMode, access::target::local>);
template <int dimensions, access::mode addressMode>
multi_ptr(accessor<elementType, dimensions,
addressMode, access::target::constant_buffer>);
template <typename elementType, int dimensions,
access::mode addressMode>
multi_ptr(accessor<elementType, dimensions,
addressMode, access::target::global_buffer>);
template <typename elementType, int dimensions,
access::mode addressMode>
multi_ptr (accessor<elementType, dimensions,
addressMode, access::target::local>);
template <typename elementType, int dimensions,
access::mode addressMode>
multi_ptr(accessor<elementType, dimensions,
addressMode, access::target::constant_buffer>);
pointer_t get() const;
operator void*() const;
template <typename elementType>
explicit operator multi_ptr<elementType, addressSpace>()
const;
Non-member Functions
template <typename elementType, access::address_space
addressSpace> multi_ptr<elementType, addressSpace>
make_ptr(multi_ptr<elementType,
addressSpace>::pointer_t);
template
<typename elementType, access::address_space
addressSpace> multi_ptr<elementType, addressSpace>
make_ptr(elementType*);
template
<typename elementType, access::address_space
addressSpace>
bool operatorOP(const multi_ptr<elementType,
addressSpace> & lhs, const multi_ptr<elementType,
addressSpace>& rhs);
OP is one of ==, !=, <, >, <=, >=
template
<typename elementType, access::address_space
addressSpace>
bool operatorOP(const multi_ptr<elementType,
addressSpace>& lhs, std::nullptr_t rhs);
OP is one of ==, !=, <, >, <=, >=
template
<typename elementType, access::address_space
addressSpace>
bool operatorOP(std::nullptr_t lhs,
const multi_ptr<elementType, addressSpace>& rhs);
OP is one of ==, !=, <, >, <=, >=
Template specialization aliases [4.7.7.2]
template <typename elementType>
using global_ptr = multi_ptr<elementType,
access::address_space::global_space>;
template <typename elementType>
using local_ptr = multi_ptr<elementType,
access::address_space::local_space>;
template <typename elementType>
using constant_ptr = multi_ptr<elementType,
access::address_space::constant_space>;
template <typename elementType>
using private_ptr = multi_ptr<elementType,
access::address_space::private_space>;
void prefetch(size_t numElements) const;
Ranges and index space identifiers[4.8.1]
Class range [4.8.1.1]
Class declaration
template <size_t dimensions = 1> class range;
Member functions
range(size_t dim0);
range(size_t dim0, size_t dim1);
range(size_t dim0, size_t dim1, size_t dim2);
size_t get(int dimension) const;
size_t &operator[](int dimension);
size_t size() const;
range<dimensions> operatorOP(
const range<dimensions> &rhs) const;
range<dimensions> operatorOP(const size_t &rhs) const;
range<dimensions> &operatorOP(
const range<dimensions> &rhs);
range<dimensions> &operatorOP(const size_t &rhs);
Non-member function
template <int dimensions>
range<dimensions> operatorOP(const size_t &lhs,
const range<dimensions> &rhs);
Class nd_range [4.8.1.2]
Class declaration
template <size_t dimensions = 1> struct nd_range;
Member functions
nd_range(range<dimensions> globalSize,
range<dimensions> localSize,
id<dimensions>
offset = id<dimensions>());
range<dimensions> get_global_range() const;
range<dimensions> get_local_range() const;
range<dimensions> get_group_range() const;
id<dimensions> get_offset() const;
Class id [4.8.1.3]
Class declaration
template <size_t dimensions = 1> struct id;
Member functions
id();
id(size_t dim0);
id(size_t dim0, size_t dim1);
id(size_t dim0, size_t dim1, size_t dim2);
id(const range<dimensions> &range);
id(const item<dimensions> &item);
size_t get(int dimension) const;
size_t &operator[](int dimension) const;
operator size_t();
id<dimensions> operatorOP(const id<dimensions> &rhs)
const;
where OP is one of +, -, *, /, %, <<, >>, &, |, ˆ, &&, ||, <, >, <=, >=
id<dimensions> operatorOP(const size_t &rhs) const;
where OP is one of +, -, *, /, %, <<, >>, &, |, ˆ, &&, ||, <, >, <=, >=
id<dimensions> &operatorOP(const id<dimensions> &rhs);
where OP is one of +=, -=, *=, /=, %=, <<=, >>=, &=, |=, ˆ=
id<dimensions> &operatorOP(const size_t &rhs);
where OP is one of +=, -=, *=, /=, %=, <<=, >>=, &=, |=, ˆ=
Non-member functions
template <int dimensions> id<dimensions> operatorOP(
const size_t &lhs, const id<dimensions> &rhs);
where OP is one of +, -, *, /, %, <<, >>, &, |, ˆ
Class item [4.8.1.4-5]
Class declaration
template <int dimensions = 1, bool with_offset = true>
struct item;
Member functions
id<dimensions> get_id() const;
size_t get_id(int dimension) const;
size_t &operator[](int dimension);
range<dimensions> get_range() const;
id<dimensions> get_offset() const;
operator item<dimensions, true>() const;
size_t get_linear_id() const;
Class nd_item [4.8.1.6]
Class declaration
template <size_t dimensions = 1> struct nd_item;
Member functions
id<dimensions> get_global_id() const;
size_t get_global_id(int dimension) const;
size_t get_global_linear_id() const;
id<dimensions> get_local_id() const;
size_t get_local_id(int dimension) const;
size_t get_local_linear_id() const;
group<dimensions> get_group() const;
size_t get_group(int dimension) const;
size_t get_group_linear_id() const;
size_t get_num_range(int dimension) const;
id<dimensions> get_num_range() const;
range<dimensions> get_global_range() const;
range<dimensions> get_local_range() const;
id<dimensions> get_offset() const;
nd_range<dimensions> get_nd_range() const;
void barrier(access::fence_space accessSpace =
access::fence_space::global_and_local) const;
template
<access::mode accessMode = access::mode::read_write>
void mem_fence(access::fence_space accessSpace =
access::fence_space::global_and_local) const;
template <typename dataT>
device_event async_work_group_copy(
local_ptr<dataT> dest, global_ptr<dataT> src,
size_t numElements) const;
template <typename dataT>
device_event async_work_group_copy(
global_ptr<dataT> dest, local_ptr<dataT> src,
size_t numElements) const;
(Continued on next page)
Sampler class [4.7.8]
Enums
addressing_mode sampler_filtering_mode
mirrored_repeat
repeat
clamp_to_edge
clamp
none
nearest
linear
coordinate_normalization_mode
normalized
unnormalized
Class sampler members
sampler(coordinate_normalization_mode
normalizationMode, addressing_mode addressingMode,
filtering_mode filteringMode);
sampler(cl_sampler clSampler, const context &syclContext);
addressing_mode get_addresssing_mode() const;
filtering_mode get_filtering_mode() const;
coordinate_normalization_mode
get_coordinate_normalization_mode() const;
Accessor class (continued)
Image [Table 4.50]
The data types are cl_int4, cl_uint4, cl_float4, and cl_half4, and
the dimensionality is between 1 and 3 (inclusive).
Access Target
Accessor
Type Dimensionality Access Mode
Place-
holder
image Device 1, 2, or 3
read, write,
discard_write,
false_timage_array Device 1 or 2
host_image Host 1, 2, or 3
- 5. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818
SYCL 1.2.1 API Reference Guide Page 5
Kernel class [4.8.7]
kernel(cl_kernel clKernel, const context &syclContext);
cl_kernel get() const;
bool is_host() const;
context get_context() const;
program get_program() const;
template <info::kernel param>
typename info::param_traits<
info::kernel, param>::return_type get_info() const;
template <info::kernel_work_group param>
typename info::param_traits<info::kernel_work_group,
param>::return_type
get_work_group_info(const device &dev) const;
Kernel queries using get_info()
Descriptor Return type
info::kernel::function_name string_class
info::kernel::num_args cl_uint
info::kernel::context context
info::kernel::program program
info::kernel::reference_count cl_uint
info::kernel::attributes string_class
Ranges, index space identifiers (cont.)
template <typename dataT>
device_event async_work_group_copy(
local_ptr<dataT> dest, global_ptr<dataT> src,
size_t numElements, size_t srcStride) const;
template <typename dataT>
device_event async_work_group_copy(
global_ptr<dataT> dest, local_ptr<dataT> src,
size_t numElements, size_t destStride) const;
template <typename... eventTN>
void wait_for(eventTN... events) const;
Class h_item [4.8.1.7]
Class declaration
template <int dimensions> struct h_item;
Member functions
item<dimensions, false> get_global() const;
item<dimensions, false> get_local() const;
item<dimensions, false> get_logical_local() const;
item<dimensions, false> get_physical_local() const;
range<dimensions> get_global_range() const;
size_t get_global_range(int dimension) const;
id<dimensions> get_global_id() const;
size_t get_global_id(int dimension) const;
range<dimensions> get_local_range() const;
size_t get_local_range(int dimension) const;
id<dimensions> get_local_id() const;
size_t get_local_id(int dimension) const;
range<dimensions> get_logical_local_range() const;
size_t get_logical_local_range(int dimension) const;
id<dimensions> get_logical_local_id() const;
size_t get_logical_local_id(int dimension) const;
range<dimensions> get_physical_local_range() const;
size_t get_physical_local_range(int dimension) const;
id<dimensions> get_physical_local_id() const;
size_t get_physical_local_id(int dimension) const;
Class group [4.8.1.8]
Class declaration
template <int dimensions = 1> struct group;
Member functions
id<dimensions> get_id() const;
size_t get_id(int dimension) const;
range<dimensions> get_global_range() const;
size_t get_global_range(int dimension) const;
range<dimensions> get_group_range() const;
range<dimensions> get_local_range() const;
size_t get_local_range(int dimension) const;
size_t get_group_range(int dimension) const;
size_t operator[](int dimension) const;
size_t get_linear_id() const;
template<typename workItemFunctionT>
void parallel_for_work_item(workItemFunctionT func)
const;
template<typename workItemFunctionT>
void parallel_for_work_item(range<dimensions>
flexibleRange, workItemFunctionT func) const;
template
<access::mode accessMode = access::mode::read_write>
void mem_fence(access::fence_space accessSpace =
access::fence_space::global_and_local) const;
template <typename dataT>
device_event async_work_group_copy(
local_ptr<dataT> dest, global_ptr<dataT> src,
size_t numElements) const;
template <typename dataT>
device_event async_work_group_copy(
global_ptr<dataT> dest, local_ptr<dataT> src,
size_t numElements) const;
template <typename dataT>
device_event async_work_group_copy(
local_ptr<dataT> dest, global_ptr<dataT> src,
size_t numElements, size_t srcStride) const;
template <typename dataT>
device_event async_work_group_copy(
global_ptr<dataT> dest, local_ptr<dataT> src,
size_t numElements, size_t destStride) const;
template <typename... eventTN>
void wait_for(eventTN... events) const;
class device_event member [4.8.1.9 - 4.8.1.10]
void wait() const;
Programclass[4.8.8]
Enums
program_state
none compiled linked
Member functions
explicit program(const context &context);
program(const context &context,
vector_class<device> deviceList);
program(vector_class<program> programList,
string_class linkOptions ='''');
program(const context &context, cl_program clProgram);
cl_program get() const;
bool is_host() const;
template <typename kernelT>
void compile_with_kernel_type(
string_class compileOptions ='''');
template <typename kernelT>
void build_with_kernel_type(
string_class buildOptions = '''');
void compile_with_source(string_class kernelSource,
string_class compileOptions = '''');
void build_with_source(string_class kernelSource,
string_class buildOptions = '''');
void link(string_class linkOptions ='''');
template <typename kernelT>
bool has_kernel<kernelT>() const;
bool has_kernel(string_class kernelName) const;
template <typename kernelT>
kernel get_kernel<kernelT>() const;
kernel get_kernel(string_class kernelName) const;
template <info::program param>
typename info::param_traits<
info::program, param>::return_type get_info() const;
vector_class<vector_class<char>> get_binaries() const;
context get_context() const;
vector_class<device> get_devices() const;
string_class get_compile_options() const;
string_class get_link_options() const;
string_class get_build_options() const;
program_state get_state() const;
Information descriptors:
Descriptor Return type
info::program::reference_count cl_uint
info::program::context cl_context
info::program::devices vector_class<device>
Command group class handler [4.8.3 - 6]
template <typename dataT, int dimensions, access::mode
accessMode, access::target accessTarget>
void require(accessor<dataT, dimensions, accessMode,
accessTarget, placeholder::true_t> acc);
template <typename T> void set_arg(int argIndex, T && arg);
template <typename... Ts> void set_args(Ts &&... args);
template <typename KernelName, typename KernelType>
void single_task(KernelType kernelFunc);
template <typename KernelName,
typename KernelType, int dimensions>
void parallel_for(range<dimensions> numWorkItems,
KernelType kernelFunc);
template <typename KernelName,
typename KernelType, int dimensions>
void parallel_for(range<dimensions> numWorkItems,
id<dimensions> workItemOffset, KernelType kernelFunc);
template <typename KernelName,
typename KernelType, int dimensions>
void parallel_for(nd_range<dimensions> executionRange,
KernelType kernelFunc);
template <typename KernelName,
typename WorkgroupFunctionType, int dimensions>
void parallel_for_work_group(
range<dimensions> numWorkGroups,
WorkgroupFunctionType kernelFunc);
template <typename KernelName, typename
WorkgroupFunctionType, int dimensions>
void parallel_for_work_group(
range<dimensions> numWorkGroups,
range<dimensions> workGroupSize,
WorkgroupFunctionType kernelFunc);
void single_task(kernel syclKernel);
template <int dimensions>
void parallel_for(range<dimensions> numWorkItems,
kernel syclKernel);
template <int dimensions>
void parallel_for(range<dimensions> numWorkItems,
id<dimensions> workItemOffset, kernel syclKernel);
template <int dimensions> void parallel_for(
nd_range<dimensions> ndRange, kernel syclKernel);
template <typename T, int dim, access::mode mode,
access::target tgt> void copy(accessor<T, dim, mode, tgt>
src, shared_ptr_class<T> dest);
template <typename T, int dim, access::mode mode,
access::target tgt> void copy(shared_ptr_class<T> src,
accessor<T, dim, mode, tgt> dest);
template <typename T, int dim, access::mode mode,
access::target tgt> void copy(
accessor<T, dim, mode, tgt> src, T * dest);
template <typename T, int dim, access::mode mode,
access::target tgt> void copy(
const T * src, accessor<T, dim, mode, tgt> dest);
template <typename T, int dim, access::mode mode,
access::target tgt> void copy(
accessor<T, dim, mode, tgt> src,
accessor<T, dim, mode, tgt> dest);
template <typename T, int dim, access::mode mode,
access::target tgt> void update_host(
accessor<T, dim, mode, tgt> acc);
template<typename T, int dim, access::mode mode,
access::target tgt> void fill(
accessor<T, dim, mode, tgt> dest, const T& src);
class private_memory members [4.8.5.3]
class private_memory {
public:
private_memory(const group<Dimensions> &);
T &operator()(const h_item<Dimensions> &id);
};
- 6. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818
SYCL 1.2.1 API Reference Guide Page 6
Scalar data types [4.10.1. 6.5]
SYCL supports the C++11 ISO standard data types. The following data type aliases are defined for OpenCL interoperability in the cl::sycl namespace:
char,
signed char,
unsigned char,
short int,
unsigned short int,
bool,
int,
unsigned int,
long int,
unsigned long int,
long long int,
unsignedlonglongint,
size_t,
float,
double,
cl_bool,
cl_char,
cl_uchar,
cl_short,
cl_ushort
cl_int,
cl_uint
cl_long,
cl_ulong
cl_float
cl_double
cl_half
half
byte
Synchronization & atomics [4.11]
Enums
fence_space : char memory_order : int
local_space
global_space
global_and_local
relaxed
Class atomic
Class declaration
template <typename T, access::address_space addressSpace
= access::address_space::global_space> class atomic;
Member functions
template <typename pointerT>
atomic(multi_ptr<pointerT, addressSpace> ptr);
void store(T operand, memory_order memoryOrder =
memory_order::relaxed) volatile;
T load(memory_order memoryOrder =
memory_order::relaxed) const volatile;
T exchange(T operand,
memory_order memoryOrder = memory_order::relaxed)
volatile;
bool compare_exchange_strong(T& expected, T desired,
memory_order successMemoryOrder =
memory_order::relaxed,
memory_order failMemoryOrder =
memory_order::relaxed) volatile;
T fetch_X(T operand, memory_order memoryOrder =
memory_order::relaxed) volatile;
where X is one of add, sub, and, or, xor, min, max
Available only when T != float
Non-member functions
template <typename T, access::address_space addressSpace>
T atomic_load(atomic<T, addressSpace> object,
memory_order memoryOrder = memory_order::relaxed)
volatile;
template <typename T, access::address_space addressSpace>
void atomic_store(atomic<T, addressSpace>
object, T operand, memory_order memoryOrder =
memory_order::relaxed) volatile;
template <typename T, access::address_space addressSpace>
T atomic_exchange(atomic<T, addressSpace> object,
T operand, memory_order memoryOrder =
memory_order::relaxed) volatile;
template <typename T, access::address_space addressSpace>
bool atomic_compare_exchange_strong(
atomic<T, addressSpace> object, T &expected, T desired,
memory_order successMemoryOrder =
memory_order::relaxed, memory_order failMemoryOrder
= memory_order::relaxed) volatile;
template <typename T, access::address_space addressSpace>
T atomic_fetch_X(atomic<T, addressSpace> object,
T operand, memory_order memoryOrder =
memory_order::relaxed) volatile;
where X is one of add, sub, and, or, xor, min, max
Exception classes [4.9.2]
Class: exception
const char *what() const;
context get_context() const;
bool has_context() const;
cl_int get_cl_code() const;
Class: exception_list
size_t size() const;
iterator begin() const;
iterator end() const;
Exception types
Derived from
cl::sycl::runtime_error class
Derived from
cl::sycl::device_error class
kernel_error compile_program_error
nd_range_error link_program_error
accessor_error invalid_object_error
event_error memory_allocation_error
invalid_parameter_error platform_error
profiling_error
feature_not_supported
Stream class [4.12]
Enums
stream_manipulator
scientific,
hex,
oct,
noshowbase,
showbase,
showpos,
endl,
noshowpos,
fixed,
hexfloat,
defaultfloat,
dec
Member functions
stream(size_t bufferSize, size_t maxStatementSize, handler& cgh);
size_t get_size() const;
size_t get_max_statement_size() const;
Non-member functions
template <typename T>
const stream& operator<<(const stream& os, const T &rhs);
__precision_manipulator__ setprecision(int precision);
__width_manipulator__ setw(int width);
Vector data types [4.10.2]
RET’s dataT template parameter depends on vec’s dataT
template parameter:
dataT RET
cl_char, cl_uchar cl_char
cl_short, cl_ushort or cl_half cl_short
cl_int, cl_uint or cl_float cl_int
cl_long, cl_ulong or cl_double cl_long
Class declaration
template <typename dataT, int numElements> class vec;
Members
using element_type = dataT;
vec();
explicit vec(const dataT &arg);
template <typename... argTN> vec(const argTN&... args);
vec(const vec<dataT, numElements> &rhs);
#ifdef __SYCL_DEVICE_ONLY__
vec(vector_t openclVector);
operator vector_t() const;
#endif
operator dataT() const; // Available only if numElements == 1
size_t get_count() const;
size_t get_size() const;
template <typename convertT,
rounding_mode roundingMode>
vec<convertT, numElements> convert() const;
template <typename asT> asT as() const;
The following XYZW members are available only when
numElements <= 4. RGBA members are available only when
numElements == 4.
template<int... swizzleIndexes>
__swizzled_vec__ swizzle() const;
__swizzled_vec__ XYZW_ACCESS() const;
__swizzled_vec__ RGBA_ACCESS() const;
__swizzled_vec__ INDEX_ACCESS() const;
#ifdef SYCL_SIMPLE_SWIZZLES
__swizzled_vec__ XYZW_SWIZZLE() const;
__swizzled_vec__ RGBA_SWIZZLE() const;
#endif
The following lo, hi, odd, and even members are available only
when numElements > 1.
__swizzled_vec__ lo() const;
__swizzled_vec__ hi() const;
__swizzled_vec__ odd() const;
__swizzled_vec__ even() const;
template <access::address_space addressSpace>
void load(size_t offset,
multi_ptr<dataT, addressSpace> ptr);
template <access::address_space addressSpace>
void store(size_t offset,
multi_ptr<dataT, addressSpace> ptr) const;
vec<dataT, numElements> &operator=(const vec<dataT, numElements> &rhs);
vec<dataT, numElements> &operator=(const dataT &rhs);
vec<RET, numElements> operator!();
vec<dataT, numElements> operator~(); // Not available for floating point types
Member functions with OP variable
For all types,
OP may be:
For non floating-point
types, OP may be:
vec<dataT, numElements> operatorOP(const vec<dataT, numElements> &rhs) const;
+, -, *, / %, &, |, ^, <<, >>
vec<dataT, numElements> operatorOP(const dataT &rhs) const;
vec<dataT, numElements> &operatorOP(const vec<dataT, numElements> &rhs); +=, -=, *=,
/=
%=, &=, |=, ^=,
<<=, >>=vec<dataT, numElements> &operatorOP(const dataT &rhs);
vec<RET, numElements> operatorOP(const vec<dataT, numElements> &rhs) const; &&, ||,
==, !=, <, >,
<=, >=vec<RET, numElements> operatorOP(const dataT &rhs) const;
vec<dataT, numElements> &operatorOP();
++, --
vec<dataT, numElements> operatorOP(int);
Non-member functions
Non-member functions with OP variable
For all types, OP
may be:
For non floating-point
types, OP may be:
template <typename dataT, int numElements>
vec<dataT, numElements> operatorOP(const dataT &lhs)
const vec<dataT, numElements> &rhs);
+, -, *, / %, &, |, ^, <<, >>
template <typename dataT, int numElements> vec<RET, numElements>
operatorOP(const dataT &lhs, const vec<dataT, numElements> &rhs);
&&, ||, ==, !=, <,
>, <=, >=
- 7. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818
SYCL 1.2.1 API Reference Guide Page 7
Math functions[4.13.3]
Math funtions are available in the namespace cl::sycl. In all cases
below, n may be one of 2, 3, 4, 8, or 16.
Tf (genfloat in the spec) is type float[n], double[n], or half[n].
Tff (genfloatf) is type float[n].
Tfd (genfloatd) is type double[n].
Th (genfloath) is type half[n].
sTf (sgenfloat) is type float, double, or half.
Ti (genint) is type int[n].
uTi (ugenint) is type unsigned int or uintn.
uTli (ugenlonginteger) is unsigned long int, ulonglongn, ulongn,
unsigned long long int.
N indicates native variants, available in cl::sycl::native.
H indicates half variants, available in cl::sycl::halfprecision,
implemented with a minimum of 10 bits of accuracy.
Tf acos (Tf x) Arc cosine
Tf acosh (Tf x) Inverse hyperbolic cosine
Tf acospi (Tf x) acos (x) / π
Tf asin (Tf x) Arc sine
Tf asinh (Tf x) Inverse hyperbolic sine
Tf asinpi (Tf x) asin (x) / π
Tf atan (Tf y_over_x) Arc tangent
Tf atan2 (Tf y, Tf x) Arc tangent of y / x
Tf atanh (Tf x) Hyperbolic arc tangent
Tf atanpi (Tf x) atan (x) / π
Tf atan2pi (Tf y, Tf x) atan2 (y, x) / π
Tf cbrt (Tf x) Cube root
Tf ceil (Tf x) Round to integer toward + infinity
Tf copysign (Tf x, Tf y) x with sign changed to sign of y
Tf cos (Tf x)
Tff cos (Tff x) N H
Cosine
Tf cosh (Tf x) Hyperbolic cosine
Tf cospi (Tf x) cos (π x)
Tff divide (Tff x, Tff y) N H
x / y
(Not available in cl::sycl.)
Tf erfc (Tf x) Complementary error function
Tf erf (Tf x) Calculates error function
Tf exp (Tf x)
Tff exp (Tff x) N H
Exponential base e
Tf exp2 (Tf x)
Tff exp2 (Tff x) N H
Exponential base 2
Tf exp10 (Tf x)
Tff exp10 (Tff x) N H
Exponential base 10
Tf expm1 (Tf x) ex
-1.0
Tf fabs (Tf x) Absolute value
Tf fdim (Tf x, Tf y) Positive difference between x and y
Tf floor (Tf x) Round to integer toward infinity
Tf fma (Tf a, Tf b, Tf c) Multiply and add, then round
Tf fmax (Tf x, Tf y)
Tf fmax (Tf x, sTf y)
Return y if x < y,
otherwise it returns x
Tf fmin (Tf x, Tf y)
Tf fmin (Tf x, sTf y)
Return y if y < x,
otherwise it returns x
Tf fmod (Tf x, Tf y) Modulus. Returns x – y * trunc (x/y)
Tf fract (Tf x, Tf *iptr) Fractional value in x
Tf frexp (Tf x, Ti *exp) Extract mantissa and exponent
Tf hypot (Tf x, Tf y) Square root of x2
+ y2
Ti ilogb (Tf x) Return exponent as an integer value
Tf ldexp (Tf x, Ti k)
doublen ldexp (doublen x, int k)
x * 2n
Tf lgamma (Tf x) Log gamma function
Tf lgamma_r (Tf x, Ti *signp) Log gamma function
Tf log (Tf x)
Tff log (Tff x) N H
Natural logarithm
Tf log2 (Tf x)
Tff log2 (Tff x) N H
Base 2 logarithm
Tf log10 (Tf x)
Tff log10 (Tff x) N H
Base 10 logarithm
Tf log1p (Tf x) ln (1.0 + x)
Tf logb (Tf x) Return exponent as an integer value
Tf mad (Tf a, Tf b, Tf c) Approximates a * b + c
Tf maxmag (Tf x, Tf y) Maximum magnitude of x and y
Tf minmag (Tf x, Tf y) Minimum magnitude of x and y
Tf modf (Tf x, Tf *iptr) Decompose floating-point number
Tff nan(uTi nancode)
Tfd nan(uTli nancode)
Quiet NaN
(Return is scalar when nancode
is scalar)
Tf nextafter (Tf x, Tf y) Next representable floating-point
value after x in the direction of y
Tf pow (Tf x, Tf y) Compute x to the power of y
Tf pown (Tf x, Tiy) Compute xy
, where y is an integer
Tf powr (Tf x, Tf y)
Tff powr (Tff x, Tff y) N
Tff powr (Tff x, Th y) H
Compute xy
, where x is >= 0
Tff recip (Tff x) N H
1 / x
(Not available in cl::sycl.)
Tf remainder (Tf x, Tf y) Floating point remainder
Tf remquo (Tf x, Tf y, Ti *quo) Remainder and quotient
Tf rint (Tf x) Round to nearest even integer
Tf rootn (Tf x, Ti y) Compute x to the power of 1/y
Tf round (Tf x) Integral value nearest to x
rounding
Tf rsqrt (Tf x)
Tff rsqrt (Tff x) N H
Inverse square root
Tf sin (Tf x)
Tff sin (Tff x) N H
Sine
Tf sincos (Tf x, Tf *cosval) Sine and cosine of x
Tf sinh (Tf x) Hyperbolic sine
Tf sinpi (Tf x) sin (π x)
Tf sqrt (Tf x)
Tff sqrt (Tff x) N H
Square root
Tf tan (Tf x)
Tff tan (Tff x) N H
Tangent
Tf tanh (Tf x) Hyperbolic tangent
Tf tanpi (Tf x) tan (π x)
Tf tgamma (Tf x) Gamma function
Tf trunc (Tf x) Round to integer toward zero
Integer functions[4.13.4]
Integer funtions are available in the namespace cl::sycl. In all
cases below, n may be one of 2, 3, 4, 8, or 16. If a type in the
functions below is shown with [xbit] in its name, this indicates
that the type is x bits in size.
Tint (geninteger in the spec) is type int[n], uint[n], unsigned int,
char, char[n], signed char, scharn, ucharn, unsigned
short[n], unsigned short, ushort[n], longn, ulongn, long
int, unsigned long int, long long int, longlongn, ulonglongn
unsigned long long int.
uTint (ugeninteger) is type unsigned char, ucharn,
unsigned short, ushortn, unsigned int, uintn, unsigned long int,
ulongn, ulonglongn, unsigned long long int.
iTint (igeninteger) is type signed char, scharn, short[n], int[n],
long int, longn, long long int, longlongn.
sTint (sgeninteger) is type char, signed char, unsigned char, short,
unsigned short, int, unsigned int, long int, unsigned long int,
long long int, unsigned long long int.
uTint abs (Tint x) | x |
uTint abs_diff (Tint x, Tint y) | x – y | without modulo overflow
Tint add_sat (Tint x, Tint y) x + y and saturates the result
Tint hadd (Tint x, Tint y) (x + y) >> 1 withoutmod.overflow
Tint rhadd (Tint x, Tint y) (x + y + 1) >> 1
Tint clamp (Tint x, Tint min,
Tint max)
Tint clamp (Tint x, sTint min,
sTint max)
min(max(x, min), max)
Tint clz (Tint x) number of leading 0-bits in x
Tint mad_hi (Tint a, Tint b,
Tint c)
mul_hi(a, b) + c
Tint mad_sat (Tint a, Tint b,
Tint c)
a * b + c and saturates the result
Tint max (Tint x, Tint y)
Tint max (Tint x, sTint y)
y if x < y, otherwise it returns x
Tint min (Tint x, Tint y)
Tint min (Tint x, sTint y)
y if y < x, otherwise it returns x
Tint mul_hi (Tint x, Tint y) high half of the product of x and y
Tint popcount (Tint x) Number of non-zero bits in x
Tint rotate (Tint v, Tint i) result[indx] = v[indx] << i[indx]
Tint sub_sat (Tint x, Tint y) x - y and saturates the result
uTint16bit upsample (uTint8bit hi,
uTint8bit lo)
result[i]= ((ushort)hi[i]<< 8)|lo[i]
iTint16bit upsample (iTint8bit hi,
uTint8bit lo)
result[i]=((short)hi[i]<< 8)|lo[i]
uTint32bit upsample ( uTint16bit hi,
uTint16bit lo)
result[i]=((uint)hi[i]<< 16)|lo[i]
iTint32bit upsample (iTint16bit hi,
uTint16bit lo)
result[i]=((int)hi[i]<< 16)|lo[i]
uTint64bit upsample (uTint32bit hi,
uTint32bit lo)
result[i]=((ulonglong)hi[i]<< 32)
|lo[i]
iTint64bit upsample (iTint32bit hi,
uTint32bit lo)
result[i]=((longlong)hi[i]<< 32)|lo[i]
Tint32bit mad24 (Tint32bit x,
Tint32bit y, Tint32bit z)
Tint32bit mad24 (Tint32bit x,
Tint32bit y, Tint32bit z)
Multiply 24-bit integer values x, y, add 32-
bit integer result to 32-bit integer z
Tint32bit mul24 (Tint32bit x,
Tint32bit y)
Multiply 24-bit integer values x and y
- 8. www.khronos.org/sycl©2018 Khronos Group - Rev. 0818
SYCL 1.2.1 API Reference Guide Page 8
© 2018 Khronos Group. All rights reserved. SYCL is a trademark of the Khronos Group. The Khronos Group
is an industry consortium creating open standards for the authoring and acceleration of parallel computing,
graphics, dynamic media, and more on a wide variety of platforms and devices. See www.khronos.org to
learn more about the Khronos Group. See www.khronos.org/sycl to learn more about SYCL.
Relational built-in functions [4.13.7]
Relational functions are available in the namespace cl::sycl. In
all cases below, n may be one of 2, 3, 4, 8, or 16. If a type in the
functions below is shown with [xbit] in its name, this indicates
that the type is x bits in size.
Tint (geninteger in the spec) is type int[n], uint[n], unsigned int,
char, char[n], signed char, scharn, ucharn, unsigned
short[n], unsigned short, ushort[n], longn, ulongn, long
int, unsigned long int, long long int, longlongn, ulonglongn
unsigned long long int.
iTint (igeninteger) is type signed char, scharn, short[n], int[n],
long int, longn, long long int, longlongn.
uTint (ugeninteger) is type unsigned char, ucharn,
unsigned short, ushortn, unsigned int, uintn,
unsigned long int, ulongn, ulonglongn, unsigned long long int.
Ti (genint) is type int[n].
uTi (ugenint) is type unsigned int or uintn.
Tff (genfloatf) is type float[n].
Tfd (genfloatd) is type double[n].
T (gentype) is type float[n], double[n], or half[n], or any type
listed for above for Tint.
int any (iTint x) 1 if MSB in component of x is
set; else 0
int all (iTint x) 1 if MSB in all components of x are
set; else 0
T bitselect (T a, T b, T c) Eachbitofresultiscorresponding
bitofaifcorrespondingbitofcis0
Tint select (Tint a, Tint b, iTint c)
Tint select (Tint a, Tint b, uTint c)
Tff select (Tff a, Tff b, Ti c)
Tff select (Tff a, Tff b, uTi c)
Tfd select (Tfd a, Tfd b, iTint64bit c)
Tfd select (Tfd a, Tfd b,
uTint64bit c)
For each component of a vector
type, result[i] = if MSB of c[i] is set
? b[i] : a[i] For scalar type, result
= c? b : a
iTint32bit function (Tff x, Tff y)
iTint64bit function (Tfd x, Tfd y)
function may be one of isequal, isnotequal,
isgreater, isgreaterequal, isless, islessequal,
islessgreater, isordered, isunordered.
This format is used
for many relational
functions. Replace
function with the
function name.
iTint32bit function (Tff x)
iTint64bit function (Tfd x)
function may be one of isfinite, isinf, isnan,
isnormal, signbit.
Geometric Functions [4.13.6]
Geometric functions are available in the namespace cl::sycl.
Tgf (gengeofloat in the spec) is type float, float2, float3, float4.
Tgd (gengeodouble) is type double, double2, double3, double4.
float4 cross (float4 p0, float4 p1)
float3 cross (float3 p0, float3 p1)
double4 cross (double4 p0, double4 p1)
double3 cross (double3 p0, double3 p1)
Cross product
float distance (Tgf p0, Tgf p1)
double distance (Tgd p0, Tgd p1)
Vector distance
float dot (Tgf p0, Tgf p1)
double dot (Tgd p0, Tgd p1)
Dot product
float length (Tgf p)
double length (Tgd p)
Vector length
Tgf normalize (Tgf p)
Tgd normalize (Tgd p)
Normal vector length 1
float fast_distance (Tgf p0, Tgf p1) Vector distance
float fast_length (Tgf p) Vector length
Tgf fast_normalize (Tgf p) Normal vector length 1
Common functions[4.13.5]
Common funtions are available in the namespace cl::sycl. On the
host the vector types use the vec class and on an OpenCL device
use the corresponding OpenCL vector types. In all cases below, n
may be one of 2, 3, 4, 8, or 16.
Tf (genfloat in the spec) is type float[n], double[n], or half[n].
Tff (genfloatf) is type float[n].
Tfd (genfloatd) is type double[n].
Tf clamp (Tf x, Tf minval, Tf maxval);
Tff clamp (Tff x, float minval,float maxval);
Tfd clamp (Tfd x, double minval,
doublen maxval);
Clamp x to range given by
minval, maxval
Tf degrees (Tf radians); radians to degrees
Tf abs (Tf x, Tf y);
Tff abs (Tff x, float y);
Tfd abs (Tfd x, double y);
Max of x and y
Tf max (Tf x, Tf y);
Tff max (Tff x, float y);
Tfd max (Tfd x, double y);
Max of x and y
Tf min (Tf x, Tf y);
Tff min (Tff x, float y);
Tfd min (Tfd x, double y);
Min of x and y
Tf mix (Tf x, Tf y, Tf a);
Tff mix (Tff x, Tff y, float a);
Tfd mix (Tfd x, Tfd y, double a) ;
Linear blend of x and y
Tf radians (Tf degrees); degrees to radians
Tf step (Tf edge, Tf x);
Tff step (float edge, Tff x);
Tfd step (double edge, Tfd x);
0.0 if x < edge, else 1.0
Tf smoothstep (Tf edge0, Tf edge1, Tf x);
Tff smoothstep(float edge0,float edge1,
Tff x);
Tfd smoothstep(double edge0,
double edge1,Tfd x);
Step and interpolate
Tf sign (Tf x); Sign of x
Preprocessor directives and macros [6.6]
CL_SYCL_LANGUAGE_VERSION Integer version, e.g.: 121
__FAST_RELAXED_MATH__ -cl-fast-relaxed-math
__SYCL_DEVICE_ONLY__ defined when in device compilation
__SYCL_SINGLE_SOURCE__ produce host and device binary
SYCL_EXTERNAL
allow kernel external linkage
(optional)
Examples of how to invoke kernels
Example: single_task invoke [4.8.5.1]
SYCL provides a simple interface to enqueue a kernel that will be
sequentially executed on an OpenCL device.
myQueue.submit([&](handler & cgh) {
cgh.single_task<class kernel_name>(
[=] () {
// [kernel code]
}));
});
Examples: parallel_for invoke [4.8.5.2]
Example #1
Using a lambda function for a kernel invocation.
myQueue.submit([&](handler & cgh) {
auto acc = myBuffer.get_access<access::mode::write>(cgh);
cgh.parallel_for<class myKernel>
(range<1>(numWorkItems),
[=] (id<1> index) {
acc[index] = 42.0f;
});
});
Example #2
Allowing the runtime to choose the index space best matching
the range provided using information given the scheduled
interface instead of the general interface.
class MyKernel;
myQueue.submit([&](handler & cgh)
{
auto acc=myBuffer.get_access<access::mode::write>(cgh);
cgh.parallel_for<class myKernel>(range<1>(
numWorkItems), [=] (item<1> item)
{
size_t index = item.get_global();
acc[index] = 42.0f;
});
});
Example #3
Two examples of launching a kernel functor over a 3D grid. In
the first example, work_item IDs range from 0 to 2.
myQueue.submit([&](handler & cgh) {
cgh.parallel_for<class example_kernel1>(
range<3>(3,3,3), // global range
[=] (item<3> it) {
//[kernel code]
});
});
Example #4
Launching sixty-four work-items in a three-dimensional grid with
four in each dimension and divided into eight work-groups.
myQueue.submit([&](handler & cgh) {
cgh.parallel_for<class example_kernel>(
nd_range<3>(range<3>(4, 4, 4), range<3>(2, 2, 2)),
[=] (nd_item<3> item) {
//[kernel code]
// Work-group synchronization
item.barrier(access::fence_space::global_space);
//[kernel code]
});
});
Parallel For hierarchical invoke [4.8.5.3]
myQueue.submit([&](handler & cgh) {
// Issue 8 work-groups of 8 work-items each
cgh.parallel_for_work_group<class example_kernel>(
range<3>(2, 2, 2), range<3>(2, 2, 2),
[=](group<3> myGroup)
{
// [workgroup code]
int myLocal; // this variable is shared between workitems
// This variable will be instantiated for each work-item
// separately
private_memory<int> myPrivate(myGroup);
// Issue parallel sets of work-items each sized using the
// runtime default
myGroup.parallel_for_work_item([&](h_item<3> myItem)
{
// [work-item code]
myPrivate(myItem) = 0;
});
// Carry private value across loops
myGroup.parallel_for_work_item([&](h_item<3> myItem)
{
//[work-item code]
output[myItem.get_global()] = myPrivate(myItem);
});
//[workgroup code]
});
});