ArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array based function set makes parallel programming more accessible.

You can get ArrayFire in one of the following ways

## Easy to use

The array object is beautifully simple.

Array-based notation effectively expresses computational algorithms in readable math-resembling notation. You do not need expertise in parallel programming to use ArrayFire.

A few lines of ArrayFire code accomplishes what can take 100s of complicated lines in CUDA or OpenCL kernels.

## ArrayFire is extensive!

#### Support for multiple domains

ArrayFire contains hundreds of functions across various domains including:

Each function is hand-tuned by ArrayFire developers with all possible low-level optimizations.

#### Support for various data types and sizes

ArrayFire operates on common data shapes and sizes, including vectors, matrices, volumes, and

It supports common data types, including single and double precision floating point values, complex numbers, booleans, and 32-bit signed and unsigned integers.

#### Extending ArrayFire

ArrayFire can be used as a stand-alone application or integrated with existing CUDA or OpenCL code. All ArrayFire arrays can be interchanged with other CUDA or OpenCL data structures.

## Code once, run anywhere!

With support for x86, ARM, CUDA, and OpenCL devices, ArrayFire supports for a comprehensive list of devices.

Each ArrayFire installation comes with:

• a CUDA version (named 'libafcuda') for NVIDIA GPUs,
• an OpenCL version (named 'libafopencl') for OpenCL devices
• a CPU version (named 'libafcpu') to fall back to when CUDA or OpenCL devices are not available.

## ArrayFire is highly efficient

#### Vectorized and Batched Operations

ArrayFire supports batched operations on N-dimensional arrays. Batch operations in ArrayFire are run in parallel ensuring an optimal usage of your CUDA or OpenCL device.

ArrayFire can also execute loop iterations in parallel with the gfor function.

#### Just in Time compilation

ArrayFire performs run-time analysis of your code to increase arithmetic intensity and memory throughput, while avoiding unnecessary temporary allocations. It has an awesome internal JIT compiler to make optimizations for you.

## Simple Example

Here's a live example to let you see ArrayFire code. You create [arrays](Constructors of array class) which reside on CUDA or OpenCL devices. Then you can use ArrayFire functions on those arrays.

// sample 40 million points on the GPU
array x = randu(20e6), y = randu(20e6);
array dist = sqrt(x * x + y * y);
// pi is ratio of how many fell in the unit circle
float num_inside = sum<float>(dist < 1);
float pi = 4.0 * num_inside / 20e6;