5. LvArray::ArrayOfArrays
¶
The LvArray::ArrayOfArrays
provides functionality similar to a std::vector< std::vector< T > >
but with just three allocations. The primary purpose is to reduce the number of memory transfers when moving between memory spaces but it also comes with the added benefit of reduced memory fragmentation.
5.1. Template arguments¶
The LvArray::ArrayOfArrays
requires three template arguments.
T
: The type of values stored in the inner arrays.INDEX_TYPE
: An integral type used in index calculations, the suggested type isstd::ptrdiff_t
.BUFFER_TYPE
: A template template parameter specifying the buffer type used for allocation and de-allocation, theLvArray::ArrayOfArrays
contains aBUFFER_TYPE< T >
along with twoBUFFER_TYPE< INDEX_TYPE >
.
5.2. Usage¶
LvArray::ArrayOfArrays
has a single constructor which takes two optional parameters; the number of inner arrays and the capacity to allocate for each inner array. Both values default to zero.
Given an ArrayOfArrays< T, ... > array
, an equivalent std::vector< std::vector< T > > vector
, a non negative integer n
, integer i
such that 0 <= i < vector.size()
, integer j
such that 0 <= j < vector[ i ].size()
, two iterators first
and last
and a variadic pack of parameters ... args
then following are equivalent:
Getting information about the outer array or a specific inner array | |
---|---|
LvArray::ArrayOfArrays< T, ... > |
std::vector< std::vector< T > > |
array.size() |
vector.size() |
array.capacity() |
vector.capacity() |
array.sizeOfArray( i ) |
vector[ i ].size() |
array.capacityOfArray( i ) |
vector[ i ].capacity() |
Modifying the outer array | |
---|---|
LvArray::ArrayOfArrays< T, ... > |
std::vector< std::vector< T > > |
array.reserve( n ) |
vector.reserve( n ) |
array.resize( n ) |
vector.resize( n ) |
array.appendArray( n ) |
vector.push_back( std::vector< T >( n ) ) |
array.appendArray( first, last ) |
vector.push_back( std::vector< T >( first, last ) ) |
array.insertArray( i, first, last ) |
vector.insert( vector.begin() + i, std::vector< T >( first, last ) ) |
array.eraseArray( i ) |
vector.erase( vector.first() + i ) |
Modifying an inner array | |
---|---|
LvArray::ArrayOfArrays< T, ... > |
std::vector< std::vector< T > > |
array.resizeArray( i, n, args ... ) |
vector[ i ].resize( n, T( args ... ) ) |
array.clearArray( i ) |
vector[ i ].clear() |
array.emplaceBack( i, args ... ) |
vector[ i ].emplace_back( args ... ) |
array.appendToArray( i, first, last ) |
vector[ i ].insert( vector[ i ].end(), first, last ) |
array.emplace( i, j , args ... ) |
vector[ i ].emplace( j, args ... )` |
array.insertIntoArray( i, j , first, last ) |
vector[ i ].insert( vector[ i ].begin() + j, first, last ) |
eraseFromArray( i, j, n ) |
vector[ i ].erase( vector[ i ].begin() + j, vector[ i ].begin() + j + n ) . |
Accessing the data | |
---|---|
LvArray::ArrayOfArrays< T, ... > |
std::vector< std::vector< T > > |
array( i, j ) |
vector[ i ][ j ] |
array[ i ][ j ] |
vector[ i ][ j ] |
It is worth noting that operator[]
returns a one dimensional LvArray::ArraySlice
.
TEST( ArrayOfArrays, construction )
{
// Create an empty ArrayOfArrays.
LvArray::ArrayOfArrays< std::string, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays;
EXPECT_EQ( arrayOfArrays.size(), 0 );
// Append an array of length 2.
arrayOfArrays.appendArray( 2 );
EXPECT_EQ( arrayOfArrays.size(), 1 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 2 );
EXPECT_EQ( arrayOfArrays.capacityOfArray( 0 ), 2 );
arrayOfArrays( 0, 0 ) = "First array, first entry.";
arrayOfArrays( 0, 1 ) = "First array, second entry.";
// Append another array of length 3.
arrayOfArrays.appendArray( 3 );
EXPECT_EQ( arrayOfArrays.size(), 2 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 3 );
EXPECT_EQ( arrayOfArrays.capacityOfArray( 1 ), 3 );
arrayOfArrays( 1, 0 ) = "Second array, first entry.";
arrayOfArrays( 1, 2 ) = "Second array, third entry.";
EXPECT_EQ( arrayOfArrays[ 0 ][ 1 ], "First array, second entry." );
EXPECT_EQ( arrayOfArrays[ 1 ][ 2 ], "Second array, third entry." );
// Values are default initialized.
EXPECT_EQ( arrayOfArrays[ 1 ][ 1 ], std::string() );
}
TEST( ArrayOfArrays, modification )
{
LvArray::ArrayOfArrays< std::string, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays;
// Append an array.
arrayOfArrays.appendArray( 3 );
arrayOfArrays( 0, 0 ) = "First array, first entry.";
arrayOfArrays( 0, 1 ) = "First array, second entry.";
arrayOfArrays( 0, 2 ) = "First array, third entry.";
// Insert a new array at the beginning.
std::array< std::string, 2 > newFirstArray = { "New first array, first entry.",
"New first array, second entry." };
arrayOfArrays.insertArray( 0, newFirstArray.begin(), newFirstArray.end() );
EXPECT_EQ( arrayOfArrays.size(), 2 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 2 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 3 );
EXPECT_EQ( arrayOfArrays( 0, 1 ), "New first array, second entry." );
EXPECT_EQ( arrayOfArrays[ 1 ][ 1 ], "First array, second entry." );
// Erase the values from what is now the second array.
arrayOfArrays.clearArray( 1 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 0 );
// Append a value to the end of each array.
arrayOfArrays.emplaceBack( 1, "Second array, first entry." );
arrayOfArrays.emplaceBack( 0, "New first array, third entry." );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 3 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 1 );
EXPECT_EQ( arrayOfArrays[ 1 ][ 0 ], "Second array, first entry." );
EXPECT_EQ( arrayOfArrays( 0, 2 ), "New first array, third entry." );
}
[Source: examples/exampleArrayOfArrays.cpp]
5.3. LvArray::ArrayOfArraysView
¶
LvArray::ArrayOfArraysView
is the view counterpart to LvArray::ArrayOfArrays
. The LvArray::ArrayOfArraysView
shares the three template parameters of LvArray::ArrayOfArraysView
but it has an additional boolean parameter CONST_SIZES
which specifies if the view can change the sizes of the inner arrays. If CONST_SIZES
is true the sizes
buffer has type BUFFER_TYPE< INDEX_TYPE const >
otherwise it has type BUFFER_TYPE< std::remove_const_t< INDEX_TYPE > >
.
The LvArray::ArrayOfArraysView
doesn’t have a default constructor, it must always be created from an existing LvArray::ArrayOfArrays
. From a LvArray::ArrayOfArrays< T, INDEX_TYPE, BUFFER_TYPE >
you can get three different view types.
toView()
returns aLvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, BUFFER_TYPE >
which can modify existing values and change the size of the inner arrays as long as their size doesn’t exceed their capacity. -toViewConstSizes()
returns aLvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, BUFFER_TYPE >
which can modify existing values but cannot change the size of the inner arrays.toViewConst()
returns aLvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, BUFFER_TYPE >
which provides read only access.
TEST( ArrayOfArrays, view )
{
// Create an ArrayOfArrays with 10 inner arrays each with capacity 9.
LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays( 10, 9 );
{
// Then create a view.
LvArray::ArrayOfArraysView< int,
std::ptrdiff_t const,
false,
LvArray::MallocBuffer > const view = arrayOfArrays.toView();
EXPECT_EQ( view.size(), 10 );
EXPECT_EQ( view.capacityOfArray( 7 ), 9 );
// Modify every inner array in parallel
RAJA::forall< RAJA::omp_parallel_for_exec >(
RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, view.size() ),
[view] ( std::ptrdiff_t const i )
{
for( std::ptrdiff_t j = 0; j < i; ++j )
{
view.emplaceBack( i, 10 * i + j );
}
}
);
EXPECT_EQ( view.sizeOfArray( 9 ), view.capacityOfArray( 9 ) );
// The last array is at capacity. Therefore any attempt at increasing its size through
// a view is invalid and may lead to undefined behavior!
// view.emplaceBack( 9, 0 );
}
{
// Create a view which cannot modify the sizes of the inner arrays.
LvArray::ArrayOfArraysView< int,
std::ptrdiff_t const,
true,
LvArray::MallocBuffer > const viewConstSizes = arrayOfArrays.toViewConstSizes();
for( std::ptrdiff_t i = 0; i < viewConstSizes.size(); ++i )
{
for( int & value : viewConstSizes[ i ] )
{
value *= 2;
}
// This would be a compilation error
// viewConstSizes.emplaceBack( i, 0 );
}
}
{
// Create a view which has read only access.
LvArray::ArrayOfArraysView< int const,
std::ptrdiff_t const,
true,
LvArray::MallocBuffer > const viewConst = arrayOfArrays.toViewConst();
for( std::ptrdiff_t i = 0; i < viewConst.size(); ++i )
{
for( std::ptrdiff_t j = 0; j < viewConst.sizeOfArray( i ); ++j )
{
EXPECT_EQ( viewConst( i, j ), 2 * ( 10 * i + j ) );
// This would be a compilation error
// viewConst( i, j ) = 0;
}
}
}
// Verify that all the modifications are present in the parent ArrayOfArrays.
EXPECT_EQ( arrayOfArrays.size(), 10 );
for( std::ptrdiff_t i = 0; i < arrayOfArrays.size(); ++i )
{
EXPECT_EQ( arrayOfArrays.sizeOfArray( i ), i );
for( std::ptrdiff_t j = 0; j < arrayOfArrays.sizeOfArray( i ); ++j )
{
EXPECT_EQ( arrayOfArrays( i, j ), 2 * ( 10 * i + j ) );
}
}
}
[Source: examples/exampleArrayOfArrays.cpp]
LvArray::ArrayView
also has a method emplaceBackAtomic
which is allows multiple threads to append to a single inner array in a thread safe manner.
TEST( ArrayOfArrays, atomic )
{
// Create an ArrayOfArrays with 1 array with a capacity of 100.
LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays( 1, 100 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 0 );
// Then create a view.
LvArray::ArrayOfArraysView< int,
std::ptrdiff_t const,
false,
LvArray::MallocBuffer > const view = arrayOfArrays.toView();
// Append to the single inner array in parallel
RAJA::forall< RAJA::omp_parallel_for_exec >(
RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, view.capacityOfArray( 0 ) ),
[view] ( std::ptrdiff_t const i )
{
view.emplaceBackAtomic< RAJA::builtin_atomic >( 0, i );
}
);
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 100 );
// Sort the entries in the array since they were appended in an arbitrary order
LvArray::sortedArrayManipulation::makeSorted( arrayOfArrays[ 0 ].begin(), arrayOfArrays[ 0 ].end() );
for( std::ptrdiff_t i = 0; i < arrayOfArrays.sizeOfArray( 0 ); ++i )
{
EXPECT_EQ( arrayOfArrays( 0, i ), i );
}
}
[Source: examples/exampleArrayOfArrays.cpp]
5.4. Data structure¶
In order to use the LvArray::ArrayOfArrays
efficiently it is important to understand the underlying data structure. A LvArray::ArrayOfArrays< T, INDEX_TYPE, BUFFER_TYPE > array
contains three buffers:
BUFFER_TYPE< T > values
: Contains the values of each inner array.BUFFER_TYPE< INDEX_TYPE > sizes
: Of lengtharray.size()
,sizes[ i ]
contains the size of inner arrayi
.BUFFER_TYPE< INDEX_TYPE > offsets
: Of lengtharray.size() + 1
, inner arrayi
begins atvalues[ offsets[ i ] ]
and has capacityoffsets[ i + 1 ] - offsets[ i ]
.
Given M = offsets[ array.size() ]
which is the sum of the capacities of the inner arrays then
array.appendArray( n )
is O( n ) because it entails appending a new entry tosizes
andoffsets
and then appending then
new values tovalues
.array.insertArray( i, first, last )
isO( array.size() + M + std::distance( first, last )
. It involves inserting an entry intosizes
andoffsets
which isO( array.size() )
, making room invalues
for the new array which isO( M )
and finally copying over the new values which isO( std::distance( first, last ) )
.array.eraseArray( i )
isO( array.size() + M )
. It involves removing an entry fromsizes
andoffsets
which isO( array.size() )
and then it involves shifting the entries invalues
down which isO( M )
.- The methods which modify an inner array have the same complexity as their
std::vector
counterparts if the capacity of the inner array won’t be exceeded by the operation. Otherwise they have an added cost ofO( M )
because new space will have to be made invalues
. So for examplearray.emplaceBack( i, args ... )
isO( 1 )
ifarray.sizeOfArray( i ) < array.capacityOfArray( i )
andO( M )
otherwise.
LvArray::ArrayOfArrays
also supports two methods that don’t have an std::vector< std::vector >
counterpart. The first is compress
which shrinks the capacity of each inner array to match it’s size. This ensures that the inner arrays are contiguous in memory with no extra space between them.
TEST( ArrayOfArrays, compress )
{
// Create an ArrayOfArrays with three inner arrays each with capacity 5.
LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays( 3, 5 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 0 );
EXPECT_EQ( arrayOfArrays.capacityOfArray( 1 ), 5 );
// Append values to the inner arrays.
std::array< int, 5 > values = { 0, 1, 2, 3, 4 };
arrayOfArrays.appendToArray( 0, values.begin(), values.begin() + 3 );
arrayOfArrays.appendToArray( 1, values.begin(), values.begin() + 4 );
arrayOfArrays.appendToArray( 2, values.begin(), values.begin() + 5 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 3 );
EXPECT_EQ( arrayOfArrays.capacityOfArray( 0 ), 5 );
// Since the first array has some extra capacity it is not contiguous in memory
// with the second array.
EXPECT_NE( &arrayOfArrays( 0, 2 ) + 1, &arrayOfArrays( 1, 0 ) );
// Compress the ArrayOfArrays. This doesn't change the sizes or values of the
// inner arrays, only their capacities.
arrayOfArrays.compress();
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 3 );
EXPECT_EQ( arrayOfArrays.capacityOfArray( 0 ), 3 );
// The values are preserved.
EXPECT_EQ( arrayOfArrays( 2, 4 ), 4 );
// And the inner arrays are now contiguous.
EXPECT_EQ( &arrayOfArrays( 0, 2 ) + 1, &arrayOfArrays( 1, 0 ) );
}
[Source: examples/exampleArrayOfArrays.cpp]
The second is resizeFromCapacities
which takes in a number of inner arrays and the capacity of each inner array. It clears the ArrayOfArrays
and reconstructs it with the given number of empty inner arrays each with the provided capacity. It also takes a RAJA execution policy as a template parameter which specifies how to compute the offsets array from the capacities. Currently only host execution policies are supported.
TEST( ArrayOfArrays, resizeFromCapacities )
{
// Create an ArrayOfArrays with two inner arrays
LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays( 2 );
// Append values to the inner arrays.
std::array< int, 5 > values = { 0, 1, 2, 3, 4 };
arrayOfArrays.appendToArray( 0, values.begin(), values.begin() + 3 );
arrayOfArrays.appendToArray( 1, values.begin(), values.begin() + 4 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 3 );
// Resize the ArrayOfArrays from a new list of capacities.
std::array< std::ptrdiff_t, 3 > newCapacities = { 3, 5, 2 };
arrayOfArrays.resizeFromCapacities< RAJA::loop_exec >( newCapacities.size(), newCapacities.data() );
// This will clear any existing arrays.
EXPECT_EQ( arrayOfArrays.size(), 3 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 0 );
EXPECT_EQ( arrayOfArrays.capacityOfArray( 1 ), newCapacities[ 1 ] );
}
[Source: examples/exampleArrayOfArrays.cpp]
5.5. Usage with LvArray::ChaiBuffer
¶
The three types of LvArray::ArrayOfArrayView
obtainable from an LvArray::ArrayOfArrays
all act differently when moved to a new memory space.
LvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, LvArray::ChaiBuffer >
, obtained by callingtoView()
. When it is moved to a new space the values are touched as well as the sizes. The offsets are not touched.LvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, LvArray::ChaiBuffer >
, obtained by callingtoViewConstSizes()
. When it is moved to a new space the values are touched but the sizes and offsets aren’t.LvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, LvArray::ChaiBuffer >
, obtained by callingtoViewConst()
. None of the buffers are touched in the new space.
Calling the explicit move
method with the touch parameter set to true
on a view type has the behavior described above. However calling move( MemorySpace::host )
on an LvArray::ArrayOfArrays
will also touch the offsets (if moving to the GPU the offsets aren’t touched). This is the only way to touch the offsets so if an LvArray::ArrayOfArrays
was previously on the device then it must be explicitly moved and touched on the host before any modification to the offsets can safely take place.
CUDA_TEST( ArrayOfArrays, ChaiBuffer )
{
LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::ChaiBuffer > arrayOfArrays( 10, 9 );
{
// Create a view.
LvArray::ArrayOfArraysView< int,
std::ptrdiff_t const,
false,
LvArray::ChaiBuffer > const view = arrayOfArrays.toView();
// Capture the view on device. This will copy the values, sizes and offsets.
// The values and sizes will be touched.
RAJA::forall< RAJA::cuda_exec< 32 > >(
RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, view.size() ),
[view] __device__ ( std::ptrdiff_t const i )
{
for( std::ptrdiff_t j = 0; j < i; ++j )
{
view.emplace( i, 0, 10 * i + j );
}
}
);
}
{
// Create a view which cannot modify the sizes of the inner arrays.
LvArray::ArrayOfArraysView< int,
std::ptrdiff_t const,
true,
LvArray::ChaiBuffer > const viewConstSizes = arrayOfArrays.toViewConstSizes();
// Capture the view on the host. This will copy back the values and sizes since they were previously touched
// on device. It will only touch the values on host.
RAJA::forall< RAJA::loop_exec >(
RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, viewConstSizes.size() ),
[viewConstSizes] ( std::ptrdiff_t const i )
{
for( int & value : viewConstSizes[ i ] )
{
value *= 2;
}
}
);
}
{
// Create a view which has read only access.
LvArray::ArrayOfArraysView< int const,
std::ptrdiff_t const,
true,
LvArray::ChaiBuffer > const viewConst = arrayOfArrays.toViewConst();
// Capture the view on device. Since the values were previously touched on host it will copy them over.
// Both the sizes and offsets are current on device so they are not copied over. Nothing is touched.
RAJA::forall< RAJA::loop_exec >(
RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, viewConst.size() ),
[viewConst] ( std::ptrdiff_t const i )
{
for( std::ptrdiff_t j = 0; j < viewConst.sizeOfArray( i ); ++j )
{
LVARRAY_ERROR_IF_NE( viewConst( i, j ), 2 * ( 10 * i + i - j - 1 ) );
}
}
);
}
// This won't copy any data since everything is current on host. It will however touch the values,
// sizes and offsets.
arrayOfArrays.move( LvArray::MemorySpace::host );
// Verify that all the modifications are present in the parent ArrayOfArrays.
EXPECT_EQ( arrayOfArrays.size(), 10 );
for( std::ptrdiff_t i = 0; i < arrayOfArrays.size(); ++i )
{
EXPECT_EQ( arrayOfArrays.sizeOfArray( i ), i );
for( std::ptrdiff_t j = 0; j < arrayOfArrays.sizeOfArray( i ); ++j )
{
EXPECT_EQ( arrayOfArrays( i, j ), 2 * ( 10 * i + i - j - 1 ) );
}
}
}
[Source: examples/exampleArrayOfArrays.cpp]
5.6. Usage with LVARRAY_BOUNDS_CHECK
¶
When LVARRAY_BOUNDS_CHECK
is defined access via operator[]
and operator()
is checked. If an invalid access is detected the program is aborted. Methods such as sizeOfArray
, insertArray
and emplace
are also checked.
TEST( ArrayOfArrays, boundsCheck )
{
#if defined(LVARRAY_BOUNDS_CHECK)
LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays;
// Append an array.
std::array< int, 5 > values = { 0, 1, 2, 3, 4 };
arrayOfArrays.appendArray( values.begin(), values.end() );
EXPECT_EQ( arrayOfArrays.size(), 1 );
EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 5 );
// Out of bounds access aborts the program.
EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays( 0, -1 ), "" );
EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays[ 0 ][ 6 ], "" );
EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays[ 1 ][ 5 ], "" );
EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays.capacityOfArray( 5 ), "" );
EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays.insertArray( 5, values.begin(), values.end() ), "" );
EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays.emplace( 0, 44, 4 ), "" );
EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays.emplace( 1, 44, 4 ), "" );
#endif
}
[Source: examples/exampleArrayOfArrays.cpp]
5.7. Guidelines¶
Like with the LvArray::Array
it is a good idea to pass around the most restrictive LvArray::ArrayOfArrays
object possible. If a function only reads the values of an array it should accept an LvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, BUFFER_TYPE >
. If it only reads and writes to the values of the inner arrays it should accept an LvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, BUFFER_TYPE >
. If it will modify the size of the inner arrays and can guarantee that their capacity won’t be exceeded it should accept a LvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, BUFFER_TYPE >
. Only when the function needs to modify the outer array or can’t guarantee that the capacity of an inner array won’t be exceeded should it accept an LvArray::ArrayOfArrays
.
Examining the computational complexities listed above resizing the inner arrays of an LvArray::ArrayOfArrays
can be as fast as the equivalent operations on a std::vector< std::vector< T > >
only if the capacity of the inner arrays is not exceeded. Whenever possible preallocate space for the inner arrays. The following examples demonstrate the impact this can have.
One use case for the LvArray::ArrayOfArrays
is to represent the node-to-element map of a computational mesh. Usually this is constructed from a provided element-to-node map which is a map from an element index to a list of node indices that make up the element. For a structured mesh or an unstructured mesh made up of a single element type this map can be represented as a two dimensional LvArray::Array
since every element has the same number of nodes. However the inverse node-to-element map cannot be easily represented this way since in general not all nodes are a part of the same number of elements. In a two dimensional structured mesh an interior node is part of four elements while the four corner nodes are only part of a single element.
This is an example of how to construct the node-to-element map represented as a std::vector< std::vector >
from the element-to-node map.
void NaiveNodeToElemMapConstruction::
vector( ArrayView< INDEX_TYPE const, 2, 1, INDEX_TYPE, DEFAULT_BUFFER > const & elementToNodeMap,
std::vector< std::vector< INDEX_TYPE > > & nodeToElementMap,
INDEX_TYPE const numNodes )
{
nodeToElementMap.resize( numNodes );
for( INDEX_TYPE elementIndex = 0; elementIndex < elementToNodeMap.size( 0 ); ++elementIndex )
{
for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
{
nodeToElementMap[ nodeIndex ].emplace_back( elementIndex );
}
}
}
[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]
For a mesh with N
nodes this construction has complexity O( N )
since there are N
calls to std::vector::emplace_back
each of which is O( 1 )
. The same approach works when the node-to-element map is a LvArray::ArrayOfArrays
however the complexity is much higher.
void NaiveNodeToElemMapConstruction::
naive( ArrayView< INDEX_TYPE const, 2, 1, INDEX_TYPE, DEFAULT_BUFFER > const & elementToNodeMap,
ArrayOfArrays< INDEX_TYPE, INDEX_TYPE, DEFAULT_BUFFER > & nodeToElementMap,
INDEX_TYPE const numNodes )
{
nodeToElementMap.resize( numNodes );
for( INDEX_TYPE elementIndex = 0; elementIndex < elementToNodeMap.size( 0 ); ++elementIndex )
{
for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
{
nodeToElementMap.emplaceBack( nodeIndex, elementIndex );
}
}
}
[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]
Since nothing is done to preallocate space for each inner array every call to LvArray::ArrayOfArrays::emplaceBack
has complexity O( M )
where M
is the sum of the capacities of the inner arrays. M
is proportional to the number of nodes in the mesh N
so the entire algorithm runs in O( N * N )
.
This can be sped up considerably with some preallocation.
template< typename POLICY >
void NodeToElemMapConstruction< POLICY >::
overAllocation( ArrayView< INDEX_TYPE const, 2, 1, INDEX_TYPE, DEFAULT_BUFFER > const & elementToNodeMap,
ArrayOfArrays< INDEX_TYPE, INDEX_TYPE, DEFAULT_BUFFER > & nodeToElementMap,
INDEX_TYPE const numNodes,
INDEX_TYPE const maxNodeElements )
{
using ATOMIC_POLICY = typename RAJAHelper< POLICY >::AtomicPolicy;
// Resize the node to element map allocating space for each inner array.
nodeToElementMap.resize( numNodes, maxNodeElements );
// Create an ArrayOfArraysView
ArrayOfArraysView< INDEX_TYPE, INDEX_TYPE const, false, DEFAULT_BUFFER > const nodeToElementMapView =
nodeToElementMap.toView();
// Launch a RAJA kernel that populates the ArrayOfArraysView.
RAJA::forall< POLICY >(
RAJA::TypedRangeSegment< INDEX_TYPE >( 0, elementToNodeMap.size( 0 ) ),
[elementToNodeMap, nodeToElementMapView] ( INDEX_TYPE const elementIndex )
{
for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
{
nodeToElementMapView.emplaceBackAtomic< ATOMIC_POLICY >( nodeIndex, elementIndex );
}
}
);
}
[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]
Since this method guarantees that the capacity of each inner arrays won’t be exceeded the complexity is reduced back down to O( N )
. In addition the loop appending to the inner arrays can be parallelized.
One problem with this approach is that it may allocate significantly more memory than is necessary to store the map since most of the inner arrays may not have the maximal length. Another potential issue is that the array is not compressed since some inner arrays will have a capacity that exceeds their size. Precomputing the size of each inner array (the number of elements each node is a part of) and using resizeFromCapacities
solves both of these problems and is almost as fast as over allocating.
template< typename POLICY >
void NodeToElemMapConstruction< POLICY >::
resizeFromCapacities( ArrayView< INDEX_TYPE const, 2, 1, INDEX_TYPE, DEFAULT_BUFFER > const & elementToNodeMap,
ArrayOfArrays< INDEX_TYPE, INDEX_TYPE, DEFAULT_BUFFER > & nodeToElementMap,
INDEX_TYPE const numNodes )
{
using ATOMIC_POLICY = typename RAJAHelper< POLICY >::AtomicPolicy;
// Create an Array containing the size of each inner array.
Array< INDEX_TYPE, 1, RAJA::PERM_I, INDEX_TYPE, DEFAULT_BUFFER > elementsPerNode( numNodes );
// Calculate the size of each inner array.
RAJA::forall< POLICY >(
RAJA::TypedRangeSegment< INDEX_TYPE >( 0, elementToNodeMap.size( 0 ) ),
[elementToNodeMap, &elementsPerNode] ( INDEX_TYPE const elementIndex )
{
for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
{
RAJA::atomicInc< ATOMIC_POLICY >( &elementsPerNode[ nodeIndex ] );
}
}
);
// Resize the node to element map with the inner array sizes.
nodeToElementMap.resizeFromCapacities< POLICY >( elementsPerNode.size(), elementsPerNode.data() );
// Create an ArrayOfArraysView
ArrayOfArraysView< INDEX_TYPE, INDEX_TYPE const, false, DEFAULT_BUFFER > const nodeToElementMapView =
nodeToElementMap.toView();
// Launch a RAJA kernel that populates the ArrayOfArraysView.
RAJA::forall< POLICY >(
RAJA::TypedRangeSegment< INDEX_TYPE >( 0, elementToNodeMap.size( 0 ) ),
[elementToNodeMap, nodeToElementMapView] ( INDEX_TYPE const elementIndex )
{
for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
{
nodeToElementMapView.emplaceBackAtomic< ATOMIC_POLICY >( nodeIndex, elementIndex );
}
} );
}
[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]
The following timings are from a clang 10 release build on LLNL’s Quartz system run with a 200 x 200 x 200
element structured mesh. The reported time is the best of ten iterations.
Function | RAJA Policy | Time |
---|---|---|
vector |
N/A | 0.99s |
overAllocation |
loop_exec |
0.49s |
resizeFromCapacities |
loop_exec |
0.58s |
overAllocation |
omp_parallel_for_exec |
0.11s |
resizeFromCapacities |
omp_parallel_for_exec |
0.17s |
The naive
method is much to slow to run on this size mesh. However on a 30 x 30 x 30
mesh it takes 1.28 seconds.