5. LvArray::ArrayOfArrays

The LvArray::ArrayOfArrays provides functionality similar to a std::vector< std::vector< T > > but with just three allocations. The primary purpose is to reduce the number of memory transfers when moving between memory spaces but it also comes with the added benefit of reduced memory fragmentation.

5.1. Template arguments

The LvArray::ArrayOfArrays requires three template arguments.

  1. T: The type of values stored in the inner arrays.
  2. INDEX_TYPE: An integral type used in index calculations, the suggested type is std::ptrdiff_t.
  3. BUFFER_TYPE: A template template parameter specifying the buffer type used for allocation and de-allocation, the LvArray::ArrayOfArrays contains a BUFFER_TYPE< T > along with two BUFFER_TYPE< INDEX_TYPE >.

5.2. Usage

LvArray::ArrayOfArrays has a single constructor which takes two optional parameters; the number of inner arrays and the capacity to allocate for each inner array. Both values default to zero.

Given an ArrayOfArrays< T, ... > array, an equivalent std::vector< std::vector< T > > vector, a non negative integer n, integer i such that 0 <= i < vector.size(), integer j such that 0 <= j < vector[ i ].size(), two iterators first and last and a variadic pack of parameters ... args then following are equivalent:

Getting information about the outer array or a specific inner array
LvArray::ArrayOfArrays< T, ... > std::vector< std::vector< T > >
array.size() vector.size()
array.capacity() vector.capacity()
array.sizeOfArray( i ) vector[ i ].size()
array.capacityOfArray( i ) vector[ i ].capacity()
Modifying the outer array
LvArray::ArrayOfArrays< T, ... > std::vector< std::vector< T > >
array.reserve( n ) vector.reserve( n )
array.resize( n ) vector.resize( n )
array.appendArray( n ) vector.push_back( std::vector< T >( n ) )
array.appendArray( first, last ) vector.push_back( std::vector< T >( first, last ) )
array.insertArray( i, first, last ) vector.insert( vector.begin() + i, std::vector< T >( first, last ) )
array.eraseArray( i ) vector.erase( vector.first() + i )
Modifying an inner array
LvArray::ArrayOfArrays< T, ... > std::vector< std::vector< T > >
array.resizeArray( i, n, args ... ) vector[ i ].resize( n, T( args ... ) )
array.clearArray( i ) vector[ i ].clear()
array.emplaceBack( i, args ... ) vector[ i ].emplace_back( args ... )
array.appendToArray( i, first, last ) vector[ i ].insert( vector[ i ].end(), first, last )
array.emplace( i, j , args ... ) vector[ i ].emplace( j, args ... )`
array.insertIntoArray( i, j , first, last ) vector[ i ].insert( vector[ i ].begin() + j, first, last )
eraseFromArray( i, j, n ) vector[ i ].erase( vector[ i ].begin() + j, vector[ i ].begin() + j + n ).
Accessing the data
LvArray::ArrayOfArrays< T, ... > std::vector< std::vector< T > >
array( i, j ) vector[ i ][ j ]
array[ i ][ j ] vector[ i ][ j ]

It is worth noting that operator[] returns a one dimensional LvArray::ArraySlice.

TEST( ArrayOfArrays, construction )
{
  // Create an empty ArrayOfArrays.
  LvArray::ArrayOfArrays< std::string, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays;
  EXPECT_EQ( arrayOfArrays.size(), 0 );

  // Append an array of length 2.
  arrayOfArrays.appendArray( 2 );
  EXPECT_EQ( arrayOfArrays.size(), 1 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 2 );
  EXPECT_EQ( arrayOfArrays.capacityOfArray( 0 ), 2 );
  arrayOfArrays( 0, 0 ) = "First array, first entry.";
  arrayOfArrays( 0, 1 ) = "First array, second entry.";

  // Append another array of length 3.
  arrayOfArrays.appendArray( 3 );
  EXPECT_EQ( arrayOfArrays.size(), 2 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 3 );
  EXPECT_EQ( arrayOfArrays.capacityOfArray( 1 ), 3 );
  arrayOfArrays( 1, 0 ) = "Second array, first entry.";
  arrayOfArrays( 1, 2 ) = "Second array, third entry.";

  EXPECT_EQ( arrayOfArrays[ 0 ][ 1 ], "First array, second entry." );
  EXPECT_EQ( arrayOfArrays[ 1 ][ 2 ], "Second array, third entry." );

  // Values are default initialized.
  EXPECT_EQ( arrayOfArrays[ 1 ][ 1 ], std::string() );
}

TEST( ArrayOfArrays, modification )
{
  LvArray::ArrayOfArrays< std::string, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays;

  // Append an array.
  arrayOfArrays.appendArray( 3 );
  arrayOfArrays( 0, 0 ) = "First array, first entry.";
  arrayOfArrays( 0, 1 ) = "First array, second entry.";
  arrayOfArrays( 0, 2 ) = "First array, third entry.";

  // Insert a new array at the beginning.
  std::array< std::string, 2 > newFirstArray = { "New first array, first entry.",
                                                 "New first array, second entry." };
  arrayOfArrays.insertArray( 0, newFirstArray.begin(), newFirstArray.end() );

  EXPECT_EQ( arrayOfArrays.size(), 2 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 2 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 3 );

  EXPECT_EQ( arrayOfArrays( 0, 1 ), "New first array, second entry." );
  EXPECT_EQ( arrayOfArrays[ 1 ][ 1 ], "First array, second entry." );

  // Erase the values from what is now the second array.
  arrayOfArrays.clearArray( 1 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 0 );

  // Append a value to the end of each array.
  arrayOfArrays.emplaceBack( 1, "Second array, first entry." );
  arrayOfArrays.emplaceBack( 0, "New first array, third entry." );

  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 3 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 1 );

  EXPECT_EQ( arrayOfArrays[ 1 ][ 0 ], "Second array, first entry." );
  EXPECT_EQ( arrayOfArrays( 0, 2 ), "New first array, third entry." );
}

[Source: examples/exampleArrayOfArrays.cpp]

5.3. LvArray::ArrayOfArraysView

LvArray::ArrayOfArraysView is the view counterpart to LvArray::ArrayOfArrays. The LvArray::ArrayOfArraysView shares the three template parameters of LvArray::ArrayOfArraysView but it has an additional boolean parameter CONST_SIZES which specifies if the view can change the sizes of the inner arrays. If CONST_SIZES is true the sizes buffer has type BUFFER_TYPE< INDEX_TYPE const > otherwise it has type BUFFER_TYPE< std::remove_const_t< INDEX_TYPE > >.

The LvArray::ArrayOfArraysView doesn’t have a default constructor, it must always be created from an existing LvArray::ArrayOfArrays. From a LvArray::ArrayOfArrays< T, INDEX_TYPE, BUFFER_TYPE > you can get three different view types.

  • toView() returns a LvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, BUFFER_TYPE > which can modify existing values and change the size of the inner arrays as long as their size doesn’t exceed their capacity. -
  • toViewConstSizes() returns a LvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, BUFFER_TYPE > which can modify existing values but cannot change the size of the inner arrays.
  • toViewConst() returns a LvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, BUFFER_TYPE > which provides read only access.
TEST( ArrayOfArrays, view )
{
  // Create an ArrayOfArrays with 10 inner arrays each with capacity 9.
  LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays( 10, 9 );

  {
    // Then create a view.
    LvArray::ArrayOfArraysView< int,
                                std::ptrdiff_t const,
                                false,
                                LvArray::MallocBuffer > const view = arrayOfArrays.toView();
    EXPECT_EQ( view.size(), 10 );
    EXPECT_EQ( view.capacityOfArray( 7 ), 9 );

    // Modify every inner array in parallel
    RAJA::forall< RAJA::omp_parallel_for_exec >(
      RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, view.size() ),
      [view] ( std::ptrdiff_t const i )
    {
      for( std::ptrdiff_t j = 0; j < i; ++j )
      {
        view.emplaceBack( i, 10 * i + j );
      }
    }
      );

    EXPECT_EQ( view.sizeOfArray( 9 ), view.capacityOfArray( 9 ) );

    // The last array is at capacity. Therefore any attempt at increasing its size through
    // a view is invalid and may lead to undefined behavior!
    // view.emplaceBack( 9, 0 );
  }

  {
    // Create a view which cannot modify the sizes of the inner arrays.
    LvArray::ArrayOfArraysView< int,
                                std::ptrdiff_t const,
                                true,
                                LvArray::MallocBuffer > const viewConstSizes = arrayOfArrays.toViewConstSizes();
    for( std::ptrdiff_t i = 0; i < viewConstSizes.size(); ++i )
    {
      for( int & value : viewConstSizes[ i ] )
      {
        value *= 2;
      }

      // This would be a compilation error
      // viewConstSizes.emplaceBack( i, 0 );
    }
  }

  {
    // Create a view which has read only access.
    LvArray::ArrayOfArraysView< int const,
                                std::ptrdiff_t const,
                                true,
                                LvArray::MallocBuffer > const viewConst = arrayOfArrays.toViewConst();

    for( std::ptrdiff_t i = 0; i < viewConst.size(); ++i )
    {
      for( std::ptrdiff_t j = 0; j < viewConst.sizeOfArray( i ); ++j )
      {
        EXPECT_EQ( viewConst( i, j ), 2 * ( 10 * i + j ) );

        // This would be a compilation error
        // viewConst( i, j ) = 0;
      }
    }
  }

  // Verify that all the modifications are present in the parent ArrayOfArrays.
  EXPECT_EQ( arrayOfArrays.size(), 10 );
  for( std::ptrdiff_t i = 0; i < arrayOfArrays.size(); ++i )
  {
    EXPECT_EQ( arrayOfArrays.sizeOfArray( i ), i );
    for( std::ptrdiff_t j = 0; j < arrayOfArrays.sizeOfArray( i ); ++j )
    {
      EXPECT_EQ( arrayOfArrays( i, j ), 2 * ( 10 * i + j ) );
    }
  }
}

[Source: examples/exampleArrayOfArrays.cpp]

LvArray::ArrayView also has a method emplaceBackAtomic which is allows multiple threads to append to a single inner array in a thread safe manner.

TEST( ArrayOfArrays, atomic )
{
  // Create an ArrayOfArrays with 1 array with a capacity of 100.
  LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays( 1, 100 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 0 );

  // Then create a view.
  LvArray::ArrayOfArraysView< int,
                              std::ptrdiff_t const,
                              false,
                              LvArray::MallocBuffer > const view = arrayOfArrays.toView();

  // Append to the single inner array in parallel
  RAJA::forall< RAJA::omp_parallel_for_exec >(
    RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, view.capacityOfArray( 0 ) ),
    [view] ( std::ptrdiff_t const i )
  {
    view.emplaceBackAtomic< RAJA::builtin_atomic >( 0, i );
  }
    );

  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 100 );

  // Sort the entries in the array since they were appended in an arbitrary order
  LvArray::sortedArrayManipulation::makeSorted( arrayOfArrays[ 0 ].begin(), arrayOfArrays[ 0 ].end() );

  for( std::ptrdiff_t i = 0; i < arrayOfArrays.sizeOfArray( 0 ); ++i )
  {
    EXPECT_EQ( arrayOfArrays( 0, i ), i );
  }
}

[Source: examples/exampleArrayOfArrays.cpp]

5.4. Data structure

In order to use the LvArray::ArrayOfArrays efficiently it is important to understand the underlying data structure. A LvArray::ArrayOfArrays< T, INDEX_TYPE, BUFFER_TYPE > array contains three buffers:

  1. BUFFER_TYPE< T > values: Contains the values of each inner array.
  2. BUFFER_TYPE< INDEX_TYPE > sizes: Of length array.size(), sizes[ i ] contains the size of inner array i.
  3. BUFFER_TYPE< INDEX_TYPE > offsets: Of length array.size() + 1, inner array i begins at values[ offsets[ i ] ] and has capacity offsets[ i + 1 ] - offsets[ i ].

Given M = offsets[ array.size() ] which is the sum of the capacities of the inner arrays then

  • array.appendArray( n ) is O( n ) because it entails appending a new entry to sizes and offsets and then appending the n new values to values.
  • array.insertArray( i, first, last ) is O( array.size() + M + std::distance( first, last ). It involves inserting an entry into sizes and offsets which is O( array.size() ), making room in values for the new array which is O( M ) and finally copying over the new values which is O( std::distance( first, last ) ).
  • array.eraseArray( i ) is O( array.size() + M ). It involves removing an entry from sizes and offsets which is O( array.size() ) and then it involves shifting the entries in values down which is O( M ).
  • The methods which modify an inner array have the same complexity as their std::vector counterparts if the capacity of the inner array won’t be exceeded by the operation. Otherwise they have an added cost of O( M ) because new space will have to be made in values. So for example array.emplaceBack( i, args ... ) is O( 1 ) if array.sizeOfArray( i ) < array.capacityOfArray( i ) and O( M ) otherwise.

LvArray::ArrayOfArrays also supports two methods that don’t have an std::vector< std::vector > counterpart. The first is compress which shrinks the capacity of each inner array to match it’s size. This ensures that the inner arrays are contiguous in memory with no extra space between them.

TEST( ArrayOfArrays, compress )
{
  // Create an ArrayOfArrays with three inner arrays each with capacity 5.
  LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays( 3, 5 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 0 );
  EXPECT_EQ( arrayOfArrays.capacityOfArray( 1 ), 5 );

  // Append values to the inner arrays.
  std::array< int, 5 > values = { 0, 1, 2, 3, 4 };
  arrayOfArrays.appendToArray( 0, values.begin(), values.begin() + 3 );
  arrayOfArrays.appendToArray( 1, values.begin(), values.begin() + 4 );
  arrayOfArrays.appendToArray( 2, values.begin(), values.begin() + 5 );

  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 3 );
  EXPECT_EQ( arrayOfArrays.capacityOfArray( 0 ), 5 );

  // Since the first array has some extra capacity it is not contiguous in memory
  // with the second array.
  EXPECT_NE( &arrayOfArrays( 0, 2 ) + 1, &arrayOfArrays( 1, 0 ) );

  // Compress the ArrayOfArrays. This doesn't change the sizes or values of the
  // inner arrays, only their capacities.
  arrayOfArrays.compress();

  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 3 );
  EXPECT_EQ( arrayOfArrays.capacityOfArray( 0 ), 3 );

  // The values are preserved.
  EXPECT_EQ( arrayOfArrays( 2, 4 ), 4 );

  // And the inner arrays are now contiguous.
  EXPECT_EQ( &arrayOfArrays( 0, 2 ) + 1, &arrayOfArrays( 1, 0 ) );
}

[Source: examples/exampleArrayOfArrays.cpp]

The second is resizeFromCapacities which takes in a number of inner arrays and the capacity of each inner array. It clears the ArrayOfArrays and reconstructs it with the given number of empty inner arrays each with the provided capacity. It also takes a RAJA execution policy as a template parameter which specifies how to compute the offsets array from the capacities. Currently only host execution policies are supported.

TEST( ArrayOfArrays, resizeFromCapacities )
{
  // Create an ArrayOfArrays with two inner arrays
  LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays( 2 );

  // Append values to the inner arrays.
  std::array< int, 5 > values = { 0, 1, 2, 3, 4 };
  arrayOfArrays.appendToArray( 0, values.begin(), values.begin() + 3 );
  arrayOfArrays.appendToArray( 1, values.begin(), values.begin() + 4 );

  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 3 );

  // Resize the ArrayOfArrays from a new list of capacities.
  std::array< std::ptrdiff_t, 3 > newCapacities = { 3, 5, 2 };
  arrayOfArrays.resizeFromCapacities< RAJA::loop_exec >( newCapacities.size(), newCapacities.data() );

  // This will clear any existing arrays.
  EXPECT_EQ( arrayOfArrays.size(), 3 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 1 ), 0 );
  EXPECT_EQ( arrayOfArrays.capacityOfArray( 1 ), newCapacities[ 1 ] );
}

[Source: examples/exampleArrayOfArrays.cpp]

5.5. Usage with LvArray::ChaiBuffer

The three types of LvArray::ArrayOfArrayView obtainable from an LvArray::ArrayOfArrays all act differently when moved to a new memory space.

  • LvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, LvArray::ChaiBuffer >, obtained by calling toView(). When it is moved to a new space the values are touched as well as the sizes. The offsets are not touched.
  • LvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, LvArray::ChaiBuffer >, obtained by calling toViewConstSizes(). When it is moved to a new space the values are touched but the sizes and offsets aren’t.
  • LvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, LvArray::ChaiBuffer >, obtained by calling toViewConst(). None of the buffers are touched in the new space.

Calling the explicit move method with the touch parameter set to true on a view type has the behavior described above. However calling move( MemorySpace::host ) on an LvArray::ArrayOfArrays will also touch the offsets (if moving to the GPU the offsets aren’t touched). This is the only way to touch the offsets so if an LvArray::ArrayOfArrays was previously on the device then it must be explicitly moved and touched on the host before any modification to the offsets can safely take place.

CUDA_TEST( ArrayOfArrays, ChaiBuffer )
{
  LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::ChaiBuffer > arrayOfArrays( 10, 9 );

  {
    // Create a view.
    LvArray::ArrayOfArraysView< int,
                                std::ptrdiff_t const,
                                false,
                                LvArray::ChaiBuffer > const view = arrayOfArrays.toView();

    // Capture the view on device. This will copy the values, sizes and offsets.
    // The values and sizes will be touched.
    RAJA::forall< RAJA::cuda_exec< 32 > >(
      RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, view.size() ),
      [view] __device__ ( std::ptrdiff_t const i )
    {
      for( std::ptrdiff_t j = 0; j < i; ++j )
      {
        view.emplace( i, 0, 10 * i + j );
      }
    }
      );
  }

  {
    // Create a view which cannot modify the sizes of the inner arrays.
    LvArray::ArrayOfArraysView< int,
                                std::ptrdiff_t const,
                                true,
                                LvArray::ChaiBuffer > const viewConstSizes = arrayOfArrays.toViewConstSizes();

    // Capture the view on the host. This will copy back the values and sizes since they were previously touched
    // on device. It will only touch the values on host.
    RAJA::forall< RAJA::loop_exec >(
      RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, viewConstSizes.size() ),
      [viewConstSizes] ( std::ptrdiff_t const i )
    {
      for( int & value : viewConstSizes[ i ] )
      {
        value *= 2;
      }
    }
      );
  }

  {
    // Create a view which has read only access.
    LvArray::ArrayOfArraysView< int const,
                                std::ptrdiff_t const,
                                true,
                                LvArray::ChaiBuffer > const viewConst = arrayOfArrays.toViewConst();

    // Capture the view on device. Since the values were previously touched on host it will copy them over.
    // Both the sizes and offsets are current on device so they are not copied over. Nothing is touched.
    RAJA::forall< RAJA::loop_exec >(
      RAJA::TypedRangeSegment< std::ptrdiff_t >( 0, viewConst.size() ),
      [viewConst] ( std::ptrdiff_t const i )
    {
      for( std::ptrdiff_t j = 0; j < viewConst.sizeOfArray( i ); ++j )
      {
        LVARRAY_ERROR_IF_NE( viewConst( i, j ), 2 * ( 10 * i + i - j - 1 ) );
      }
    }
      );
  }

  // This won't copy any data since everything is current on host. It will however touch the values,
  // sizes and offsets.
  arrayOfArrays.move( LvArray::MemorySpace::host );

  // Verify that all the modifications are present in the parent ArrayOfArrays.
  EXPECT_EQ( arrayOfArrays.size(), 10 );
  for( std::ptrdiff_t i = 0; i < arrayOfArrays.size(); ++i )
  {
    EXPECT_EQ( arrayOfArrays.sizeOfArray( i ), i );
    for( std::ptrdiff_t j = 0; j < arrayOfArrays.sizeOfArray( i ); ++j )
    {
      EXPECT_EQ( arrayOfArrays( i, j ), 2 * ( 10 * i + i - j - 1 ) );
    }
  }
}

[Source: examples/exampleArrayOfArrays.cpp]

5.6. Usage with LVARRAY_BOUNDS_CHECK

When LVARRAY_BOUNDS_CHECK is defined access via operator[] and operator() is checked. If an invalid access is detected the program is aborted. Methods such as sizeOfArray, insertArray and emplace are also checked.

TEST( ArrayOfArrays, boundsCheck )
{
#if defined(LVARRAY_BOUNDS_CHECK)
  LvArray::ArrayOfArrays< int, std::ptrdiff_t, LvArray::MallocBuffer > arrayOfArrays;

  // Append an array.
  std::array< int, 5 > values = { 0, 1, 2, 3, 4 };
  arrayOfArrays.appendArray( values.begin(), values.end() );

  EXPECT_EQ( arrayOfArrays.size(), 1 );
  EXPECT_EQ( arrayOfArrays.sizeOfArray( 0 ), 5 );

  // Out of bounds access aborts the program.
  EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays( 0, -1 ), "" );
  EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays[ 0 ][ 6 ], "" );
  EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays[ 1 ][ 5 ], "" );

  EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays.capacityOfArray( 5 ), "" );
  EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays.insertArray( 5, values.begin(), values.end() ), "" );
  EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays.emplace( 0, 44, 4 ), "" );
  EXPECT_DEATH_IF_SUPPORTED( arrayOfArrays.emplace( 1, 44, 4 ), "" );
#endif
}

[Source: examples/exampleArrayOfArrays.cpp]

5.7. Guidelines

Like with the LvArray::Array it is a good idea to pass around the most restrictive LvArray::ArrayOfArrays object possible. If a function only reads the values of an array it should accept an LvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, BUFFER_TYPE >. If it only reads and writes to the values of the inner arrays it should accept an LvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, BUFFER_TYPE >. If it will modify the size of the inner arrays and can guarantee that their capacity won’t be exceeded it should accept a LvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, BUFFER_TYPE >. Only when the function needs to modify the outer array or can’t guarantee that the capacity of an inner array won’t be exceeded should it accept an LvArray::ArrayOfArrays.

Examining the computational complexities listed above resizing the inner arrays of an LvArray::ArrayOfArrays can be as fast as the equivalent operations on a std::vector< std::vector< T > > only if the capacity of the inner arrays is not exceeded. Whenever possible preallocate space for the inner arrays. The following examples demonstrate the impact this can have.

One use case for the LvArray::ArrayOfArrays is to represent the node-to-element map of a computational mesh. Usually this is constructed from a provided element-to-node map which is a map from an element index to a list of node indices that make up the element. For a structured mesh or an unstructured mesh made up of a single element type this map can be represented as a two dimensional LvArray::Array since every element has the same number of nodes. However the inverse node-to-element map cannot be easily represented this way since in general not all nodes are a part of the same number of elements. In a two dimensional structured mesh an interior node is part of four elements while the four corner nodes are only part of a single element.

This is an example of how to construct the node-to-element map represented as a std::vector< std::vector > from the element-to-node map.

void NaiveNodeToElemMapConstruction::
  vector( ArrayView< INDEX_TYPE const, 2, 1, INDEX_TYPE, DEFAULT_BUFFER > const & elementToNodeMap,
          std::vector< std::vector< INDEX_TYPE > > & nodeToElementMap,
          INDEX_TYPE const numNodes )
{
  nodeToElementMap.resize( numNodes );

  for( INDEX_TYPE elementIndex = 0; elementIndex < elementToNodeMap.size( 0 ); ++elementIndex )
  {
    for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
    {
      nodeToElementMap[ nodeIndex ].emplace_back( elementIndex );
    }
  }
}

[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]

For a mesh with N nodes this construction has complexity O( N ) since there are N calls to std::vector::emplace_back each of which is O( 1 ). The same approach works when the node-to-element map is a LvArray::ArrayOfArrays however the complexity is much higher.

void NaiveNodeToElemMapConstruction::
  naive( ArrayView< INDEX_TYPE const, 2, 1, INDEX_TYPE, DEFAULT_BUFFER > const & elementToNodeMap,
         ArrayOfArrays< INDEX_TYPE, INDEX_TYPE, DEFAULT_BUFFER > & nodeToElementMap,
         INDEX_TYPE const numNodes )
{
  nodeToElementMap.resize( numNodes );

  for( INDEX_TYPE elementIndex = 0; elementIndex < elementToNodeMap.size( 0 ); ++elementIndex )
  {
    for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
    {
      nodeToElementMap.emplaceBack( nodeIndex, elementIndex );
    }
  }
}

[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]

Since nothing is done to preallocate space for each inner array every call to LvArray::ArrayOfArrays::emplaceBack has complexity O( M ) where M is the sum of the capacities of the inner arrays. M is proportional to the number of nodes in the mesh N so the entire algorithm runs in O( N * N ).

This can be sped up considerably with some preallocation.

template< typename POLICY >
void NodeToElemMapConstruction< POLICY >::
overAllocation( ArrayView< INDEX_TYPE const, 2, 1, INDEX_TYPE, DEFAULT_BUFFER > const & elementToNodeMap,
                ArrayOfArrays< INDEX_TYPE, INDEX_TYPE, DEFAULT_BUFFER > & nodeToElementMap,
                INDEX_TYPE const numNodes,
                INDEX_TYPE const maxNodeElements )
{
  using ATOMIC_POLICY = typename RAJAHelper< POLICY >::AtomicPolicy;

  // Resize the node to element map allocating space for each inner array.
  nodeToElementMap.resize( numNodes, maxNodeElements );

  // Create an ArrayOfArraysView
  ArrayOfArraysView< INDEX_TYPE, INDEX_TYPE const, false, DEFAULT_BUFFER > const nodeToElementMapView =
    nodeToElementMap.toView();

  // Launch a RAJA kernel that populates the ArrayOfArraysView.
  RAJA::forall< POLICY >(
    RAJA::TypedRangeSegment< INDEX_TYPE >( 0, elementToNodeMap.size( 0 ) ),
    [elementToNodeMap, nodeToElementMapView] ( INDEX_TYPE const elementIndex )
  {
    for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
    {
      nodeToElementMapView.emplaceBackAtomic< ATOMIC_POLICY >( nodeIndex, elementIndex );
    }
  }
    );
}

[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]

Since this method guarantees that the capacity of each inner arrays won’t be exceeded the complexity is reduced back down to O( N ). In addition the loop appending to the inner arrays can be parallelized.

One problem with this approach is that it may allocate significantly more memory than is necessary to store the map since most of the inner arrays may not have the maximal length. Another potential issue is that the array is not compressed since some inner arrays will have a capacity that exceeds their size. Precomputing the size of each inner array (the number of elements each node is a part of) and using resizeFromCapacities solves both of these problems and is almost as fast as over allocating.

template< typename POLICY >
void NodeToElemMapConstruction< POLICY >::
resizeFromCapacities( ArrayView< INDEX_TYPE const, 2, 1, INDEX_TYPE, DEFAULT_BUFFER > const & elementToNodeMap,
                      ArrayOfArrays< INDEX_TYPE, INDEX_TYPE, DEFAULT_BUFFER > & nodeToElementMap,
                      INDEX_TYPE const numNodes )
{
  using ATOMIC_POLICY = typename RAJAHelper< POLICY >::AtomicPolicy;

  // Create an Array containing the size of each inner array.
  Array< INDEX_TYPE, 1, RAJA::PERM_I, INDEX_TYPE, DEFAULT_BUFFER > elementsPerNode( numNodes );

  // Calculate the size of each inner array.
  RAJA::forall< POLICY >(
    RAJA::TypedRangeSegment< INDEX_TYPE >( 0, elementToNodeMap.size( 0 ) ),
    [elementToNodeMap, &elementsPerNode] ( INDEX_TYPE const elementIndex )
  {
    for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
    {
      RAJA::atomicInc< ATOMIC_POLICY >( &elementsPerNode[ nodeIndex ] );
    }
  }
    );

  // Resize the node to element map with the inner array sizes.
  nodeToElementMap.resizeFromCapacities< POLICY >( elementsPerNode.size(), elementsPerNode.data() );

  // Create an ArrayOfArraysView
  ArrayOfArraysView< INDEX_TYPE, INDEX_TYPE const, false, DEFAULT_BUFFER > const nodeToElementMapView =
    nodeToElementMap.toView();

  // Launch a RAJA kernel that populates the ArrayOfArraysView.
  RAJA::forall< POLICY >(
    RAJA::TypedRangeSegment< INDEX_TYPE >( 0, elementToNodeMap.size( 0 ) ),
    [elementToNodeMap, nodeToElementMapView] ( INDEX_TYPE const elementIndex )
  {
    for( INDEX_TYPE const nodeIndex : elementToNodeMap[ elementIndex ] )
    {
      nodeToElementMapView.emplaceBackAtomic< ATOMIC_POLICY >( nodeIndex, elementIndex );
    }
  } );
}

[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]

The following timings are from a clang 10 release build on LLNL’s Quartz system run with a 200 x 200 x 200 element structured mesh. The reported time is the best of ten iterations.

Function RAJA Policy Time
vector N/A 0.99s
overAllocation loop_exec 0.49s
resizeFromCapacities loop_exec 0.58s
overAllocation omp_parallel_for_exec 0.11s
resizeFromCapacities omp_parallel_for_exec 0.17s

The naive method is much to slow to run on this size mesh. However on a 30 x 30 x 30 mesh it takes 1.28 seconds.