2.2.0
ST Neural-ART - Requested cache maintenance operations


ST Edge AI Core

ST Neural-ART - Requested cache maintenance operations


for STM32 target, based on ST Edge AI Core Technology 2.2.0



r1.0

Overview

As mentioned in the “ST Neural-ART NPU concepts” article, no hardware mechanism is in place to ensure coherence between the NPU memory domain and HOST memory domain when a shared memory region is marked as cached (NPU and/or HOST memory domain point of view).

add figure with an overview of the sys mem + cached buffers

Memory coherence in this context means that when both the NPU and HOST access the same cached memory region, the data seen by both should be consistent and up to date. Without hardware mechanisms ensuring coherence, changes made by one domain may not be immediately visible to the other, leading to potential data inconsistency. Consequently, when a buffer is shared between the HOST and the NPU during, before, or after inference, cache operations should be performed beforehand.

For performance reasons, it is recommended to enable the MCU/NPU caches for the different memory regions shared and used by the NPU. The ‘CACHEABLE_ON’ attribute must be used in the memory-pool descriptors file for the external memory devices. The ‘–cache-maintenance –Ocache-opt’ NPU compiler options are required to generate calls by the LL ATON stack of the requested cache maintenance operations. If a non-hardware-assisted operation (fallback mechanism) is performed by the MCU, memory consistency is guaranteed by the generated code.

IO buffers

Allocated in the memory pools (default behavior)

If the input buffers are filled by the MCU, the MCU cache clean and invalidate maintenance operations must be called by the user application before performing inference. In the case where the input buffers are placed in a memory region cached by the NPU cache (as decided by the NPU compiler), the generated code will guarantee data consistency. If the buffer is filled by a hardware engine, MCU cache maintenance operation is not requested.

 while (user_app_not_finished()) {
    /* Fill input buffers */
    user_fill_inputs(input_0);
    LL_ATON_Cache_MCU_Clean_Invalidate_Range(input_0, input_size_0);
    //  LL_ATON_Cache_MCU_Invalidate_Range(prediction_0, prediction_size_0);
    /* Perform a complete inference */
    ai_run();
    /* Post-process the predictions */
    user_post_process(prediction_0);
  }

Note that, depending on the post-processing and the usage of the output buffers, it is also recommended to invalidate the output buffers to avoid potential unpredictable behavior (such as MCU cache line eviction).

Provided by the user application (‘–no-output/input-allocation’)