This includes the ISR itself, the kernel or bare metal interrupt handling functions and data as well as any data and peripherals accessed by the ISR.įor this demonstration, the translation table is using 1 MiB sections, as such the four entries can hold four 1 MiB regions of the full address space. The goal, is to lock those four entries to cover everything in the critical path of an interrupt processing. Those lockable entries can be filled and then locked in software in order to prevent them from being replaced automatically. The main TLB of a Cortex-A9 consists of 64 to 512 2-way set associative entries plus an additional four fully associative and lockable entries. To improve the performance of modern CPU the TLB is used to store recently used translation results. A translation, known as a page table walk, is a rather long operation that can consist of multiple access to main memory. The MMU will use a more or less complex page table in memory to perform the translation. The MMU is responsible of performing virtual to physical memory address translation as well as permission checking for every memory access done by the CPU core. The Translation Lookaside Buffer or TLB for short is a specialized cache used by modern processors that have a Memory Management Unit (MMU). ![]() It’s this worst case that we are trying to improve. Most importantly, however, is the worst case, which is far to the right at 1778 cycles and very infrequent, happening once in 2 million iterations. The overall response time is rather good, an average of 876 cycles and the fastest response time at 154 cycles. The distribution is relatively wide and bell shaped, which isn’t surprising considering the random nature of the input. Figure 1 – Interrupt latency distribution, no lockdown. This is the result of 2 million iterations without any lockdown. The results would be very similar with an efficient RTOS. Everything is done with a minimal bare metal environment to minimize the effect of code and data outside of our interests. This method of measuring interrupt latency is explained in details in another article. Measuring the time between the trigger and the interrupts gives a distribution of interrupt latency. The test is relatively simple and consists of performing a random number of (random) data and instruction accesses and then triggering an interrupt. However, in this article, I’ll attempt to demonstrate how it works and what can be achieved using a synthetic workload. Worst case interrupt latency is very application specific and consequently, the effect of cache lockdown depends a lot on the application layout and memory access patterns. When asked about those features, and if they can help improve interrupt latency in a customer’s application, it’s often hard to put numbers to back the discussion. TLB lockdown, on the other hand, prevents TLB misses on those same areas. In theory, cache lockdown or isolation, prevents other data and instructions from taking the place of critical code and data. ![]() However, the Cortex-A9 is still a popular core, found on the Xilinx Zynq-7000 and NXP i.MX 6 SoCs to name a few. Newer cores have a simpler TLB, and most often than not an integrated L2 cache instead of the external L2 found on the A9. It’s important to note that those two features are not available in more recent 32 and 64 bits ARM processors such as the Cortex-A7. Namely the L2 cache and TLB lockdown features found in those processors. ![]() Continuing from the last post, this article explores features specific to early members of the ARM Cortex-A family such as the Cortex-A9.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |