Final Implementation of the Secure Element
E. Camacho Ruiz, P. Navarro Torrero, P. Brox (CSIC)
03/10/2025
The Secure Element (SE) is a critical component in the Post-Quantum (PQ) transition of IoT devices for both MCU-based (Microcontroller) and MPU-based (Microprocessor) embedded devices. At system level, several building blocks are combined within Pilot #1 of the QUBIP project dedicated to the Quantum-secure IoT-based Digital Manufacturing Pilot, as was previously described in the blog post “PQC Implementation on IoT: Challenges and Solutions”.
In essence, the SE is a hardware-based solution that consolidates multiple cryptographic functions into a single design, including PQC encapsulation/decapsulation, hash functions, symmetric and asymmetric cryptography, key storage, True Random Number Generator (TRNG) for key generation, and Message Authentication Codes (MACs).
This post presents the final design of the SE, detailing key features, architectural choices, and performance analysis.
The Software CRYPTO-API
While the development of the SE was ongoing, an intermediate solution was anticipated to the consortium before the SE was fully ready. To address this, CSIC team from Microelectronics Institute of Seville developed a software CRYPTO-API [1], designed as a mirror of the SE. It implements all the functionalities of the final SE and provides an API with the same exposed functions. This approach simplifies integration into the pilot: integrators can use the CRYPTO-API directly without requiring a physical SE. Once the SE is completed, the next step will be the migration of drivers, but the API will remain unchanged, ensuring a seamless transition.
To that end we perform some tests of the PQC algorithms in the MCU (ARM-Cortex M4 processor, single core, 84 MHz) and MPU (ARM-Cortex A53, quad core, 1.5 GHz) selected as reference platforms in this project. As a demonstration of the integration capability of the CRYPTO-API, the following tables present the average execution times (after 1000 runs) of the PQC algorithms implemented on MCU-based and MPU-based IoT devices. It is possible to conclude:
- Device gap (MCU vs MPU): The execution time difference between MCU-based and MPU-based devices is nearly 1000× greater. This becomes particularly striking with SLH-DSA, where signature generation may take over 10 minutes on the MCU.
- Algorithm family comparison: Lattice-based algorithms (ML-KEM and ML-DSA) run significantly faster than the hash-based SLH-DSA, with differences around 1000× in execution time.
- Fast vs. small variants (SLH-DSA): For SLH-DSA, the “fast” (f) versions are roughly 50× faster than the corresponding “small” (s) versions. This highlights the importance of selecting the right algorithm depending on area constraints or time requirements.
Table 1: ML-KEM and ML-DSA execution time in μs
Table 2: SLH-DSA execution time in ms [1]
Post-Quantum Secure Element Architecture
The final design of the Post-Quantum SE developed within the QUBIP project is shown in Figure 1. It features an AXI interface for MPU devices and an I²C protocol for connectivity with MCU devices. At its core, a lightweight RISC-V processor (≈3000 LUTs) manages overall control, including read/write operations across all hardware modules.
In terms of cryptographic primitives, the SE integrates both classical and post-quantum algorithms. For classical asymmetric cryptography, it supports X25519 (KEM) and EdDSA25519 for digital signatures. On the symmetric side, it includes AES with multiple modes, hash functions (SHA-2 and SHA-3), and a TRNG for random number generation. A Secure Key Storage is also provided in order to store private keys generated by the algorithms.
All PQC algorithms standardized to date have been incorporated: ML-KEM as a full hardware implementation, while ML-DSA and SLH-DSA follow a hardware/software co-design methodology. In particular, the SLH-DSA implementation builds upon the work presented in [2], with both software and hardware adapted to fit the SE architecture.
Figure 1: Post-Quantum Secure Element
Performance Analysis
The final version of the Secure Element (SE) utilizes 99,252 LUTs, 58,255 FFs, 89 BRAMs, and 303 DSPs, operating at 350 MHz on a Zynq UltraScale+ platform and 100 MHz on a Kintex-7 FPGA. To enable a fair comparison with software implementations, we executed the complete set of PQC algorithms 1,000 times on the MPU, obtaining the results summarized in the following table.
Table 3: Performance analysis
As can be seen in the table, a 2× performance gain for ML-KEM and 10× SLH-DSA is achieved, as most of the processing is offloaded to hardware. In contrast, ML-DSA achieves only partial hardware acceleration, limited to hash functions, meaning the execution on the small RISC-V core is slower than the software execution on the MPU’s main processor.
Conclusions
An open-source Secure Element with the required security features has been designed to address PQ transition in two types of IoT devices (MPU and MCU), and enable its use in the Quantum-secure IoT-based Digital Manufacturing Pilot.