The Atrevido 423 has a wider, 4-way pipeline, allowing for the decoding and retirement of up to two times more instructions than its recently launched, 2-way, 223 core. It is also coupled with more functional units, which significantly increases the IPC (instructions-per-cycle).
According to Roger Espasa, Semidynamics’ CEO, “The Atrevido 423 is particularly well suited for applications that require massive amounts of data. It shines when the data required cannot fit in memory hierarchy levels that are closer to the core (such as L1, L2 or even L3) by tolerating very large latencies without compromising on throughput thanks to our Gazzillion misses technology. This can handle up to 128 simultaneous requests for data and track them back to the correct place in whatever order they are returned.
“Gazzillion allows the core to access memory hierarchy levels far away from the core without an impact in bandwidth or throughput. Effectively, Gazzillion technology removes the latency issues that can occur when using CXL technology to enable far away memory to be accessed at the supercharged rates that it was designed to deliver. This makes Atrevido very well positioned to handle AI and HPC workloads, which typically need to rapidly access very large amounts of data from main memory.”
Atrevido can be configured as a coherent core with a CHI NoC or as a simpler, incoherent core connected via an AXI interface. Furthermore, with an improved TLB and MMU and support for SV39/48/57, the core is well suited for running applications with large memory footprints using Linux.
The Out-Of-Order core comes with a large menu of RISC-V extensions that can be added. Most notably, it can be configured with the in-house Vector Unit, which fully supports the latest RISC-V vector spec.
Other important extensions are bit manipulation, crypto, single-precision FP, double-precision FP and half-precision FP, and bfloat16. Customers can also optionally choose to protect the Data cache with ECC and the Instruction cache with parity, if required for their target markets.
Furthermore, the Atrevido core is fully compliant with the latest RVA22 RISC-V profile. The cores are process agnostic with versions already being supplied down to 5nm.
Espasa added, “Semidynamics has the fastest cores on the market for moving large amounts of data with a cache line per clock at high frequencies even when the data does not fit in the cache. And this can be done at frequencies up to 2.4 GHz on the right node. The rest of the market averages about a cache line every many, many cycles, that is nowhere near Semidynamics’ one every cycle.”
The scalar crypto extension implemented follows the latest specification (Zks and Zk) and provides high performance encryption for algorithms such as SHA2-256, SHA2-512, ShangMi 3, ShangMi 4, AES-128, AES-192, and AES-256. The Atrevido 423 constant-time implementation provides security against side-channel attacks while still delivering a high-performance crypto solution.
“Customers for these kinds of state-of-the-art cores want to have unique solutions with their own special secret sauce built,” explained Espasa. “We are unique in offering Open Core Surgery where we open up the core to insert custom instructions within it. This is unique as other companies’ cores are only configurable from a set of predetermined options. This completely protects the customer’s ASIC from copying and protects its multi-million-dollar investment in the new ASIC. It also means that it is optimised for Power, Performance and Area with no unnecessary overheads or compromises.”
Semidynamics can implement a customer’s ‘secret sauce’ features into the RTL in a matter of weeks, which is something that no-one else currently offers. Semidynamics also enables customers to achieve a fast time to market for their customised core as a first drop can be delivered that will run on an FPGA. This enables the customer to check functionality and run software on it while Semidynamics does the core verification. By doing these actions in parallel, the product can be brought to market faster and with reduced risk.
Key to this is Semidynamics’ Vector Unit that is the largest, fully customisable Vector Unit in the RISC-V market, delivering up to 2048b of computation per cycle for unprecedented data handling. The Vector Unit is composed of several 'vector cores', roughly equivalent to a GPU core, that perform multiple calculations in parallel. Each vector core has arithmetic units capable of performing addition, subtraction, fused multiply-add, division, square root, and logic operations.
Semidynamics' vector core can be tailored to support different data types: FP64, FP32, FP16, BF16, INT64, INT32, INT16 or INT8, depending on the customer’s target application domain. The largest data type size in bits defines the vector core width or ELEN. Customers then select the number of vector cores to be implemented within the Vector Unit, either 4, 8, 16 or 32 cores, catering for a very wide range of power-performance-area trade-off options. Once these choices are made, the total Vector Unit data path width or DLEN is ELEN x number of vector cores. Semidynamics supports DLEN configurations from 128b to 2048b.
Semidynamics also offers a second key choice in the Vector Unit: the number of bits of each vector register (known as VLEN) can also be tailored to customer’s needs. While most other vendors assume that VLEN is equal to DLEN (i.e., 1X ratio), Semidynamics offers 2X, 4X and 8X ratios. When the VLEN is larger than the DLEN, a vector operation uses multiple cycles to execute. For example, when VLEN=2048 and DLEN=512, each vector arithmetic operation will take 4 clocks to execute.
This is an important feature for tolerating large memory latencies and for reducing power. This unleashes the ability for the Vector Unit to process unprecedented amounts of data bits which it is being continuously fed by Gazzillion.