The Vector Unit is totally compliant with the RISC-V Vector Specification 1.0 with many, additional, customisable features to provide enhanced data handling capabilities, improving data handling both in terms of speed and volume.
Semidynamics’ CEO and founder, Roger Espasa, explained, “Our recently announced Atrevido core is unique in that we can do ‘Open Core Surgery’ on it. This means that, unlike other vendors’ cores that are just configurable from a set of options, we actually open up the core and change the inner workings to add features or special instructions to create a totally bespoke solution. We have taken the same approach with our new Vector Unit to perfectly complement the ability of our cores to rapidly process massive amounts of data.”
A Vector Unit is composed of several 'vector cores', roughly equivalent to a GPU core, that perform multiple calculations in parallel. Each vector core has arithmetic units capable of performing addition, subtraction, fused multiply-add, division, square root, and logic operations.
According to Semidynamics, its vector core can be tailored to support different data types: FP64, FP32, FP16, BF16, INT64, INT32, INT16 or INT8, depending on the customer’s target application domain.
The largest data type size in bits defines the vector core width or ELEN. Customers then select the number of vector cores to be implemented within the Vector Unit, either 4, 8, 16 or 32 cores, catering for a very wide range of power-performance-area trade-off options.
Once these choices are made, the total Vector Unit data path width or DLEN is ELEN x number of vector cores. Semidynamics supports DLEN configurations from 128b to 2048b.
Semidynamics has equipped its Vector Unit with a high-performance, cross-vector-core network that provides all-to-all connectivity between the vector cores at high bandwidth, even for the very large, 32-vector core option. The cross-vector-core unit is used for specific instructions in the RISC-V standard that shuffle data between the different vector cores, such as vrgather, vslide, etc.
Uniquely, Semidynamics also offers a second key choice in the Vector Unit: the number of bits of each vector register (known as VLEN) can also be tailored to customer’s needs. While most other vendors assume that VLEN is equal to DLEN (i.e., 1X ratio), Semidynamics offers 2X, 4X and 8X ratios. When the VLEN is larger than the DLEN, a vector operation uses multiple cycles to execute. For example, when VLEN=2048 and DLEN=512, each vector arithmetic operation will take 4 clocks to execute. This is a great feature for tolerating large memory latencies and for reducing power.
“This unleashes the ability for the Vector Unit to process unprecedented amounts of data bits,” added Espasa. “And to fetch all this data from memory, we have our Gazzillion technology that can handle up to 128 simultaneous requests for data and track them back to the correct place in whatever order they are returned. Together our technologies take RISC-V to a whole new level with the fastest handling of big data currently available that will open up opportunities in many application areas of High-Performance Computing such as video processing, AI and ML.”
The new Vector Unit is Out-Of-Order and pairs with Semidynamics’ Out-Of-Order Atrevido core and upcoming In-Order cores. If required, Semidynamics can do Open Core Surgery on cores and Vector Units to provide special interfaces and protocols to a customer’s proprietary IP block.