( ESNUG 511 Item 4 ) -------------------------------------------- [10/09/12]
Subject: Metrics checklist for selecting commercial Network-on-Chip (NoC)
From: [ Jim Hogan of Vista Ventures LLC ]
Hi, John,
A Network-on-Chip, or a NoC (pronounced "KNOCK") is a subset of a complete
On-Chip Communications Network (OCCN) that's typically characterized as the
packetized/serialized configurable network used to connect all of the many
IP cores within a System-on-Chip (SoC).
There's a list of factors to consider when selecting a commercial NoC.
---- ---- ---- ---- ---- ---- ----
NOC PRIMARY CHARACTERISTICS - PERFORMANCE, POWER AND AREA
The first priority most chip designers need to assess is if the NoC's specs
meet their objectives for performance, power and area.
Performance is a mix of all three aspects of frequency, throughput, and
latency. Capabilities like QoS and Virtual Channels in combination with
your chip's frequency also impacts performance and/or throughput.
Power is really leakage power and frequency-related power that dictates what
process node you'll design your chip in (e.g. 90 nm, 65 nm, 28 nm.)
Power can be reduced by your apps turning off all the currently unused parts
of your chip -- which is what Apple does with the iPhone. This requires
extremely careful planning of all scenarios. Not easy to do!
Area is mostly a marketing spec because it directly relates to chip cost.
Smaller is always better. But most designs will trade area for power and
performance.
---- ---- ---- ---- ---- ---- ----
NETWORK-ON-CHIP CAPABILITIES
1. Scalable/Adaptable for Changing Market Requirements
Most comsumer digital camera buyers don't know when they buy a $600 camera
or a $50 camera, many times both cameras have the exact same core chipsets.
Milbeaut MB91696AM (courtesy Fujistu)
To save on developement costs, digital camera chipset designers like Sony,
Qualcomm, Samsung, and ST all make one "super chip" for their cameras and
"customize" it by adding (more or less) external memory, a lens, a sensor,
and by tweaking firmware at the last moment in final camera design to meet
a variety of local market demands. One chipset; 90 "different" cameras.
A NoC, by design, is a modular, flexible structure that allows the on-chip
network to grow -- and shrink -- in a scalable manner. When new IP cores
are added to the system, the NoC should grow linearly, without changing
fundamental characteristics. This notion is the heart of platform design.
An ideal architecture supports socket-based (AMBA or custom or a mix of
both) interfaces; and separates cores from fabric, for the flexibility to
extend the platform for derivative designs. Another key feature for the
overall scalability of the NoC is a seamless low-latency interface to other
network components and peripheral networks. This ensures the best system
performance with the lowest gate count.
2. Quality-of-Service (QoS)
This is the no-18-wheeler-trucks-allowed-in-the-commuter-express-lane rule.
When the total communication use (traffic) desired on the on-chip network
exceeds what can be accommodated simultaneously, the NoC must incorporate
intelligent algorithms to determine a sensible priority rankings of the
various transmissions. Since the NoC is the only part of the chip that's
fully aware of all traffic to-and-from all IP blocks, it must be the one
that does this system tuning.
The Network-on-Chip needs a mechanism for the SoC architects to define
deterministic data flows to guarantee critical/minimal bandwidth for each
CPU/GPU/DSP in the system. Each NoC vendor has their own QoS scheme.
One scheme is defined by one CPU as the master (also called "initiator
core"). Each master in the system will assign a QoS priority field to the
data. Each downstream arbiter in the network will use this information to
give data priority. This is referred to as initiator-based QoS.
Another scheme is where QoS is determined is at the DRAM or target core
interface. This is referred to as target-based QoS.
There are trade-offs for each approach. Initiator-based QoS is easy and
intuitive to implement, but is often subject to network congestion failure.
Target-based QoS is more robust, but is also more complicated to implement.
With SoC complexity increasing, it is important to assess the NoC's QoS
scheme for robustness and flexibility.
3. Virtual Channels
Virtual Channels (or Multi-Threading), by way of time-division-MUXing of the
bus, provides up to 16 virtual network connections over one single physical
bus connection. Virtual Channels improve system concurrency and efficient
resource usage -- by saving gates and wires, plus reducing system lantency
-- all in exchange for bus bandwdth. They are "non-blocking" data flows by
creating "passing lanes" over the same physical wires. Here is a detailed
description of Virtual Channels.
Because of the reduced latency, a NoC that has Virtual Channels provides a
much better user QoR than NoC's that lack them.
4. Layout Friendly
Commercial NoCs are soft configurable IP, which must be sufficiently layout-
aware such that the final physical implementation still meets the original
intent of the system architect. The network-on-chip must ensure that as the
physical floor plan topology changes system performance is unchanged.
Logical and Physical Topologies
For example, due to physical layout constraints, the NoC will be distributed
out over the die -- while pipeline stages must be added to maintain timing
and performance. It's critical when selecting a NoC to be sure that these
pipeline stages do NOT damage system performance (changing latency and QoS
performance). In other words, *logical* topology and *physical* topology
must be independent of each other. Due to less wiring, a Virtual Channel
architecture helps in managing this.
5. Power Domain Partitioning
In order to keep most of their chip as non-power-using "dark silicon", most
designers partition their SoCs into numerous independent power and voltage
domains. Therefore your NoC IP must support:
1. power and voltage domain partitioning,
2. switched power domains,
3. multiple clock frequency domains, and
4. skew management domains.
Within each voltage domain, the NoC needs to support multiple power domains.
Within a particular power domain, various clock crossings (asynchronous,
synchronous and mesochronous) must be supported, too.
On-Chip Communications Network, with 4 power domains
Above is an example of an SoC partitioned into 4 different power domains.
For example, while Power Domain 1 ALWON (shown in the top left corner above)
is always "on", Power Domain 2 CPU, Power Domain 3 VIDEO, and Power Domain 4
PER may go into "sleep" mode. Notice that this NoC only goes into 3 power
domains, while the On-Chip Communication Network (the NoC plus 2 Peripheral
Networks) must go into all 4 of the power domains.
6. Memory optimization
On most big chips, the many CPU/GPU/DSP's access to memory is usually the
most contested resource.
Your on-chip network has to be extremely careful how it schedules memory
accesses. It's critical that bus memory accesses have less "dead" (unused)
cycles during memory read and writes. The goal is to maximum memory access
utilization -- ideally 100%, but even 85% is challenging -- while minimizing
costs and avoiding routing congestion
Cores need shared access to external DRAM; often greater than 50% of system
traffic is to-and-from external DRAM, with DRAMs organized as banks for
inner parallelism. Companies often move to smaller, faster DRAMs to get a
higher bandwidth for their SoC.
Conversely, companies often move to bigger, slower DRAMs to save on design
costs -- leading to inefficiencies caused by increased DDR burst lengths.
Therefore the NoC you use needs analysis tools to explore and optimize
different multi-processor architectures and to validate that your final
architecture meets your system specs.
The NoC (with a memory scheduler) can support memory optimizations such as:
- Maximizes page hits to minimize bank cycling costs.
- Pipelines a large number of accesses to minimize the latency
between the cores and the external memory.
- Exploits bank-level parallelism to hide page misses.
- Clusters "reads" and "writes" to minimize the changes from
read-to-write, given many cycle penalty from those changes.
- Interleaves memory channels by indexing off of intermediate
memory address bits. (e.g. when the memory footprint grows,
it's easier to scale bandwidth sharing with two independent
16 bit channels rather than one 32-bit channel.)
- Suddenly intervenes, when needed, to ensure a good user QoS,
to minimize wasted cycles during critical moments. For example,
suddenly keeps the latency to main memory low because it can
impact to the speed of Internet browsing.
7. Cache Coherency
To scale adequately, a Network-on-Chip must support cache coherency that
extends beyond host processor clusters. (Core vendors deploy various
levels of cache memory for faster recall of important data.) Multiple IP
cores make it harder to use caches effectively, especially when the cores
are different sizes and run at different frequencies. CPU cache coherence
within a local processor cluster used to be sufficient, but with more
complex SoCs you will often have other cores like GPUs/DSPs that uses its
own local memories. In this case either I/O or full cache coherency may
be required.
Cache coherency is a headache for a NoC since it must determine if the
bus traffic is coherent or non-coherent and then route it to the proper
destination.
At the least, a commerial NoC should have its own I/O cache coherency
scheme, or it supports a widely used 3rd party scheme like ARM's ACE-Lite.
The upside to using ACE-Lite is the NoC now works with ARM cores in an
I/O coherent environment.
8. System Verification
Once the architect customizes the complete Network-on-Chip for their SoC,
the next step will be to verify the resulting architecture. Typically, if
an individual IP core in the SoC fails, that corner of the design doesn't
work. In contrast, everything in the SoC is dependent on the on-chip
communication network working.
Therefore a robust verification methodology must be included with the NoC
since it impacts every aspect of the chip. Verification works in layers,
like functional verification, and performance verification. Functional
errors include protocol mismatches -- for example, various IP vendors
implementing AMBA differently -- or connectivity errors such as bit-width
conversion errors. Performance verification checks that the SoC is running
up-to-speed for clocks, data, etc.
Commercial NoCs ideally ship with protocol checkers, as well as testbenches
for standard protocols (e.g. AMBA AXI3/4, OCP). The test environments must
also be open and flexible enough to integrate a company's custom protocol
checkers. For system level verification it is important to support UVM to
enable the best reuse of verification that has already developed.
9. Security
NoCs have a range of available security features. Heterogeneous multi-
processing complicates the security architecture. Multiple processors
on the SoC mean more than just the host processor is vulnerable to attack.
This becomes even more important as you build more SW for each processor;
you can have multiple activities occurring simultaneously, with varying
degrees of security.
Highly secure NoCs allow firewalls in the network fabric so that certain
masters can neither "read" nor "write" to certain slaves. They allow you
to set up private secure regions that applications can or cannot write to.
Security can be put in place to ensure that specific regions of memory are
only accessed by authorized processes. Signaling must be reconciled across
differing interfaces, and corrective actions coordinated across multiple
cores.
One can have a NoC with a proprietary security scheme, or use the widely
accepted ARM 'TrustZone', where you can define a secure address region.
By supporting ARM's TrustZone, a commercial NoC can provide secure zones
for applications such as: secured PIN entry, anti-malware, digital rights
management, SW license management, access control of cloud-based documents.
10. Chip-Package-Board/Interposer support
To increase memory bandwidth with embedded SoCs, several die stacking
techniques are being developed. Wide I/O is one method that stacks DRAM
directly on logic using TSV's. This technique separates the DRAM into
4 individual channels of memory on the die. In this configuration, it is
important for the NoC to support Wide I/O by balancing DRAM traffic among
each of the 4 channels of memory. This simplifies SW design since memory
will look like one large bank.
- Jim Hogan
Vista Ventures, LLC Los Gatos, CA
Editor's Note: As mentioned earlier, Jim's on the Sonics board. - John
---- ---- ---- ---- ---- ---- ----
Related Articles
Hogan outlines key market drivers for Network-on-Chip (NoC) IP
Common definitions of On-Chip Communication Network (OCCN) terms
Exploring a designer's Make-or-Buy decision for On-Chip Networks
A detailed discussion of On-Chip Networks with Virtual Channels
Hogan compares Sonics SGN vs. Arteris Flex NOC vs. ARM NIC 400
Join
Index
Next->Item
|
|