( ESNUG 393 Item 11 ) -------------------------------------------- [04/25/02]

Subject: ( ESNUG 388 #19 ) My 31% Speed-up By Hand-Tweaking DW Arithmatic

> Here's a simple datapath example and different timing/area results with
> different flows.
>
>    module mux4 (m0,m1,x,b0,b1,z);
> 
>    parameter n=32;
>
>    input  [n-1:0]   m0,m1,x;
>    input  [2*n-1:0] b0,b1;
>    output [n:0]   z ;
>
>    wire   [2*n:0] y1; 
>    wire   [2*n:0] y0; 
>    wire   [2*n:0] y2; 
>
>    assign y0 = m0*x + b0;
>    assign y1 = m1*x + b1;
>
>    assign y2 = (y1>y0) ? y1 : y0;
>    assign z = y2[2*n:n] + y1[(n-1):0];
>
>    endmodule
>
>
> All of these have been achieved using TSMC's 0.13 um technology and the
> 2001.08-SP2 release of DC. 
>
>  Flow                  Path Length  Path Slack   Design Area  Compile Time
>  --------------------  -----------  ----------   -----------  ------------
>  DC-Expert + DW_Standard     13.74    -7.24        240760.28     2745.29
>  DC-Expert + DW               7.33    -0.83        213677.06     3161.75
>  DC-Ultra + DW + MCI + TCSA   6.63    -0.13        210615.31     2754.35
>  DC-Ultra + DW + MCI + PD     6.50     0.00        174409.92      952.84
>
> DW_Standard is the standard library shipped with DC.  DW is the full
> DesignWare library.  "DW + MCI + TCSA" means DesignWare, with
> dw_prefer_mc_inside set to true and transform_csa command.  "DW + MCI
> + PD" means DesignWare, with dw_prefer_mc_inside set to true and 
> partition_dp command
>
>     - Oliver Meisel
>       Synopsys, Inc.                             Mountain View, CA


From: [ Papa Smurf ]

John, anon pls.

I have quite a bit of experience building arithmetic units for graphics
chips, mainframe processors and DSPs.  I ran Oliver Meisel's example using
a "DC-Ultra + DW + MCI + TCSA flow" and found that it could meet 6.0 nsec
using a similar library which also targets the TSMC's 0.13 micron process.
I set max_fanout to 20, ungrouped all and set the operating conditions to
worst case military.  I used -map_effort high and max_area 0.  I'm using
2001.08-SP1.  (Be sure to use SP1 or later if your using the transform_csa
command as *bad* logic can result otherwise.)

I then synthesized a version which instantiated hand optimized multipliers
and adders.  I call these results in the chart below "RTL-1".  This design
made 5.5 nsec and was smaller than the DW implementations.

To further improve performance I reorganized/re-architected the code,
duplicating an adder so that the magnitude compare and last addition were
performed in parallel.  These results are listed as "RTL-2" in my chart.
I went from:

   assign y2 = (y1>y0) ? y1 : y0;
   assign z  = y2[2*n:n] + y1[(n-1):0];

to:

   assign y3 = y0[2*n:n] + y1[(n-1):0];
   assign y4 = y1[2*n:n] + y1[(n-1):0];
   assign z  = (y1>y0) ? y4 : y3;

This design (also using my hand optimized multiply-accumulators and
adders) achieved a 4.5 nsec timing.

  Flow                                         Path Length    Area
  --------------------------                   -----------   -------
  DC-Ultra + DW + MCI + TCSA (original RTL)       6.0 ns     150,805
  DC-Expert "RTL-1" (hand optimized arith)        5.5 ns     118,097
  DC-Ultra "RTL-2" DW + MCI + TCSA (re-arch)      5.0 ns     157,753
  DC-Expert "RTL-2" (re-arch) (hand op arith)     4.5 ns     129,329


Note that the big gains are from architectural changes.  The transform_csa
command is a powerful architectural tool which saves significant area and
delay.  My hand optimized multipliers and adders also utilized a carry-save
architecture and saved about 0.5 ns and significant area over the DW
implementation, but reorganizing the code had an even larger impact.  I went
from Oliver's 6.0 nsec down to 4.5 nsec overall.  That's a 25% speed-up.

All of these runs met the path length timing constraint listed.

    - [ Papa Smurf ]


 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)