( ESNUG 317 Item 5 ) --------------------------------------------- [5/13/99]

From: Robert Wiegand <rwiegand@ensoniq.com>
Subject: Seven Cool Tricks From My Adventures Using Test Compiler

Hi John,

I just completed a design using Test Compiler.  (I did most of the work in
the 1998.08 release, when I ran into that last bug (below) I switched to
the 1998.08-1 CD release, but it behaved the same.  I have yet to try it
with 1999.05, I have several days of script modifications for non-backward
compatible command changes in the new version.)  This particular design, 
although relatively small (~35K + RAMs), packs a lot of nasty for Test 
Compiler to deal with.  There are 4 external clocks which become 8 internal 
clock domains at the core level after various gating (for power control), 
inverting and dividing logic in the clocks block (top level instances a 
pads block, a clocks block, and a hierarchical core).  Our ASIC vendor 
allows for 4 scan chains, and prefers us to manage our own test_se (scan 
enable) buffering.  This led to some interesting problems and some cool 
tricks to solve them.

COOL TRICK #1:  The -scan and -incremental_mapping compile switches can 
be used together.

This trick doesn't fix a particular problem, but I thought I'd mention it 
here. My intention was to economize on Test Compiler/DCXP license usage. 
Design exploration compiles can proceed without scan until design issues 
are worked out, then scan is added incrementally with all the benefits of 
compiling with scan.  It did help to fit scan into my methodology as 
described in the SNUG99 paper about MIN/MAX compile (shameless plug, 
sorry!).

I chose not to route the scan chains at each module, but to wait for full 
core level visibility before routing the chains with insert_scan.  This led 
to an interesting problem:  compile -scan does not generate a test_se input 
to a module.  Insert_scan does, and at the core level the tool now sees all 
loads on test_se at once.  The compile log started at about 300ns timing 
violation and I killed it after 6 hours.

COOL TRICK #2:  To manage test_se fanout when running insert_scan at the 
core level, run insert_scan with no optimization and use incremental 
compile block by block to clean up the mess.  Here's the script:

   /* route scan chains without optimization */
   set_drive 0 find(port,scan_enable_port) /* scan_enable_port = "test_se" */
   set_resistance 0 find(net,scan_enable_port) /* net & port are same name */
   set_don't_touch find(net,scan_enable_port)
   insert_scan -ignore_compile_design_rules -map_effort low
   remove_attribute find(port,scan_enable_port) rise_drive
   remove_attribute find(port,scan_enable_port) fall_drive
   remove_attribute find(net,scan_enable_port) ba_net_resistance
   remove_attribute find(net,scan_enable_port) don't_touch

   /* fix design rules from scan chain routing - buffer test_se */
   suppress_errors = suppress_errors + {UID-95}
   foreach (design_name,(find(design) - core_block) {
     /* core_block = name of core design */
     current_design design_name
     if (find(port,test_se)) {
       set_max_fanout 1 test_se
       compile -incremental_mapping -only_design_rule
     }
   }
   current_design core_block
   suppress_errors = suppress_errors - {UID-95}
   check_test

With the above procedure, the scan chains were routed and the test_se tree 
was buffered in ~15 minutes.  Be sure compile_no_new_cells_at_top_level
is set to false, or non-hierarchical blocks will not be fixed as I found
out the hard way.

Since I had 8 clock domains at the core, I now had 8 unballanced scan 
chains.  I turns out that clocks 1-3 were fairly balanced, but I needed to 
mix clocks 4-8 on the forth chain to balance the rest.  DCXP allows you to 
mix edges or mix clocks in scan chains, but for the whole design, not 
individual scan chains.  Also, DCXP doesn't pay attention to insertion 
delays on clocks.  This is ok if you have one clock per chain, or mix edges 
of one clock on a chain (DCXP will put negedge flops ahead of posedge 
flops), but bad news if you want to mix multiple edges of multiple clocks 
in a single chain.  The desired order is from largest insertion delay 
negedge to smallest insertion delay posedge.  It can be done, as long as 
you split your dual edge clocks into separate posedge and negedge domains.


COOL TRICK #3:  To specify the clock domain order when mixing clocks and 
edges on the same chain, split posedge and negedge into separate domains, 
then use the all_registers command to specify the order:

   /* name scan chains */
   set_scan_path chain_1
   set_scan_path chain_2
   set_scan_path chain_3
   set_scan_path chain_4
   /* specify clock ordering in chains */
   set_scan_path chain_1 all_registers(-clock clock1) -complete true
   set_scan_path chain_2 all_registers(-clock clock2) -complete true
   set_scan_path chain_3 all_registers(-clock clock3) -complete true
   set_scan_path chain_4 all_registers(-clock clock5) \
     + all_registers(-clock clock4) + all_registers(-clock8) \
     + all_registers(-clock clock6) + all_registers(-clock7) -complete true

The mapping of core level clocks 1-8 to external clocks A-D is as follows:

   Clock1 = posedge clockA
   Clock2 = gated posedge clockA
   Clock3 = posedge clockB
   Clock4 = negedge clockB
   Clock5 = gated negedge clockB
   Clock6 = divided posedge clockC
   Clock7 = posedge clockD
   Clock8 = negedge clockD

Now for the top level.  Here there were two new complications.  First, the 
clocks block had dividing logic in it and needed to be added to chain_4 (a 
test_mode signal was used to bypass the dividing logic to make the divided 
internal clock controllable).  Second, the clocks were now defined from 
external pins, and now paths crossing between edges of the same clock were 
generating capture violations.  At the core level, DCXP assumed all 8 
clocks were posedge which hid this problem.  The second problem, by itself, 
can be solved by multi_pass ATPG.  More on that in a bit.  These two 
problems interacted in some interesting ways.  Based on the insertion delay 
of the clock performing the divide in the clocks block, I wanted these 
registers to be inserted in chain_4 between clock8 and clock6 with lockup 
latches.  Even though I had the core declared as existing scan, everything 
was getting jumbled around.  Once again, I needed to specify the scan 
chain order, leading me to...

COOL TRICK #4:  To specify the scan chain order from the top level when 
mixing clocks, create temporary clock domains, then use the all_registers 
command to specify the order:

   /* create temporary internal clocks for register grouping */
   create_clock core_block/clock1 -period default_period -name clock1
   create_clock core_block/clock2 -period default_period -name clock2
   create_clock core_block/clock3 -period default_period -name clock3
   create_clock core_block/clock4 -period default_period -name clock4
   create_clock core_block/clock5 -period default_period -name clock5
   create_clock core_block/clock6 -period default_period -name clock6
   create_clock core_block/clock7 -period default_period -name clock7
   create_clock core_block/clock8 -period default_period -name clock8
   create_clock clocks_block/clock9 -period default_period -name clock9
   /* name scan chains */
   set_scan_path chain_1
   set_scan_path chain_2
   set_scan_path chain_3
   set_scan_path chain_4
   /* specify clock ordering in chains */
   set_scan_path chain_1 all_registers(-clock clock1) -complete true
   set_scan_path chain_2 all_registers(-clock clock2) -complete true
   set_scan_path chain_3 all_registers(-clock clock3) -complete true
   set_scan_path chain_4 all_registers(-clock clock5) \
     + all_registers(-clock clock4) + all_registers(-clock8) \
     + all_registers(-clock9) + all_registers(-clock clock6) \
     + all_registers(-clock7) -complete true
   /* remove temporary clocks */
   remove_clock clock1
   remove_clock clock2
   remove_clock clock3
   remove_clock clock4
   remove_clock clock5
   remove_clock clock6
   remove_clock clock7
   remove_clock clock8
   remove_clock clock9

This worked great, except DCXP was juggling the order of the clock domains 
in chain_4.  It turned out that DCXP was assuming an inverted waveform on 
one of the dual edge clocks to get a smaller number of cross clock 
violations.

COOL TRICK #5  Before running the top level insert_scan, check for the 
presence of a .tpf (test protocol file) file.  If is exists, load it.  If 
it does not, create one.  If there are ordering problems after insert_scan, 
check and edit the .tpf file so that all clock waveforms are rising edge.

   which scan_directory + top_block + ".tpf"	
   if (dc_shell_status) {
      read_init_protocol scan_directory + top_block + ".tpf"
   } else {
      write_test_protocol -out scan_directory + top_block + ".tpf"
   }

Ok, scan chains are routed in the specified order, life is good!  Not quite 
yet  I noticed a pile of TEST-294 messages in the log file telling me that 
all scan_enable inputs to the flops get disconnected during the top level 
insert_scan.  Since the insert_scan finished quickly, I figured it must 
have reconnected these inputs to my existing test_se tree.  Examining the 
netlist, I found this to be true, but I also found a duplication of test_so 
muxes at the core and in the pads.  I.E. the scan outputs were muxed with 
functional signals twice.  I got around this by forcing the core level 
insert_scan to generate dedicated scan inputs and outputs:

   set_scan_configuration -dedicated_scan_ports true

I have not yet gotten around the other problem.  I took the DC-XP Advanced  
Scan Synthesis course after SNUG, and picked up this bit of info:  DCXP 
will interpret a pre-connect test_se as an unsupported functional 
connection.  The solution presented was to break the connection and remove 
the net before running insert_scan.  I tried various combinations of places 
to break this connection, all of which produced the same TEST-294 messages 
along with different incorrect implementations.  The closest solution was 
to leave the connection intact, producing the TEST-294 message but 
producing a correct implementation.  Has anyone else run into this?  I've 
tried it with 1998.08 and 1998.08-1.

Two more for the road:  Multi-pass ATPG works great for mixed edge designs. 
Take the .tpf file as generated earlier, copy it to same_name.pass2.tpf 
and invert the necessary waveform.

COOL TRICK #6:  Use tpf files for multipass ATPG.  A generalized ATPG 
script can be written by checking for the second tpf file:

   read_init_protocol scan_directory + top_block + ".tpf"
   check_test
   create_test_patterns -output scan_directory + top_block + "_atpg.vdb"
   which scan_directory + top_block + ".pass2.tpf"
   if (dc_shell_status) {
     multi_pass_test_generation = true
     read_init_protocol scan_directory + top_block + ".pass2.tpf"
     check_test
     create_test_patterns -input scan_directiry + top_block + "_atpg.vdb" \
        -output scan_directory + top_block + "_atpg2.vdb"
   }

COOL TRICK #7:  If there is no preference for bidirectional pins to be 
input or output during scan, try both and pick the one with higher 
coverage.

The last trick gave me an additional 3 or 4 tenths of a percent coverage. 
With all the above tricks, and most importantly the up-front commitment of 
all the designers to write scan compatible RTL, we were able to get just 
over 99% coverage on this design.

    - Bob Wiegand
      Ensoniq, Corp.                                 Malvern, PA



 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)