( ESNUG 317 Item 5 ) --------------------------------------------- [5/13/99]
From: Robert Wiegand <rwiegand@ensoniq.com>
Subject: Seven Cool Tricks From My Adventures Using Test Compiler
Hi John,
I just completed a design using Test Compiler. (I did most of the work in
the 1998.08 release, when I ran into that last bug (below) I switched to
the 1998.08-1 CD release, but it behaved the same. I have yet to try it
with 1999.05, I have several days of script modifications for non-backward
compatible command changes in the new version.) This particular design,
although relatively small (~35K + RAMs), packs a lot of nasty for Test
Compiler to deal with. There are 4 external clocks which become 8 internal
clock domains at the core level after various gating (for power control),
inverting and dividing logic in the clocks block (top level instances a
pads block, a clocks block, and a hierarchical core). Our ASIC vendor
allows for 4 scan chains, and prefers us to manage our own test_se (scan
enable) buffering. This led to some interesting problems and some cool
tricks to solve them.
COOL TRICK #1: The -scan and -incremental_mapping compile switches can
be used together.
This trick doesn't fix a particular problem, but I thought I'd mention it
here. My intention was to economize on Test Compiler/DCXP license usage.
Design exploration compiles can proceed without scan until design issues
are worked out, then scan is added incrementally with all the benefits of
compiling with scan. It did help to fit scan into my methodology as
described in the SNUG99 paper about MIN/MAX compile (shameless plug,
sorry!).
I chose not to route the scan chains at each module, but to wait for full
core level visibility before routing the chains with insert_scan. This led
to an interesting problem: compile -scan does not generate a test_se input
to a module. Insert_scan does, and at the core level the tool now sees all
loads on test_se at once. The compile log started at about 300ns timing
violation and I killed it after 6 hours.
COOL TRICK #2: To manage test_se fanout when running insert_scan at the
core level, run insert_scan with no optimization and use incremental
compile block by block to clean up the mess. Here's the script:
/* route scan chains without optimization */
set_drive 0 find(port,scan_enable_port) /* scan_enable_port = "test_se" */
set_resistance 0 find(net,scan_enable_port) /* net & port are same name */
set_don't_touch find(net,scan_enable_port)
insert_scan -ignore_compile_design_rules -map_effort low
remove_attribute find(port,scan_enable_port) rise_drive
remove_attribute find(port,scan_enable_port) fall_drive
remove_attribute find(net,scan_enable_port) ba_net_resistance
remove_attribute find(net,scan_enable_port) don't_touch
/* fix design rules from scan chain routing - buffer test_se */
suppress_errors = suppress_errors + {UID-95}
foreach (design_name,(find(design) - core_block) {
/* core_block = name of core design */
current_design design_name
if (find(port,test_se)) {
set_max_fanout 1 test_se
compile -incremental_mapping -only_design_rule
}
}
current_design core_block
suppress_errors = suppress_errors - {UID-95}
check_test
With the above procedure, the scan chains were routed and the test_se tree
was buffered in ~15 minutes. Be sure compile_no_new_cells_at_top_level
is set to false, or non-hierarchical blocks will not be fixed as I found
out the hard way.
Since I had 8 clock domains at the core, I now had 8 unballanced scan
chains. I turns out that clocks 1-3 were fairly balanced, but I needed to
mix clocks 4-8 on the forth chain to balance the rest. DCXP allows you to
mix edges or mix clocks in scan chains, but for the whole design, not
individual scan chains. Also, DCXP doesn't pay attention to insertion
delays on clocks. This is ok if you have one clock per chain, or mix edges
of one clock on a chain (DCXP will put negedge flops ahead of posedge
flops), but bad news if you want to mix multiple edges of multiple clocks
in a single chain. The desired order is from largest insertion delay
negedge to smallest insertion delay posedge. It can be done, as long as
you split your dual edge clocks into separate posedge and negedge domains.
COOL TRICK #3: To specify the clock domain order when mixing clocks and
edges on the same chain, split posedge and negedge into separate domains,
then use the all_registers command to specify the order:
/* name scan chains */
set_scan_path chain_1
set_scan_path chain_2
set_scan_path chain_3
set_scan_path chain_4
/* specify clock ordering in chains */
set_scan_path chain_1 all_registers(-clock clock1) -complete true
set_scan_path chain_2 all_registers(-clock clock2) -complete true
set_scan_path chain_3 all_registers(-clock clock3) -complete true
set_scan_path chain_4 all_registers(-clock clock5) \
+ all_registers(-clock clock4) + all_registers(-clock8) \
+ all_registers(-clock clock6) + all_registers(-clock7) -complete true
The mapping of core level clocks 1-8 to external clocks A-D is as follows:
Clock1 = posedge clockA
Clock2 = gated posedge clockA
Clock3 = posedge clockB
Clock4 = negedge clockB
Clock5 = gated negedge clockB
Clock6 = divided posedge clockC
Clock7 = posedge clockD
Clock8 = negedge clockD
Now for the top level. Here there were two new complications. First, the
clocks block had dividing logic in it and needed to be added to chain_4 (a
test_mode signal was used to bypass the dividing logic to make the divided
internal clock controllable). Second, the clocks were now defined from
external pins, and now paths crossing between edges of the same clock were
generating capture violations. At the core level, DCXP assumed all 8
clocks were posedge which hid this problem. The second problem, by itself,
can be solved by multi_pass ATPG. More on that in a bit. These two
problems interacted in some interesting ways. Based on the insertion delay
of the clock performing the divide in the clocks block, I wanted these
registers to be inserted in chain_4 between clock8 and clock6 with lockup
latches. Even though I had the core declared as existing scan, everything
was getting jumbled around. Once again, I needed to specify the scan
chain order, leading me to...
COOL TRICK #4: To specify the scan chain order from the top level when
mixing clocks, create temporary clock domains, then use the all_registers
command to specify the order:
/* create temporary internal clocks for register grouping */
create_clock core_block/clock1 -period default_period -name clock1
create_clock core_block/clock2 -period default_period -name clock2
create_clock core_block/clock3 -period default_period -name clock3
create_clock core_block/clock4 -period default_period -name clock4
create_clock core_block/clock5 -period default_period -name clock5
create_clock core_block/clock6 -period default_period -name clock6
create_clock core_block/clock7 -period default_period -name clock7
create_clock core_block/clock8 -period default_period -name clock8
create_clock clocks_block/clock9 -period default_period -name clock9
/* name scan chains */
set_scan_path chain_1
set_scan_path chain_2
set_scan_path chain_3
set_scan_path chain_4
/* specify clock ordering in chains */
set_scan_path chain_1 all_registers(-clock clock1) -complete true
set_scan_path chain_2 all_registers(-clock clock2) -complete true
set_scan_path chain_3 all_registers(-clock clock3) -complete true
set_scan_path chain_4 all_registers(-clock clock5) \
+ all_registers(-clock clock4) + all_registers(-clock8) \
+ all_registers(-clock9) + all_registers(-clock clock6) \
+ all_registers(-clock7) -complete true
/* remove temporary clocks */
remove_clock clock1
remove_clock clock2
remove_clock clock3
remove_clock clock4
remove_clock clock5
remove_clock clock6
remove_clock clock7
remove_clock clock8
remove_clock clock9
This worked great, except DCXP was juggling the order of the clock domains
in chain_4. It turned out that DCXP was assuming an inverted waveform on
one of the dual edge clocks to get a smaller number of cross clock
violations.
COOL TRICK #5 Before running the top level insert_scan, check for the
presence of a .tpf (test protocol file) file. If is exists, load it. If
it does not, create one. If there are ordering problems after insert_scan,
check and edit the .tpf file so that all clock waveforms are rising edge.
which scan_directory + top_block + ".tpf"
if (dc_shell_status) {
read_init_protocol scan_directory + top_block + ".tpf"
} else {
write_test_protocol -out scan_directory + top_block + ".tpf"
}
Ok, scan chains are routed in the specified order, life is good! Not quite
yet I noticed a pile of TEST-294 messages in the log file telling me that
all scan_enable inputs to the flops get disconnected during the top level
insert_scan. Since the insert_scan finished quickly, I figured it must
have reconnected these inputs to my existing test_se tree. Examining the
netlist, I found this to be true, but I also found a duplication of test_so
muxes at the core and in the pads. I.E. the scan outputs were muxed with
functional signals twice. I got around this by forcing the core level
insert_scan to generate dedicated scan inputs and outputs:
set_scan_configuration -dedicated_scan_ports true
I have not yet gotten around the other problem. I took the DC-XP Advanced
Scan Synthesis course after SNUG, and picked up this bit of info: DCXP
will interpret a pre-connect test_se as an unsupported functional
connection. The solution presented was to break the connection and remove
the net before running insert_scan. I tried various combinations of places
to break this connection, all of which produced the same TEST-294 messages
along with different incorrect implementations. The closest solution was
to leave the connection intact, producing the TEST-294 message but
producing a correct implementation. Has anyone else run into this? I've
tried it with 1998.08 and 1998.08-1.
Two more for the road: Multi-pass ATPG works great for mixed edge designs.
Take the .tpf file as generated earlier, copy it to same_name.pass2.tpf
and invert the necessary waveform.
COOL TRICK #6: Use tpf files for multipass ATPG. A generalized ATPG
script can be written by checking for the second tpf file:
read_init_protocol scan_directory + top_block + ".tpf"
check_test
create_test_patterns -output scan_directory + top_block + "_atpg.vdb"
which scan_directory + top_block + ".pass2.tpf"
if (dc_shell_status) {
multi_pass_test_generation = true
read_init_protocol scan_directory + top_block + ".pass2.tpf"
check_test
create_test_patterns -input scan_directiry + top_block + "_atpg.vdb" \
-output scan_directory + top_block + "_atpg2.vdb"
}
COOL TRICK #7: If there is no preference for bidirectional pins to be
input or output during scan, try both and pick the one with higher
coverage.
The last trick gave me an additional 3 or 4 tenths of a percent coverage.
With all the above tricks, and most importantly the up-front commitment of
all the designers to write scan compatible RTL, we were able to get just
over 99% coverage on this design.
- Bob Wiegand
Ensoniq, Corp. Malvern, PA
|
|