( ESNUG 249 Item 6 ) -------------------------------------------- [8/30/96]

Subject: ( ESNUG 248 #5 ) FSM Treatment Doesn't Seem Consistant Or Coherent

>I am busy with state machine synthesis and I have been fighting with design
>compiler some days in order to get the last picoseconds out of my FSM, that
>is supposed to reach 66 MHz.  ... I decided to check Design Compiler using
>an identical FSM coding by swapping the columns of my state coding.  I was
>sure to get an identical result, where the synthesized flipflops
>(SIG_st_sm_next_reg[0][1][2][3]) just changed their order.
>
>  attribute ENUM_ENCODING of sm_next_state_type : type is
>  -- order ABCD
>  --   "0011 0010 0001 1010 0000 0100 0101 1101 1100 1011 1111 1001";
>  -- order ADBC
>       "0101 0001 0100 1001 0000 0010 0110 1110 1010 1101 1111 1100";
>
>I am very unhappy to see that the IDENTICAL synthesis script on an
>IDENTICALLY coded state machine produces DIFFERENT results!  ...
>I understand that I get quite different results when using different state
>assignments.  Do you know an explanation, why the results also differ, if
>I change only the ORDER of my state vectors?


From: [ Synopsys HDL Advisor Technical Marketing ]

Dear John,

Although, I can not provide an explanation of the results of state machine
compiler I can sympathize.  The starting point for synthesis is *extremely*
important.  In fact, the Design Compiler optimization is limited by this
starting point.  This means that in many situations it is more efficient to
change the source code.  You get a better result and it may even take less
time than iterating with constraints.

Last year we released a new tool, HDL Advisor, which provides metrics
about the starting point for synthesis.  In many cases we have adjusted
the source to take into account critical paths (late arriving signals)
and made significant gains in results.  Since you are already spending
quite a bit of time in trying to meet timing, It may make sense to
see what the tool says about the source. 

  - [ Synopsys HDL Advisor Technical Marketing ]

      ----    ----    ----    ----    ----    ----    ----    ----
      ----    ----    ----    ----    ----    ----    ----    ----

 [ Editor's Note:  What follows is an exchange of e-mails between Ted
   Boydston and Deiter Peer on the synthesizing FSMs problem.  - John ]


From: tboydsto@su102s.ess.harris.com (Ted Boydston)
To: peer@iis.fhg.de (Dieter Peer)

Dieter,

If I understood the question, what you did by changing flip flops ABCD to 
flip flops ADBC is to change the sequence the machine counts.  So, yes,
order does make a difference, because the transfer function that generated
the  next state transistions had to change.  That is, a state machine with
two different state vector series, must have two diffenent next-state
transfer functions to generate the output, hence different timing parameters.
A simple example would help here:

Let's say you are making a 16 state state machine.  You could use a 4 bit 
binary counter as your state bits or a 4 bit linear shift feedback register
(LSFR) as your state bits.  Since the counter generates a binary sequence,
while the LSFR generates a psuedorandom sequence, the binary counter is
inherently much slower than the LSFR.  So, to obtain maximum performance in
this simple case, you would choose the LSFR.

By the way, your encoding method is probably not the best encoding if speed
is a concern.  My recommendation, if area is not a concern, is to use one-hot 
(or one-cold) encoding in Synopsys's FSM compiler -- this will give you the 
highest performance.  An example of one-hot in an 8 state state machine:

    00000001
    00000010
    00000100
    00001000
    00010000
    00100000
    01000000
    10000000

As you can see, each state is assigned to a flip flop, allowing for minimal
decode delay.  If area is a concern, then grey codes are the most efficent.
An example of grey codes in an 8 state state machine would be:

      001
      011
      111
      110
      100
      000
      100
      110

In this case, only one bit is changing at any one time, allowing for minimal
decode delay.

  - Theodore L. Boydston IV                  
    Harris Corporation

     ----    ----    ----    ----    ----    ----    ----    ----

To: tboydsto@su102s.ess.harris.com (Ted Boydston)
From: peer@iis.fhg.de (Dieter Peer)

Ted,

Your answer is correct, but my question was different.  (Sorry I am not a
native English speaker.)

Using one-hot for my 12-state statemachine does *not* give the best (max
speed, any area) results.  The gate library (ES2 Gate Array) provides
maximum 4-input-gates (and 6-input-complex-gates).  DC synthesizes too many
logic gates to the outputs of (here: 12) flipflops so the time thru logic
gates within the fsm limits the speed.  My best results due to speed were
achieved using 4- or 5- bit state assignment.

Lets assume your 8-state-assignment example using one-hot, naming the
flipflops A,B,C,D,E,F,G,H.  If you swap the assignment by changing some of
the columns to lets say C,D,E,F,G,H,A,B you get "another" one-hot.  But
wouldn t you expect to get identical circuitry synthsized?

   ABCDEFGH    CDEFGHAB
   00000001    00000100
   00000010    00001000
   00000100    00010000
   00001000    00100000
   00010000    01000000
   00100000    10000000
   01000000    00000001
   10000000    00000010

I would expect to get a (functionally) identical netlist, where only the
*names* of the flipflops are swapped. (A->C, B->D, .... F->H, G->A, H->B)
If you do not agree with me, do you have some advise on how to optimize the
order of 1 s in a one-hot assignment (or the gray code assignment) for the
fastest possible circuit?  The same would apply for your gray-code
assignment (ABC swapped to ACB):

   ABC    ACB
   001    010
   011    011
   111    111
   110    101
   100    100
   000    000
   100    100
   110    101

  - Dieter Peer
    Fraunhofer Gesellschaft, Germany

     ----    ----    ----    ----    ----    ----    ----    ----

From: tboydsto@su102s.ess.harris.com (Ted Boydston)
To: peer@iis.fhg.de (Dieter Peer)

Dieter,

I now see what you are saying, and for your specific column swap example on
the one-hot case I would agree, because you are not changing the actual
transition pattern (explained latter).  For the gray code case, though, I
believe there is a problem.  In this case, you performed the following
column swap:

   ABC    ACB   Transition
   000    000   1
   001    010   2
   011    011   3    
   111    111   4
   110    101   5
   100    100   6
   110    101   7
   010    001   8

If you notice, there are very few similar transitions between case ABC and
case ACB (like 3->4), while non-similar transitions dominate (1->2 and 4->5).  
Since the machine "counts" a different pattern, I believe that different 
control logic will be synthesized to produce the desired pattern, resulting
in a slight performance difference.

Another example would be a straight binary counter with a swap:

  ABC  ACB  Transition
  000  000  1
  001  010  2
  010  001  3
  011  011  4
  100  100  5 
  101  110  6
  110  101  7
  111  111  8

In this case, you can see that case ABC would synthesize to an adder and 
control logic, while case ACB would synthesize to some random logic state
machine, resulting in a signifigant performance difference.

Your one hot case of:

   ABCDEFGH   Transition  CDEFGHAB    Transition	
   00000001   1 <- reset  00000100    3 <- reset
   00000010   2           00001000    4
   00000100   3           00010000    5
   00001000   4           00100000    6
   00010000   5           01000000    7   Transition order
   00100000   6           10000000    8   does not change,
   01000000   7           00000001    1   its only shifted.
   10000000   8           00000010    2

is a special case because the transitions are exactly the same -- just
shifted--  and I would hope to see the same logic -- a shift register. :^)
Now, let's say you changed the order of the machine from ABCDEFGH to
HACFGEBD, the machine would no longer transition in the order of a shift
register, resulting in a differenct circuit.  By the way, I think if you
only shift the order of the state bits as a set, without swapping them, you
should achieve the same results with minimal performance impact in any
circuit.

As for your question of increasing performance, my first post still holds
about one-hot and gray codes.  For one hot, your mealy or moore output
should be the state, and since there is one flip flop per state, you should
not need wide combinatorial decodes to get valid state transitions.  Take
your circuit into the FSM Compiler and perform a one-hot synthesis -- that
should give you the fastest output.  You will need to modifiy your HDL a bit
to get it to read it into FSM Compiler, but overall you should not have
many problems.  I hope that you can find a solution!

  - Theodore L. Boydston IV                  
    Harris Corporation



 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)