[SOLVED] In FPGA, why counter with full adder raw implementation have better clock performance than infered addition '+'?

In FPGA, why counter with full adder raw implementation have better clock performance than infered addition '+'?

I'm testing counter and addition performances on ICE40 and Gatemate FPGAs.

I wrote counter in two differents way :

NaturalCounter using the operator '+' of chisel (view source):

// Natural counter with classic addition"
class NaturalCount(val COUNT_WIDTH: Int = 32) extends Module {
  val io = IO(new Bundle {
    val count = Output(UInt(COUNT_WIDTH.W))
  })

  val MAXCOUNT = BigInt(1) << COUNT_WIDTH
  val counterSize = log2Ceil(MAXCOUNT)
  val counterValue = RegInit(0.U(counterSize.W))
  counterValue := counterValue + 1.U
  io.count := counterValue 
}

FullAdderCount that is an instantiation of chained FullAdder described in Chisel here.

/* FullAdder counter */
class FullAdderCount(val COUNT_WIDTH: Int = 32) extends Module {
  val io = IO(new Bundle {
    val count = Output(UInt(COUNT_WIDTH.W))
  })

  val counterValue = RegInit(0.U(COUNT_WIDTH.W))
  val addition = Module(new FullAdderAddition(COUNT_WIDTH))
  addition.io.a := counterValue
  addition.io.b := 1.U
  counterValue := addition.io.s

  io.count := counterValue
}

If I synthesis (and place&route) these counters in a blinker with ice40 using icestorm tools (yosys and nextpnr) with a 44 bits counters I got these performances

NaturalCount:

2.48. Printing statistics.

=== IcestickBlink ===

   Number of wires:                 14
   Number of wire bits:            240
   Number of public wires:          14
   Number of public wire bits:     240
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                130
     SB_CARRY                       42
     SB_DFF                         44
     SB_LUT4                        44

Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 117.32 MHz (PASS at 12.00 MHz)
Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 121.15 MHz (PASS at 12.00 MHz)

FullAdderCount:

   Number of wires:                581
   Number of wire bits:            810
   Number of public wires:         581
   Number of public wire bits:     810
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                173
     SB_DFF                         44
     SB_DFFE                        43
     SB_LUT4                        86

Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 436.68 MHz (PASS at 12.00 MHz)
Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 380.37 MHz (PASS at 12.00 MHz)

NaturalCount version use less LUT than FullAdderCount but clock perfomances are far better with FullAdderCount.

Is it normal ? What is the purpose of SB_CARRY if the performance are slower than "normal" LUT ?

I tried the same with Gatemate FPGA that is use same software for synthesis (yosys) but another "home made" for place&route (p_r).

NaturalCount with 44 bits counter on Gatemate :

2.49. Printing statistics.
 
=== GatemateBlink ===
 
   Number of wires:                 72
   Number of wire bits:            352
   Number of public wires:          24
   Number of public wire bits:     217
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                144
     CC_ADDF                        44
     CC_BUFG                         1
     CC_DFF                         44
     CC_IBUF                         2
     CC_LUT2                        44
     CC_OBUF                         8
     CC_PLL                          1

Maximum Clock Frequency on CLK 160 (160/3):  189.79 MHz

FullAdderCount with Gatemate :

2.49. Printing statistics.
 
=== GatemateBlink ===
 
   Number of wires:                595
   Number of wire bits:            835
   Number of public wires:         506
   Number of public wire bits:     745
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                186
     CC_BUFG                         1
     CC_DFF                         87
     CC_IBUF                         2
     CC_LUT2                         2
     CC_LUT3                        85
     CC_OBUF                         8
     CC_PLL                          1

Maximum Clock Frequency on CLK 160 (160/3):  189.79 MHz

With this FPGA, the clock performances are exactly the same.

I wonder why clock performance isn't better when using the SB_CARRY and CC_ADDF cells with '+' instantiation.

Is it a "bug" in my design or is it normal ?

Solution

Ok, in fact it was a bug in my code. The FullAdder option was a pdchain fast counter as described on opencore.

    case CounterTypes.FullAdderCount => {
      println("Generate FullAdderCount of " + COUNT_WIDTH + " bits")
      val counter = Module(new PdChain(COUNT_WIDTH)) // <---- wrong class instaciated
      io.leds := counter.io.count(counter.counterSize-1, counter.counterSize-LED_WIDTH)
    }

I fixed the code and get worse timing performances with FullAdder as expected initially.

Sorry for the inconvenience, but asking this question allowed me to find the mistake.