ice40 clock delay, output timing analysis

I have an ice40 that drives the clock and data inputs of an ASIC.

The ice40 drives the ASIC's clock with the same clock that drives the ice40's internal logic. The problem is that the rising clock triggers the ice40's internal logic and changes the ice40's data outputs a few nanoseconds before the rising clock reaches the ASIC, and therefore the ASIC observes the wrong data at its rising clock.

I've solved this issue by using an inverter chain to delay the ice40's internal clock without delaying the clock driving the ASIC. That way, the rising clock reaches the ASIC before the ice40's data outputs change. But that raises a few questions:

Is my strategy -- using an inverter chain to delay the ice40 internal clock -- a good strategy?
To diagnose the problem, I used Lattice's iCEcube2 to analyze the min/max delays between the internal clock and output pins:

Notice that the asic_dataX delays are shorter than the clk_out delay, indicating the problem.

Is there a way to get this information from yosys/nextpnr?

Thank you for any insight!

Solution

Instead of tinkering with the delays I would recommend to use established techniques. For example SPI simple clocks the data on the one edge and changes them on the other: .

The logic to implement that is rather simple. Here an example implementation for an SPI slave:

module SPI_slave #(parameter WIDTH = 6'd16, parameter phase = 1'b0,
                   parameter polarity = 1'b0, parameter bits = 5) (
    input wire rst,
    input wire CS,
    input wire SCLK,
    input wire MOSI,
    output reg MISO,  
    output wire data_avbl,
    input wire [WIDTH-1:0] data_tx,
    output reg [WIDTH-1:0] data_rx
    );

reg [bits:0]    bitcount;
reg [WIDTH-1:0] buf_send;            

assign clk          = phase ^ polarity ^ SCLK;
assign int_rst      = rst | CS;
assign tx_clk       = clk | CS;
assign data_avbl    = bitcount == 0;            
                               
always @(negedge tx_clk or posedge rst) begin
    MISO <= rst ? 1'b0 : buf_send[WIDTH-1];
end

always @(posedge clk or posedge int_rst) begin  
    if (int_rst) begin
        bitcount    <= WIDTH;
        data_rx     <= 0;
        buf_send    <= 0;
    end else begin    
        bitcount    <= (data_avbl ? WIDTH : bitcount) - 1'b1;                
        data_rx     <= { data_rx[WIDTH-2:0], MOSI };
        buf_send    <= bitcount == 1 ? data_tx[WIDTH-1:0] : { buf_send[WIDTH-2:0], 1'b0};
    end   
end

endmodule

As one can see the data are captured at the positive edge and changed on the negative edge. If one wants to avoid the mixing of edge sensistivies a doubled clock can be used instead.