Zephyrnet Logo

Using the Methodology Report Part Five: DDR4 IP post calibration hardware failures that indicate a timing issue but no violation in timing report

Date:

The analysis in this blog entry is based on a real customer issue where they were seeing DDR4 post calibration data errors in hardware. The issue turned out to be timing related, but there was no violation in the timing report. The Methodology report was not the initial method used to pinpoint the root cause, but this blog will show you how this report would help to speed up the debug, or even to avoid the hardware failure entirely.

The root cause was determined to be a race condition upon calibration completion. The reason for the race condition was a timing exception which was covered by a multicycle constraint. As a result it was not flagged in timing analysis reports.

This is part five of the Using the Methodology Report series. For all entries in the series, see here.

Issue Explanation:

The customers encountered post calibration data errors in Hardware with UltraScale+ DDR4 IP. 

The issue was build dependent based on the routing and implementation of the design which meant that it could appear and then disappear when testing multiple build images during product development. Additionally, the issue might only manifest on a few boards out of a population.

The timing report showed no violation.

Debug Methods:

Because the issue could disappear after re-implementation, we could not use ILA debugging.

We probed signals to the unused pins using ECO in the routed DCP and observed the signals with an oscilloscope to see which signal(s) started to show errors.

Eventually, we narrowed down the issue to a particular net. After we re-routed the net in the DCP, the failure disappeared.

We then checked the timing analysis and the timing constraints on the paths related to this net:

1. Report timing on paths through the net. In this report we knew that the involved paths were covered by multicycle path constraints.

report_timing -through [get_nets <net_name>]

2. Open the Timing Constraints wizard to find the corresponding multicycle path constraints.

Tools -> Timing -> Edit Timing Constraints

We found the below multicycle path constraints in the Timing Constraints wizard:

set_multicycle_path -setup -from [get_pins */u_ddr_cal_top/calDone*/C] 8
set_multicycle_path -hold -end -from [get_pins */u_ddr_cal_top/calDone*/C] 7

Based on the above analysis, we determined that there was a race condition issue on those paths.

The multicycle path constraints should not have been added. In this use case the data should be captured correctly every clock cycle to avoid the race condition, so these are not multicycle paths.

Root cause analysis:

The following are the paths where the race condition issue occurred.

multicycle_path.png

The two destinations need to receive the calDone signal on the same cycle as both of them are closely related. They are different timing paths and each one of them might have closed timing at a different clock cycle (between 1 to 8 cycles as per the multicycle constraint). This can cause the calDone to reach the destinations at different time lines, leading to incorrect functionality.

On the other hand, the two destinations do not have CE pin control (CE pin tied to VCC). Therefore, data on the two paths is not captured on the same clock cycle, so they are not qualified multicycle paths.

This multicycle constraint violation is actually caught by the Methodology Report:

TIMING-46#1 Warning

Multicycle path with tied CE pins 

One or more multicycle paths are defined between registers u_mig/inst/u_ddr4_mem_intfc/u_ddr_cal_top/calDone_gated_reg/Q and u_example_tb/init_calib_complete_r_reg/D with a direct connection and the CE pins connected to VCC (see constraint position 6 in the Timing Constraint window in Vivado IDE). This can result in an inaccurate path requirement.

TIMING-46#2 Warning

Multicycle path with tied CE pins 

One or more multicycle paths are defined between registers u_mig/inst/u_ddr4_mem_intfc/u_ddr_cal_top/calDone_gated_reg/Q and u_mig/inst/u_ddr4_mem_intfc/u_ddr_mc/u_ddr_mc_periodic/periodic_config_gap_enable_reg/D with a direct connection and the CE pins connected to VCC (see constraint position 6 in the Timing Constraint window in Vivado IDE).  This may result in an inaccurate path requirement.

It is a good idea to check the Methodology Report at an early stage of the design flow. In examples like this one it could help you to catch and fix the multicycle violation and avoid the hardware failure. You can also run the Methodology Report first as part of the debugging process, and the warnings highlighting the violation will help to speed up the investigation.

Resolution:

  • (Xilinx Answer 73068) provides the patches that resolves this issue in releases prior to the 2020.1 version.
  • Starting from the 2020.1 release, the multicycle path constraints are removed and pipeline stages are added on the paths to ease timing closure while ensuring that all destinations are timed to the same fabric cycle.

Conclusion:

  1. Run the Methodology Report early in the design flow to catch and fix potential issues.
  2. Be careful to use multicycle constraint on paths with CE pins tied to VCC.

Source: https://forums.xilinx.com/t5/Design-and-Debug-Techniques-Blog/Using-the-Methodology-Report-Part-Five-DDR4-IP-post-calibration/ba-p/1168263

spot_img

Latest Intelligence

spot_img