Tips on Controlling Clock Skew

Tiny differences in propagation delay, when compounded across all the clock nets in a complex digital product, often lead to unacceptable degradations in overall system-timing margins. This generic problem is often referred to as the "clock skew" problem. Recently, our ability to manage and control these tiny differences has been improved by the introduction of a new generation of multi-output, low-skew clock drivers. The latest examples of these new parts will typically have one common clock input and eight or more ganged outputs. The multi-output parts are intended for use in a "spider" or "tree" clock distribution topology, where each output feeds only a single load (or a group of loads located at the end of a long trace). Each trace is used in point-to-point fashion. One working assumption of this method is that the delays of all the clock traces will be balanced.

Uncompensated versions of the new, low-skew clock driver parts commonly guarantee worst-case skew between all outputs of no more than 500 ps. More sophisticated versions of these same parts, with active skew correction circuitry included, also are available. The active skew correction circuitry can shave the worst-case output skew specification down to as little as 50 ps. Both styles are a boon to designers of high-speed digital circuitry, and I welcome their development.

Regarding the use of low-skew clock drivers, I would like to make two points.

  • Point 1: The skew specification between outputs has nothing to do with the variation in delay from input to output. Parts with a 500-ps output skew specification often will have a variability as large as 1000 to 2000 ps from input to ouput. In applications that use only one clock driver chip, this subtle point is of no consequence. Applications that require multi-level clock trees are different. In a multi-level clock tree, we need to control the worst case skew between any of the leaf nodes, even if those leaf nodes are sourced by different driver chips. Whenever the path between two leaf nodes traverses a driver input, the input-to-output skew specification for that driver enters the overall skew equation. Good multi-level trees need drivers with low input-to-output skew. Some clock drivers use PLL technology to actually advance their output timing to the point where it closely matches the input timing, thus guaranteeing good input-to-output skew performance. A pyramid of these so-called zero-delay clock repeaters may be cascaded to form large clock trees with very low leaf-to-leaf skew.
  • Point 2: Our applications demand low skew between clock signals as received. Our low-skew clock drivers give us low skew between the outputs as transmitted. To obtain the former from the latter, we must do more than simply provide equal length traces on all clock nets. We must use traces of equal delay (remember that outer layer, or microstrip, traces go a little faster than inner layer, or stripline, traces), we must use the same termination strategy on each trace, and we must place the same loads at the end of each line. To the extent that we have achieved these three objectives, the trace delays will be properly balanced.

Figure 1 indicates how trace delay can vary with termination type and capacitive load, even for short traces. This figure charts the actual line delay versus line length for various combinations of termination style and capacitive loading. In the figure, the assumed signal risetime is 3.00 ns, the assumed trace impedance is 75 Ω, and the assumed trace delay is 180 ps/inch (FR-4 stripline at 25°C).

Trace delay versus trace length

Figure 1—Trace delay versus trace length, for various termination types and values of loading.

Results for a series-terminated line are shown in red. With zero loading, the lowest red line shows an ideal delay of 180 ps/inch (540 ps at 3 inches). As each increment of 10 pF is added to the line, the delay goes up by anywhere from 500 to 750 ps (this is approximately R×C, where R=75 Ω and C is the load capacitance). In this example the extra delay introduced by loading is quite large compared to the driver skew specification, but we get a nicely damped response with almost no overshoot or ringing.

Results for a short, unterminated line are shown with blue dotted lines. Note that these are non-linear curves. This non-linearity renders most rule-of-thumb delay equations quite useless. The non-linearity derives from minor amounts of overshoot or undershoot at the end of the unterminated line, which shift the precise time at which the output signal crosses the clock threshold. This non-linear effect dominates the delay performance of short unterminated lines. (In this example we have confined the line length to no more than 3 in., at which point ringing is still controlled to an acceptable degree.)

As a general rule, shorter lines show less variation with capacitive loading than longer lines. That is because a capacitive load, if located near the driver, is directly charged-up by the rather low source impedance of the driver. When removed from the driver, even if only by a couple of inches, the interposing series inductance of the trace charges the capacitor more slowly, thus producing a somewhat slower risetime and a correspondingly longer effective trace delay

The ringing effects cause unexpected results. For example, an unterminated line with no load shows almost no variation in line delay with length because, as the line gets longer, the signal overshoot increases. The increasing overshoot actually advances the clock threshold crossing almost enough to compensate for the increase in line length. As shown in figure 1, the effective delay of an unloaded 3-in. unterminated trace (with 3-ns risetime logic) is only 100 ps, not 540 ps as you would expect from (3 in.)×(180 ps/in). Even minor amounts of overshoot cause this effect. The overshoot at the end of the 3-in. line in this example is only 20%.

If you are serious about controlling clock skew, you will achieve the best results by carefully balancing your clock distribution tree. Pay close attention to the specifications for input-to-output delay on the drivers. Use the same drivers at every level of the clock hierarchy. Balance the nominal trace delays at each level. Use the same termination strategy on each line. Balance the loading on each line, even if you have to add dummy capacitors to one branch to balance out loads on the other branches. Lastly, check your results with a high-quality probe and high-bandwidth oscilloscope.

Tight control of clock skew can be accomplished only with a complete awareness of all the relevant circuit parameters. Merely balancing the trace lengths is not enough.