Manage Improvement Metrics -- Mark Graban on SPC

A KaiNexus webinar with Mark Graban, hosted by Jeff Roussel

Watch the webinar here:

And here is the "bonus content" about how to create control charts:

See the Slides:

How to Manage Your Improvement Metrics More Efficiently and Effectively from KaiNexus

Bonus Content: How to Create a Control Chart (a.k.a. Process Behavior Chart) from KaiNexus

Most organizations measure performance. Far fewer measure it well. The most common failure modes look like best practices on the surface -- dashboards on the wall, red and green bowling charts in monthly reports, linear trend lines in PowerPoint, daily huddle boards comparing yesterday's numbers to a goal. All of these are visible improvement infrastructure. None of them reliably answer the question the measurement is supposed to answer in the first place.

The question is: are we improving?

The methods that produce confident answers to that question are not new. They go back to Walter Shewhart at Bell Labs in the 1920s, W. Edwards Deming's work in postwar Japan, and decades of refinement since. They sit in plain sight in books like Donald Wheeler's Understanding Variation: The Key to Managing Chaos, which Mark cites in this session as one of the most influential books on his shelf -- not a Lean book, not a Deming book, but the practical management book about variation that connects everything else.

This webinar is the most direct treatment in the KaiNexus catalog of how to use statistical process control methods to manage performance data in daily improvement work. It is also one of the most operationally useful sessions for leaders who have inherited dashboards they don't trust, huddle boards that demoralize their teams, and metrics that feel busy without producing change.

About the presenter

Mark Graban is an internationally recognized author, speaker, and advisor on Lean management, particularly in healthcare, manufacturing, and service industries. At the time of this webinar, he was VP of Improvement and Innovation Services at KaiNexus. His books include Lean Hospitals, Healthcare Kaizen, The Executive Guide to Healthcare Kaizen, Practicing Lean, and Measures of Success: React Less, Lead Better, Improve More, which expands the material covered in this session into a full-length treatment. He writes regularly at LeanBlog.org and hosts multiple podcasts on Lean and continuous improvement.

This session was hosted by Jeff Roussel, then VP of Sales at KaiNexus.

Why most measurement fails

The session opens with a diagnostic. Most organizations practicing daily Lean management have boards, huddles, and metrics on display. The infrastructure is right. What's commonly wrong is how the data on those boards gets interpreted.

The simplest version of the problem shows up in what Mark calls bowling charts -- monthly performance grids with red and green cells indicating whether each metric hit its target that month. The name comes from the visual resemblance to a bowling scorecard. Monthly cadence is too slow to support real improvement work, and the binary red-green frame asks only one question (did we hit the goal?) when at least two questions matter (did we hit the goal, and are we actually improving?).

The two questions are different. An organization can be improving but not yet hitting its goal. An organization can be exceeding its goal while its underlying performance is stable or eroding. A red-green frame conflates these scenarios.

The same diagnostic applies to dense tables of numbers labeled as dashboards. Mark uses the comparison to a car dashboard. A car dashboard is limited information for immediate decisions -- speed, fuel level, maybe oil temperature when something is wrong. Many executive dashboards are closer to a plane's cockpit, overwhelming the user with information rather than focusing on what matters. A useful dashboard is the opposite of comprehensive. It surfaces the few things that actually drive decisions.

Run charts -- simple line charts showing data over time -- are the minimum viable visualization for performance data. They make trends visible in a way that tables of numbers and red-green grids don't. They're also the starting point for the more rigorous analysis the rest of the session is about.

The patient satisfaction trap

Mark uses a worked example to illustrate how measurement can mislead. A consulting case study he encountered claimed that an intervention had increased average patient satisfaction from 87.2% to 89%. The case study presented this as a definitive improvement. A column chart with a y-axis truncated to make the bars visually dramatic reinforced the conclusion.

The problem: when the same data was shown across a longer timeframe with a full y-axis, the "improvement" looked very different. The 89% figure was the highest point on the chart. The 87.2% figure was the average across the baseline period. The comparison was between an average and a peak. If the analyst had cherry-picked a different pair of points -- say, comparing a baseline peak to a follow-up trough -- the same data would have supported the opposite conclusion (that satisfaction had decreased).

The deeper issue isn't dishonesty. It's that point-to-point comparisons of metrics are fundamentally unreliable when the underlying process has natural variation. The same data, sliced different ways, supports contradictory conclusions. Without a method for distinguishing meaningful change from natural fluctuation, leaders end up making decisions based on whichever slice was presented to them.

Linear trend lines are not what they appear

The second worked example takes on a habit even more widespread than misleading point-to-point comparisons. Excel makes it trivially easy to add a linear trend line to a chart. Right-click, select "add trendline," and the chart now suggests an unambiguous direction.

Mark demonstrates the failure mode by taking a single dataset of patient satisfaction scores and drawing a linear trend line. The line slopes upward. Things are improving.

Then he removes the first and last data points -- not a fundamental change to the dataset -- and redraws the trend line. The line now slopes downward. Things are getting worse.

The same data. Two opposite conclusions. The mathematical mechanism is straightforward: linear regression is sensitive to the endpoints of the data range, and small changes at the boundaries can flip the slope. But the implication for management practice is substantial. Decisions made on the basis of linear trend lines drawn over arbitrary date ranges are often decisions made on the basis of noise, framed as signal.

The pep talk and the kick in the butt

The session's most operationally useful diagnostic is about how leaders behave in response to fluctuating performance data without a method to interpret it.

Mark walks through a scenario. The team's production starts in the mid-30s. The goal is 42. On day two, production climbs into the high 30s. The leader praises the team. On day three, production drops. The leader gives a pep talk and a verbal kick in the butt. On day four, production climbs above goal. Praise. On day five, it drops. Another kick. And so on.

Over time, leaders in this pattern start to develop a model that praise causes performance to drop and criticism causes performance to improve. The model is wrong. Both effects are produced by regression to the mean within a stable system. But the leader's behavior is being calibrated by what looks like cause and effect, and the calibration is producing more pep talks and more kicks without producing any change in the underlying performance.

The cost isn't just demoralization, though that's substantial. The cost is that leadership attention is being consumed by reacting to natural fluctuations rather than working on the systemic changes that would actually shift performance. The team member at the lab Mark coached put it most clearly: "When we have a good day they say way to go, and when we have a bad day they say don't worry, it's the system. Isn't it always the system?"

It almost always is.

Control charts: separating signal from noise

The methodological core of the session is control charts. The premise comes directly from Wheeler: every dataset contains noise; some datasets also contain signals; before you can detect a signal, you have to filter out the noise. Control charts are the filter.

A control chart shows the data over time with three calculated reference lines:

The center line is the mean of the data during a baseline period.

The upper control limit is the mean plus three standard deviations of the data's natural variation, calculated using the moving range between consecutive points.

The lower control limit is the mean minus three standard deviations.

Data points that fall within the control limits represent the system performing as it normally does. Statistical theory predicts that future data points will almost always fall within these limits as long as the system itself doesn't change. Data points that fall outside the limits represent something genuinely different -- a signal that warrants investigation.

The specific control chart Mark uses throughout the session is the individuals control chart, sometimes called X-MR (X for the individual data points, MR for the moving range chart that accompanies it). The method is described in detail in Wheeler's book and is the most practical chart for the kinds of performance metrics most organizations track. The session's companion bonus video walks through the calculation step by step.

The mathematical detail is less important than the management implication. With control limits drawn, the leader has a defensible answer to the daily question of whether to react. If today's data point falls within the limits, the variation is part of the system. The right response is not to investigate that data point. If today's data point falls outside the limits, something has changed, and investigation is warranted.

The Western Electric rules

Single points outside the control limits aren't the only signals worth investigating. The Western Electric rules name additional patterns that are statistically unlikely in a stable system and therefore probably signals.

The most useful rules:

A single data point outside the three-sigma control limits is a signal. Investigate.

Eight consecutive data points above the mean (or eight below the mean) is a signal. The probability of this happening by chance in a stable system is very low. Something has shifted.

Six consecutive points trending upward (or six trending downward) is a signal. Something is producing directional change.

Fourteen consecutive points alternating up and down is a signal. Someone or something is over-adjusting the process and producing artificial oscillation.

The trade-off underlying these rules is between false positives and missed signals. The three-sigma limits are calibrated to produce roughly one false positive in every 400 data points -- conservative enough that practitioners can trust signals when they appear. The rules together let leaders detect meaningful change without chasing the everyday noise that consumes attention in red-green dashboard environments.

Testing improvement hypotheses

The session's other major use of control charts is testing whether a specific change actually improved performance.

Mark walks through a hospital lab turnaround time example. Baseline control limits sit between five and 32 minutes. The team makes a change to the process. The first data point after the change is higher than the baseline -- but inside the control limits, so it's not a signal. A linear-trend-line mindset would conclude the intervention failed.

Over the next several days, the data points trend downward. Eventually, there's a run of more than eight consecutive points below the baseline mean. By the Western Electric rules, that's a signal. The intervention worked. A new baseline can be calculated with a lower mean and (in this case) narrower control limits.

The discipline matters. Without control charts, leaders judge interventions on the basis of the first one or two data points after the change. The judgment is unreliable. Some real improvements look like failures in the first days. Some apparent improvements are just noise that will regress to baseline. Waiting for a statistically meaningful run of evidence -- the run-of-eight or some other Western Electric rule -- produces judgments that hold up over time.

The goal line problem

The session closes with the question of how to think about goals when you're using control charts.

Goals matter. The example Mark uses is door-to-balloon time for heart attack patients arriving at an emergency department. The 45-minute goal is grounded in clinical evidence about patient outcomes. Hitting that goal isn't a management aspiration. It's a survival factor.

The problem isn't goals themselves. It's the relationship between the goal and the system's current capability. Two distinct situations call for distinct responses:

The system is hitting its goal on average. The control limits are below the goal. Most data points fall below the goal. The system is capable. The improvement question is whether to tighten the variation, raise the goal, or focus attention elsewhere.

The system is not hitting its goal on average. The control limits sit above or straddle the goal. Many or most data points exceed the goal even though the system itself is stable. The team is doing their best within the system, and the system can't reliably meet the target. The improvement question is what about the system needs to change -- physical layout, batching, staffing, technology, training -- not why the team failed to hit the goal on any particular day.

Mark uses a chart from an organization that had set a goal nearly equal to the system's average. Roughly half their data points were red (below goal) and half were green (at or above goal). The team was being praised and criticized in a roughly random pattern that had nothing to do with their actual performance. The system was stable. The goal was poorly calibrated relative to the system's capability. Either improve the system or recalibrate the goal -- but stop holding people accountable for natural variation around the system's mean.

A second example from the same organization: a process whose entire data range sat below the goal line, every data point shown in red. The team had quietly added yellow to the chart "because it was getting demoralizing that we were always in the red." The added color didn't change anything about the system. The fundamental issue was that the goal had been set without regard to what the system was actually capable of producing.

The lesson generalizes. Goals are useful when they're tied to outcomes that matter and calibrated against systems capable of achieving them. They're harmful when they generate constant red on dashboards for processes that need systemic change rather than performance pressure on the people running them.

Common cause and special cause

The framing that ties the session together comes from Deming's distinction between common cause and special cause variation.

Common cause variation is the routine fluctuation produced by a stable system. It can't be explained by any specific event. The variation is the system doing what the system does. The right response is either to leave it alone (if performance is acceptable) or to change the system (if it isn't). Asking what went wrong on a particular day with common cause variation is asking the wrong question -- there's no useful answer.

Special cause variation is variation that exceeds what the system typically produces. Something has changed. The right response is to investigate that specific event and either fix what caused a degradation or capture what caused an improvement.

The two types of variation require completely different management responses. Confusing them is one of the most costly mistakes in performance management. Treating common cause variation as special cause produces wild goose chases, demoralization, and inadvertent process tampering. Treating special cause variation as common cause means missing the signals that would let the organization learn from what changed.

Mark's framing on what to do when control charts show common cause variation that isn't meeting the goal: don't pressure people to try harder within the existing system. Study the system, form a hypothesis about what to change, test the hypothesis with a structured improvement effort, and use the control chart to verify whether the change actually shifted performance. The methodology is the same one that A3 thinking and DMAIC produce -- root cause analysis, countermeasure design, evaluation against measurable targets -- with control charts providing the statistical infrastructure for the evaluation step.

How KaiNexus connects

KaiNexus customers track performance measures within the platform alongside their improvement work. The same statistical methods Mark describes apply whether the data lives on paper, in Excel, in statistical software, or in the platform.

The advantage of platform-based metric tracking is the connection between the metric and the improvement work intended to change it. A control chart on a paper huddle board can show that a system is performing as expected. It can't show what the team is doing about it. A control chart connected to active improvement projects shows both -- the current state of the system and the work being done to change it. That connection is what makes the methodology operational rather than purely analytical.

Several specific platform capabilities support the methods in this session:

Performance metrics can be tracked within the system and displayed in time-series formats that support run chart and control chart interpretation. Teams can update metrics in the context of huddles, A3 work, or daily management practices rather than as a separate reporting exercise.

The connection between metrics and improvement work means that when a control chart shows a stable system that isn't meeting its goal, the platform supports launching the structured improvement work the situation actually calls for -- A3 projects, kaizen events, or PDCA cycles aimed at the systemic changes that would shift the mean.

When an improvement project completes and the team is testing whether the change produced the expected shift, the platform holds both the before-state metrics and the after-state metrics in a place where the evaluation can use the kind of run-of-eight analysis Mark describes. The discipline of distinguishing real shifts from noise becomes part of how improvements get closed out, rather than an afterthought.

For organizations practicing daily management with tiered huddles, the platform supports the visibility of metrics across multiple levels of the organization. Frontline teams see their own metrics. Department leaders see aggregated metrics across their teams. Senior leaders see organization-wide performance. The same statistical discipline applies at every level: react to signals, not to noise; investigate special causes, work on the system to address common cause.

See KaiNexus in action →

Frequently Asked Questions

What is statistical process control?

A method for distinguishing routine variation in a process from genuinely meaningful changes. Developed by Walter Shewhart at Bell Labs in the 1920s and extended by W. Edwards Deming and Donald Wheeler, SPC uses control charts to visualize data over time alongside statistically calculated control limits. Data points within the limits represent the system performing as it normally does. Data points outside the limits represent signals worth investigating. The method gives leaders a defensible way to decide when to react to a number and when to leave it alone.

What is a control chart?

A time-series chart showing data points over time along with three calculated reference lines: the mean of the baseline data, an upper control limit at three standard deviations above the mean, and a lower control limit at three standard deviations below the mean. Standard deviations are calculated using the moving range between consecutive data points. The individuals control chart (sometimes called X-MR) is the most common variant for the kinds of performance metrics most organizations track. The chart is also called a process behavior chart, the term Donald Wheeler prefers because it describes what the chart actually shows.

Why are linear trend lines misleading?

Linear regression is sensitive to the endpoints of the data range. Small changes at the beginning or end of the time period being analyzed can flip the slope of the trend line. Mark demonstrates this in the session with patient satisfaction data: the same dataset shows an upward trend with the original endpoints and a downward trend when the first and last points are removed. The trend line gives an unambiguous-looking visualization of a relationship that may not actually exist in the data. Control charts avoid this failure mode by analyzing the data against statistically calculated reference lines rather than fitting a line to whatever range happened to be selected.

What's the difference between common cause and special cause variation?

Common cause variation is the routine fluctuation produced by a stable system. It can't be explained by any specific event because no specific event caused it -- the system is just doing what it does. Special cause variation is variation that exceeds what the system typically produces, indicating that something has changed. The two types require completely different management responses. Treating common cause as special cause produces wild goose chases and demoralization. Treating special cause as common cause means missing the signals that would let the organization learn from what changed.

What are the Western Electric rules?

A set of rules for identifying signals in a control chart beyond the basic rule of single points outside the control limits. The most useful are: a single point outside the three-sigma control limits, eight consecutive points above or below the mean, six consecutive points trending in one direction, and fourteen consecutive points alternating up and down. The rules are named for the Western Electric Company, which formalized them in their statistical quality control work in the mid-twentieth century. Each rule identifies a pattern that is statistically unlikely in a stable system and therefore probably signals something has changed.

What is the difference between hitting a goal and actually improving?

They're separate questions. An organization can be improving but not yet hitting its goal (the trajectory is right, the destination hasn't been reached). An organization can be exceeding its goal while its underlying performance is stable or eroding (the goal was set lower than the system's capability, or recent good results are masking a real decline). Red-green dashboards collapse both questions into the single binary of goal achievement, which loses important information. Control charts answer the improvement question by distinguishing signals from noise. Goal achievement is a separate question that should be evaluated in light of whether the system is capable of the goal.

What should I do if my system is stable but not hitting its goal?

Don't pressure people to try harder within the existing system. Stable performance that doesn't meet the goal is a sign that the system itself needs to change. Investigate the system. Do current-state analysis through process mapping or value stream mapping. Form a hypothesis about what part of the system to change. Test the change using a structured improvement effort -- A3 thinking, kaizen events, or PDCA cycles -- and evaluate the result with a control chart. The combination of asking what about the system needs to change (rather than what about the people needs to change) and verifying real improvements through statistical analysis is what distinguishes effective Lean management from performance pressure.

Do I need Six Sigma certification to use control charts?

No. Mark notes in the session that he is not a Six Sigma Black Belt and that the methods in Donald Wheeler's Understanding Variation are accessible to anyone willing to learn arithmetic-level math. Control limits can be calculated in Excel or by hand. Many of the production associates at General Motors twenty-plus years ago were using control charts on the manufacturing floor without statistical training, and the same approach is available to anyone tracking performance metrics today. The bonus video accompanying this session walks through the calculation step by step.

How do I track 5S progress with metrics?

Some organizations track 5S audit scores directly and put them in control charts. That works as a way of monitoring whether the 5S program itself is stable or improving. The deeper question is whether 5S is producing the operational outcomes it's supposed to enable -- safety improvements, quality improvements, on-time delivery, turnaround times. Companies don't directly make money because their 5S is better. They make money because the operational metrics 5S was supposed to enable have improved. Tracking the downstream metrics using control charts lets you test the hypothesis that 5S work is actually changing the operational outcomes that matter.

What's the relationship between control charts and confidence intervals?

Both are statistical methods for distinguishing meaningful differences from random variation. Confidence intervals are typically used in significance testing -- comparing two specific samples or testing a specific hypothesis. Control charts are continuous monitoring tools that flag when a process has changed without requiring you to specify in advance what change you're looking for. Mark notes that both can be useful in different contexts. For ongoing performance management and improvement work, the control chart approach is generally more practical because it surfaces changes as they happen rather than requiring a structured statistical test for every question.

What book do you recommend for going deeper?

Mark recommends Donald Wheeler's Understanding Variation: The Key to Managing Chaos. The book is widely cited as one of the most accessible practical treatments of statistical thinking for managers. Wheeler holds a PhD in statistics but writes for a management audience without assuming statistical background. Mark also notes his own book, Measures of Success: React Less, Lead Better, Improve More, which expands the material in this session into a full-length treatment with extensive case studies.

See KaiNexus in action →

How to Manage Your Improvement Metrics More Efficiently and Effectively

A KaiNexus webinar with Mark Graban, hosted by Jeff Roussel

Watch the webinar here:

See the Slides:

About the presenter

Why most measurement fails

The patient satisfaction trap

Linear trend lines are not what they appear

The pep talk and the kick in the butt

Control charts: separating signal from noise

The Western Electric rules

Testing improvement hypotheses

The goal line problem

Common cause and special cause

How KaiNexus connects

Frequently Asked Questions

Bonus Offer:

Company

Product

Solutions

Industries

Resources

How to Manage Your Improvement Metrics More Efficiently and Effectively

A KaiNexus webinar with Mark Graban, hosted by Jeff Roussel

Watch the webinar here:

See the Slides:

About the presenter

Why most measurement fails

The patient satisfaction trap

Linear trend lines are not what they appear

The pep talk and the kick in the butt

Control charts: separating signal from noise

The Western Electric rules

Testing improvement hypotheses

The goal line problem

Common cause and special cause

How KaiNexus connects

Frequently Asked Questions

Bonus Offer:

hbspt.cta._relativeUrls=true;hbspt.cta.load(326641, 'a7dd2483-6f5d-404b-9283-74b174363271', {"useNewLoader":"true","region":"na1"});

Company

Product

Solutions

Industries

Resources