Process Behavior Charts to Improve Performance

Featuring Mark Graban, Senior Advisor to KaiNexus and author of "Measures of Success." Hosted by Greg Jacobson from KaiNexus.

Watch the webinar here:

Reminder: Here is a blog post with a previous webinar and some "how to" detail. Please check that out if you're unfamiliar with Process Behavior Charts before watching this new webinar.

See the Slides:

How to Use Process Behavior Charts to Improve: Case Studies v2 from KaiNexus

Extra bonus material (cut for time)

Videos coming soon... see Mark's blog posts about using Process Behavior Charts for snapshot comparisons.

How to Use Process Behavior Charts to Improve: Case Studies (BONUS MATERIAL) from KaiNexus

Watch the bonus videos:

Is it a signal?

Mark Graban opened with a question disguised as a game show. The session's running format was "Is It a Signal?" — held up against headlines, news charts, workplace metrics, and one hospital story Mark wasn't going to name. The question is the one most organizations don't ask before they react. Did this number actually change in a meaningful way, or is it just the system doing what the system has always done?

The premise sounds technical. The consequences are operational. When organizations don't filter signal from noise, leaders spend their time asking analysts to explain fluctuations that don't have explanations, improvement teams spend their time writing fictional root causes for events that didn't really have causes, and the work of actually improving the system gets crowded out by the work of generating monthly metric narratives. Mark's framing throughout the session: when leaders react less to noise, they lead better, and when they lead better, they improve more.

The session followed an earlier KaiNexus webinar Mark had presented on Halloween 2018 ("Metrics and Statistics Don't Have to Be Scary") and a separate bonus session on how to build a Process Behavior Chart step by step. This session went heavier on case studies than methodology — running through more than a dozen examples from public news data, organizational metrics, and one anonymized hospital — to show what the practice looks like in working use. The examples are the point. The underlying methodology (Donald Wheeler's Process Behavior Chart framework, also known as XmR charts or control charts) was presented quickly so the examples could carry the weight.

Greg Jacobson, KaiNexus CEO and co-founder, hosted and added one of the session's most useful pieces of leadership reflection at the end — a story about how Mark's approach had changed his own behavior with the marketing director and what that change meant for the team.

About the presenter

Mark Graban is Senior Advisor to KaiNexus and has been with the company in various capacities since 2011. He is the author of "Measures of Success: React Less, Lead Better, Improve More," among other books, and the foreword to that book was written by Donald Wheeler — the statistician whose work on variation forms the technical foundation of the methodology Mark teaches. Mark holds a Bachelor of Science in Industrial Engineering from Northwestern University and a Master's in Mechanical Engineering and an MBA from MIT.

Mark's work focuses on the application of Lean management, continuous improvement, and statistical thinking in healthcare, manufacturing, and other complex industries. He hosts the Lean Blog Interviews podcast and writes regularly on related topics.

A quick refresher on what a Process Behavior Chart actually does

Mark opened with the methodology only briefly because he'd covered it in depth in earlier webinars. The shape is straightforward.

A Process Behavior Chart is a line chart — data points connected over time — with a calculated average line and calculated upper and lower natural process limits. The limits aren't goals. They aren't targets. They're descriptions of what the system has been doing, calculated from the inherent variation in the baseline data. The limits answer the question: given how this system has been performing, what range of values can we expect it to produce going forward?

Three signal rules tell you when something has changed. Any data point outside the upper or lower limits. Eight or more consecutive data points above or below the average. Three out of four consecutive data points closer to one of the limits than to the average. When any of those patterns appear, the system has likely changed and the moment is worth investigating. The rest of the time, the fluctuations are noise — routine variation that any stable system will produce.

The methodological point Mark returned to throughout the session: the chart tells you whether something changed. It doesn't tell you what changed. That's still the practitioner's work. Go to gemba, talk to people, examine the system, identify the cause. The chart shifts the question from "why did this happen?" — asked of every data point indiscriminately — to "what changed in our system at the moments when the chart shows something actually changed?" The narrower question is the one that produces useful answers.

White truffles and the headline industry

The first round of "Is It a Signal?" came from the Wall Street Journal. The headline: foodies rejoice, white truffle prices have sunk to lows not seen in more than a decade, less than half of last year's average.

The article delivered a column chart of annual prices going back to 2006. The numbers fluctuated substantially year to year. Mark plotted the same data as a Process Behavior Chart. None of the points were outside the calculated limits. The mR chart showed no signal either. The dramatic headline described what was, statistically, routine year-to-year variation in a commodity price.

The lesson generalizes. News headlines are not statistical analysis. A 50% drop from one year to the previous year sounds enormous, but in a commodity whose normal range spans roughly that much variation, the change is what the system has always done. Foodies can rejoice if they want to, but next year's price is most likely going to fall somewhere within the established range unless something fundamentally changes about worldwide truffle supply or demand.

Mark used the example to make a structural point: news media has different incentives than organizational metrics. The job of a headline is to attract attention. The job of an organizational metric is to support better decisions. When organizations import news-media patterns into their own metric reviews — treating every fluctuation as a story, every percentage change as meaningful — they get news-media performance: a lot of words about not very much.

A question from Sergei (an attendee from Russia) made the same point during the Q&A. News, Sergei noted, has a goal of making noise look like signal so people read more. The same pattern, when imported into organizational dashboards, becomes a problem rather than a strategy.

Webinar registrations and the Jess Orr signal

Mark used KaiNexus's own webinar registration data as the next example. The session he was presenting at had 352 registrations — the second-most of any webinar that year. By the standard descriptive language, that was a strong number. Above average. Up 26.6% from the previous month. Third consecutive webinar above average.

The Process Behavior Chart, built in KaiNexus's own software using webinar registration data going back to 2014, told a different story. The system's average was roughly 260 registrations. The lower limit was essentially zero (registrations can't go negative). The upper limit was a little over 500. The 352-registration session was above average but well within the routine range.

There had been a genuine signal in the data, though. A previous webinar with Jess Orr had drawn more than 700 registrations — clearly above the upper natural process limit. That data point was unusual enough to warrant the question of what changed. Mark's team had analyzed it (the LinkedIn promotion played a role) but hadn't fully cracked what made that particular session attract such an outsize audience.

The contrast between the 700-registration outlier and the 352-registration "above average" session is exactly the kind of distinction Process Behavior Charts make visible. Both are above average. Only one is a signal. Treating them the same way — celebrating each, asking each to be explained — would waste the energy that should be reserved for understanding the genuine outlier.

Book sales: the daily metric, the spike, the shift, and the PR firm

The session's longest worked example was Mark's own daily book sales data for "Measures of Success" — which Mark presented with deliberate self-awareness about how unlikely it would be for a book about Process Behavior Charts not to have a Process Behavior Chart of its own sales.

The chart showed a near-signal on August 4th, the official ebook launch date. That spike had an obvious explanation. Then the data fluctuated within an established range — between roughly zero and 18 books per day, around an average that Mark might not love but that was at least predictable. Then two consecutive data points appeared above the upper limit. Two days of unusually high sales.

That was a signal. The chart told Mark something had changed. The chart didn't tell him what. He investigated. The cause turned out to be a single person at a company who had read the book and recommended it to a large internal email list. The spike wasn't a sustained shift; it was an isolated event tied to a one-time recommendation. The chart had correctly identified the moment something changed, and the investigation had correctly identified the cause as a transient external event rather than a durable improvement.

The chart kept telling stories. Starting around September 4th, eight consecutive data points appeared below the previous average. That was a different kind of signal — a sustained shift downward in the underlying system. Mark recalculated to reflect the new lower average and the new upper limit. The chart now reflected what the system was actually doing rather than what it had previously been doing.

Mark made a change in October. He hired a PR firm. The hypothesis: PR work should eventually move book sales upward. The hypothesis included an important honest qualification — PR isn't a magic sales booster, and the effect of working with a PR firm wouldn't show up immediately. There's lag time. Articles take time to publish. Interviews take time to schedule and air. Mark wouldn't expect the chart to show a Rule 1 signal the day after he signed the contract. He'd expect to wait, gather data, and see whether the average shifted over time.

The weekly chart told essentially the same story with smoother variation. The five most recent data points were below the new average, which raised a question Mark couldn't yet answer: was this the start of a new shift, or was it the routine noise of a system that occasionally produces runs of below-average results? "I guess in three more weeks I might know the answer to that" was the honest framing. The chart's discipline is partly about being comfortable with not knowing yet.

Voter turnout and the danger of short baselines

The 2018 US midterm election turnout produced a wave of headlines. Voter turnout soared. Highest level in more than a century. The level was undeniably higher than recent comparable elections. The question Mark wanted to ask: was it actually a statistical signal, or was it a headline-friendly description of a fluctuation?

The Vox.com chart showed midterm voter turnout going back to 1912. The visual told a richer story than any headline could — turnout had fluctuated substantially over the decades, with a noticeable drop around World War I, another around World War II, and a sustained lower band from the 1970s onward.

Mark pulled additional historical data going back to 1790. The longer view changed the picture again. Earlier in American history, midterm voter turnout had been in the 60-70% range. The 2018 turnout of nearly 50%, which had been described as record-setting, was in fact considerably lower than the historical norm of the 19th century.

The lesson Mark drew: be careful about charts that only show recent data. The conclusions you draw from a 5-year window can be substantially different from the conclusions you'd draw from a 20-year window, which can be different again from the conclusions a 200-year window would support. Headlines that frame current performance as "highest in X years" implicitly choose the X. Choosing a different X produces a different headline. The careful version of this is to acknowledge what time horizon you're working with and what would change if you extended it.

Using 1974 to 2014 as the baseline — covering the post-Watergate stable era — Mark calculated limits of roughly 34% to 45%. The 2018 turnout of nearly 50% fell above that upper limit. It was a Rule 1 signal. Something had changed in the political system. The chart didn't tell Mark what. That was a conversation for after the webinar.

Winnipeg ER wait times and the headline cycle

A friend of Mark's in Winnipeg, Manitoba, sends him articles about regional emergency room wait times roughly every month. The headlines describe wait times as getting better, getting worse, holding steady, dropping by some percentage, rising by some other percentage — sometimes in the same week. The story changes month to month even though the underlying system is presumably the same system it was the previous month.

Mark cobbled together data from the various articles to build a run chart. Then he added the calculated average and limits. The chart showed routine fluctuation around a stable average. The headlines describing dramatic month-to-month changes were describing noise.

The Winnipeg Regional Health Authority had implemented a change in October 2017 intended to reduce waiting times. The chart showed that the change didn't appear to have moved the system. The data after October 2017 was fluctuating in the same range as before. Three consecutive data points near the upper limit in January, February, and March looked like a possible Rule 3 signal, but the seasonality of winter ER demand made the interpretation ambiguous.

The substantive point: the headlines, by being noisy themselves, made it harder for readers to know whether the system was actually changing. A reader who tracked the headlines would have a constantly shifting mental model of the ER's performance. A reader who tracked the chart would see what was actually happening — which, in this case, was a stable system that hadn't responded to the attempted countermeasure.

Mark also looked at the broader provincial data going back to 2013. The qualitative pattern was the same. Some February seasonality. Otherwise, routine variation around a stable average. Without a substantial change to the system, the wait times were going to keep fluctuating in their established range.

US traffic fatalities and the NHTSA report format

The session moved to US traffic fatality data — a topic Mark acknowledged was grim, given that it traced back to the same emergency room data the previous example had touched on. The headlines followed the same pattern: lowest in 27 years, continues to climb, rate has stabilized, trending higher, nearing last year's level. Pick a story, find a headline.

The National Highway Traffic Safety Administration's reports were instructive in a different way. NHTSA published a chart showing the percentage change year over year. The chart required mental gymnastics to interpret because the reader had to translate the percentage changes back into the underlying numbers to understand what was actually happening.

Mark's preference: just plot the actual numbers. The quarterly numbers as a run chart and Process Behavior Chart showed routine fluctuation with a possible pattern of Q1 being slightly lower than the other quarters — surprising in light of winter weather but visible in the data. The annual fatalities showed something more interesting. 2007 stood out as a relatively low year. The numbers had increased from 2007 to 2008, which raised a question worth examining: what was different about 2007 versus 2008? Smartphones and distracted driving had been increasing as factors over that exact period.

The chart raised the question. The investigation would have to answer it. Mark didn't try to settle the substantive question in the webinar. The point was that the chart made the relevant question askable in a way the NHTSA percentage-change chart didn't.

NHTSA officials, to their credit, had been honest in their commentary. Heidi King had said there was no single reason for the decline. Mark called that a perfect way of describing noise. When there isn't a signal in the data, there isn't a single cause to identify. The right response is to acknowledge that and continue working on the underlying system rather than constructing fictional explanations.

Pedestrian fatalities and the 26% / 19% trap

Mark pulled one more example from NHTSA's data — pedestrian fatalities. The agency's website reported that 26% of pedestrian fatalities occur between 6:00 p.m. and 9:00 p.m. Sounds alarming. Mark did the math out loud. People sleep about 8 hours a day. They're out being pedestrians for roughly the other 16 hours. Three out of 16 hours is 19% of the day. So 26% of fatalities occurring during 19% of the day isn't an obvious signal of disproportionate risk during that window. It might be entirely routine.

The example illustrated how easily descriptive statistics can mislead when they're not put in context. "26% of fatalities occur during 6-9 p.m." sounds like a clear safety insight. "26% of fatalities occur during 19% of the day" reframes the same number as either routine noise or a modest disproportionate risk — depending on what other factors are involved.

The Process Behavior Chart of annual pedestrian fatalities showed something more interesting. Earlier years were near or below a calculated lower limit. Recent years (2015, 2016, 2017) were above the calculated upper limit. The system appeared to have shifted upward. Drawing the chart as a system with a step shift around 2015 made the picture clearer — a stable lower average for many years, then a jump to a higher average. The chart didn't tell Mark why. The likely hypothesis (smartphones, distracted walking, distracted driving) was a hypothesis to be tested, not a conclusion the chart had reached.

Washington DC right turns on red: evaluating a countermeasure

The pedestrian safety thread led to a Washington Post article about Washington DC considering a ban on right turns on red — and the predictable controversy about whether it would actually save lives. Some experts argued the ban would improve safety. Other experts argued it might actually decrease safety by increasing risk in different ways.

Mark used the example to walk through how Process Behavior Charts help evaluate countermeasures, with hypothetical future data. If DC banned right turns on red at the end of 2017, and 2018 showed a data point well below the calculated lower limit, that would be a strong signal. The chart would say something had changed. Whether the change was caused by the right-turn ban specifically, by other safety improvements implemented at the same time, by a one-time fluctuation that happened to coincide with the policy change, or by some combination — that's the judgment call that requires understanding the system beyond what any chart can tell you.

Mark walked through alternative hypothetical scenarios. What if 2018 showed a below-average data point but not a Rule 1 signal? The single data point wouldn't be conclusive. Would the team give up on the countermeasure? Probably not — the data wasn't sufficient to make that call. What if the next several years showed a sustained shift to a lower average (a Rule 2 signal)? Then there'd be evidence of real improvement, though attribution to the right-turn ban specifically would still be a judgment about the system.

The framework matters because organizations face this question constantly. Did our improvement work? Did our PDSA cycle produce the intended effect? Without a way to distinguish signal from noise, organizations either claim victory prematurely (declaring impact from data that hasn't actually shifted) or abandon countermeasures prematurely (giving up before enough data has accumulated to evaluate them). Both errors are common. Both are expensive.

The hospital that claimed a big impact (but didn't)

Mark told one story he was careful not to name the source of. A presentation at a hospital. A team described a project. The narrative was definitive — we made a big impact on this metric, and then it ticked back up. The team had two data points before the project and two data points after.

Mark plotted the dots. The "big impact" data point was lower than the previous two. The "ticked back up" data point was within the same range as the originals. Four data points is the bare minimum for calculating a Process Behavior Chart, and the chart wouldn't be the world's most precise. But it was at least useful. The four-point chart suggested that the metric was fluctuating around an average and that neither the "big impact" nor the "uptick" was a signal.

Mark's framing of the implication: are we fooling ourselves in a way that's harmful? Are we declaring victory on improvements that didn't actually shift the system? The answer matters because the next move is different in each case. If the project really did produce a sustained shift, the work is to stabilize and spread it. If the project didn't actually shift the system, the work is to run more PDSA cycles, try different countermeasures, and keep iterating until something does shift.

The discipline of waiting for a signal before claiming impact is uncomfortable for organizations that need to report results. It's also the discipline that distinguishes real improvement from improvement theater. Mark wasn't suggesting the hospital team had been dishonest. He was suggesting they had probably been honestly misreading the data, which is more common than dishonesty and probably more damaging because it can persist for years without anyone noticing.

Lightning round and the goal-versus-signal distinction

The session's lightning round ran through several news examples quickly.

US companies expanding factories — a survey reported the largest share planning expansion in at least a decade. The Wall Street Journal's chart showed the last data point clearly above the established range. The Process Behavior Chart confirmed: a real signal. Something had changed in the underlying conditions that made companies want to expand. The chart didn't tell Mark what (tax policy, demand conditions, supply chain considerations, all of the above) but it confirmed the signal was real.

Holiday gift buying anticipated to be 18% higher than the previous year. The Wall Street Journal's chart had a misleading y-axis. The Process Behavior Chart showed routine year-to-year fluctuation. No signal. The 18% comparison sounded meaningful but didn't represent a statistical shift.

Tesla warranty costs falling sharply in Q3 — Mark used this example to discuss the gray area in baseline selection. Looking at six data points as the baseline, Q2 was higher than the others but not a Rule 1 signal. Looking at only the first four data points as the baseline, Q2 would be a signal. The practitioner has to make a judgment about what baseline reflects the relevant system. Mark wasn't advocating for choosing baselines to prove a predetermined point — he was acknowledging that the choice of baseline involves real judgment and isn't fully mechanical.

The final lightning round example was about an organization presenting on internal customer satisfaction. The team celebrated being above their goal for two consecutive quarters. The goal happened to coincide with the upper natural process limit. The team was correct that something had changed — the data was showing a real upward shift — but the evidence for the shift wasn't that the number had crossed the goal line. The evidence was that the number had crossed the upper natural process limit, which happened to be near the goal coincidentally. The distinction matters because in many organizations the goal is set arbitrarily relative to the system's actual capability. A goal that happens to be near the upper limit will be crossed occasionally by noise. A goal that's well above the upper limit will require genuine system improvement to reach. The two cases require different responses, and conflating them produces confused conversations about what the metric is actually saying.

Chunky data and the days-between trick

Mark covered one more methodological wrinkle that comes up regularly in healthcare and other low-event-count contexts. A central line-associated bloodstream infection chart with monthly counts of either one or two infections. The data was "chunky" — there wasn't enough range of variation in the monthly counts for a standard Process Behavior Chart to be useful. With an average around one and counts that vary from zero to two, almost everything looks like noise.

The alternative is to plot days between events rather than count of events per period. Every infection gets a data point. The metric becomes a duration rather than a count. The chart shows the days between consecutive infections fluctuating around an average, with the same signal rules applying.

In Mark's example, the days-between chart showed two data points where the interval between infections was much longer than the routine pattern. Those were signals. Something had changed — either the system had genuinely improved (longer intervals between infections means lower infection rate) or something specific had prevented infections during those periods that wasn't sustainable. The chart raised the question. The investigation answered it.

The technique generalizes to any rare-event situation. Time between equipment failures. Time between safety incidents. Time between customer complaints. When the count per period is too low for the count-based chart to be useful, the interval-based chart often reveals what the count-based chart can't.

Greg Jacobson's reflection: how the practice changed his behavior

The session's most useful piece of leadership reflection came at the end, when Greg shared his own experience. Before Mark introduced Process Behavior Chart thinking at KaiNexus, Greg would routinely ask the marketing director — Maggie Millard at the time — why a particular month's number was higher or lower than the previous month's. Greg's framing in the moment was charitable: he wasn't blaming, he was asking what they could learn from the variation.

What Greg came to understand, over time and with Mark's coaching, was that the questions he was asking didn't have meaningful answers. The variation from month to month was mostly noise. Maggie would spend time trying to construct explanations for fluctuations that didn't actually have causes worth identifying. The time spent constructing those explanations was time not spent doing the marketing work that would actually move the underlying system.

Greg's framing of the shift: the questions had appeared on the surface to be a leader's job — engaged, curious, learning-oriented. In reality, they were diverting resources to producing answers that didn't add value. Once Greg understood the signal-versus-noise distinction, he stopped asking those questions, which freed Maggie to do work that actually shifted the metrics he cared about.

The lesson generalizes. Leaders who ask "why did this number change?" of every fluctuation aren't being curious. They're imposing a tax on their teams. The tax is paid in time, in attention, and in the slow erosion of the team's willingness to bring honest data forward (because honest data invites questions the team can't answer in any useful way). The leaders who learn to ask the question only when the chart shows something has actually changed get more useful answers and more honest data — and they leave their teams with the time and attention to do the work that produces real improvement.

How KaiNexus connects

The Process Behavior Chart methodology is independent of any particular software. A practitioner with a notebook and a calculator can build and interpret the charts the same way Mark did on screen. Mark mentioned in earlier sessions that when he started at General Motors in 1995, operators on the shop floor — many without high school degrees — were maintaining control charts by hand.

What infrastructure does is preserve the practice at scale across many metrics, many teams, and the time horizons that real improvement work operates over.

The volume problem is the first place infrastructure earns its keep. A modern organization tracks many metrics. A hospital tracking quality, safety, and operational measures across many units produces hundreds of charts that need to be built, kept current, and made available to the people who need to read them. The administrative burden of maintaining all of those charts in spreadsheets is substantial, and the burden grows as the organization grows. Infrastructure that holds the charts alongside the operational data, recalculates as appropriate, and makes the charts available to teams as a standard capability is what makes the practice sustainable rather than heroic. The example Mark showed of the KaiNexus webinar registration chart was generated inside KaiNexus itself — the platform building Process Behavior Charts on its own operational data.

The PDSA evaluation problem is the second. The session walked through several examples of using Process Behavior Charts to evaluate whether countermeasures had actually shifted the system — the Washington DC right-turn ban hypothetical, the hospital that claimed a "big impact," Mark's own PR firm experiment with book sales. In each case, the chart was the mechanism for distinguishing real improvement from improvement theater. For that mechanism to work operationally, the chart has to live next to the work that produced the change. The improvement project gets documented somewhere; the metric the project was supposed to move gets tracked somewhere; the connection between the project and the metric has to be visible enough that the question "did this work?" can actually be asked and answered. Infrastructure that holds both the improvement work and the operational metrics in a single system makes that connection visible.

The cultural shift Mark and Greg were both pointing toward is the third. Moving an organization away from monthly-fluctuation reactivity toward signal-versus-noise statistical thinking is a leadership change before it's a technical one. The shift is supported when the displays leaders see are designed to support the better thinking. A dashboard that only shows current value against target reinforces the reactive pattern. A dashboard that shows the value in the context of the system's actual behavior over time — average, limits, signal indicators — supports the better pattern. The display itself becomes a coaching tool for leaders learning to react less and lead better.

The spread problem is the fourth. A single skilled CI practitioner can build Process Behavior Charts for their own team. The practitioner can't easily make the same capability available to twenty other teams, each of whom would need to build their own spreadsheets, maintain their own data, and develop their own interpretive discipline. Infrastructure that provides the methodology as a standard capability lets teams without their own CI practitioner participate in the practice that more advanced teams have already adopted. The methodology stops being something that only the analytically sophisticated teams use and becomes something that's available to everyone.

None of this changes what Mark was teaching. The methodology is the methodology. Donald Wheeler's "Understanding Variation" is the foundational text. Mark's "Measures of Success" is the worthwhile complement that connects the statistical method to broader Lean management practice. The signal rules are the signal rules. What infrastructure does is preserve the integrity of the practice when the practice is being applied across an organization rather than by a single skilled individual.

See KaiNexus in action →

Frequently asked questions

What is a Process Behavior Chart? A line chart of a metric over time with a calculated average and calculated upper and lower natural process limits. The limits describe the range within which a stable system will routinely fluctuate. Data points outside the limits, or specific patterns within them, indicate that the system has likely changed and is worth investigating. The chart is also called a control chart or an XmR chart. Mark prefers "Process Behavior Chart" because the work isn't about controlling the process — it's about understanding the process's behavior.

What's the difference between signal and noise? Noise is the routine variation a stable system produces from period to period. Even a perfectly stable system generates results that fluctuate. Signal is a change in the system itself — a data point outside the natural process limits, a sustained shift in the average, or a pattern indicating the system is no longer behaving as it was. Process Behavior Charts filter out roughly 99% of routine noise, so the signals they identify are very likely to be genuine.

What are the three rules for finding a signal? Rule 1: any data point above the upper natural process limit or below the lower limit. Rule 2: eight or more consecutive data points above the average, or eight or more consecutive data points below the average. Rule 3: three out of four consecutive data points closer to one of the limits than to the average. Rule 1 also applies to the moving range (mR) chart. There are additional rules — the Western Electric rules and the Nelson rules — that some practitioners use, but Mark has scaled back the number of rules he uses over time because adding rules adds complexity and increases the risk of false signals.

How many baseline data points do you need? Donald Wheeler's recommendation is 15 to 20 data points for a robust baseline. The methodology can work with as few as four or six if that's all you have, with the understanding that the calculated limits will be less precise. As more data accumulates, the limits can be recalculated until you reach roughly 15 data points, at which point you typically stop recalculating and use the established limits to detect future signals.

Should you keep recalculating the limits as new data comes in? No. After roughly 15 data points, the limits are stable enough that you should leave them in place and use them as the basis for detecting future signals. Continuously recalculating turns the chart into a moving average, which defeats the purpose. The methodology depends on having a fixed baseline against which subsequent data can be evaluated. The exception is when a genuine sustained shift occurs (a Rule 2 signal indicating the system has changed) — at that point, you'd recalculate to reflect the new system and use the new limits going forward.

How do you evaluate whether a PDSA cycle worked using Process Behavior Charts? The chart tells you whether the system has shifted in response to the change. A Rule 1 signal after the change indicates a strong shift. A Rule 2 signal (eight or more consecutive data points on one side of the previous average) indicates a sustained shift. The judgment about whether the shift was caused by the specific change you made (versus other factors that happened to coincide) is still a judgment about the system — the chart confirms that something changed, but it doesn't attribute the change to a specific cause. The Washington DC right-turn ban example walked through how to think about this attribution question with hypothetical data.

What if you don't have enough data points after a change to evaluate the PDSA cycle quickly? One option is to use a more frequent metric — daily instead of weekly, or weekly instead of monthly. More frequent measurements give you more data points and let you detect signals more quickly. The trade-off is that daily metrics generally have wider variation than weekly metrics, so the calculated limits will be wider. The methodology handles that — wider limits for noisier metrics, narrower limits for less variable metrics — so the underlying logic is the same. You get faster signal detection with more frequent measurement.

What's "chunky data" and how do you handle it? Chunky data is data where the values fall in a narrow range and the count per period is so low that the standard chart isn't useful. Mark's example was central line-associated bloodstream infections where the monthly count was either one or two. With that little variation, almost everything looks like noise. The workaround is to plot the days between events rather than counts per period. Every event gets a data point. The metric becomes a duration. The chart shows time between events fluctuating around an average, with the same signal rules applying. Long intervals between events appear as signals (the system has improved); short intervals also appear as signals (the system has gotten worse).

What about gray areas in choosing a baseline? The Tesla warranty cost example illustrated this. With six data points as the baseline, Q2 wasn't a signal. With only the first four data points as the baseline, Q2 would have been a signal. The choice of baseline involves real judgment. Mark's guidance: don't choose baselines to prove a predetermined point, but do think carefully about what period represents the relevant system. If you know the system was genuinely different during certain periods (a major change happened, the business context shifted), it can be appropriate to exclude that period from the baseline or treat the system as having multiple regimes.

Why does Mark say processes with significant variation are harder to evaluate but the methodology still works? Because the calculated limits incorporate the variation in the data. A noisy system will have wide limits. A stable system will have narrow limits. A 33% change in a metric that normally fluctuates 30-40% per period might be entirely within the limits — pure noise. A 33% change in a metric that normally fluctuates plus or minus 2% would be far outside the limits — a strong signal. Two-data-point percentage comparisons can't distinguish these cases. Process Behavior Charts can.

What's the difference between a signal and a number being above the goal? The signal is statistical — has the system changed? The goal is managerial — what does the organization want the system to do? The two questions are different. A goal can be set anywhere relative to the system's actual capability, and crossing the goal isn't necessarily evidence that the system has changed. The internal customer satisfaction example walked through a case where the team correctly identified a real improvement (the chart showed a signal) but had incorrectly framed the evidence (they were celebrating crossing the goal line, which happened to coincide with the upper natural process limit by accident).

How do you introduce Process Behavior Charts to a leadership team used to red/green dashboards? Mark's recommendation in this session and elsewhere: start with run charts. A run chart is just the data plotted over time, with no calculated limits — a smaller cognitive step from red/green than a full Process Behavior Chart. Once leaders are comfortable looking at patterns over time rather than at single-point comparisons, you can add the average line, then the limits, then the signal rules. The shift is a change-management process more than a technical one. The leaders who have been trained to react to every red number don't adopt the new approach immediately, but consistent exposure to the better view tends to make the case more persuasively than any single training session can.

Are linear trend lines on charts a good idea? Usually not. Linear trend lines are easy to add in Excel and look authoritative, but they're sensitive to the choice of endpoints and the range of data included. The same data can produce a "trending up" or "trending down" line depending on which points are included. Process Behavior Charts use signal rules that don't depend on fitted lines — they identify genuine shifts based on statistical patterns rather than on a line of best fit. There's a more sophisticated version of the methodology that uses a sloped center line when the system genuinely has an underlying trend, but the casually-added Excel trend line is rarely the right tool.

Does the methodology become obsolete with newer analytics and AI? Mark's view: no. The foundational work (Walter Shewhart in the 1920s, refined into Process Behavior Charts in the 1940s) has been well-validated for nearly a century. The methods don't depend on the technology used to display them. Whether the chart is drawn by hand, calculated in Excel, generated by a sophisticated platform, or surfaced by an AI agent, the underlying methodology is the same. The question isn't whether to replace the methodology with newer technology but whether the new technology preserves the methodology's discipline.

What's the connection between Process Behavior Charts and Lean daily management? A natural one. Daily management is about leaders engaging with operational reality at the work — visiting the gemba, reviewing huddle boards, examining metrics with the teams that produce them. The Process Behavior Chart is the right form for the metrics that get reviewed in daily management because it supports the right conversation — not "explain this fluctuation" but "is this a signal that something changed, and if so, what?" Mark closed the session by encouraging practitioners to incorporate Process Behavior Charts into daily management and strategy deployment at all the levels where metrics get reviewed.

What resources does Mark recommend for someone starting out? Donald Wheeler's "Understanding Variation: The Key to Managing Chaos" is the foundational practitioner text. Mark's "Measures of Success: React Less, Lead Better, Improve More" builds on Wheeler's work with additional examples and connections to Lean management practice. The earlier KaiNexus webinars Mark referenced — the Halloween 2018 "Metrics and Statistics Don't Have to Be Scary" session and the bonus session on building a Process Behavior Chart step by step — are available in the KaiNexus webinar library. Wheeler's articles on specific topics (the choice of mR-bar versus standard deviation, the handling of different data types) are useful references for practitioners who want to go deeper.