Structured Problem-Solving at AGCO -- Chad Westbrook

A KaiNexus webinar with Chad Westbrook of AGCO, hosted by Mark Graban

Listen to it as a podcast here:

Watch the webinar here:

Check out the slides:

A Structured Approach to Problem-Solving from KaiNexus

Most problems get solved twice in most organizations. The first round produces a quick fix, often delivered under time pressure, often based on assumptions about what's wrong, and often successful enough to make the immediate symptom go away. The second round happens weeks or months later when the same problem returns, because the first fix addressed the symptom rather than the cause.

Chad Westbrook is direct about how this used to play out at AGCO's Hesston, Kansas facility. The team was focused on containment and speed. Problems got resolved fast enough to keep product moving out the door. But the same issues kept coming back. The fixes were covering up the issues rather than actually resolving them, and the same root causes kept producing the same symptoms in slightly different forms.

This session walks through the methodology AGCO developed to make that pattern stop. The approach is structured enough to be repeatable across a thousand-employee facility and rigorous enough to prevent the most common failure modes -- jumping to solutions, drawing conclusions without confirmation, defining the problem too broadly to act on, and treating the fishbone diagram as an output rather than an investigation tool.

What makes the session distinctive is not the individual tools, which are familiar to most Lean practitioners. It's the discipline that connects them, particularly the insistence on getting from an initial phenomenon to a revised phenomenon before any cause analysis, the use of the 5G process to ground that work in direct observation, and the confirmation step that catches the most common error -- declaring a root cause without verifying it actually produces the symptom.

Chad Westbrook serves as Manufacturing Engineering Manager and AGCO Production System Manager at AGCO Corporation. He's based at the company's Hesston, Kansas facility, which builds hay equipment, combines, and combine headers under the Massey Ferguson, Challenger, Gleaner, and Fendt brands. The site has approximately 1,000 employees and is vertically integrated -- fabrication, welding, machining, and laser work happen in-house alongside final assembly. Chad holds a bachelor's degree in mechanical engineering related technologies from Kansas State University.

The session is hosted by Mark Graban, then VP of Improvement and Innovation Services at KaiNexus and the author of Lean Hospitals, Healthcare Kaizen, Measures of Success, and The Mistakes That Make Us.

The methodology in sequence

Chad's structured problem-solving process is a connected sequence of steps, each feeding the next. The order matters because each step prevents a specific class of error in the next one.

The full sequence:

5G (the observation discipline) → 5W1H (the phenomenon description, producing both an initial and a revised phenomenon) → 4M/1D fishbone (contributing factor analysis) → Confirmation (verifying which contributing factors actually drive the symptom) → 5 Whys (root cause analysis on the verified factors) → Levels of Countermeasures (selecting the right type of fix).

The entire workflow lives in KaiNexus through a PDCA-formatted template, with associated tasks for implementation and SMART validation for impact tracking.

What separates this approach from generic problem-solving training is the discipline of the transitions. The 5W1H doesn't begin until the team has actually gone to the gemba. The fishbone doesn't begin until the phenomenon has been revised from a broad initial statement to a narrowed, specific description. The 5 Whys doesn't begin until the confirmation step has identified which contributing factors actually turn the symptom on and off. Skip any of these transitions and the methodology fails -- not because the tools are wrong, but because the tools are being applied to the wrong target.

The 5G process: grounding the work in direct observation

The 5G framework comes from the Japanese tradition of Lean problem-solving and refers to five terms: Genba (the actual place), Genbutsu (the actual thing), Genjitsu (the actual facts), Genri (principles or theory), and Gensoku (rules or standards). Different Lean traditions emphasize different subsets. AGCO focuses primarily on the first three, with the latter two reserved for cases where direct observation alone doesn't reveal the cause.

Genba. Go to the actual place where the problem occurred. Don't try to solve the problem from a conference room. Chad is direct that this is where AGCO's old approach failed most often -- the team would assume they understood where the problem was happening and design countermeasures based on those assumptions. Half the time the countermeasures addressed something other than what was actually wrong, because the team never went to look.

Genbutsu. Examine the actual objects involved. The materials, the machinery, the parts, the tools, the people doing the work. At a vertically integrated facility like Hesston, this can mean walking the problem back through multiple departments -- a quality issue at final assembly might trace back to fabrication, welding, or machining, and tracing it requires walking the value stream rather than working from the assembly drawings alone.

Genjitsu. Check the actual facts. Separate what's observed from what's assumed. What did the operator actually do? What did the machine actually produce? What does the part actually measure? This is where direct measurement, photographs, and primary observation replace secondhand reports. Chad's framing: get the real facts and analyze the actual data, not the version that has survived two rounds of summary.

The remaining two G's -- Genri and Gensoku -- come into play when direct observation doesn't reveal the cause. Genri is the underlying theory of how the process should work. Gensoku is the standard or rule that defines what good looks like. If the gap between observation and outcome can't be explained at the level of place, thing, and fact, the team examines whether the underlying theory is wrong or whether the standard is being followed.

The 5G process is the foundation that makes everything downstream work. Skip it, and the rest of the methodology runs on assumptions instead of evidence.

Initial phenomenon vs. revised phenomenon: where most problem-solving goes wrong

The single most important move in Chad's methodology is the discipline of moving from an initial phenomenon to a revised phenomenon before any cause analysis begins.

The initial phenomenon is whatever the person experiencing the problem first reports. "The shaft is the wrong size." "The hole is oversized." "The cab is leaking air." These statements are starting points, not problem definitions. They're too broad to act on and too imprecise to investigate.

The revised phenomenon is what the team arrives at after applying the 5G process and the 5W1H discipline. It's narrowed, specific, and tied to direct observation. "The shaft measures differently when measured by three different operators with the same micrometer" is a revised phenomenon for the "wrong size" complaint. "The drill is at an off angle producing an oblong hole" is a revised phenomenon for the "hole is oversized" complaint. "The air leak occurs at the connection between the air hose and the air fittings on the 9980 windrower cab suspension assembly" is a revised phenomenon for the "cab is leaking air" complaint.

The shift from initial to revised is what narrows the scope of investigation. Without it, the team is trying to solve a broad symptom, and the fishbone diagram fills up with twenty plausible contributing factors, none of which can be definitively confirmed because the symptom itself isn't specific enough to test. With it, the team has a focused target. The fishbone diagram fills up with four or five real contributing factors, each of which can be tested and confirmed.

Chad's example for why this matters: a micrometer reading discrepancy. Three operators measure the same shaft and get three different readings. The initial phenomenon was that the shaft size was wrong. The revised phenomenon is that the measurement varies by operator -- which is a completely different problem, with completely different root causes (operator technique, gauge R&R, tool selection), and completely different countermeasures (training, calibration, or replacing micrometers with go/no-go gauges).

If the team had stopped at the initial phenomenon and started analyzing why the shaft was wrong, they would have run down dozens of irrelevant paths. By revising the phenomenon first, they reframed the entire investigation.

5W1H: the questions that produce the revised phenomenon

The 5W1H is the structured set of questions AGCO uses to move from initial phenomenon to revised phenomenon. The form is filled out while the team is at the gemba, applying the 5G process to the actual situation.

The questions:

What. What happened? What is the phenomenon? Can the team reproduce it -- can they turn it on and off? What does it look like? What machine, product, material, and size are involved?

When. When did it occur? When did it start? Is this a startup problem, an intermittent problem, or a continuous problem? Did it happen during a shift change, a model changeover, or a shutdown? Is there a pattern in the timing?

Where. Where did the phenomenon occur? Where on the equipment or material? This question does two distinct kinds of work -- it informs containment (where else could the problem be present?) and it narrows the scope (which specific location does the root cause analysis need to address?).

Who. Who is doing the work? Who is affected? Is the phenomenon skill-related -- does it happen with some operators but not others? Chad notes that this question often reveals shift-to-shift variation that hadn't been communicated. One operator on first shift welds a component in a different sequence than another operator on second shift. The 5W1H surfaces those variations as part of the phenomenon definition, not as something to discover later.

Which. Which trend or pattern does the phenomenon have? Which factors influence it? Which process variables are involved? Is the problem related to a specific sequence -- like the order of loading parts into a fixture?

How. How is the work being done? How much variation is present? Is the process in control? How does the equipment's state differ from optimal? How many times has the phenomenon occurred?

Before the 5W1H questions begin, the form documents the containment actions taken to prevent the problem from passing to the next customer. This is typically a cross-functional move -- production, quality, manufacturing engineering, and planning aligning on what gets quarantined, reworked, or held.

The output of a properly executed 5W1H is the revised phenomenon -- specific enough to investigate, narrow enough to act on, grounded in direct observation.

4M/1D fishbone: the contributing factor analysis

Once the revised phenomenon is established, the team moves to the fishbone diagram (also called the Ishikawa diagram). The 4M/1D refers to the categories AGCO uses for organizing potential contributing factors: Material, Machine, Method, Man, and Design.

Chad notes that the categories aren't fixed. Other traditions use Mother Nature (environment), Management, Money, Maintenance, or Measurement instead of some of the M's. The category set should match the actual categories of contributing factors relevant to the problem -- if a category doesn't fit, replace it with one that does.

The fishbone exercise is a structured brainstorm with a cross-functional team. The revised phenomenon goes at the head of the fish. The major categories form the bones. The team fills in contributing factors under each category, drawing on the 5G observations and the team's collective knowledge.

A diagnostic worth knowing about: if the team ends up with 10 or 20 contributing factors, the revised phenomenon is probably too broad. Four or five contributing factors per fishbone is normal. More than that usually signals that the team skipped the phenomenon revision step and is working at the level of the original symptom rather than the narrowed observation.

After the brainstorm, the team eliminates the trivial factors (cross them off) and circles the important ones for further analysis.

Confirmation: the step that prevents false root causes

The confirmation step is where AGCO's methodology becomes distinctive. Most problem-solving training jumps directly from fishbone analysis to 5 Whys analysis. AGCO inserts a confirmation step between them, and the discipline of that step is what separates real root cause work from theatrical root cause work.

The confirmation works by transferring the important contributing factors from the fishbone onto a confirmation sheet, then defining for each factor a way to turn the symptom on and off. The team then actually tests whether the contributing factor produces the symptom in the way the team hypothesized.

If the test confirms the relationship, the factor passes through to the 5 Whys analysis. If the test shows that the factor doesn't actually produce the symptom -- the team can't replicate the relationship on demand -- the factor gets eliminated from further analysis. Only the not-OK factors carry forward.

This is the aha moment for most cross-functional teams, Chad notes. They identify what they think is a contributing factor, set up a test, and discover that the relationship they assumed exists doesn't actually exist. The team had been ready to develop countermeasures for the wrong cause. The confirmation step caught it.

The reverse case is equally important. The team identifies a contributing factor, sets up a test, and successfully turns the symptom on and off using that factor. Now they have evidence-based confidence that the factor is real. The downstream 5 Whys analysis is working with confirmed inputs, not speculation.

5 Whys: extending root cause analysis until the "therefore" test passes

The 5 Whys analysis takes the confirmed not-OK factors from the confirmation step and asks why repeatedly until the team reaches a cause that, when addressed, eliminates the symptom.

Chad emphasizes the number five is not the point. The point is to keep going until you've reached a cause that produces an effective countermeasure and you can go no further. Sometimes that's three whys. Sometimes it's seven. The discipline is in the substance, not the count.

The verification technique AGCO uses: the "therefore" test. Walk the chain backwards using the word "therefore" between each step. If the story makes sense in reverse, the chain is solid. If it doesn't, the team has a logic gap somewhere.

Chad's machine example illustrates the technique. Coolant on the floor. Why? The coolant is leaking from the machine. Why? The seal was damaged. (Historically, this is where the team would have stopped, replaced the seal, and moved on.) Why was the seal damaged? Metal shavings got into the coolant. Why? The coolant pump guard allowed shavings to pass behind the coolant screen.

Now run the therefore test in reverse. The coolant pump guard allowed shavings to pass behind the coolant screen, therefore metal shavings got into the coolant, therefore the seal was damaged, therefore coolant is leaking from the machine, therefore coolant appears on the floor.

The story makes sense in reverse. The chain is verified. The countermeasure -- redesigning and installing a new guard over the screen -- addresses the actual cause rather than the most recent symptom.

Stopping at "the seal was damaged" would have produced a temporary fix. The seal would have been replaced, the machine would have run for a while, and then the same problem would have returned. Stopping at "the coolant pump guard allowed shavings to pass" produces a durable fix because it addresses the underlying mechanism.

The levels of countermeasures: choosing the right type of fix

Not all countermeasures are equally effective. AGCO uses a four-level hierarchy that ranks countermeasures by their durability and risk profile.

Level 1: Defect Awareness. The least effective. Visual controls, training, documentation, and communication that make operators aware of the defect and its cause. Awareness alone doesn't prevent the defect -- it relies on the operator remembering and responding correctly. Used for one-off issues and as a stop-gap while more durable countermeasures are developed.

Level 2: Defect Detection. Inspection, testing, and validation that catch defects before they reach the next customer. Better than awareness because it doesn't rely on operator vigilance, but still allows defects to be produced -- they're just caught before they ship. Examples include the quality network and quality matrix that AGCO uses to verify product before it moves downstream.

Level 3: Defect Prevention (Mistake-Proofing / Poka-Yoke). Process or product designs that minimize the chance of producing the defect. Go/no-go gauges that make incorrect measurements visible immediately. Fixtures that can only accept correctly-oriented parts. Better than detection because it reduces defects rather than catching them. Chad notes that poka-yoke designs that make incorrect installation physically impossible are stronger than gauges that require an operator to interpret a reading.

Level 4: Defect Elimination. The most effective. Redesign the product or process so the defect cannot occur. Engineering changes that remove the failure mode entirely. The design team works closely with manufacturing engineering to apply this level whenever possible, because it eliminates the failure mode for all future production rather than just managing it.

The hierarchy is also a cost and risk hierarchy. Awareness is cheap and weak. Elimination is expensive and strong. The choice of countermeasure level depends on the severity of the defect, the cost of recurrence, and the feasibility of upstream redesign.

A real-world example: the 9980 windrower air leak

To illustrate the methodology end to end, Chad walks through a real example from a new model year of the Massey Ferguson 9980 windrower.

Initial phenomenon. Air ride suspension leaks after assembly on the 9980 windrowers.

5G observations. The team went to the gemba (the assembly area where leaks were being detected), examined the genbutsu (the air ride suspension, the pump, the hoses), and pulled the design drawings to understand the intent.

Revised phenomenon. Air leak on the tractor cab suspension assembly occurring at the connection point between the air hose and the air fittings.

4M/1D contributing factors. Material: air hose not to spec, air fittings not to spec. Design: correct air hoses and fittings used, hose-fitting compatibility. Method: correct work instructions available, correct tools available to operators.

Confirmation. Material checks passed (hose and fittings both met spec). The design side did not pass -- the team discovered that the incorrect hose was being used for the fittings called out, and the work method required operators to use pliers and lubricant when the design intent was tool-free installation by hand.

5 Whys on the confirmed factors. Why was the incorrect hose used? The design didn't consider the compatibility of the hose and the fittings. Why? The hose called out wasn't approved for use with the push fittings selected. Therefore the hose couldn't be installed properly. Therefore the seal was incomplete. Therefore air leaked at the connection.

The therefore test passes. The chain is verified.

Countermeasure (Level 4: Defect Elimination). Change the hose specification to match the fitting requirements. This eliminated the leak and also eliminated the need for pliers and lubricant during installation.

The countermeasure was implemented through KaiNexus using a PDCA workflow with associated tasks and SMART validation that calculated the total cost of rejects plus project implementation cost.

The follow-up that's almost as important as the original fix: the team's kaizen expansion process caught the same hose-fitting compatibility issue on a different product before that product launched. The root cause analysis didn't just fix the 9980. It prevented the same problem on a different machine. This is the compounding return on doing the work properly -- a single confirmed root cause prevents recurrence on the original problem and prevents occurrence on related problems that haven't surfaced yet.

What the Q&A added

Several questions in the live session produced substantive additions worth preserving.

On time pressure. AGCO initially set fixed time limits for completing the structured problem-solving process. They eventually removed the time limits. The reason: the quality of the root cause analysis and countermeasures suffered when teams were rushed, and the containment step on the first page of the 5W1H form already prevented downstream contamination during the analysis period. Time pressure on the analysis produced worse outcomes than allowing the analysis to take however long the rigor required.

On sustainment. The most durable form of sustainment is designing the problem out (Level 4 countermeasures), which eliminates the failure mode entirely. For countermeasures that can't reach Level 4, the quality group conducts 30/60/90-day checks to verify the countermeasures are holding. The cross-functional team composition itself is part of sustainment -- when production, quality, manufacturing engineering, design, planning, and logistics are all engaged in the root cause work, the countermeasures don't depend on a single owner staying engaged.

On fishbone team composition. AGCO doesn't structure fishbone teams as "experts and non-experts." They structure by value stream -- one team per assembly product, and separate teams by core competency for components (fabrication, welding, machining, laser). The cross-functional team is built around the value stream, not around individual expertise levels.

On training and coaching. All 1,000 employees at the Hesston site have been through the structured problem-solving training. Skill develops with practice. Chad notes that a year into the rollout, fishbone diagrams were filled out completely in every category -- the team felt obligated to populate every box. Today, four or five contributing factors per fishbone is typical, because the team has learned that focused analysis beats exhaustive analysis.

On the discipline of coaching. AGCO holds a weekly kaizen review where teams present their structured problem-solving work to the staff. The presentation requires telling the story from initial phenomenon to revised phenomenon to confirmed contributing factors to root cause to countermeasure. If the story doesn't hold together -- if the questions asked during the review reveal logic gaps -- the kaizen is rejected and the team works through it again as a coaching exercise. The rejection isn't punitive. It's the mechanism that maintains the quality of the methodology over time.

On methodology comparison. Chad uses DMAIC and 8D for project-level work and the structured problem-solving approach described in this session for day-to-day problem-solving. The structured approach is faster because the 5G grounding work tightens the scope before analysis begins. DMAIC is a fit for larger, more complex projects where the analysis itself is part of the work.

On scope narrowing. Chad's metaphor: trying to solve a broad initial problem is like trying to eat an elephant in one bite. The 5G and 5W1H discipline is about eating the elephant one bite at a time. The narrowed revised phenomenon also feeds the kaizen expansion process -- when a confirmed root cause from one problem can be applied to prevent similar problems elsewhere in the organization, the return on the original analysis compounds.

How KaiNexus connects

Several elements of Chad's methodology benefit from the kind of infrastructure KaiNexus provides.

The full PDCA workflow from initial phenomenon to revised phenomenon to root cause to countermeasure to validation lives as a single connected record. Teams can move through the methodology without losing context between steps, and the record provides the full story of how the problem was solved -- which is exactly what the Wednesday kaizen review needs.

The kaizen expansion work depends on being able to search and reference completed problem-solving records across the organization. When a confirmed root cause on one product turns out to be relevant to a different product, the team needs to be able to find the original work and apply the lesson. A platform that captures and surfaces past work makes this kind of cross-product learning operational rather than dependent on individual memory.

The cross-functional collaboration that AGCO's methodology requires is supported through the platform's notification, task assignment, and shared visibility features. When production, quality, manufacturing engineering, design, and planning all need to see and act on the same problem, the platform replaces email chains and scattered spreadsheets with a single source of truth.

The SMART validation that AGCO uses to track the cost of rejects and the cost of countermeasure implementation lives in the same record as the problem-solving work. The financial discipline isn't a separate exercise -- it's part of the workflow.

The 30/60/90-day sustainment checks the quality group performs are supported through scheduled tasks and reporting. The work doesn't depend on someone remembering to follow up.

None of this substitutes for the discipline Chad describes. The methodology is the thing that matters. The platform is what allows a 1,000-person facility to operate the methodology consistently across teams, products, and time without losing rigor as the volume scales.

See KaiNexus in action →

About the presenter

Chad Westbrook serves as Manufacturing Engineering Manager and AGCO Production System Manager at AGCO Corporation, based at the company's Hesston, Kansas facility. The Hesston site builds hay equipment (round balers, large and small square balers, windrowers, mowers, headers) and combines under the Massey Ferguson, Challenger, Gleaner, and Fendt brands. Chad holds a bachelor's degree in mechanical engineering related technologies from Kansas State University.

Frequently Asked Questions

What is the difference between an initial phenomenon and a revised phenomenon?

The initial phenomenon is whatever the person experiencing the problem first reports -- typically a broad symptom that's too imprecise to investigate. The revised phenomenon is what the team arrives at after applying the 5G process and the 5W1H discipline -- a narrowed, specific description grounded in direct observation. The move from initial to revised is the single most important step in AGCO's methodology because it determines whether the downstream analysis is targeting the right problem. A 5W1H that produces a fishbone with twenty contributing factors usually signals that the team skipped the phenomenon revision step.

What is the 5G process and how is it different from gemba walks?

The 5G refers to five Japanese terms: Genba (the actual place), Genbutsu (the actual thing), Genjitsu (the actual facts), Genri (principles or theory), and Gensoku (rules or standards). AGCO focuses primarily on the first three for problem-solving work, reserving the latter two for cases where direct observation doesn't reveal the cause. A gemba walk is a leadership routine for observing operations -- the 5G is the investigative discipline that grounds problem-solving in direct observation rather than secondhand reports. Both share the principle that you have to go see, but the 5G is more structured because it's tied to a specific problem rather than to general operational awareness.

Why does AGCO add a confirmation step between fishbone analysis and 5 Whys?

Because most problem-solving methodologies allow teams to jump from "this is a plausible contributing factor" to "this is the root cause" without actually testing the relationship. The confirmation step requires the team to define a way to turn the symptom on and off using each contributing factor and to actually run that test. Factors that don't produce the symptom on demand get eliminated. Factors that do produce the symptom carry forward to the 5 Whys analysis. The step prevents the team from developing countermeasures for a cause that turns out not to be the cause, which is one of the most common failure modes in problem-solving work.

How does the "therefore test" verify a 5 Whys analysis?

By walking the chain backward and inserting the word "therefore" between each step. If the story makes sense in reverse, the logical chain is solid. If it doesn't, there's a gap in the analysis that needs to be addressed. Example: the coolant pump guard allowed shavings to pass behind the screen, therefore metal shavings got into the coolant, therefore the seal was damaged, therefore coolant leaked from the machine, therefore coolant appeared on the floor. The reverse reading is internally consistent, which verifies the chain. If the reverse reading required a leap that wasn't in the original analysis, the team knows where to look for the gap.

What are the four levels of countermeasures, and how does AGCO choose between them?

In order from least to most effective: Defect Awareness (visual controls, training, communication), Defect Detection (inspection, testing, validation), Defect Prevention (mistake-proofing, poka-yoke), and Defect Elimination (engineering the failure mode out entirely). The choice depends on the severity of the defect, the cost of recurrence, and the feasibility of upstream redesign. AGCO works toward Defect Elimination whenever possible because it removes the failure mode for all future production rather than managing it. The lower levels are used when elimination isn't feasible, with a preference for prevention over detection and detection over awareness.

Why did AGCO remove time limits from the structured problem-solving process?

Because time pressure on the analysis produced worse outcomes -- rushed teams stopped at superficial root causes and developed countermeasures that didn't hold. The containment step on the first page of the 5W1H form already prevents the problem from passing to downstream customers during the analysis period, which removes the urgency argument for rushing the root cause work. AGCO learned that the quality of the analysis matters more than the speed of the analysis, and they removed the time aspect from the methodology.

How does AGCO sustain countermeasures after they're implemented?

The most durable form of sustainment is Level 4 countermeasures (Defect Elimination), which remove the failure mode entirely and don't require ongoing maintenance. For countermeasures at lower levels, the quality group conducts 30/60/90-day checks to verify the countermeasures are still holding. The cross-functional team composition also supports sustainment -- when production, quality, manufacturing engineering, design, planning, and logistics are all engaged in the root cause work, the countermeasures don't depend on a single owner staying engaged.