Alerts & Escalation

Digital Andon: What Changes When You Move from Cord to Software Alert

Rina Nakamura · September 5, 2024 · 6 min read

Digital andon visualization: amber alert signal with status indicators

The andon cord — or the andon button, or the illuminated column light — is one of the Toyota Production System's most immediately copied and most frequently misunderstood tools. Taiichi Ohno conceived it as a mechanism for making problems visible and giving operators the authority to stop the line when something was wrong. Sixty years later, most factories have an andon system of some kind. What they vary on is whether it actually works as intended.

Digital andon — alerts generated from sensor data, PLC event streams, or threshold logic in a process intelligence layer — extends the physical signal into a software notification. A shift supervisor's tablet lights up with the same amber urgency as the stack light above Line 4. The promise is faster response, richer data context, and escalation to people who are not physically on the floor. The reality, when implemented without TPS discipline, is a system that generates alerts that operators mute within two weeks.

What the Original Cord Was Designed to Do

Understanding what breaks in digital andon requires first understanding what the physical system was designed to enforce. The andon cord (or its modern equivalent, a pull-cord or push-button station) had several design properties that are easy to overlook:

It required human decision to trigger. An operator who noticed an abnormality — a dimensional drift, a missing component, an equipment irregularity — made a judgment call and pulled the cord. The human decision step filtered noise: the operator's tacit knowledge meant that not every variation generated a signal, only variations that the operator assessed as production-meaningful.

It caused a specific, visible response. The zone light illuminated, the team leader walked to the station, and a conversation happened at the line. The escalation path was immediate, physical, and time-bounded — if no one responded within a defined interval, the line stopped.

It created a record by forcing resolution. The andon event was only "closed" when the problem was addressed. Every cord pull became part of the production log, creating a traceable record of abnormalities over time — the raw material for 改善 (kaizen) analysis at the weekly production review.

These three properties — human-filtered trigger, time-bounded response protocol, and automatic record creation — are what most digital alert systems fail to replicate fully.

Where Digital Andon Goes Wrong

The most common failure mode is threshold proliferation. A process intelligence system can monitor dozens of signals simultaneously: cycle time deviation, temperature out of band, OEE drop below X%, vibration spike on a spindle motor, pressure variance on a pneumatic actuator. The temptation is to alert on all of them, because "you have the data, you might as well use it."

The result is a notification stream that looks like a security operations center: constant amber, occasional red, and operators who have learned that amber means nothing requires attention right now. This is not a hypothetical — it is the most commonly reported failure mode in industrial alerting, sometimes called "alarm flood" in process industries and well-documented in IEC 62682, which provides guidance on alarm management for industrial process plants.

The IEC 62682 framework is worth borrowing even for discrete manufacturing contexts. It defines an alarm as a "notification requiring a response" — not a notification of any abnormal condition, but specifically one requiring operator action within a defined time window. By this definition, many conditions that get configured as alarms in digital andon implementations are not actually alarms: they are data points that might become alarms under certain conditions, or informational signals that should appear on a dashboard but not trigger a notification.

Consider a 3-line precision stamping operation in Shizuoka Prefecture. In an early phase of their digital andon deployment, cycle time deviation alerts were set at 5% above ideal cycle time. With a 4-second ICT, that threshold triggered on essentially any micro-stoppage — a parts feeder hesitation, a transfer arm timing variation that self-corrected. The result was 40-60 alerts per shift, per line. After 10 days, operators had configured the mobile app to vibration-only, which they kept in their apron pocket. Alert response time, measured from timestamp to acknowledgment in the system, was averaging over 22 minutes — worse than before the system was installed.

Designing Alert Logic with TPS Discipline

The fix is not better notification UI or more persistent reminders. It is redesigning the alert conditions themselves with the same discipline Ohno applied to the andon cord: alerts should only fire when they require a response, and the response expectation should be clear.

A practical starting framework uses three severity tiers:

Tier 1 — Operator attention: Conditions that require the operator to check something before the situation escalates. Cycle time deviation sustained for 3 consecutive cycles (not a single outlier). Minor stoppage count exceeding 3 in a 10-minute window. These appear as ambient indicators — a subtle status change on the operator's HMI panel or a dashboard glow change — not push notifications.

Tier 2 — Team leader response: Conditions that require a team leader or maintenance contact. Equipment fault code with known resolution procedure. OEE dropping below shift target with 45 minutes remaining in the shift. These trigger a mobile push notification with a 10-minute response expectation, same as the original andon pull-cord interval.

Tier 3 — Production stop consideration: Conditions that may require halting the line to prevent quality escape or equipment damage. Quality deviation threshold breached on a critical dimension. Safety interlock activation. Consecutive failed cycles on an automated test station. These trigger immediate notification with escalation if unacknowledged within 5 minutes — the digital equivalent of the line stopping and waiting.

The key design principle: each tier should have a defined owner (who receives the alert), a defined response expectation (what they are expected to do and in what time window), and a defined closure condition (how the alert is resolved in the system). Alerts without defined response expectations do not modify behavior — they accumulate as digital background noise.

Record Creation as a Feature, Not an Afterthought

One property of digital andon that genuinely improves on the physical cord is automatic record creation. Every alert event, response timestamp, acknowledging operator, and resolution note becomes a data record without requiring manual log entry. Over 3 months, this creates an andon event database that supports structured loss analysis: which condition types are most frequent, which lines have the highest alert burden, which shift pattern sees the most Tier 2 escalations.

This is the kaizen data loop in its most direct form. The andon alert log becomes the input for the weekly 現場 (genba) review: which recurring conditions should be eliminated through maintenance, process adjustment, or operator training, rather than managed through repeated alerting. The goal, consistent with jidoka (自働化) principles, is not a system that alerts better but a process that requires fewer alerts over time.

We are not saying digital andon should replace human judgment at the line. The operator's ability to interpret context — "this stoppage pattern started when we switched to the new supplier's material batch on Tuesday" — cannot be fully encoded in threshold logic. What digital andon does is ensure that observation is recorded, escalated appropriately, and traceable back to specific production conditions. That combination of speed, structure, and record is what makes the digital version useful. Without TPS-grounded alert design, it is just a faster way to create noise.

Migration from Physical to Digital: Practical Sequence

For factories that have existing physical andon infrastructure — stack lights, zone boards, cord pull stations — a phased migration is more reliable than a full replacement. Digital andon, in its first phase, should shadow the physical system: the same conditions that light the physical stack light also generate a digital record. This phase validates that PLC event mapping is correct and gives operators visibility without changing their response behavior.

In the second phase, digital alerts supplement physical signals for conditions the physical system cannot cover: remote monitoring, supervisor tablet views, historical trending. The physical system remains the primary response trigger for line-level operators.

Only in a third phase, after 3-6 months of validated digital records and confirmed operator familiarity, should the physical system be considered optional for any individual condition category. For most mid-size facilities, the physical andon layer never goes away — it is faster than any mobile notification for a line operator standing two meters from a stack light, and it requires no network connectivity or device interaction.

Sixty years of TPS refinement is embedded in that cord. Digital tools extend its reach; they do not replace its clarity.

Get started

Run a pilot on one line. See live OEE in 30 days.

Our engineers handle the connection. You review the data.

Request a Pilot