Smart Developers Leverage Built-In Processor Support for Safety-Critical Applications
投稿人:DigiKey 北美编辑
2016-08-09
Software continues to play an increasingly important role in controlling vehicles and equipment. At the same time, the inevitability of software errors threatens safe operation in applications where software-controlled machines interact with humans. In collaborative robotics, for example, where robots and humans work in close proximity, software bugs can directly result in physical harm to humans.
For these applications and others where computer-system errors in machine operation can endanger humans, designers need to build safety controls into their systems at a fundamental level. Smart use of hardware features that are already built into microcontrollers (MCUs) and digital signal processors (DSPs) can help mitigate the effects of errors in hardware or software.
Automation and manual labor continue to complement each other across a wide window of product costs and features. While automation can efficiently deliver high-volume, low-cost products, manual assembly is most effective for production of high-cost, low-volume products. Collaborative robotics offers a hybrid approach intended to merge the advantages of each approach (Figure 1).
Figure 1: Collaborative robotic systems provide a hybrid approach, merging the advantages of manual production and automated systems. (Source: ABB/European Robotics Forum)
Collaborative robotic systems are designed to work in direct cooperation with a human within a specific work area. This hybrid approach combines the precious power and endurance of a robot with the problem-solving ability of a human. At the same time, this approach presents significant risk to human workers.
Robotic systems can swing mechanical arms with great acceleration and grip objects with great force. Placed in a work cell, a typical robot is constrained to move within a well-defined operating window while collaborating with a human worker, yet move within a larger restricted envelope while a maintenance worker is tending to the system (Figure 2). Rarely is the robotic system allowed to move through its complete maximum envelope in the presence of a human.
Figure 2: Industrial robots require deeply embedded safety mechanisms able to ensure safe operation within specific zones associated with normal production and maintenance. (Source: US Dept. of Labor Occupational Safety & Health Administration)
Conventional robotic assembly systems are designed with physical constraints or fenced-off working areas to protect nearby humans (Figure 3). When working in close proximity to one or more humans, a collaborative robotic system has little physical protection designed into the work cell. Instead, these systems depend on built-in mechanisms to ensure safe operation. In such systems, system-level errors arising from software bugs or hardware faults could easily allow a moveable robotic arm to accelerate into a human or carry out some physical movement that could present an immediate risk to the human working in collaboration.
Figure 3: On the factory floor, manufacturers place physical barriers around industrial robots to prevent accidental contact with nearby human workers. (Source: DNC)
Safety strategies
For designers, ensuring safe operation of a robotic system starts with confirming that its underlying control systems are operating as expected. DSPs and MCUs designed for real-time control applications offer multiple built-in safety features intended specifically for that purpose. For example, Analog Devices' ADSP-BF707 DSP and Texas Instruments' Hercules™ Series MCUs each integrate multiple memory-check features including error-checking-and-correction (ECC) for on-chip memory, parity-bit checking for off-chip memory, and cyclic redundancy check (CRC) for direct-memory access (DMA) transfers.
Although these memory features can correct or at least identify errors during memory operations, ensuring correct operation of the processor itself is a more complex problem. A hardware fault arising in the processor core itself can result in unpredictable errors including dangerous movements of a robotic arm or manipulator. To identify core faults, the Analog ADSP-BF707 DSP and TI Hercules Series MCUs each include built-in self-test (BIST) mechanisms integrated in the processor core itself. Although BIST cannot run during execution of application code itself, system designers can interleave complete or partial self-test runs with application code execution – providing constant if not continuous monitoring of processor function.
With the TI Hercules Series MCUs, for example, designers can initiate a BIST scan through a few simple steps:
- Configure clock domain frequencies
- Select number of test intervals to be run
- Configure the timeout period for the self-test run
- Save register state including CPU core registers and coprocessor registers as well as hardware break point and watch point registers
- Enable self-test by writing the appropriate bits into the CPU self-test controller register (Figure 4)
- Wait for CPU reset
- In the reset handler, read CPU self-test status to identify any failures
- Retrieve CPU state
Figure 4: Engineers can easily implement complete or partial BIST scans, interleaved between application execution. For the TI Hercules Series MCUs, engineers can implement code to quickly perform various housekeeping steps before finally enabling self-test by sitting bits in the self-test controller register shown here. (Source: Texas Instruments)
Intercepting software errors
As with most leading-edge applications, collaborative robotic systems rely on software for the major share of their functionality, in particular for highly differentiated capabilities. During operation, any number of software errors, including incorrectly set pointer variables, stack overflows, or code errors that escaped test coverage can cause the system to hang or otherwise spin off into unintended operations. To prevent this type of system behavior, designers can implement run-time checks in the software itself.
Use of a watchdog timer provides a particularly effective mechanism for catching errors that hang the processor. In this conventional approach, software resets the watchdog timer at a period shorter than the timeout setting of the watchdog timer. If indeed the watchdog timer times out, it means the expected timer reset never occurred – indicating that application code is hanging and the system is in some indeterminate state. Safety-oriented processors such as Analog Devices' ADSP-BF707 DSP and Texas Instruments' Hercules Series MCUs integrate watchdog timer capabilities. On the other hand, use of an independent, external watchdog timer further reduces the chance that an error scenario impacts the built-in watchdog.
Devices such as Texas Instruments’ TPS3823-33 supervisor IC provide for DSP and MCU-based systems with circuit initialization and timing supervision as well as supply voltage monitoring. This device integrates a watchdog timer that must be periodically triggered by either a positive or negative transition at the WDI input to avoid a reset signal being issued (Figure 5). When the supervising system fails to retrigger the watchdog circuit within the time-out interval, ttout, RESET becomes active for the time period td, causing the processor to enter its hardware reset cycle. Much as laptop users cycle when their systems hang, this approach may provide the only effective method for recovering from deep software bugs.
Figure 5: Along with other supervisory capabilities, the Texas Instruments TPS3823-33 IC provides a hardware watchdog timer, offering further protection against errors that could compromise on-chip watchdog timers. (Source: Texas Instruments)
System-level safety
Along with low-level mechanisms, experienced designers use a variety of methods to ensure the processor-based control system continues to generate expected output. Higher-level "safety" architectures rely on designs that execute critical functions redundantly, producing what should be the same result through multiple parallel paths that independently perform the same function on the same data. In mission-critical applications such as avionics that depend on accurate sensor readings from engines and control surfaces, hardware designers might build in multiple independent sensor paths to allow continued operation even if one (or more) sensor fails during flight operations.
Translated to the software level for robotic-movement algorithms, for example, software engineers might use two (or more) different algorithms or different implementations of the same algorithm to help validate continued system functionality. Here, engineers would create different objects set to the same values as the original data objects to compute the same algorithm using a different code pattern. Supervisory code would compare the results and generate an exception if different. For example, for an original function:
function work(VariableA, VariableB):
if (VariableA > VariableB):
// execute required function
// based on VariableA and VariableB
...
return Result
Engineers would create an alternate implementation, using different variable objects set to the same values to compute some result:
// set new object instances to values in original variables
function workNew(VariableA, VariableB):
VariableNewA = deepcopy(VariableA)
VariableNewB = deepcopy(VariableB)
if (VariableA > VariableB):
if (VariableNewA <= VariableNewB):
// generate exception - possible memory fault
else:
// execute required function
// using new variables and
// alternate coding pattern
...
return ResultNew
// supervisor
X = work(A,B)
XNew = workNew(A,B)
if ( X <> XNew ):
// generate system fault exception
// otherwise continue processing with X
Specialized multi-core MCUs use conceptually the same approach to ensure that the underlying processor itself is functioning correctly. Here, multiple copies of the same processor core execute the same software in lockstep – at the same time or with a small clock offset. Any difference in output of these cores running in lockstep indicates the MCU has a low-level fault – possibly a transient fault due to some sort of soft error such as radiation or a hard fault due to some internal breakdown of functionality.
ARM provides Dual-Core Lock-Step (DCLS) redundant core configurations that allow MCU manufacturers to implement lockstep cores for safety applications. Texas Instruments has implemented this feature in its Hercules Series MCUs. Here, a pair of ARM® Cortex®-R5F cores operate in lockstep – with execution separated by two clock cycles – to reduce the chance that a soft error from an alpha particle would result in the same (erroneous) result in cores executing exactly the same code at the same time. Note: Alpha particles are an unavoidable side-effect of IC packaging in that it contains low levels of radioactive material at low decay rates and can cause soft errors.
In the TMS570LS0714, a special unit, the CPU Compare Module (CCM), compares the output signals of both cores (Figure 6). To a programmer, this process is transparent. No special coding is needed and the comparison between locked cores happens automatically.
Figure 6: Integrated in the Hercules Series MCUs, the CPU Compare Module (CCM) monitors output from the MCU's dual cores operating in lockstep with a two-cycle delay. This delay helps prevent a situation where a transient error could affect both cores and pass undetected. (Source: Texas Instruments)
Although these hardware features help ensure proper operations, building a functional safety application imposes rigorous demands for software development and certification. TI's SafeTI design packages support functional safety application development in a wide range of segments including industrial machinery, industrial process, medical, automotive, rail, and aviation. Along with the functionality provided with SafeTI software libraries, the design packages support requirements associated with achieving compliance to safety standards such as ISO 26262, IEC 61508, and IEC 60730.
Conclusion
Collaborative robotics presents significant opportunities for combining human experience with machine precision. In working in close quarters to a human, collaborative robotic systems present significant challenges in ensuring safe operation. Designed specifically to support functional safety applications, specialized DSPs and MCUs offer integrated features able to help identify errors and mitigate their effects at the application level. Using these devices, designers can build a foundation of safety for collaborative robotic systems.
免责声明:各个作者和/或论坛参与者在本网站发表的观点、看法和意见不代表 DigiKey 的观点、看法和意见,也不代表 DigiKey 官方政策。