FPGA Based Fault Detection, Isolation and Healing for Integrated

Ali Akoglu, Sonia Vohnout, and Justin Judkins

Advances in VLSI technology have led to fabrication of chips with number of transistors projected to reach 10 billion in the near future. Affordable fault tolerant solutions transparent to applications with minimal hardware overhead in the micro architecture are necessary to mitigate component level errors for emerging system-on-chip (SoC) platforms. Ridgetop Group and the University of Arizona have developed innovative methods and systems for _detection of anomalous conditions_ that lead to faults in highly-complex electronic systems. Through built-in selftesting and fault detection, isolation and recovery capabilities we can offer 100% system availability and proactively avoid false or missed alarms, and estimate the remaining useful life of critical electronic components and their associated subsystems. A novel self-healing mechanism for SoC using field programmable gate array (FPGA) technology that localizes and isolates the faulty area and then replaces the functionality through partial configuration of the FPGA is introduced. Prognostic techniques are leveraged to address resource allocation and distribution to enable a more fault-tolerant, time efficient, and robust system. When prognostic detection methods are combined with reconfiguration strategies, system reliability and availability improve, reducing the probability of failure without compromise of either service quality or performance or requiring redundant components on the chip.