Improved Intel IDXD Driver for Enhanced Accelerator Management in Case of Hardware Failures

Improved Intel IDXD Driver for Enhanced Accelerator Management in Case of Hardware Failures


Intel has released patches for the IDXD driver in Linux, allowing for a more robust recovery from hardware errors. The driver enables the Data Stream Accelerator (DSA) in Intel’s Xeon processors, starting from Sapphire Rapids. The patches posted to the Linux kernel mailing list enable the IDXD driver to perform a PCIe function level reset (FLR) when a hardware error is detected by the data throughput accelerator. This allows for a more reliable recovery process compared to simply printing an error message.

According to Intel, when the IDXD device detects hardware errors, it enters a halt state and triggers an interrupt to the IDXD driver. The current handling involves printing an error message to the interrupt handler. However, the new approach involves performing a function level reset (FLR) to recover the device’s hardware and software configurations to their previous operating state. This will allow the device and software to continue operating after the interruption.

The IDXD patches are currently under review and are expected to be included in an upcoming kernel series. With the Linux v6.11 merge window approaching, it is uncertain whether these patches will be deemed ready for inclusion in the upcoming release or if they will be postponed to a later kernel version.

This development is important for ensuring the stability and reliability of Intel’s accelerator offerings on its Xeon processors. The ability to recover from hardware errors more effectively can lead to a smoother overall user experience and improved performance of the hardware.

In conclusion, Intel has released patches for the IDXD driver in Linux, enabling it to perform a PCIe FLR when hardware errors are detected. These patches are currently under review and are expected to be included in an upcoming kernel series. The incorporation of these patches will enhance the reliability and robustness of Intel’s accelerator offerings on its Xeon processors, providing a more stable and efficient user experience.

Article Source
https://www.phoronix.com/news/Intel-IDXD-FLR-Reset-HW-Errors