Intel® MAX® 10 Based System Management

Slide 1 Slide 2 Slide 3 Slide 4 Slide 5 Slide 6 Slide 7 Product List

Thermal management is considered to be the next most critical aspect of the system management design. A system that does not operate within the appropriate thermal range of its components will be more prone to reliability issues, early wear out, and potentially catastrophic failure. The system manager must be able to monitor the environmental conditions and evaluate and analyze the results of the data gathered. Ultimately, the system manager must be able to act as needed to mitigate poor conditions and or initiate steps to prevent system damage. In complex systems with multiple datapath devices, it is possible for board level micro climates to exist. One datapath FPGA may be more heavily loaded than another, therefore it is good design practice to track the temperature at multiple locations on the board. This ensures complete coverage of board conditions. The system management device must have sufficient resources, analog and digital, to monitor multiple temperature sensors and the ability to control active cooling systems, often fans. There are several different types of temperature sensors, the most common interfaces include analog, IIC, and SPI. The MAX® 10 FPGA with integrated ADCs enables it to connect to analog sensors, and as an FPGA it can easily support both the IIC and SPI interfaces, or all three variants in the same device. Even boards with a robust system management implementation will sometimes fail. Even though the system management design was not able to prevent board failure, its job is not done. The system management device should be able to record events prior to, during and after the failure. Reusing the Arria® 10 FPGA design example from earlier, but in this case there is a temperature sensor next to the Arria® 10 device and it shows an increase in temperature. As long as the temperature is still within the normal operating range, the system manager would increase active cooling (for example, increase fan speed). The temperature continues to increase and crosses a threshold approaching critical. In this case the system may choose to do any of the following or a combination of them: increase active cooling, increase the frequency of monitoring, log the event and possibly even redirect traffic reducing the loading on the device in question. Ultimately, if the temperature continues to rise to a critical threshold, then the system management system should shut down the device or entire board as necessary.

PTM Published on: 2015-06-26

信息

帮助

联系我们

关注我们