Nvidia driver gets problems with Linux kernel


With the changes to the code for the power architecture for the upcoming Linux version 5.3, chief developer Linus Torvalds has also added code to the main kernel branch that could cause some problems for Nvidia. Developer Christoph Hellwig’s patches remove code paths that are currently only used by Nvidia’s proprietary driver.

The code itself is the support of direct memory access for the so-called NPUs, i.e. deep learning accelerators. In addition, there are some other smaller components in the architecture code. Hellwig himself justifies the removal of the code with the fact that it is not used by any other part of the main branch of the Linux kernel.

This decision is based on a fundamental problem of kernel development. The Linux community deliberately does without stable internal kernel interfaces in order to be able to implement even large changes relatively easily. Likewise, there should always be code available for kernel interfaces that uses them in order to better maintain and, above all, understand the interfaces themselves. However, drivers maintained outside the main branch, such as the Nvidia driver, stand in the way of these concepts from the point of view of the kernel developers and are criticized again and again.

Hurdles for the Nvidia driver

As expected, the removal of the code did not happen without contradiction. IBM developer Alexey Kardashevskiy, for example, complains that his criticism of the plans has been ignored. He also claims that the Nvidia driver simply doesn’t exist. From Hellwig’s point of view this is exactly the case, because the driver is not part of the main branch of Linux.

Greg Kroah-Hartman, the most important kernel developer after Linus Torvalds, also supports this view. The community even voted on the approach last year. Despite criticism, the code is no longer available in the upcoming Linux kernel version. For Nvidia this simply means more work. On the one hand the driver won’t run with Linux 5.3 anymore and on the other hand Nvidia will have to rebuild and maintain the removed parts itself.

A similar step was taken by the Linux community at the end of last year. Developers there had marked important interfaces for the so-called heterogeneous memory management (HMM) as GPL symbols. This leads to the fact that also only GPL compatible drivers and modules can use this code, which in turn could partly hinder Nvidia. Also the HMM code has only been used by external drivers for years.

Both the HMM code and the now removed NPU code concern the for Nvidia very lucrative and also prestigious business area of supercomputing. Here Nvidia cooperates very strongly with IBM.


About Author

Mette Frederiksen is a The Washington Newsday correspondent. With her coverage of general science, NASA and the interface between technology and society, Frederiksen has been in the Science Desk's Technology Beat since joining Washington Newsday in 2018.

Leave A Reply