The Best of Casimusings: Memoirs on the Nvidia Kernel Module


Casimusings

An introduction to binary-only modules
Nvidia is kind to its Linux-using customers; not as kind as it could be, but kind nonetheless. It provides for its users a binary-only, 3d accelerated, XFree86 kernel module. That list provides some very good descriptions as well as some that would make the average kernel developer rabid. The main descriptor that would induce this frothing hacker syndrome is "binary-only." This means that a precompiled object that handles all the nuts-and-bolts graphics stuff, is wrapped in an user-compiled kernel interface. This makes things easy on Nvidia, easy on users, and easy on X developers. The only question is, what happens when something goes wrong? A user of a binary-only module is helpless to see inside the code to ascertain the source of the problem. Even when the system is kind and produces a stack trace, the functions are cryptic and the source is hidden. Ideally, nothing goes wrong, and if something does, the corporation providing the module (in this case, Nvidia) provides support and new, patched modules. Unfortunately, this ties the end-user to a single company for updates and bugfixes, something that the Linux community at large can't stand (since companies are rarely that attentive and never on the ball.)
Problems with the Nvidia Kernel Module
Where to start? The kernel module provided by Nvidia is excellent for many purposes. For others it is kludgy, and for others it simply does not work. There are a number of problems associated with running an accelerated X server. In the interest of being succinct, I'll list them.
  • Long wait for 2.6 compatible interface
  • Larger memory footprint than similar modules
  • Not framebuffer aware and causes problems with consoles
  • 2.6 stability issues
  • Memory leak
To be fair, most of these problems don't affect the average Linux end-user on a day-to-day basis, and most advanced users tinker these problems to death or discover a workaround. For those still struggling with these and other problems, I'm going to compile a list of possible solutions that have been tried and their results.
Getting the module to work with the 2.6 series kernel
This point might itself be moot simply due to the now official support of the 2.6 kernel by Nvidia However, if like me you're interested in getting the 2.6 kernel to work on a laptop that has fallen out of the Nvidia support cycle, you will still need to a patch your driver with the information found at minion.de Also, if you are interested in using the 2.6 implementation of ACPI, your solution may require a bit more finesse. There is a kernel parameter to leave ACPI in place while taking IRQ control away from it. Depending on how you configure this, it can severely affect performance of 3d acceleration. On my desktop, the relevant parameters are as follows:
acpi=off noapic
Turning off acpi without providing the noapic option severely impeded performace in 3d accelerated applications.

If you experience stability problems using the 2.6 kernel and the nvidia kernel module, there are certain steps that will get you started. Try them in order and test after each until the stability problems stop or your computer starts smoking.
  1. First upgrade to the latest 2.6 kernel, 2.6.4 or later, since this solves problems with the search.c file that were recently corrected.
  2. When compiling the kernel, make sure agpgart is compiled as a module, not into the kernel itself. In any configuration, compiling agpgart into the kernel will cause problems.
  3. Compile a new interface from the latest Nvidia driver.
  4. Add the kernel parameters acpi=off and noapic
  5. In your XF86Config file, add the line Option NvAGP "1" to the "Driver" section.
  6. Use the kernel interface patches found at minion.de, preferably for your last working module version>.
  7. Compile without framebuffer support if you have compiled with it.
  8. Use ALSA if you aren't already using it, it sounds odd, but it works as a last resort and you should do it anyway.
  9. Replace the Option NvAGP "1" line in your XF86Config file with Option NvAGP "0" (USE THIS ONLY AS A LAST RESORT, YOUR PERFORMANCE WILL SUFFER!!)
In closing...
There are some very quality guides out there to help solve the problems surrounding the most notorious of binary-only modules. You're not likely to get help from kernel developers regarding this issue since Nvidia hides the details of the module's actions from them. The downside with binary modules, they will tell you, is that you put yourself in the position of relying on the supplier to fix your problems, and they will refer you to Nvidia in the hopes that you will see the light and switch over to an open-source module. This is all true, kernel developers have every right to make things as difficult as possible for binary-only modules, since all other modules are free for them to patch and correct as possible. For the end-user however, this can be an annoyance when frequent X crashes threaten the stability of the system.

Sites you should visit when seeking help with Nvidia modules:
  • Nvidia, the most obvious.
  • minion.de, a great site with patches for several legacy versions of the driver, great for laptop users.
If you still experience problems, post a comment to my blog or to one of the forums to see if anyone else had similar problems.