I’ve been experiencing periodic lockups on my Ryzen-based, Fedora Workstation. It’s been a constant puzzle so I went looking for a resolution. Apparently, I’m not the only one with this problem: (Kernel Bug 196683).
As a workaround, I’ve opted to set the kernel boot parameter:
rcu_nocbs=0-15. Some have reported success with this, others have had to disable c6 states directly in the BIOS. I am opting for the former, for now, and hoping for the best. If I continue to have issues, I will update this accordingly.
These notes were written to remind me what I did, but they may be of use to others.
- Confirm that
CONFIG_RCU_NOCB_CPUis set and compiled into the kernel. This is required for the
rcu_nocbssetting to work. Luckily, It is compiled into the stock Fedora 27 kernel (source: Kernel Bug 196683: Comment 87).
``` $ fgrep CONFIG_RCU_NOCB_CPU /boot/config-$(uname -r) CONFIG_RCU_NOCB_CPU=y $ ```
If you’re using Ubuntu, you can check out Programster’s Ubuntu 16.04 - Compile Custom Kernel For Ryzen for help on compiling a custom kernel or just disable c6 states.
rcu_nocbs=0-15to the boot parameters. This setting is for the Ryzen 1700X which has 16 threads. As stated in this comment, to determine the range for your setting, determine the thread count for your CPU (16 for the 1700X) and subtract one (15).
``` $ sudo vi /etc/default/grub ``` Add `rcu_nocbs=0-15` to the list of `GRUB_CMDLINE_LINUX` options. ``` GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="rcu_nocbs=0-15" ``` You'll probably have more than one option already listed in `GRUB_CMDLINE_LINUX` just add the setting in with the rest.
- Apply the changes to the boot config and reboot.
$ sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg $ sudo reboot
This command is somewhat distro and build dependent. This works for Fedora 27 systems booting from UEFI. More info on this can be found in the Fedora 27 System Administrator’s Guide: Working with the GRUB 2 Boot Loader
If it was effective, you should see something similar in your boot logs via
kernel: Hierarchical RCU implementation. kernel: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=16. kernel: Tasks RCU enabled. kernel: RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16 kernel: NR_IRQS: 524544, nr_irqs: 1096, preallocated irqs: 16 kernel: Offload RCU callbacks from CPUs: 0-15.
We’ll see if this improves stability. I’m hoping it does. I was locking up (on average) at least once a day. The bug is still active and open, so for now the workaround appears to be the only option.