Discussion:
[ubuntu-us-mi] Need verification of nosmp bug in Hardy
Jeff Hanson
2008-07-12 23:22:47 UTC
Permalink
While trying to troublehsoot a different problem with VMware Player
(now fixed) I attempted to disable the other cores on my Phenom 9550
with the "nosmp" boot option. It hangs my system during boot every
time. There is an old bug report that describes the same issue:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/94192

I would appreciate it if some of you could verify this so I can
determine if it's just my configuration that causes it. Basically hit
"Esc" at the Grub prompt, select the default kernel config (usually
the top one) and press "e" to edit it. Move down to the kernel line
and press "e" again. You should then be able to edit the line. It
should look similar to this:
kernel=/vmlinuz-2.6.24-19-generic root=/dev/mapper/vg0-lv0_crypt ro quiet splash

Edit the end of the line to look like:
kernel=/vmlinuz-2.6.24-19-generic root=/dev/mapper/vg0-lv0_crypt ro
nosplash nosmp

Then press "Enter" and then "b" to boot with it. Don't worry, the
change is temporary. Mine hangs part way through around floppy
detection (yes I have one). This may be caused by it prompting for
the password to my encrypted / partition. I don't get the prompt so
I'm not sure. "maxcpus=0" is equivalent to nosmp. If you have a
single-core hyperthreading CPU try "noht".
Robert Citek
2008-07-13 04:12:54 UTC
Permalink
Does it hang or does it just boot very slowly?

On my ThinkPad T61 it just boots very slowly, about 5-6 times longer
than without the nosmp. Also wifi doesn't work and I get some errors
on boot up that I don't get when I do normal boot. Some details:

With nosmp:

$ ( set -x ; cat /proc/cmdline ; dmesg | grep -C 2 -i error ; ls
/sys/devices/system/cpu/ )
+ cat /proc/cmdline
root=LABEL=ubuntu ro vga=791 nosmp
+ dmesg
+ grep -C 2 -i error
[ 18.731422] Early unpacking initramfs... done
[ 18.993120] ACPI: Core revision 20070126
[ 18.993180] ACPI: Looking for DSDT in initramfs... error, file
/DSDT.aml not found.
[ 18.997686] ACPI: setting ELCR to 0200 (from 0c00)
[ 19.332954] CPU0: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz stepping 0b
--
[ 38.575652] usbcore: registered new interface driver hci_usb
[ 38.651920] ata1.00: qc timeout (cmd 0xec)
[ 38.656689] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 38.661487] ata1: failed to recover some devices, retrying in 5 secs
[ 39.123636] ricoh-mmc: Ricoh MMC Controller disabling driver
--
[ 41.736545] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 42.326739] ata1.00: qc timeout (cmd 0xec)
[ 42.332604] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 42.338494] ata1: failed to recover some devices, retrying in 5 secs
[ 42.561223] ata1: port is slow to respond, please be patient (Status 0x80)
--
[ 42.677419] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 43.272929] ata1.00: qc timeout (cmd 0xec)
[ 43.278730] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 43.284481] ata1: failed to recover some devices, retrying in 5 secs
[ 43.504172] ata1: port is slow to respond, please be patient (Status 0x80)
--
[ 51.875379] ACPI: \_SB_.PCI0.IDE0.PRIM.MSTR: found ejectable bay
[ 51.875384] ACPI: \_SB_.PCI0.IDE0.PRIM.MSTR: Adding notify handler
[ 51.875402] ACPI: Error installing bay notify handler
[ 51.875405] ACPI: Bay [\_SB_.PCI0.IDE0.PRIM.MSTR] Added
[ 53.240657] NET: Registered protocol family 10
+ ls --color=tty /sys/devices/system/cpu/
cpu0 cpuidle sched_mc_power_savings

Normal boot:

$ ( set -x ; cat /proc/cmdline ; dmesg | grep -C 2 -i error ; ls
/sys/devices/system/cpu/ )
+ cat /proc/cmdline
root=LABEL=ubuntu ro vga=791
+ dmesg
+ grep -C 2 -i error
[ 13.018474] Early unpacking initramfs... done
[ 13.280123] ACPI: Core revision 20070126
[ 13.280184] ACPI: Looking for DSDT in initramfs... error, file
/DSDT.aml not found.
[ 13.319461] CPU0: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz stepping 0b
[ 13.319483] SMP alternatives: switching to SMP code
--
[ 36.619249] ACPI: \_SB_.PCI0.IDE0.PRIM.MSTR: found ejectable bay
[ 36.619254] ACPI: \_SB_.PCI0.IDE0.PRIM.MSTR: Adding notify handler
[ 36.619277] ACPI: Error installing bay notify handler
[ 36.619280] ACPI: Bay [\_SB_.PCI0.IDE0.PRIM.MSTR] Added
[ 37.425703] NET: Registered protocol family 10
+ ls --color=tty /sys/devices/system/cpu/
cpu0 cpu1 cpuidle sched_mc_power_savings

BTW, the kernel is ...

$ uname -a
Linux Ubuntu804 2.6.24-19-generic #1 SMP Wed Jun 18 14:43:41 UTC 2008
i686 GNU/Linux

Hope this is at least somewhat helpful.

Regards,
- Robert
Post by Jeff Hanson
I would appreciate it if some of you could verify this so I can
determine if it's just my configuration that causes it.
Jeff Hanson
2008-07-13 17:40:22 UTC
Permalink
Post by Robert Citek
Does it hang or does it just boot very slowly?
Hangs forever. It may be the LUKS/dm-crypt password prompt that
causes it but it doesn't accept the password and there is not prompt
so it may be something else.
Post by Robert Citek
On my ThinkPad T61 it just boots very slowly, about 5-6 times longer
than without the nosmp. Also wifi doesn't work and I get some errors
on boot up that I don't get when I do normal boot.
Theoretically, there should be no difference except in performance.
I've had problems with the generic kernels causing flaky PS/2 port
behavor resulting in random keyboard key repeats. Switching to the
i386 kernels solved it. There are fundamental issues with interrupt
handling on SMP systems that I've seen kernel patches for so they
aren't directly comparable to non-SMP systems, but regardless
disabling use of a core/CPU/virtual HT processor should never impact
anything other than performance. It points to problems with the
kernel or initrd scripts that although won't affect most people they
shouldn't exist.

Another example of this is some of the informal tests users have
performed with Firefox to evaluate it's "robustness". They feed
random garbage-filled web pages to it to see how it fails. It often
does. These types of problems indicate faulty or weak programming.
Robert Citek
2008-07-14 16:04:42 UTC
Permalink
It points to problems with the kernel or initrd scripts that although won't affect most people they
shouldn't exist.
Agreed. So what's the next step? How do we track down this bug?

Regards,
- Robert
Jeff Hanson
2008-07-14 18:36:07 UTC
Permalink
Post by Robert Citek
It points to problems with the kernel or initrd scripts that although won't affect most people they
shouldn't exist.
Agreed. So what's the next step? How do we track down this bug?
Debugging the kernel and initrd scripts is way over my head. I added
my findings to bug #94192 for posterity. You should report your
findings also to help show it's not unique to my system. It's not a
major problem but the kernel maintainers should be aware of it the
next time they make changes to the kernel scripts. It may be a bigger
issue for the Ubuntu Server edition or for clusters.
Robert Citek
2008-07-14 19:36:51 UTC
Permalink
Post by Jeff Hanson
Debugging the kernel and initrd scripts is way over my head.
Ditto. Although it would be interesting to learn how to think about
troubleshooting these types of issues, at least at a basic level.

For example, I came across the debug and break boot parameters for Ubuntu:

https://help.ubuntu.com/community/BootOptions#Initrd%20break%20points
Post by Jeff Hanson
I added my findings to bug #94192 for posterity. You should report your
findings also to help show it's not unique to my system. It's not a
major problem but the kernel maintainers should be aware of it the
next time they make changes to the kernel scripts. It may be a bigger
issue for the Ubuntu Server edition or for clusters.
Will do. It's entirely possible that this bug is a symptom of a larger issue.

Regards,
- Robert

Loading...