Grub: Fixing The Bootloader

by Pete Kelly (critter)

It's getting late and tomorrow you have a busy day. So you save your work and shut down your computer. The following day, you power up your computer and your heart sinks when you see the message "kernel panic … " or "grub error xx" and your machine will not start.

What to do? Well the good news is that the fact that you got one of those messages suggests that it is no more than a software problem, and you may just need to fix up a configuration file or re-install the boot loading program. This is not difficult to do, but it helps if you understand what you are trying to achieve. So first of all, we need a little background information. I'll try to keep this simple.

The boot process

pic

When you apply power to a computer, the processor needs to be told what to do. The motherboard stores a list of drives that the computer can boot from, and knows the order in which these should be tried, and so points the processor to the first drive on the list. The processor goes to the very beginning of the storage area of the drive to look for more information. This storage area is divided up into smaller areas, known as sectors, and the processor looks at the first sector on the drive, which is known as the Master Boot Record or MBR. The first sector of any other partition is also reserved, and is known as the boot sector. There isn't sufficient space in here to store all of the information the processor needs, so it shows the processor where to find the code that will complete the rest of the boot loading process, so that the processor can continue to a full system boot.

The boot loading program used by PCLinuxOS is called grub (GRand Unified Bootloader), and the code stored in the MBR is known as 'grub stage1.' The final bit of code loaded into memory is 'grub stage2.'

Stage 2 starts the kernel and will set up a temporary file system in memory, which contains things like modules and drivers that the kernel may need to complete a successful system boot. It does this using a file system image known as the 'initial ram disk,' or initrd.img.

Unfortunately, we have a problem here. On the one hand we have stage2, which knows where the kernel and initrd are stored on the file system, but stage1 knows nothing about file systems. Enter stage1_5. There are several of these and each one is file system specific having names such as e2fs_stage1_5 and reiserfs_stage1_5. The stage1_5 code in these files is the bridge between the two, but needs to be able to be found by stage1. Fortunately, due to the way that partitions are laid out on a drive, there are always some free sectors after the first reserved sector, and this is where the extra code goes. Stage1 knows to always look in the second sector of its root partition, and after executing the code there, grub will be able to find things in the file system.

When sufficient work has been done setting things up so that the kernel can manage physical files systems, control is handed to the kernel.

That's roughly how things work in PCLinuxOS, but there is a whole lot more to the grub system than described here, and this is not the only way to boot the system.

Recovery

What follows is applicable to PCLinuxOS distributions and has always worked for me, but may need some modification for other systems. Ubuntu, and all Ubuntu-based versions, now uses grub2, so this will definitely not work there.

From the above, we can see that grub needs to be told three things that it needs to boot your operating system:

  • Where is the kernel
  • Where is the initrd
  • Which drive or partition holds grubs stage1_5 and stage2

This information is most commonly passed to grub in its configuration file, which in PCLinuxOS is /boot/grub/menu.lst. Some systems call this file grub.conf. If the information passed in this file is incorrect, then a missing kernel or initrd will give a grub error, and a wrongly defined root file system will cause the kernel to panic.

The information can also be supplied on the command line at boot time, although this requires a little more effort to master.

To repair the system, you can boot from the Live CD that you used to install PCLinuxOS. This will get an operating system running in memory, and then you can repair any damaged files on your hard drive.

pic

Finding the information

The first thing you need to know is the drive and partition on which PCLinuxOS is installed, and grub can help here. After booting from a Live CD, open up a terminal and get administrative powers by typing:

su <enter>

You will be prompted for a password. This is the administrative or root password, and not your own user password. Please remember that now you have administration rights you should be extra careful about what you enter on the command line.

Type:

grub <enter>

You will get a message about probing, and then you will enter the grub command shell, where you can enter commands and even reinstall the resident part of grub:

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename. ]

grub>

At the new prompt, enter the following:

find /boot/grub/stage2 <enter>

You will get a list of partitions that contain the grub stage2 file. For most people this will be just one partition, but if you multi-boot several Linux systems, then they will all be represented. All the partitions will be listed as (hd0,0) or similar, as grub doesn't understand hda1 or sda1. It sees them only as drives. Grub counts starting at zero, not one. The first number is the drive number, and the second number the partition number. Type quit <enter> to leave grub.

Now we know the drive and partition that our operating system is installed on, we can mount it and have a look around. I'm going to assume now that the partition is (hd0,0), the first partition on the first drive. In the terminal, still as root, type:

mkdir /a <enter>
mount /dev/sda1 /a <enter>

You will have to change sda to hda if your drive is IDE. If you don't know, try typing fdisk -l <enter> to get a list of recognized drives.

The drive is now mounted at /a, and so the kernel and initrd should be in /a/boot. The name of the kernel and the initrd are quite long and complex, so there are usually easy to type links (shortcuts) to refer to them. The kernels name begins with vmlinuz and initrds name begins with initrd. To see them, type the following and make a note of the names.

ls /a/boot/vmlinuz* <enter>
ls /a/boot/initrd* <enter>

The name that ends in an '@' is the link, and you can use this in your grub configuration file.

Labeling the partitions

I now have two links named vmlinuz and initrd.img. I also know that the root device is (hd0,0), but I find using labels makes life easier. So, typing tune2fs -L kde4 /dev/sda1 gives my partition a label of kde4. I label all of my partitions in this manner. If you prefer, you can use the graphical PCLinuxOS Control Center, by going to Local disks > Manage disk partitions > Expert Mode to label your partitions.

The configuration file

Now I can try and fix up the menu.lst file. I'll use nano, a command line text editor that is really easy to use to edit the file, but you can use any text editor you like that can save a file as pure text with no formatting. Open the menu.lst file from your installed system:

nano /a/boot/grub/menu.lst <enter>

The original file looks like this:

timeout 10
color white/blue yellow/blue
gfxmenu (hd0,0)/boot/gfxmenu
default 0

title linux
kernel (hd0,0)/boot/vmlinuz BOOT_IMAGE=linux root=UUID=442bec9e-f143-4cc2-866c-d65a92fbac69 resume=UUID=afccaaaa-054d-424b-8a3c-093f1b2a743d splash=silent vga=788
initrd (hd0,0)/boot/initrd.img

title linux-nonfb
kernel (hd0,0)/boot/vmlinuz BOOT_IMAGE=linux-nonfb root=UUID=442bec9e-f143-4cc2-866c-d65a92fbac69 resume=UUID=afccaaaa-054d-424b-8a3c-093f1b2a743d
initrd (hd0,0)/boot/initrd.img

title failsafe
kernel (hd0,0)/boot/vmlinuz BOOT_IMAGE=failsafe root=UUID=442bec9e-f143-4cc2-866c-d65a92fbac69 failsafe
initrd (hd0,0)/boot/initrd.img

It really isn't as complicated as it looks. The first four lines set up the menu, and each block of three lines is an entry in the menu, known as a stanza. Each of the above stanzas contains only three lines, although the magazine typesetting will probably break these up. The formatting in the menu.lst file is important.

The three lines begin with 'title', 'kernel' and 'initrd.' Each should be exactly one line long, even though the 'kernel' line often becomes a rather long line. There can be additional lines in the stanza, but these lines must each be on one line.

pic

Adding a new menu entry

I'm going to add a new stanza (menu item) at the beginning, i.e. between the line that reads 'default 0' and the one that reads 'title linux:'

title kde4
kernel (hd0,0)/boot/mykernel root=LABEL=kde4
initrd (hd0,0)/boot/myinitrd

Make sure that you leave a blank line before, and after, each stanza or grub will not know where each stanza starts or ends.

After typing that in, hold down the Control key and press X, you will be prompted to save the file. Just say yes.

That should be enough to get you booted, although you will want to pretty it up once you are satisfied. You could, in fact, have just those three lines in menu.lst.

I didn't attempt to repair the file. Rather, I wrote my own set of instructions that I knew to be correct to the grub configuration file, menu.lst, and in that way, I am in control. I also left the original set of instructions for grub intact. Later, when I am sure that I have a bootable system, I can go back and edit the file, but I still have the original file contents.

What it all means

The four lines at the beginning of the menu.lst file perform the following functions.

Timeout=10 sets the length of time in seconds that grub will wait before booting the default menu item or, if none is defined, the first menu item. Pressing any key before that time cancels the countdown.

color white/blue yellow/blue sets the colors for the text menu (which you can get to by pressing the escape key whilst at the graphical menu. There are times when you may need to do this). The first pair of numbers sets the foreground/background colors for the bulk of the menu, and the second pair serve to highlight the selected line.

gfxmenu (hd0,0)/boot/gfxmenu tells grub where to find the graphical menu.

default 0 sets the default menu item to boot, counting from 0.

To improve on the simple 'no frills' stanza we can start adding things to the 'kernel' line.

Adding splash=silent vga=788 at the end of this line will allow the installed plymouth graphics theme to hide the scrolling text. The number 788 is optimal for most users' displays.

If you intend to use hibernation to shut down your machine, you will have to tell grub where to find the data to resume the session. This is stored on your swap partition, and for that reason, this partition should be slightly larger than the installed memory if hibernation is to succeed. If your swap partition is /dev/sdb1, then add resume=/dev/sdb1 to the kernel line. We can specify the partition in this manner, as grub will understand file systems and drive notation by the time it gets here.

A default installation of PCLinuxos supplies three stanzas.

  • The first one will provide a full graphical boot to the login screen.
  • The next one is named 'linux-nonfb,' or similar, and allows booting without a graphical boot splash, which allows you to see system messges as the system boots. This is useful for troubleshooting. You can pause the scrolling text with the Scroll Lock key on your keyboard.
  • The last one, named 'failsafe,' will boot to a limited shell in single user mode, where you can perform some administrative tasks, such as file system checks and root password recovery. When all is well, typing init 5 should get you back up to the login screen.

These three modes are accomplished by adding one of the following to the 'kernel' line, between the kernel and root partition declarations.

BOOT_IMAGE=linux
BOOT_IMAGE=linux-nonfb
BOOT_IMAGE=failsafe

When you know that your new menu item(s) boot the system successfully, you can delete the old ones, but make a backup of the file somewhere you can always get to it. If you precede each line with a '#' then that item will not appear on the menu. Any line starting with a '#' is treated as a comment, and is not executed.

Reinstalling the boot loader

If the grub set up itself has become corrupted, then it is a relatively easy task to reset it. As before, open a terminal, use the command su to get root privileges, and start the grub command shell with the command

grub <enter>

Use grub's find command to discover which partitions have grub's files on them.

grub> find /boot/grub/stage2 <enter>
(hd0,0) <--- This is grubs output

If there are more than one, then choose the partition where you have your repaired menu.lst file. Tell grub about it by typing

grub> root (hd0,0) <enter>

Use the partition you chose here.

Filesystem type is ext2fs, partition type 0x83

Now tell root where to put its stage one file. This is the drive that your motherboard BIOS will try to boot from. Note that no partition number is required, as we are specifying a device.

grub> setup (hd0) <enter>
Checking if "/boot/grub/stage1" exists … yes
Checking if "/boot/grub/stage2" exists … yes
Checking if "/boot/grub/e2fs_stage1_5" exists … yes
Running "embed /boot/grub/e2fs_stage1_5 (hd0)" … 17 sectors are embedded.
succeeded
Running "install /boot/grub/stage1 (hd0) (hd0)1+17 p(hd0,0)/boot/grub/stage2 /boot/grub/menu.lst" … succeeded
Done.

That's it, we're done. Type quit <enter> to leave grub.

Multi-booting operating systems

If you want to be able to boot into one of several distributions, then this is is most easily achieved in this manner.

At the end of the installation process you are asked where you would like to install grub, and the default is the MBR of the drive you have installed to. Instead, select the installation partition. You will then be prompted for the drive to boot from. This is the same as the root - setup sequence we performed manually in grub at the terminal.

You will now have two menu.lst files, one in the /boot/grub folder of the new installation and the original one. I am using /dev/sda5, which grub knows as (hd0,4), for a new installation of Zen-Mini. Add the following lines to the original menu.lst.

title zen-mini
root (hd0,4)
configfile /boot/grub/menu.lst

When I select this menu item, I will be taken to new screen showing the menu items and the graphics of the new installation. I can now have the menu.lst file in the new installation identical to my default menu.lst, with the only change being that every occurrence of (hd0,0) becomes (hd0,4), which makes system maintenance so much easier. A stanza in the zen menu.lst might look like this:

title zen
kernel (hd0,4)/boot/mykernel root=LABEL=kde4
initrd (hd0,4)/boot/myinitrd

Here's part of my default menu.lst:

title lxde
root (hd0,5)
configfile /boot/grub/menu.lst

title e17
root (hd0,6)
configfile /boot/grub/menu.lst

title Phoenix
root (hd0,7)
configfile /boot/grub/menu.lst

I find this much easier to follow.

Adding windows to the menu

To boot an operating system such as Windows, that doesn't use grub but has its own boot loader, you can proceed in a similar manner. You have to add a stanza to menu.lst like this:

title Windows
rootnoverify (hd0,2)
makeactive
chainloader +1

Note: Change (hd0,2) to the partition that windows is installed on.

rootnoverify works like the root command, informing grub of the location of the next part of the boot code, but no attempt is made at this stage to mount the partition, as this could be problematic for grub.

makeactive sets this root partition active

chainloader +1 tells grub to look in the second sector of the partition for the boot code (the first sector is always reserved by the file system).

Using the grub commands at boot time

If booting fails, then it is still possible to get to a working system by using the grub command shell. If you get the graphical menu, then pressing the escape key will drop to the text mode menu after displaying a confirmation dialog.

It is possible that you have made a typing error in your configuation file, or that your editor broke a long line into two and grub can make no sense of it. From here, you can edit the line. Select the menu entry that doesn't boot with the arrow keys and press 'E' then enter. Pressing 'E' again will place the selected line in the grub shell, where it may be edited. The cursor will be at the end of the line, but you can move with the arrow keys, as well as the home and end keys. Press Enter to accept any changes, or the escape key to return to the previous screen without saving. Press 'D' on a highlighted line to delete it which you may need to do if your editor has broken the line and you have a half line of garbage.

Press 'B' to attempt to boot with the modified lines. The changes exist only in memory and are not made to the menu.lst file. If this is the case, this should be modified when you get a successful boot.

If you can find no errors in the lines, then all is not lost. Press 'C' to get a command line, and this puts you in a similar same environment to that we used from a terminal to reinstall the boot loader.

grub> find /boot/grub/stage2 <enter>

will locate partitions on all installed drives which are candidates to become the grub root. Use the root command to point grub at that partition.

grub> root (hd0,0) <enter>

To find the kernel on this partition, which should reside in the /boot directory, we can use grub's command completion feature. The kernel will be named vmlinuz … something.

grub> kernel /boot/vm <tab>

Pressing the tab key here tells grub to fill in as much as it can and list all possibiltes.

grub> kernel /boot/vmlinuz

Possible files are: vmlinuz vmlinuz-2.6.32.11-pclos2

We know that vmlinuz is a link, so the other file must be the actual kernel, which we will use as the link may be broken. We don't have to type in the full name, just add the hyphen and press tab to let grub fill in the rest. This also avoids typing errors.

grub> kernel /boot/vmlinuz- <tab>
grub> kernel /boot/vmlinuz-2.6.32.11-pclos2
[Linux-bzImage, setup=0x3a00, size=0x1f4400]

That seems to have worked so now we can do the same for the initrd.

grub> /boot/init <tab>
grub> /boot/initrd

Possible files are: initrd-2.6.32.11-pclos2.img initrd.img

grub> /boot/initrd- <tab>
grub> initrd /boot/initrd-2.6.32.11-pclos2.img
[Linux-initrd @ 0x1f9a3000, 0x63c8e2 bytes]

Ok! grub has all the information it needs so now we can try booting the system.

grub> boot <enter>

All of the above grub session was taken from an actual installation, so I know that it works.

The safest way to try our some of these techniques is to practice on a virtual box installation. It is easy to set one up, and there is an excellent article on this in the October 2008 issue of the magazine written by parnote, the current editor. The article installs windows, but the principles are the same for a PCLinuxOS installation.