Table of Contents

Booting Fedora with BKO

Where I am getting stuck

Following are the messages which are showen before getting stuck

Mounting proc filesystem
Mounting sysfs filesystem
Creating initial /dev
Running plymouthd
udev: starting version 141

and It hangs after this point. I have bad feeling that it is not even reaching the step where it will execute /init script

squashfs: version 4.0 (2009/01/31) Phillip Lougher

Found the problem, plymouth is not working properly. I don't know the reason right now, but for time being, I have commented out the line which starts plymouth and it is progressing after that point.

Mounting iso over httpfs

Quite a few things are missing from fedora initrd, so I am adding them as and when I am encountering the errors related to them.

  1. Added ifconfig and route by using busybox soft link. (Note : I am using knoppix busybox here, Not using fedora busybox because it is not statically linked )
  2. Adding fuse module, somehow it is missing, it should have been present by default in kernel. But may be my assumption is wrong about fuse that fuse will present in kernel by default.

Finally, iso is mounted over httpfs.

Next set of problems

/dev/root: error opening volume
/dev/root: error opening volume
JBD: barrier-based sync failed on dm-0:8 - disabling barriers
transfering control to /sbin/init
Bug in initramfs /init detected. Dropping to a shell. Good luck!

It seems, fedora wants something to be set for /dev/root and another thing that I need to find out is what is this JBD error about barrier-based sync failed

/dev/root

/dev/root: error opening volume is handled, I just looked for various places where /dev/root is used and wrapped it with if [-z ${HTTPFS}]
now I need to find out what is that JBD problem

JBD problem

I initially doubted function do_live_overlay but it is clean, now, concentrating parent function do_live_from_base_loop function.

mount -n -o ro,remount /dev/mapper/live-rw /sysroot

is giving above JBD related errors. But this is not fatal, it is continuing even after this But it seems there might be problems in run-init script

solution

Finally, problem is solved, there was one more reference to plymouth that I had to comment out Most of the plymouth references where options. Means even the command fails, the execution will not stop

plymouth --show-splash || :

but there was one perticular reference, which was not made optional. I dont know if it is intentional or error.

may be, problem is in run-init script

Next problem

The graphical mode and run level 3 are not booting. There is some problem in sendmail daemon, which is causing segementation fault which just freezes the execution and I get kernel panic. I will try with disabling sendmail and see if it works.
Here is the error

Starting Bluetooth services:                                 [ OK ]
EXT-4-fs error (device dm-0): ext4_find_entry: reading #13191 offset 0
/etc/rc.d/rc : line 100 : /etc/rc3.d/S80sendmail: Input/output error

and so on….

solution

I disabled sendmail and tried. It partially worked. There was some error which I could not see because of scrolling, but then it gave login prompt. The problem is, when I press enter key, it takes it as ^M, so I am not able to login :-(

attempt 2

Disabled selinux with selinux=0 and tried again
I removed sendmail and other services which were related to S99 like firstboot, local. but still I am getting error

EXT4-fs error (device dm-0): __ext4_get_inode_loc: unable to read inode block - inode=9642, block=827

There is one warning regarding device dm-0 while boot time, I am not sure if that is relevant to this error, but I will anyway document it here

JDB: barrier-based sync failed on dm-0:8 - disabling barriers

and following are the logs from apache2 server which was serving this iso image.

$ cat /var/log/apache2/access.log | tail
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 4096 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 16384 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 32768 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 65536 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 131072 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 131072 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 4096 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 16384 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 32768 "-" "-"
192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 65536 "-" "-"

I am not sure if they will make any sense, but I have put them for reference.

Attempt-3

I wanted to remove /etc/rc3.d/S80sendmail, but I am not able to delete it. It says it is read-only filesystem. But when I run mount command, it shows the filesystem in rw mode. Following are the command I tried.

# mount
/dev/root on / type ext4 (rw,noatime)
proc on /proc type proc (rw)
/sys on /sys type sysfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
/dev/pts on /dev/pts type devpts (rw,gid=5,mode=620)

# rm /sysroot/etc/rc3.d/S80sendmail
rm: cannot remove '/sysroot/etc/rc3.d/S80sendmail' : Read-only file system

# mount / -o remount,rw

# rm /sysroot/etc/rc3.d/S80sendmail
rm: cannot remove '/sysroot/etc/rc3.d/S80sendmail' : Read-only file system

What is working?

Single user mode is working fine. so user gets shell where he can do all he wants

Fedora 11 live over NFS

Trying to see if Fedora 11 live can boot over NFS.
The reasoning behind this experimentation is that, if it works over NFS then it may help in locating the problem.

Testing NFS setup

Exported the “/var/www” ( because it contains all the ISO images) over NFS. Following is the excerpt from /etc/exports

/var/www *(ro,async)

This NFS volume does get mounted properly on local machine.

sudo mount 192.168.111.11:/var/www mpoint
$ mount
192.168.111.11:/var/www on /home/pravin/Etherboot/mpoint type nfs (rw,addr=192.168.111.11)

Testing the NFS mount from virtualization

Fedora 11 live cd was booted with virtualBox. The network was working and even the host machine was accissible as the URL http://192.168.111.11 did correctly resolved to Host website.
But NFS mount failed with following error.

# mount 192.168.111.11:/var/www /home/liveuser/mpoint/
mount.nfs: access denied by server while mounting 192.168.111.11:/var/www

Why would this fail if mounting from localhost is working fine?

Got help from rwrc, and fixed the problem.
It seems one more option has to be added to export options insecure, restating the explanation given by rwcr

rwcr: Try making it /var/www *(ro,async,insecure)
rwcr: Linux generally requires NFS requests to come from privileged ports, and the Fedora livecd might be using a nonstandard NFS mounter that doesn't do that.

Next step : mount NFS partition from initramfs

Debian uses special program called nfsmount for NFS mounting at boot time, I will try out both. The mount command and nfsmount utility.
Also, the kernel module be needed. Following modules and executable /sbin/mount.nfs was needed for NFS to work

  1. sunrpc.ko
  2. lockd.ko
  3. auth_rpcgss.ko
  4. nfs_acl.ko
  5. nfs.ko

In addition to this, I had to pass option -o nolock for mount to work without problems

mount "${NFS_PATH} /iso -o nolock"
mount /iso/Fedora-11-i686-Live.iso /sysroot -o loop -o ro

NFS works fine in run level 3. also one can start the GUI with startx after logging in as root from multiuser prompt.
The only issue with NFS_Fedora is that plymountd still creates a problem and is disabled and it somehow stops GUI coming up automatically.
So, the user need to login in run-level 3 and then do the startx

HTTPFS improvement

Some progress has been done on HTTPFS front also. Till now, all the tests of booting over HTTPFS were done using qemu which is inherently slow as it is emulation. When same tests were run on vmware, which is much faster, errors started comming after runlevel - 3 login prompt
Following are the errors, which are quite same as those errors which use to come around sendmail daemon before.

# startx
-bash: /usr/bin/startx: Input/output error

# top
top : error while loading shared libraries: /lib/libncursesw.so.5: cannot read file data: Input/output error

EXT4-fs error (device dm-0): ext4_find_entry: reading directory #10651 offset 0

With this observation, we can claim that errors are thrown because there is delay in response from fuse. The ext4-fs is giving up because of this delay.
Now, I need to find a way to increase the tolerance for this delay.

Removing plymouth

As marc has suggested, remove plymount from original iso, and see if it works without plymount. If it doesn't then blame can be surely put on plymount and not the network related complications.

Modifying ISO

Now, the question is, how to add new initramfs into ISO and still keep it bootable?
From Remastering Knoppix Howto, following is the command which works for knoppix

mkisofs -pad -l -r -J -v -V "KNOPPIX" -no-emul-boot -boot-load-size 4 \
   -boot-info-table -b boot/isolinux/isolinux.bin -c boot/isolinux/boot.cat \
   -hide-rr-moved -o /mnt/hda1/knx/knoppix.iso /mnt/hda1/knx/master

and I need to modify it, so that it will work for fedora.

mkisofs -pad -l -r -J -v -V "Fedora-11-i686-Live" -no-emul-boot -boot-load-size 4 \
   -boot-info-table -b isolinux/isolinux.bin -c isolinux/boot.cat \
   -hide-rr-moved -o /var/www/iso/fedora_11.iso /home/pravin/Etherboot/git/BKO.git/pxeknife/red_hat/fedora_11_live_cd/newfedora

running startx from single user mode

Tried an experiment of running startx from single user mode and see if it works.
Well it did not worked atall.

Problem Found

With help of andyTim, the cause of problem has been located.
The network and NetworkManager do restart the networking which breaks the existing HTTPFS mount.

Solution

The temporary solution tried is delete both of following files

  1. /etc/init.d/network
  2. /etc/init.d/NetworkManager

So, user has to first boot into single user mode, delete above files, and then boot into runlevel 5.