Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
soc:2009:pravin:journal:fedora11bko [2009/07/04 15:17] less1 |
soc:2009:pravin:journal:fedora11bko [2009/07/27 05:35] (current) less1 |
||
---|---|---|---|
Line 34: | Line 34: | ||
It seems, fedora wants something to be set for ''/dev/root'' and another thing that I need to find out is what is this ''JBD'' error about ''barrier-based sync failed'' | It seems, fedora wants something to be set for ''/dev/root'' and another thing that I need to find out is what is this ''JBD'' error about ''barrier-based sync failed'' | ||
+ | |||
+ | ==== /dev/root ==== | ||
+ | ''/dev/root: error opening volume'' is handled, I just looked for various places where ''/dev/root'' is used and wrapped it with ''if [-z ${HTTPFS}]'' \\ | ||
+ | now I need to find out what is that JBD problem | ||
+ | |||
+ | |||
+ | ==== JBD problem ==== | ||
+ | |||
+ | I initially doubted function ''do_live_overlay'' but it is clean, | ||
+ | now, concentrating parent function ''do_live_from_base_loop'' function. | ||
+ | <code> | ||
+ | mount -n -o ro,remount /dev/mapper/live-rw /sysroot | ||
+ | </code> | ||
+ | is giving above JBD related errors. But this is not fatal, it is continuing even after this | ||
+ | But it seems there might be problems in run-init script | ||
+ | |||
+ | |||
+ | ===== solution ===== | ||
+ | Finally, problem is solved, there was one more reference to ''plymouth'' that I had to comment out | ||
+ | Most of the plymouth references where options. Means even the command fails, the execution will not stop | ||
+ | <code> | ||
+ | plymouth --show-splash || : | ||
+ | </code> | ||
+ | but there was one perticular reference, which was not made optional. I dont know if it is intentional or error. | ||
+ | |||
+ | may be, problem is in ''run-init'' script | ||
+ | |||
+ | ===== Next problem ===== | ||
+ | The graphical mode and run level 3 are not booting. There is some problem in sendmail daemon, which is causing segementation fault which just freezes the execution and I get kernel panic. I will try with disabling sendmail and see if it works.\\ | ||
+ | Here is the error | ||
+ | <code> | ||
+ | Starting Bluetooth services: [ OK ] | ||
+ | EXT-4-fs error (device dm-0): ext4_find_entry: reading #13191 offset 0 | ||
+ | /etc/rc.d/rc : line 100 : /etc/rc3.d/S80sendmail: Input/output error | ||
+ | </code> | ||
+ | and so on.... | ||
+ | |||
+ | === solution === | ||
+ | I disabled sendmail and tried. It partially worked. There was some error which I could not see because of scrolling, but then it gave login prompt. | ||
+ | The problem is, when I press enter key, it takes it as ''^M'', so I am not able to login :-( | ||
+ | |||
+ | === attempt 2 === | ||
+ | Disabled selinux with ''selinux=0'' and tried again\\ | ||
+ | I removed sendmail and other services which were related to S99 like firstboot, local. | ||
+ | but still I am getting error | ||
+ | <code> | ||
+ | EXT4-fs error (device dm-0): __ext4_get_inode_loc: unable to read inode block - inode=9642, block=827 | ||
+ | </code> | ||
+ | |||
+ | There is one warning regarding device dm-0 while boot time, I am not sure if that is relevant to this error, but I will anyway document it here | ||
+ | <code> | ||
+ | JDB: barrier-based sync failed on dm-0:8 - disabling barriers | ||
+ | </code> | ||
+ | |||
+ | |||
+ | and following are the logs from apache2 server which was serving this iso image. | ||
+ | <code> | ||
+ | $ cat /var/log/apache2/access.log | tail | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 4096 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 16384 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 32768 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 65536 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 131072 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 131072 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 4096 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 16384 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 32768 "-" "-" | ||
+ | 192.168.0.1 - - [06/Jul/2009:23:56:27 +0200] "GET /Fedora-11-i686-Live.iso HTTP/1.1" 206 65536 "-" "-" | ||
+ | </code> | ||
+ | I am not sure if they will make any sense, but I have put them for reference. | ||
+ | |||
+ | ===== Attempt-3 ===== | ||
+ | I wanted to remove ''/etc/rc3.d/S80sendmail'', but I am not able to delete it. It says it is read-only filesystem. | ||
+ | But when I run mount command, it shows the filesystem in ''rw'' mode. Following are the command I tried. | ||
+ | <code> | ||
+ | # mount | ||
+ | /dev/root on / type ext4 (rw,noatime) | ||
+ | proc on /proc type proc (rw) | ||
+ | /sys on /sys type sysfs (rw) | ||
+ | udev on /dev type tmpfs (rw,mode=0755) | ||
+ | /dev/pts on /dev/pts type devpts (rw,gid=5,mode=620) | ||
+ | |||
+ | # rm /sysroot/etc/rc3.d/S80sendmail | ||
+ | rm: cannot remove '/sysroot/etc/rc3.d/S80sendmail' : Read-only file system | ||
+ | |||
+ | # mount / -o remount,rw | ||
+ | |||
+ | # rm /sysroot/etc/rc3.d/S80sendmail | ||
+ | rm: cannot remove '/sysroot/etc/rc3.d/S80sendmail' : Read-only file system | ||
+ | |||
+ | </code> | ||
+ | |||
+ | ====== What is working? ====== | ||
+ | Single user mode is working fine. so user gets shell where he can do all he wants | ||
+ | |||
+ | |||
+ | ====== Fedora 11 live over NFS ====== | ||
+ | Trying to see if Fedora 11 live can boot over NFS.\\ | ||
+ | The reasoning behind this experimentation is that, if it works over NFS then it may help in locating the problem. | ||
+ | |||
+ | |||
+ | ==== Testing NFS setup ==== | ||
+ | Exported the "/var/www" ( because it contains all the ISO images) over NFS. Following is the excerpt from ''/etc/exports'' | ||
+ | <code> | ||
+ | /var/www *(ro,async) | ||
+ | </code> | ||
+ | This NFS volume does get mounted properly on local machine. | ||
+ | <code> | ||
+ | sudo mount 192.168.111.11:/var/www mpoint | ||
+ | $ mount | ||
+ | 192.168.111.11:/var/www on /home/pravin/Etherboot/mpoint type nfs (rw,addr=192.168.111.11) | ||
+ | </code> | ||
+ | |||
+ | ==== Testing the NFS mount from virtualization ==== | ||
+ | Fedora 11 live cd was booted with virtualBox. The network was working and even the host machine was accissible as the URL http://192.168.111.11 did correctly resolved to Host website.\\ | ||
+ | But NFS mount failed with following error. | ||
+ | <code> | ||
+ | # mount 192.168.111.11:/var/www /home/liveuser/mpoint/ | ||
+ | mount.nfs: access denied by server while mounting 192.168.111.11:/var/www | ||
+ | </code> | ||
+ | Why would this fail if mounting from localhost is working fine? | ||
+ | |||
+ | Got help from rwrc, and fixed the problem.\\ | ||
+ | It seems one more option has to be added to export options ''insecure'', restating the explanation given by rwcr | ||
+ | <code> | ||
+ | rwcr: Try making it /var/www *(ro,async,insecure) | ||
+ | rwcr: Linux generally requires NFS requests to come from privileged ports, and the Fedora livecd might be using a nonstandard NFS mounter that doesn't do that. | ||
+ | </code> | ||
+ | |||
+ | ===== Next step : mount NFS partition from initramfs ===== | ||
+ | Debian uses special program called ''nfsmount'' for NFS mounting at boot time, I will try out both. The mount command and nfsmount utility.\\ | ||
+ | Also, the kernel module be needed. | ||
+ | Following modules and executable ''/sbin/mount.nfs'' was needed for NFS to work | ||
+ | - sunrpc.ko | ||
+ | - lockd.ko | ||
+ | - auth_rpcgss.ko | ||
+ | - nfs_acl.ko | ||
+ | - nfs.ko | ||
+ | In addition to this, I had to pass option ''-o nolock'' for mount to work without problems | ||
+ | <code> | ||
+ | mount "${NFS_PATH} /iso -o nolock" | ||
+ | mount /iso/Fedora-11-i686-Live.iso /sysroot -o loop -o ro | ||
+ | </code> | ||
+ | |||
+ | NFS works fine in run level 3. also one can start the GUI with ''startx'' after logging in as ''root'' from multiuser prompt.\\ | ||
+ | The only issue with NFS_Fedora is that ''plymountd'' still creates a problem and is disabled and it somehow stops GUI coming up automatically.\\ | ||
+ | So, the user need to login in run-level 3 and then do the ''startx'' | ||
+ | |||
+ | |||
+ | ==== HTTPFS improvement ==== | ||
+ | Some progress has been done on HTTPFS front also. Till now, all the tests of booting over HTTPFS were done using ''qemu'' which is inherently slow as it is emulation. | ||
+ | When same tests were run on vmware, which is much faster, errors started comming after runlevel - 3 login prompt\\ | ||
+ | Following are the errors, which are quite same as those errors which use to come around sendmail daemon before. | ||
+ | |||
+ | <code> | ||
+ | # startx | ||
+ | -bash: /usr/bin/startx: Input/output error | ||
+ | |||
+ | # top | ||
+ | top : error while loading shared libraries: /lib/libncursesw.so.5: cannot read file data: Input/output error | ||
+ | |||
+ | EXT4-fs error (device dm-0): ext4_find_entry: reading directory #10651 offset 0 | ||
+ | </code> | ||
+ | With this observation, we can claim that errors are thrown because there is delay in response from fuse. The ext4-fs is giving up because of this delay.\\ | ||
+ | Now, I need to find a way to increase the tolerance for this delay. | ||
+ | |||
+ | |||
+ | ===== Removing plymouth ===== | ||
+ | As marc has suggested, remove ''plymount'' from original iso, and see if it works without plymount. | ||
+ | If it doesn't then blame can be surely put on ''plymount'' and not the network related complications. \\ | ||
+ | |||
+ | ==== Modifying ISO ==== | ||
+ | Now, the question is, how to add new initramfs into ISO and still keep it bootable?\\ | ||
+ | From Remastering Knoppix Howto, following is the command which works for knoppix | ||
+ | <code> | ||
+ | mkisofs -pad -l -r -J -v -V "KNOPPIX" -no-emul-boot -boot-load-size 4 \ | ||
+ | -boot-info-table -b boot/isolinux/isolinux.bin -c boot/isolinux/boot.cat \ | ||
+ | -hide-rr-moved -o /mnt/hda1/knx/knoppix.iso /mnt/hda1/knx/master | ||
+ | </code> | ||
+ | |||
+ | and I need to modify it, so that it will work for fedora. | ||
+ | |||
+ | <code> | ||
+ | mkisofs -pad -l -r -J -v -V "Fedora-11-i686-Live" -no-emul-boot -boot-load-size 4 \ | ||
+ | -boot-info-table -b isolinux/isolinux.bin -c isolinux/boot.cat \ | ||
+ | -hide-rr-moved -o /var/www/iso/fedora_11.iso /home/pravin/Etherboot/git/BKO.git/pxeknife/red_hat/fedora_11_live_cd/newfedora | ||
+ | </code> | ||
+ | ===== running startx from single user mode ===== | ||
+ | Tried an experiment of running ''startx'' from single user mode and see if it works.\\ | ||
+ | Well it did not worked atall. | ||
+ | |||
+ | ===== Problem Found ===== | ||
+ | With help of andyTim, the cause of problem has been located.\\ | ||
+ | The ''network'' and ''NetworkManager'' do restart the networking which breaks the existing HTTPFS mount. | ||
+ | |||
+ | |||
+ | ===== Solution ===== | ||
+ | The temporary solution tried is delete both of following files | ||
+ | - ''/etc/init.d/network'' | ||
+ | - ''/etc/init.d/NetworkManager'' | ||
+ | So, user has to first boot into single user mode, delete above files, | ||
+ | and then boot into runlevel 5. | ||
+ | |||