The volume management functions of ZFS, combined with the COMSTAR (STMF) system in Opensolaris create a very powerful storage platform for boot-from-SAN functionality. This document endeavors to describe that configuration.
The following are the major components of this system:
Here's a brief run-down of the steps required to get this going:
Okay - here are the actual details!
Setting up the back-end storage is really up to you, but there are a few things to to take into consideration. First, for storage, you have the option of attaching some sort of pre-configured array to your Opensolaris box, or attaching a bunch of individual drives. There are some advantages either way - with pre-configured arrays you have a controller (or, better yet, redundant controllers) handling all of the RAID tasks and you don't have to worry about that in the Opensolaris configuration. However, if you decide to present individual drives and use software RAID, you have some better flexibility and visibility into configuration of the array and optimization of data transfers, etc. It really probably doesn't matter much, except that, if you choose to do the RAID with ZFS, you need to make sure that you've got a beefy enough processor and enough memory to support the RAID calculations and caching operations - functions that would normally be handled by a dedicated RAID controller.
Should you choose to use ZFS for RAID, you have a few options for the type of RAID. Here's a quick run-down:
Another consideration for your back-end storage, especially for SAN booting, is caching. You need to have enough memory in your Opensolaris box that a significant amount of data can be cached at any one time. If you're going to be booting 10, 20, 50, etc., machines from your SAN volume, chances are that a lot of the data that they access is going to be identical, so the ability of the storage server to cache that data is going to greatly improve performance. Also, ZFS supports using solid state disks as caching disks, and will actually automatically keep track of the most frequently used data in your pool and cache it on the solid state disks you specify. If you can get your hands on some solid state disks and add these to your ZFS pool as caching disks, you will further be able to improve performance of your boot-from-SAN clients.
Finally, you want to be able to make sure your storage is redundant enough to survive some failures. Simple things, like RAID, redundant power supplies, etc., can protect against the inevitable power supply failure, tripping over the power cord, hard drive failure, etc. However, if your environment is critical enough, you probably also want to investigate some of the HA features of Opensolaris. First, there's the ha-cluster framework, which provides things like IP address failovers. Second, there's the ability of ZFS to send and receive data from remote systems. If your Opensolaris box is not attached to shared storage this can protect you from complete failure of the system by synchronizing the data with another box. Obviously you have to have enough storage on the other system to allow for this, but that storage probably does not need to be as redundant as the primary storage - e.g. your primary storage may use RAID5 or RAIDZ-1, but perhaps your secondary Opensolaris box could just be striped.
The second step is to grab Opensolaris and install and configure it. As I mentioned before, I use the development builds to take advantage of some of the recent features of the system. This is up to you to determine your level of comfort with using “stable” builds vs. “development” builds - there are certainly trade-offs. If you do use the development builds, make sure to keep an eye on the bug list for Opensolaris and try to spot any red flags for things like data loss and corruption. Crashes of the O/S can usually be recovered from (or hedged against with an HA setup), but bugs that involve loss or corruption of your data and hard ones to deal with.
The actual installation of Opensolaris is beyond the scope of this document, but here are a few things you need to make sure you have installed:
Once you have Opensolaris installed, you need to make sure that the COMSTAR services are running. You should enable stmf and whatever target services you need. For iSCSI, enable iscsi/target. FC and SCSI, stmf is fine. For FCOE, enable fcoe_target. For SRP, enable ibsrb/target.
After the services are running, you can move on to setting up ZFS as desired.
Now for the fun part - setting up ZFS! First, how you set up ZFS depends heavily upon whether you've decided to use an external array presented as a single LUN to your Opensolaris machine or disks presented individually. If you decide to use an external RAID array, simply create your storage pool with the one disk you have presented:
# zpool create <pool name> <disk>
Note that if you're not logged in as root you'll need a “pfexec” or “sudo” in front of that. If you've decided to let ZFS manage the RAID, you'll need to do the following:
# zpool create <pool name> <RAID Type> <disk 1> <disk 2> <disk 3> <disk 4>… <disk n> <spare> <disk x>…“
Remember, RAID Type can be raidz1, raidz2, raidz3, or mirror. If you do mirrors, separate each pair of disks with another “mirror” command:
mirror <disk 1> <disk 2> mirror <disk 3> <disk 4>.
Once your pool is created, you can start creating filesystems and volumes. Filesystems aren't generally useful for iSCSI booting, as they're only exportable via NFS or CIFS. For iSCSI or FC booting, we're really more interested in volumes. Creating volumes with ZFS is done via the “zfs create -V” command. The syntax is something like this:
# zfs create -V <size> [-s] <storage pool>/<volume name>''
Size can be specified any number of ways - 3G, 1024M, etc. The -s option enables thin provisioning, which sets the volume to a certain size but does not allocate the disk space. This allows for flexibility in disk allocation - you can allocate more than you have without running out of disk - but also opens you to the risk that you'll fill up your disks with little or no warning. So, if you decide to use thin provisioning, watch your volumes and storage pools closely to make sure you don't fill them up.