−Table of Contents

Components
Overview
Detailed Instructions

The volume management functions of ZFS, combined with the COMSTAR (STMF) system in Opensolaris create a very powerful storage platform for boot-from-SAN functionality. This document endeavors to describe that configuration.

Components

The following are the major components of this system:

Opensolaris - One of the more recent builds of Opensolaris - I'm using to development builds so that I can take advantage of ZFS deduplication.
ZFS - Used for volume and filesystem management in Opensolaris. In particular, the ability to carve out volumes of a given size from a ZFS storage pool.
COMSTAR/STMF - This is the SCSI Target Mode Framework from Solaris/Opensolaris. COMSTAR features the ability to connect back-end volumes, disks, etc., to FC, SCSI, iSCSI, iSER, and FCOE targets, allowing from a wide variety of host configurations to talk to back-end storage. For the purposes of this article, our primary focus will be on the iSCSI target portion.

Overview

Here's a brief run-down of the steps required to get this going:

Configure back-end storage.
Install & Configure Opensolaris
Configure ZFS pools & volumes.
Set up COMSTAR/STMF targets
Configure gPXE
Operating System Installation/Transfer
Maintenance

Detailed Instructions

Okay - here are the actual details!

1) Back-End Storage

Setting up the back-end storage is really up to you, but there are a few things to to take into consideration. First, for storage, you have the option of attaching some sort of pre-configured array to your Opensolaris box, or attaching a bunch of individual drives. There are some advantages either way - with pre-configured arrays you have a controller (or, better yet, redundant controllers) handling all of the RAID tasks and you don't have to worry about that in the Opensolaris configuration. However, if you decide to present individual drives and use software RAID, you have some better flexibility and visibility into configuration of the array and optimization of data transfers, etc. It really probably doesn't matter much, except that, if you choose to do the RAID with ZFS, you need to make sure that you've got a beefy enough processor and enough memory to support the RAID calculations and caching operations - functions that would normally be handled by a dedicated RAID controller.

Should you choose to use ZFS for RAID, you have a few options for the type of RAID. Here's a quick run-down:

Mirroring & Striping - You can set up pairs of disks in a mirror and add multiple mirrors to a ZFS pool. This is the least efficient for disk space usage, but yields the best performance and very good resistance to disk failure.
RAIDZ-1 - RAIDZ-1 is the ZFS version of RAID 5. Single parity disk and striping across the disks in the RAID. There are some other technical differences between RAIDZ-1 and RAID 5, but that's far outside the scope of this document. This is very efficient for storage space and offers fairly good read performance, but not quite so good write performance.
RAIDZ-2 - RAIDZ-2 is the ZFS version of RAID 6 - that, is two parity disks, which increases the ability of the RAID to survive multiple disk failures. Read performance is good, and space efficiency is not quite so good as Z-1.
RAIDZ-3 - This ZFS RAID level offers three parity disks, even further hedging against multiple disk failures. Space efficiency is good, and read performance is good.
Striping - Not recommended unless you're okay rebuilding everything from scratch when you lose a single disk. Striping offers great performance for writes, but does not provide any protection against disk failures - a single failure with render your storage pool unusable.

Another consideration for your back-end storage, especially for SAN booting, is caching. You need to have enough memory in your Opensolaris box that a significant amount of data can be cached at any one time. If you're going to be booting 10, 20, 50, etc., machines from your SAN volume, chances are that a lot of the data that they access is going to be identical, so the ability of the storage server to cache that data is going to greatly improve performance. Also, ZFS supports using solid state disks as caching disks, and will actually automatically keep track of the most frequently used data in your pool and cache it on the solid state disks you specify. If you can get your hands on some solid state disks and add these to your ZFS pool as caching disks, you will further be able to improve performance of your boot-from-SAN clients.

Finally, you want to be able to make sure your storage is redundant enough to survive some failures. Simple things, like RAID, redundant power supplies, etc., can protect against the inevitable power supply failure, tripping over the power cord, hard drive failure, etc. However, if your environment is critical enough, you probably also want to investigate some of the HA features of Opensolaris. First, there's the ha-cluster framework, which provides things like IP address failovers. Second, there's the ability of ZFS to send and receive data from remote systems. If your Opensolaris box is not attached to shared storage this can protect you from complete failure of the system by synchronizing the data with another box. Obviously you have to have enough storage on the other system to allow for this, but that storage probably does not need to be as redundant as the primary storage - e.g. your primary storage may use RAID5 or RAIDZ-1, but perhaps your secondary Opensolaris box could just be striped.

2) Install & Configure Opensolaris

The second step is to grab Opensolaris and install and configure it. As I mentioned before, I use the development builds to take advantage of some of the recent features of the system. This is up to you to determine your level of comfort with using “stable” builds vs. “development” builds - there are certainly trade-offs. If you do use the development builds, make sure to keep an eye on the bug list for Opensolaris and try to spot any red flags for things like data loss and corruption. Crashes of the O/S can usually be recovered from (or hedged against with an HA setup), but bugs that involve loss or corruption of your data and hard ones to deal with.

The actual installation of Opensolaris is beyond the scope of this document, but here are a few things you need to make sure you have installed:

ZFS - This is the default, and is used as the root filesystem by default, but still worth mentioning.
COMSTAR - SUNWstmf, SUNWiscsit, SUNWfcoet, SUNWsrpt - These packages provide the target framework that will allow clients to connect.

Once you have Opensolaris installed, you need to make sure that the COMSTAR services are running. You should enable stmf and whatever target services you need. For iSCSI, enable iscsi/target. FC and SCSI, stmf is fine. For FCOE, enable fcoe_target. For SRP, enable ibsrb/target.

After the services are running, you can move on to setting up ZFS as desired.

3) ZFS Pool & Volume Configuration

Now for the fun part - setting up ZFS! First, how you set up ZFS depends heavily upon whether you've decided to use an external array presented as a single LUN to your Opensolaris machine or disks presented individually. If you decide to use an external RAID array, simply create your storage pool with the one disk you have presented:

# zpool create <pool name> <disk>

Note that if you're not logged in as root you'll need a “pfexec” or “sudo” in front of that. If you've decided to let ZFS manage the RAID, you'll need to do the following:

# zpool create <pool name> <RAID Type> <disk 1> <disk 2> <disk 3> <disk 4>… <disk n> <spare> <disk x>…“ Remember, RAID Type can be raidz1, raidz2, raidz3, or mirror. If you do mirrors, separate each pair of disks with another “mirror” command:mirror <disk 1> <disk 2> mirror <disk 3> <disk 4>. Once your pool is created, you can start creating filesystems and volumes. Filesystems aren't generally useful for iSCSI booting, as they're only exportable via NFS or CIFS. For iSCSI or FC booting, we're really more interested in volumes. Creating volumes with ZFS is done via the “zfs create -V” command. The syntax is something like this:# zfs create -V <size> [-s] <storage pool>/<volume name>''

Size can be specified any number of ways - 3G, 1024M, etc. The -s option enables thin provisioning, which sets the volume to a certain size but does not allocate the disk space. This allows for flexibility in disk allocation - you can allocate more than you have without running out of disk - but also opens you to the risk that you'll fill up your disks with little or no warning. So, if you decide to use thin provisioning, watch your volumes and storage pools closely to make sure you don't fill them up.