Friday, November 17, 2006

Vacation

I'm leaving for vacation finally :) Then directly from my vacation I go to LISA Tech Days, so see you there.

Thursday, November 16, 2006

ZFS RAID-Z2 Performance

While ZFS's RAID-Z2 can offer actually worse random read performance than HW RAID-5 it should offer much better write performance than HW RAID-5 especially when you are doing random writes or you are writing to lot of different files concurrently. After doing some tests I happily found it exactly works that way as expected. Now the hard question was: would RAID-Z2 be good enough in terms of performance in actual production environment? There's no simple answer as in a production we do actually see a mix of reads and writes. With HW RAID-5 when your write throughput is large enough its write cache can't help much and your write performance falls down dramatically with random writes. Also one write IO to an array is converted to several IOs - so you get less available IOs left for reads. ZFS RAID-Z and RAID-Z2 don't behave that way and give you excellent write performance whether it's random or not. It should also generate less write IOs per disk than HW RAID-5. So the true question is: will it offset enough to get better overall performance on a production?

After some testing I wasn't really closer to answer that question - so I decided on a pool configuration and other details and decided to put it in a production. The business comparison is that I need at least 2 HW RAID-5 arrays to carry our production traffic. One array just can't do it and main problem are writes. Well, only one x4500 with RAID-Z2 seems to do its job in the same environment without any problems - at least so far. It'll be interesting to see how it will behave with more and more data on it (only few TB's right now) as it will also mean more reads. But from what I've seen so far I'm optimistic.

ZFS RAID-Z2 Performance

While ZFS's RAID-Z2 can offer actually worse random read performance than HW RAID-5 it should offer much better write performance than HW RAID-5 especially when you are doing random writes or you are writing to lot of different files concurrently. After doing some tests I happily found it exactly works that way as expected. Now the hard question was: would RAID-Z2 be good enough in terms of performance in actual production environment? There's no simple answer as in a production we do actually see a mix of reads and writes. With HW RAID-5 when your write throughput is large enough its write cache can't help much and your write performance falls down dramatically with random writes. Also one write IO to an array is converted to several IOs - so you get less available IOs left for reads. ZFS RAID-Z and RAID-Z2 don't behave that way and give you excellent write performance whether it's random or not. It should also generate less write IOs per disk than HW RAID-5. So the true question is: will it offset enough to get better overall performance on a production?

After some testing I wasn't really closer to answer that question - so I decided on a pool configuration and other details and decided to put it in a production. The business comparison is that I need at least 2 HW RAID-5 arrays to carry our production traffic. One array just can't do it and main problem are writes. Well, only one x4500 with RAID-Z2 seems to do its job in the same environment without any problems - at least so far. It'll be interesting to see how it will behave with more and more data on it (only few TB's right now) as it will also mean more reads. But from what I've seen so far I'm optimistic.

Tuesday, November 14, 2006

Caiman

If you install Solaris on servers using jumpstart then you never actually see Solaris interactive installer. But more and more people are using Solaris on their desktops and laptops and often installer is their first contact with Solaris. And I must admit it's not a good one. Fortunately Sun realizes that and some time ago project Caiman was started to address this problem. See Caiman Architecture document and Install Strategy document. Also see early propositions of gui Caiman installer.

Friday, November 10, 2006

ZFS tuning

Recently 6472021 was integrated. If you want to tune ZFS here you can get a list of tunables. Some default values for tunables with short comments can be find here, here, and here.

St Paul Blade - Niagara blade from Sun in Q1/07

I was looking thru latest changes to Open Solaris and found this:


Date: Mon, 30 Oct 2006 19:45:33 -0800
From: Venkat Kondaveeti
To: onnv-gate at onnv dot sfbay dot sun dot com, on-all at sun dot com
Subject: Heads-up:St Paul platform support in Nevada

Today's putback for the following
PSARC 2006/575 St Paul Platform Software Support
6472061 Solaris support for St Paul platform

provides the St Paul Blade platform support in Nevada.
uname -i O/P for St Paul platform is SUNW,Sun-Blade-T6300.

The CRs aganist Solaris for St Paul Blade platform support
should be filed under platform-sw/stpaul/solaris-kernel in bugster.

If you're changing sun4v or Fire code, you'll want to test on St Paul.
You can get hold of one by contacting stpaul_sw at sun dot com alias with
"Subject: Need St Paul System Access" and blades
will be delivered to ON PIT and ON Dev on or about Feb'8th,2007.

St Paul eng team will provide the technical support.
Please send email to stpaul_sw at sun dot com if any issues.

FYI, StPaul is a Niagara-1 based, 1P, blade server designed exclusively
for use in the Constellation chassis (C-10). The blades are comprised of
an enclosed motherboard that hosts 1 system processor, 1 FIRE ASIC, 8
DIMMS,
4 disks, 2 10/100/1000Mbps Ethernet ports, 2 USB 2.0 ports and a Service
processor. Power supplies, fans and IO slots do not reside on the blade,
but instead exist as part of the C-10 chassis. Much of the blade design is
highly leveraged from the Ontario platform. St Paul RR date per plan is
03/2007.

Thanks

St Paul SW Development Team

Wednesday, November 08, 2006

ZFS saved our data

Recently we migrated Linux NFS server to Solaris 10 NFS server with Sun Cluster 3.2 and ZFS. System has connected 2 SCSI JBODs and each node has 2 SCSI adapters, RAID-10 between JBODs and SCSI adapters was created using ZFS. We did use rsync to migrate data. During migration we noticed in system logs that one of SCSI adapters reported some warnings from time to time. Then more serious warnings about bad firmware or broken adapter - but data kept writing. When we run rsync again ZFS reported some checksum errors but only on disks which were connected to bad adapter. I run scrub on entire pool and ZFS reported and corrected thousands of checksum errors - all of them on a bad controller. We removed bad controller and reconnected JBOD to good one, run scrub again - this time no errors. Then we completed data migration. So far everything works ok and no checksum error are reported by ZFS.

Important thing here is that ZFS detected that bad SCSI adapter was actually corrupting data and ZFS was able to correct that on-the-fly so we didn't have to start from the beginning. Also if it was classic file system we probably wouldn't have even notice that our data were corrupted until system panic or fsck needed. Also as there were so many errors probably fsck wouldn't help for file system consistency not to mention that it wouldn't correct bad data at all.

Friday, November 03, 2006

Thumper throughput

For some testing I'm creating right now 8 raid-5 devices under SVM with 128k interleave size. It's really amazing how much x4500 server can do in terms of throughput. Right now all those raid-5 volumes are generating above 2GB/s write throughput! Woooha! It can write more data to disks than most (all?) Intel servers can read or write to memory :))))


bash-3.00# metainit d101 -r c0t0d0s0 c1t0d0s0 c4t0d0s0 c6t0d0s0 c7t0d0s0 -i 128k
d101: RAID is setup
bash-3.00# metainit d102 -r c0t1d0s0 c1t1d0s0 c5t1d0s0 c6t1d0s0 c7t1d0s0 -i 128k
d102: RAID is setup
bash-3.00# metainit d103 -r c0t2d0s0 c1t2d0s0 c5t2d0s0 c6t2d0s0 c7t2d0s0 -i 128k
d103: RAID is setup
bash-3.00# metainit d104 -r c0t4d0s0 c1t4d0s0 c4t4d0s0 c6t4d0s0 c7t4d0s0 -i 128k
d104: RAID is setup
bash-3.00# metainit d105 -r c0t3d0s0 c1t3d0s0 c4t3d0s0 c5t3d0s0 c6t3d0s0 c7t3d0s0 -i 128k
d105: RAID is setup
bash-3.00# metainit d106 -r c0t5d0s0 c1t5d0s0 c4t5d0s0 c5t5d0s0 c6t5d0s0 c7t5d0s0 -i 128k
d106: RAID is setup
bash-3.00# metainit d107 -r c0t6d0s0 c1t6d0s0 c4t6d0s0 c5t6d0s0 c6t6d0s0 c7t6d0s0 -i 128k
d107: RAID is setup
bash-3.00# metainit d108 -r c0t7d0s0 c1t7d0s0 c4t7d0s0 c5t7d0s0 c6t7d0s0 c7t7d0s0 -i 128k
d108: RAID is setup
bash-3.00#


bash-3.00# iostat -xnzCM 1 | egrep "device| c[0-7]$"
[omitted first output as it's avarage since reboot]
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 367.5 0.0 367.5 0.0 8.0 0.0 21.7 0 798 c0
0.0 389.5 0.0 389.5 0.0 8.0 0.0 20.5 0 798 c1
0.0 276.4 0.0 276.4 0.0 6.0 0.0 21.7 0 599 c4
5.0 258.4 0.0 258.4 0.0 6.0 0.0 22.9 0 602 c5
0.0 394.5 0.0 394.5 0.0 8.0 0.0 20.2 0 798 c6
0.0 396.5 0.0 396.5 0.0 8.0 0.0 20.1 0 798 c7
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 376.0 0.0 376.0 0.0 8.0 0.0 21.2 0 798 c0
0.0 390.0 0.0 390.0 0.0 8.0 0.0 20.5 0 798 c1
0.0 281.0 0.0 281.0 0.0 6.0 0.0 21.3 0 599 c4
0.0 250.0 0.0 250.0 0.0 6.0 0.0 24.0 0 599 c5
0.0 392.0 0.0 392.0 0.0 8.0 0.0 20.4 0 798 c6
0.0 386.0 0.0 386.0 0.0 8.0 0.0 20.7 0 798 c7
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 375.0 0.0 375.0 0.0 8.0 0.0 21.3 0 798 c0
0.0 407.0 0.0 407.0 0.0 8.0 0.0 19.6 0 798 c1
0.0 275.0 0.0 275.0 0.0 6.0 0.0 21.8 0 599 c4
0.0 247.0 0.0 247.0 0.0 6.0 0.0 24.2 0 599 c5
0.0 388.0 0.0 388.0 0.0 8.0 0.0 20.6 0 798 c6
0.0 382.0 0.0 382.0 0.0 8.0 0.0 20.9 0 798 c7
^C
bash-3.00# bc
376.0+390.0+281.0+250.0+392.0+386.0
2075.0