The Storage Blog: August 2006

Friday, August 25, 2006

Q&A on Microsoft Exchange, Snapshots and ESEUTIL

Question: I have a 2-years old Exchange 2003 database that I never defragmented. I am about to migrate the database to NetApp using SnapManager for Exchange. Should I run ESEUTIL before the migration?
Dr. Toaster's Answer: I would recommend this as there are a number of other advantages in addition to optimizing the database. A totally new database actually gets created and populated with all valid records in the current one. This directly addresses any underlying issues that there might be with a db as "old" as this (incremented repair count etc). It would be a very good preventative measure.
In general, after the migration, there is no need to run Exchange and/or NTFS defragmentation. It is recommended to review and upgrade to Data ONTAP 7.2 and enjoy the performance improvements that are available thru the new WAFL extents feature in 7.2.

Question: What are common best practices for snapshot schedules in SnapManager for Exchange?
Answer: That entirely depends on the amount of time that one could spend during a restore and this is directly affected by the amount of logs that need to be rolled. Of course, more time between backups = more logs to roll during a restore. I would recommend a backup every 4 hours, and then focus more on deferred/remote verification in conjunction with this.

Question: SnapManager Verification and the new SME 3.2 I/O throttling for verification - does it work well enough to configure verification during daytime?
Answer: See the previous answer - consider differing or using a remote verification server first if this is a high-IO environment. ESEUTIL throttling works very effectively, and translates to *real* maximum throughput for ESEUTIL, but the "cost" associated with this is that the verification itself would take longer to complete.
With a small amount of users (a few hundreds) it is possible to perform verification at backup-time. What actually takes longest is the mounting/unmounting of the LUNs, and that is another reason you might want to consider a less-aggressive backup schedule.

Question: How long is the Restore process of SnapManager for Exchange?
Answer: It is first important to note that SnapManager for Exchange releases use 2 distinct method to restore data:
1. Single File SnapRestore
2. LUN Clone Split

The Single File SnapRestore was the original form of LUN restoration, introduced as a way to recover large database files back in Data ONTAP 6.2, even prior to the announcement of FCP and iSCSI LUNs support. It uses a sophisticated algorhitm that goes thru the inodes of a file and relinks them to the relevant data blocks from a snapshot. Usually Single File SnapRestore (SFSR) takes a few seconds to a few minutes, but in some cases it can take longer than that, for very large files (hundreds of GBs).
The LUN Clone Split was invented to quickly create a read/write virtual entry point that resembles the original LUN. The method it works is not linear to the size of the LUN, so it takes a few seconds no matter the size.

SnapManager for Exchange 3.2 and Data ONTAP 7.1 support the new LUN Clone Split, and are the recommended method at this point. Make sure that the following option is turned on (do not try to turn it on prior to 7.1, the feature was hidden and would be ignored by SME 3.2):

options lun.clone_restore

Thursday, August 24, 2006

SnapManager for Exchange Restore fails with: 0xe000020d

The Problem: SnapManager for Exchange Restore fails with: 0xe000020d
Components:
Microsoft iSCSI Initiator 2.02
SnapDrive 4.1
SnapManager for Exchange 3.2

The following error appears in the Restore log in SnapManager for Exchange:


***VERIFYING DATABASES IN SNAPSHOT

Verifying databases in snapshot...
Mounting Snapshot [exchsnap__monkey__recent] for LUN F
[SnapDrive Error]: Error 0xe000020d occured. Description is not found.(SnapDrive Error Code: 0xe000020d)
Unable to mount snapshot, aborting database verification...
Failed to verify log and database for restore operation.
Error code: 0xe000020d, storage group name: First Storage Group
A filer autosupport message is sent on failure.

***SNAPMANAGER RESTORE JOB ENDED AT: [08-24-2006 11.09.00]
Failed to restore storage group.

**** RESTORE RESULT SUMMARY*****
Restore failed.

SnapManager for Exchange is attempting to mount the snapshotted lun as part of the restore. This is actually performed by SnapDrive, which in turn is integrated to work with the Virtual Disk Service (VDS).

The Solution:
Restart the Virtual Disk Service service - this will restart the SnapDrive service as well.
Close and re-open the Computer Management screens as well as SnapManager for Exchange and retry the restore operation.

Tuesday, August 22, 2006

How to workaround ONTAP upgrade issues?

The problem: Having trouble upgrading ONTAP

Explanation:
A small and useful undocumented tip - if your upgrades fail, try to delete the contents of /etc/boot (\\filer\etc$\boot in Windows).
While it sounds scary, the filer doesn't really reference this directory other than when the download command is used, so it's practically safe to delete it, and then retry the software installation.
Another tip is to stop using this old setup.exe or tar file trick but rather copy the setup.exe file to /etc/software (\\filer\etc$\software) and use software install setup.exe; download instead.

Thursday, August 10, 2006

How to configure NetBackup SSO with NDMP?

The problem: Configure the filer to support NetBackup Shared Storage Option (SSO)

Explanation:
Data ONTAP 7.1.1 and 7.2.1 (not in 7.2) add the support for SCSI Reserve/Release, replacing the old Persistent Reservations code that previously existed in ONTAP, and is now supported for NetBackup SSO (starting with NetBackup 6.0).

To enable:

options tape.reservations scsi

Wednesday, August 02, 2006

Configuring FCP Multipathing in RedHat Linux/CentOS

The task: Creating LUNs on a filer, discover them in RedHat Enterprise Linux or CentOS, and multipath them with the device-mapper-multipath (dm-multipath) mechanism.
Assumptions:
1. The FC HBA drivers are already installed and running.
2. This example is using two single-port QLogic HBAs.
3. The multipathing package is already installed - look for dm-multipath or device-mapper-multipath.

1. Connect the filer using the cabling rules documented in the guides.
2. Create a lun on the filer:


   lun create -s  -t linux /vol/volname/lunname

For example:


   lun create -s 100g -t linux /vol/vol1/db1disk1.lun

3. Detect the relevant FC HBA initiator WWNs on the filer:


   fcp show initiator

4. Configure the WWNs into an initiator group on the filer:


   igroup create -f

5. Check if the Linux already recognizes the luns, it should:


   [root@hostname /]# cat /proc/scsi/scsi
   Attached devices:
   Host: scsi0 Channel: 00 Id: 00 Lun: 00
     Vendor: NETAPP   Model: LUN              Rev: 0.2
     Type:   Direct-Access                    ANSI SCSI revision: 04
   Host: scsi0 Channel: 00 Id: 01 Lun: 00
     Vendor: NETAPP   Model: LUN              Rev: 0.2
     Type:   Unknown                          ANSI SCSI revision: 04
   Host: scsi1 Channel: 00 Id: 00 Lun: 00
     Vendor: NETAPP   Model: LUN              Rev: 0.2
     Type:   Unknown                          ANSI SCSI revision: 04
   Host: scsi1 Channel: 00 Id: 01 Lun: 00
     Vendor: NETAPP   Model: LUN              Rev: 0.2
     Type:   Direct-Access                    ANSI SCSI revision: 04

6. Configure the multipath configuration file:


   root@hostname /]# cat /etc/multipath.conf
   
   ## Use user friendly names, instead of using WWIDs as names.
   defaults {
           user_friendly_names yes
   }
   
   # Blacklist all devices by default. Remove this to enable multipathing
   # on the default devices.
   devnode_blacklist {
           devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
           devnode "^hd[a-z]"
           devnode "^cciss!c[0-9]d[0-9]*"
   }
   
   multipaths {
   devices {
           device {
                   vendor                  "NETAPP"
                   product                 "LUN"
                   path_grouping_policy    multibus
                   features                "1 queue_if_no_path"
                   path_checker            readsector0
                   path_selector           "round-robin 0"
                   failback                immediate
                   no_path_retry           queue
           }
           device {
                   vendor                  "NETAPP"
                   product                 "LUN"
                   path_grouping_policy    multibus
                   features                "1 queue_if_no_path"
                   path_checker            readsector0
                   path_selector           "round-robin 0"
                   failback                immediate
                   no_path_retry           queue
           }
   }

7. The following commands can be used to rescan the SCSI bus. Verify the paths using ls /proc/scsi/scsi:


   echo "scsi-qlascan" > /proc/scsi/qla2xxx/0
   echo "- - -" > /sys/class/scsi_host/host0/scan
   echo "scsi-qlascan" > /proc/scsi/qla2xxx/1
   echo "- - -" > /sys/class/scsi_host/host1/scan
   echo "scsi add-single-device controller 0 0 1 ">/proc/scsi/scsi
   echo "scsi add-single-device controller 0 0 1 ">/proc/scsi/scsi

8. As a result of the configuration file, there should be multipathing devices created already:


   /dev/mapper/mpath0

9. Use the following commands to troubleshoot the multipathing setup:


   multipath
   multipath -d -l

10. Create a filesystem on top of the multipathing device. While it is possible to create partitions on the underlying luns, and then let the multipathing code discover the partitions (which seem to require a reboot, and will result in devices named /dev/mapper/mpath0p1 for example), it is not recommended and seems to be tricky at best. The steps to create the filesystem and mount it are simple:


   mkfs -t ext3 /dev/mapper/mpath0

11. Mount the filesystem and check df:


   [root@hostname ~]# df -k
   Filesystem           1K-blocks      Used Available Use% Mounted on
   /dev/mapper/VolGroup00-LogVol00
                       413267016   2766160 389508040   1% /
   /dev/cciss/c0d0p1       101086     23291     72576  25% /boot
   none                   8126884         0   8126884   0% /dev/shm
   /dev/mapper/mpath0   103212320     93852  97875588   1% /db
   /dev/mapper/mpath1   103212320     93852  97875588   1% /logs

Dr. Toaster Recommends:
There are issues that require further attention when creating partitions on NetApp luns. For more information on these issues and the way to avoid them check the following links: http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=156121 and http://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb8190.

The Storage Blog

Amazon Ad

Featured Post