Amazon Ad

Featured Post

Wednesday, September 13, 2006

Setting VERITAS NetBackup with a non-root NDMP user

VERITAS NetBackup NDMP setup with filers - How do I change the NDMP authentication from using root to another non-root user?
Add a user called ndmpuser for NDMP usage on the filer:
useradmin user add <ndmpuser> -g Users
ndmpd password ndmpuser

Type the challenge password into the following command on the NBU server:
set_ndmp_attr -insert -auth <filer_hostname> ndmpuser <password>

In case things go bad, delete and recreate the filer entries:
set_ndmp_attr -delete -robot

Saturday, September 09, 2006

Sharing of Oracle Environments thru NFS

Question: What can be shared in Oracle environments?
Dr. Toaster's Answer:
There are 3 types of sharing models available when thinking of Oracle and NFS:
1. Shared Oracle Binaries - Sharing a single Oracle DB installation and configuring multiple databases to mount and use that single directory via NFS mounts.
2. Shared Oracle_HOME - Enabling single databases to share the same binaries, similar to RAC. Oracle 9i originally did not support Shared Oracle_HOME, but at this point it is supported to use a Shared Oracle_HOME over NFS mounts. Nevertheless, it is suggested to use it for testing and development environments. The Network Appliance™ Best Practice Guidelines for Oracle® recommends not to use it for production and HA environments.
3. Shared APPL_TOP - Sharing the Oracle E-Business Suite binaries. See Reducing Administration Costs for Oracle® E-Business Suite Using NetApp Filers for more information.

Thursday, September 07, 2006

What is space reservation?

Question: I have 100GB LUN inside a 200GB volume - I am trying to expand it using SnapDrive but I can only increase it by a few GB. Can't I expand it by more?
Dr. Toaster's Answer: It is important to first understand the concept of disk space reservation. The simplest way to explain that is to think of a regular magnetic disk drive - it has addresses that hosts can refer to in order to read or write data - and hosts can always send I/O write commands to these data block addresses again and again - that's the whole idea of an address space.
The WAFL filesystem has a different write allocation policy - when you keep snapshots - point-in-time views of the same filesystem, WAFL keeps blocks untouched, so that one can recover from snapshots that reference these blocks.
So to connect the story here - LUNs are implemented on top of WAFL - so when snapshots are being taken, if a host (that is totally unaware of this virtualization) is writing data consistently into a LUN, more and more data blocks will be held "captive" by the snapshots. This is where space reservation algorithms kick in - to protect this behaviour from causing SCSI write errors to LUNs - by default, every LUN consumes its original size plus another 100% of its size as well as a protection against this rare case of frequent writes into a LUN on a volume with many snapshots created.

The simple solution to enable the expansion of the LUN is to enlarge the size of the underlying volume - with FlexVols that would be an easy change. If there is not enough disk space in the aggregate, one can reduce the amount of space reservations per volume using a command such as:
vol options vol_name fractional_reserve 80

where 80 is a number below 100. Note that if snapshots will be taken and the rate of changes in the volume will be higher than 80% then SCSI writes to the LUNs may fail - in which case the filer will take the LUNs in the volume offline, and manual action will have to take place to clean up space (most likely by deleting some snapshots) and online the LUNs again (lun online lun_pathname).

A few other important notes:
1. Use the following commands to review the status of space reservation:

df
df -r
snap delta

2. Data ONTAP 7.2 adds another solution which is to allow volumes to automatically clean up old snapshots and/or grow:

vol options vol_name volume_grow on
vol autosize vol_name -m 1000g -i 1g on

where 1000g is the maximum size that the autosize feature will allow the volume to grow to, and 1g is the increment.
The autosize feature will try to increment the volume size by 1GB increments, and if the aggregate is full it will try to delete snapshots. It is also possible to start with deleting snapshots by using the snap_delete policy instead of the volume_grow I suggest above.

Friday, August 25, 2006

Q&A on Microsoft Exchange, Snapshots and ESEUTIL

Question: I have a 2-years old Exchange 2003 database that I never defragmented. I am about to migrate the database to NetApp using SnapManager for Exchange. Should I run ESEUTIL before the migration?
Dr. Toaster's Answer: I would recommend this as there are a number of other advantages in addition to optimizing the database. A totally new database actually gets created and populated with all valid records in the current one. This directly addresses any underlying issues that there might be with a db as "old" as this (incremented repair count etc). It would be a very good preventative measure.
In general, after the migration, there is no need to run Exchange and/or NTFS defragmentation. It is recommended to review and upgrade to Data ONTAP 7.2 and enjoy the performance improvements that are available thru the new WAFL extents feature in 7.2.

Question: What are common best practices for snapshot schedules in SnapManager for Exchange?
Answer: That entirely depends on the amount of time that one could spend during a restore and this is directly affected by the amount of logs that need to be rolled. Of course, more time between backups = more logs to roll during a restore. I would recommend a backup every 4 hours, and then focus more on deferred/remote verification in conjunction with this.

Question: SnapManager Verification and the new SME 3.2 I/O throttling for verification - does it work well enough to configure verification during daytime?
Answer: See the previous answer - consider differing or using a remote verification server first if this is a high-IO environment. ESEUTIL throttling works very effectively, and translates to *real* maximum throughput for ESEUTIL, but the "cost" associated with this is that the verification itself would take longer to complete.
With a small amount of users (a few hundreds) it is possible to perform verification at backup-time. What actually takes longest is the mounting/unmounting of the LUNs, and that is another reason you might want to consider a less-aggressive backup schedule.

Question: How long is the Restore process of SnapManager for Exchange?
Answer: It is first important to note that SnapManager for Exchange releases use 2 distinct method to restore data:
1. Single File SnapRestore
2. LUN Clone Split

The Single File SnapRestore was the original form of LUN restoration, introduced as a way to recover large database files back in Data ONTAP 6.2, even prior to the announcement of FCP and iSCSI LUNs support. It uses a sophisticated algorhitm that goes thru the inodes of a file and relinks them to the relevant data blocks from a snapshot. Usually Single File SnapRestore (SFSR) takes a few seconds to a few minutes, but in some cases it can take longer than that, for very large files (hundreds of GBs).
The LUN Clone Split was invented to quickly create a read/write virtual entry point that resembles the original LUN. The method it works is not linear to the size of the LUN, so it takes a few seconds no matter the size.

SnapManager for Exchange 3.2 and Data ONTAP 7.1 support the new LUN Clone Split, and are the recommended method at this point. Make sure that the following option is turned on (do not try to turn it on prior to 7.1, the feature was hidden and would be ignored by SME 3.2):
options lun.clone_restore

Thursday, August 24, 2006

SnapManager for Exchange Restore fails with: 0xe000020d

The Problem: SnapManager for Exchange Restore fails with: 0xe000020d
Components:
Microsoft iSCSI Initiator 2.02
SnapDrive 4.1
SnapManager for Exchange 3.2

The following error appears in the Restore log in SnapManager for Exchange:


***VERIFYING DATABASES IN SNAPSHOT

Verifying databases in snapshot...
Mounting Snapshot [exchsnap__monkey__recent] for LUN F
[SnapDrive Error]: Error 0xe000020d occured. Description is not found.(SnapDrive Error Code: 0xe000020d)
Unable to mount snapshot, aborting database verification...
Failed to verify log and database for restore operation.
Error code: 0xe000020d, storage group name: First Storage Group
A filer autosupport message is sent on failure.

***SNAPMANAGER RESTORE JOB ENDED AT: [08-24-2006 11.09.00]
Failed to restore storage group.

**** RESTORE RESULT SUMMARY*****
Restore failed.


SnapManager for Exchange is attempting to mount the snapshotted lun as part of the restore. This is actually performed by SnapDrive, which in turn is integrated to work with the Virtual Disk Service (VDS).

The Solution:
Restart the Virtual Disk Service service - this will restart the SnapDrive service as well.
Close and re-open the Computer Management screens as well as SnapManager for Exchange and retry the restore operation.

Tuesday, August 22, 2006

How to workaround ONTAP upgrade issues?

The problem: Having trouble upgrading ONTAP

Explanation:
A small and useful undocumented tip - if your upgrades fail, try to delete the contents of /etc/boot (\\filer\etc$\boot in Windows).
While it sounds scary, the filer doesn't really reference this directory other than when the download command is used, so it's practically safe to delete it, and then retry the software installation.
Another tip is to stop using this old setup.exe or tar file trick but rather copy the setup.exe file to /etc/software (\\filer\etc$\software) and use software install setup.exe; download instead.

Thursday, August 10, 2006

How to configure NetBackup SSO with NDMP?

The problem: Configure the filer to support NetBackup Shared Storage Option (SSO)

Explanation:
Data ONTAP 7.1.1 and 7.2.1 (not in 7.2) add the support for SCSI Reserve/Release, replacing the old Persistent Reservations code that previously existed in ONTAP, and is now supported for NetBackup SSO (starting with NetBackup 6.0).

To enable:
options tape.reservations scsi

Wednesday, August 02, 2006

Configuring FCP Multipathing in RedHat Linux/CentOS

The task: Creating LUNs on a filer, discover them in RedHat Enterprise Linux or CentOS, and multipath them with the device-mapper-multipath (dm-multipath) mechanism.
Assumptions:
1. The FC HBA drivers are already installed and running.
2. This example is using two single-port QLogic HBAs.
3. The multipathing package is already installed - look for dm-multipath or device-mapper-multipath.

1. Connect the filer using the cabling rules documented in the guides.
2. Create a lun on the filer:


lun create -s -t linux /vol/volname/lunname


For example:

lun create -s 100g -t linux /vol/vol1/db1disk1.lun

3. Detect the relevant FC HBA initiator WWNs on the filer:

fcp show initiator

4. Configure the WWNs into an initiator group on the filer:

igroup create -f

5. Check if the Linux already recognizes the luns, it should:

[root@hostname /]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: NETAPP Model: LUN Rev: 0.2
Type: Direct-Access ANSI SCSI revision: 04
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: NETAPP Model: LUN Rev: 0.2
Type: Unknown ANSI SCSI revision: 04
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: NETAPP Model: LUN Rev: 0.2
Type: Unknown ANSI SCSI revision: 04
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: NETAPP Model: LUN Rev: 0.2
Type: Direct-Access ANSI SCSI revision: 04

6. Configure the multipath configuration file:

root@hostname /]# cat /etc/multipath.conf

## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}

# Blacklist all devices by default. Remove this to enable multipathing
# on the default devices.
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
devnode "^cciss!c[0-9]d[0-9]*"
}

multipaths {
devices {
device {
vendor "NETAPP"
product "LUN"
path_grouping_policy multibus
features "1 queue_if_no_path"
path_checker readsector0
path_selector "round-robin 0"
failback immediate
no_path_retry queue
}
device {
vendor "NETAPP"
product "LUN"
path_grouping_policy multibus
features "1 queue_if_no_path"
path_checker readsector0
path_selector "round-robin 0"
failback immediate
no_path_retry queue
}
}

7. The following commands can be used to rescan the SCSI bus. Verify the paths using ls /proc/scsi/scsi:

echo "scsi-qlascan" > /proc/scsi/qla2xxx/0
echo "- - -" > /sys/class/scsi_host/host0/scan
echo "scsi-qlascan" > /proc/scsi/qla2xxx/1
echo "- - -" > /sys/class/scsi_host/host1/scan
echo "scsi add-single-device controller 0 0 1 ">/proc/scsi/scsi
echo "scsi add-single-device controller 0 0 1 ">/proc/scsi/scsi

8. As a result of the configuration file, there should be multipathing devices created already:

/dev/mapper/mpath0

9. Use the following commands to troubleshoot the multipathing setup:

multipath
multipath -d -l

10. Create a filesystem on top of the multipathing device. While it is possible to create partitions on the underlying luns, and then let the multipathing code discover the partitions (which seem to require a reboot, and will result in devices named /dev/mapper/mpath0p1 for example), it is not recommended and seems to be tricky at best. The steps to create the filesystem and mount it are simple:

mkfs -t ext3 /dev/mapper/mpath0

11. Mount the filesystem and check df:

[root@hostname ~]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
413267016 2766160 389508040 1% /
/dev/cciss/c0d0p1 101086 23291 72576 25% /boot
none 8126884 0 8126884 0% /dev/shm
/dev/mapper/mpath0 103212320 93852 97875588 1% /db
/dev/mapper/mpath1 103212320 93852 97875588 1% /logs


Dr. Toaster Recommends:
There are issues that require further attention when creating partitions on NetApp luns. For more information on these issues and the way to avoid them check the following links: http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=156121 and http://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb8190.

Wednesday, July 05, 2006

Identifying incoming traffic using Data ONTAP Classic pktt

The task: Identifying thru which interface/s incoming traffic is arriving to the filer?

There are different methods to find out such information:
1. Send traffic from a particular client, and use the following command:
   pktt start all -b 2m -d / -s 5m -i 
   

If this client is sending traffic, you will see messages stating packets seen. You can also check if packets have been observed from this client using:
   pktt status -v
   

2. Capture a network trace file and observe. Capture using the same filer utility as described above. The files will end up in the filer's /, and only administrator/root can read the files:
To start the trace on all interfaces:
   pktt start all -b 2m -d / -s 5m
   

To stop the trace:
   pktt stop all
   


Dr. Toaster Recommends:
Use Ethereal to decode the network trace file. Ethereal decodes all the common file and block protocols - NFS, CIFS, iSCSI, as well as management protocols such as RSH, SNMP, Telnet.

Monday, June 12, 2006

Copying ACLs without copying data

There are a few options:

Data ONTAP 7.2 and CIFS permissions

Prior to Data ONTAP 7.2, a few issues could arise when dealing with unix/mixed qtrees and files with Unix permissions:
  • Unix permissions are lost when using Microsoft Office applications to rewrite files (the new Unix permissions are inherited from the parent folder).

  • Windows file/directory Properties does not show the Security tab for unix-style qtrees.

The solution:A new option called cifs.preserve_unix_security [on|off]
Dr. Toaster recommends: Read more in Data ONTAP 7.2 Commands: Manual Page Reference, Volume 1, search for cifs.preserve_unix_security.

Tuesday, June 06, 2006

Changing RMC IP Address

The task: Changing IP address for the RMC card

The steps:
Run:
rmc setup

This wizard will ask for the IP of the RMC card.

Thursday, June 01, 2006

Configure EtherChannel trunking and VLAN tagging with Cisco 3560

The task: Configure a redundant networking setup between a NetApp filer and a Cisco switch running IOS.

The steps:
1. Pick 2 or more switch ports you would like to trunk together.
2. Configure the switch ports to use 802.1q standard:


conf t
interface gigabitEthernet 0/1
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 1 mode on
no ip address


3. Make sure that the channel-group mode is set to on and not desired which is the default.
4. Configure the filer to use a multi vif, and then configure VLAN tagged interfaces on top of the vif:

Note: In this example VLAN id 101 is used.
From /etc/rc:

vif create multi vif0 e4a e5a
vlan create -g vif0 101
ifconfig vif0-101 ... ...
...


Dr. Toaster Recommends:

  • Avoid using VLAN id 1 - The filer expects tagging of all Ethernet frames, yet some switches are configured to not tag VLAN id 1.

  • Naming conventions - I advise calling the trunks single0 and multi0 according to their type, and use names such as vif0 for any top-level trunk names.
    Some customers prefer to call the interfaces by their tasks, for example iscsi or exchange.

Sunday, January 22, 2006

What is options cifs.max_mpx?

The CIFS Technical Report describes the Multiplex-ID (MID) field as:
The multiplex ID (Mid) is used along with the Pid to allow multiplexing the single client and server connection among the client's multiple processes, threads, and requests per thread. Clients may have many outstanding requests (up to the negotiated number, MaxMpxCount) at one time.
Servers MAY respond to requests in any order, but a response message MUST always contain the same Mid and Pid values as the corresponding request message. The client MUST NOT have multiple outstanding requests to a server with the same Mid and Pid.

Dr. Toaster Recommends:
I advise setting this option to 1124 if you use any of the following products towards CIFS shares on a filer:

  • Microsoft IIS

  • Citrix

  • Microsoft SQL Server (before SQL 2000)

  • Microsoft Exchange (before Exchange 2000)



The memory allocation on every CIFS clients will increase, however, experience shows that setting this option is completely safe.
You need to restart CIFS on the filer.
Examine the output of cifs stats - watch for the last lines - Max Multiplex. Usually you see the output as the value of the highest total multiplexes incoming from one client ("What is the largest amount of MIDs incoming to the filer from one client?"), minus 1.