Old FAQ

From DellLinuxWiki

Jump to: navigation, search

Contents

[edit] Historical pre-RHEL FAQ

[edit] Meta-FAQ - Lists, People, web sites

[edit] What Linux mailing lists does Dell offer?

Dell offers several Linux mailing lists. See http://lists.us.dell.com/mailman/listinfo/ for the complete list.


[edit] Who participates on these mailing lists?

Most of the Linux OS Development team at Dell is on the lists. What we do is the engineering, testing and certification of new Red Hat Linux releases on Dell hardware. This is our primary function. Most of us get to be very familiar with the intricacies of the hardware and software during our testing and development. So, we are on this list for two reasons, 1) because we are generally linux experts who can help with problems and 2) to gauge customer demand so that we can provide feedback within Dell on what customers are asking for.

The next set of people on this list are some of the Server Support folks. These are generally the techs you might talk to on the phone if you call in and ask for support. These folks get paid to answer phones, mostly. They chime in on the lists when they can, but it's not part of their job description.

The other people on the list are customers.

The bottom line is that most of the Dell people who are on the Linux mailing lists are not people who are getting paid to act as email support as their primary job function. Thus, you may get an answer like: "Please call Dell tech support", if you ask a particularly hard question. Please realize that sometimes we don't have time to reproduce different problems and have to refer you to "the professionals", as support may not be our primary job.

Thus, we are very interested in getting customers who can help each other on the list. A couple of great examples have emerged already. Debian install disks have been made and posted to http://linux.dell.com by Dell customers. We also have a few people who actively answer email who are not Dell employees and are a great help.


[edit] Which web sites have useful Linux information?

[edit] Dell web sites:

  • www.dell.com/linux - The official home page for Linux on Dell systems.
  • linux.dell.com/linux - Dell Linux Engineering team's community web site, for more than just official Dell information.
  • lists.us.dell.com - Dell & Linux mailing lists, this FAQ.
  • support.dell.com - Support site with downloads (drivers, firmware) for your system.
  • docs.us.dell.com - Dell Documentation site, with soft copies of all those manuals you probably threw away.

[edit] Non-Dell web sites:

  • Linux-Dell-Laptops group at Yahoo!

[edit] When will (new Red Hat release / new platform) be supported/available?

Dell has a general policy of not pre-announcing products. As a mail-order company, the FTC gets somewhat unhappy if we pre-announce, then for whatever reason product isn't available within 30 days. We'll be sure to post to Linux-PowerEdge, Linux-PowerEdge-Announce, and Linux-Announce mailing lists when new products (servers or workstations, new Red Hat Linux releases, etc) are available.


[edit] Can we have the Subject: line modified to help mail filters?

(Taken verbatim from the Linux Kernel FAQ at http://www.tux.org/lkml/) The usual proposition is that a string like [LINUX-POWEREDGE] is prepended to the subject line. This question has been raised many times before, and the answer has always been "no" or "there are better ways to filter email". The list maintainers take this position. Some of the reasons are:

  • It would increase the size of the Subject: line. This is a

problem, as it limits the amount of useful information that can be seen in the Subject: line, making it harder to scan through a list of subject lines looking for interesting subjects.

  • It doesn't work for cross-posted messages, as the subject line for

a single email will change depending on which list it was sent via. Not only can this confuse simple-minded filtering recipes, it can also break threaded mail readers (people may end up reading the same message twice).

All mailing lists on lists.us.dell.com use GNU Mailman. People subscribed to linux-poweredge@dell.com, may want to use something this example procmail recipe:

# linux-poweredge
:0
* ^X-BeenThere: linux-poweredge@dell\.com
/home/fred/mfilter/linux/kernel-poweredge

People using mailagent might try this in their .rules file (thanks to Martin Smith <martin@sharrow.demon.co.uk>):

To CC: /linux-poweredge@dell.com/
{ SPLIT -adi ~/linux-poweredge }


Similarly to procmail you can omit the mail folder from the split command. This causes the split messages to go back into the mailagent queue for further processing. Most mailers with filtering capabilities can be similarly configured. If not, then you can simply install procmail. If perchance you're running a damaged OS that can't filter properly, and there is no procmail port for it, then you should either upgrade, or accept that you won't be able to filter linux-poweredge. Don't bother asking for a subject line modification.


[edit] Supported Dell systems

[edit] What PowerEdge Servers does Dell support running Linux?

Dell supports all PowerEdge servers in production since May 1999. This list includes:

  • PowerEdge 300 / 300SC
  • PowerEdge 350
  • PowerEdge 700
  • PowerEdge 750
  • PowerEdge 800
  • PowerEdge 850
  • PowerEdge 1300
  • PowerEdge 1400 / 1400SC
  • PowerEdge 1550
  • PowerEdge 1425SC
  • PowerEdge 1650
  • PowerEdge 1800
  • PowerEdge 1850
  • PowerEdge 2300
  • PowerEdge 2400
  • PowerEdge 2450
  • PowerEdge 2500 / 2500SC
  • PowerEdge 2550
  • PowerEdge 2650
  • PowerEdge 2800
  • PowerEdge 2850
  • PowerEdge 3250 (Itanium2)
  • PowerEdge 4300
  • PowerEdge 4350
  • PowerEdge 4400
  • PowerEdge 4600
  • PowerEdge 6300
  • PowerEdge 6350
  • PowerEdge 6400
  • PowerEdge 6450
  • PowerEdge 6600
  • PowerEdge 6650
  • PowerEdge 6800
  • PowerEdge 6850
  • PowerEdge 7150 (Itanium)
  • PowerEdge 7250 (Itanium2)
  • PowerEdge 8450
  • PowerEdge 8450

Dell intends to support new platforms running Linux when they first release to the public.

[edit] Why does /proc/cpuinfo report twice as many CPUs as I have?

On a newly installed linux system (PowerEdge 2650 with 2 2GHz CPUs) the following problem occured: Not 2 but 4 CPUs are reported (dmesg, /proc/cpuinfo, top), but it should be only 2 (also the BIOS reports 2 CPUs)!

The 2 2.4GHz CPUs are being recognized as 2 logical CPUs per processor. This is due to hyper-threading in the new P4 Xeon processors. For more info take a look at this: http://www.intel.com/technology/hyperthread/intro_nexgen/

You can turn off the hyperthreading at the kernel level by either recompiling and/or entering "noht" on the kernel command line. The corollary is true as well--to enable hyperthreading in the kernel either compile it in and/or enter "ht" on the kernel command line.

On most BIOSs you can also disable hyperthreading via the BIOS Setup (F2) screen immediately after power-on.


[edit] Blinking blue lights?

I have 3 1650s two of them the blue lights blink on the other the blue light stays lit steady. What is the difference?

The lights will blink when you press the 'I' button so you can identify the system within a rack (there's LED's on both the front and back...makes it easier when you have a rack full of 1U servers) If you press the "i" button once, the led's will stop blinking. (also, the led's will turn amber and blink when there is a problem with the system) - Tom


[edit] I have 4GB (or more) RAM in my system. How come Linux sees less than that?

BIOS must reserve some address space below 4GB for PCI devices such as RAID controllers, SCSI controllers, NICs, etc. RAID controllers in particular may request and be given 256MB each. This is address space that would normally be occupied by RAM, but instead is used by PCI devices.

RAM addresses start at 0 and grow up. PCI device addresses start at 4GB and grow down. As long as there is no overlap, the OS will see all available RAM and make use of it. If there is overlap, the PCI devices win, and that RAM is not made available to the OS.

This is working as designed per PCI, BIOS, and system chipset specifications.

[edit] PowerEdge 1855MC Shared LOM and IPMI BMC - TCP/IP Port Usage

The Baseboard Management Controller on the shared LOM captures traffic destined for the host operating system, on specific TCP/IP ports. These ports must be disabled in the host OS to prevent the host OS from trying to use them. Failure to disable these may result in the portmapper unable to serve up NFS mounts to clients, seemingly randomly.

Add the following to /etc/xinetd.conf:

service porthog-1
{
 server = /bin/true
 protocol = udp
 port = 623
 wait = no
 user = nobody
}

service porthog-2
{
 server = /bin/true
 protocol = tcp
 port = 623
 wait = no
 user = nobody
}

service porthog-3
{
 server = /bin/true
 protocol = udp
 port = 664
 wait = no
 user = nobody
}

service porthog-4
{
 server = /bin/true
 protocol = tcp
 port = 664
 wait = no
 user = nobody
}

Add the following to /etc/services

porthog-1 623/udp
porthog-2 623/tcp
porthog-3 664/udp
porthog-4 664/tcp

Restart xinetd (RedHat style)

 service xinetd restart

(SUSE style)

 rcxinetd restart

[edit] Dell OpenManage Applications

This category includes questions about Dell OpenManage Server Administrator (OMSA), Dell Remote Access Card (DRAC), and Dell Server Assistant (DSA).


[edit] Dell Server Assistant (DSA)

Dell Server Assistant is a useful CD that helps you to install any supported OS along with Dell-validated drivers (the latest available as of the release date of the DSA version) for all of the detected Dell-supported hardware on your Dell Server. The DSA can also be used to create or re-create the Dell Utility partion on a server -- a bootable hard drive partition which contains diagnostic utilities for troubleshooting hardware problems.

The DSA is one of the CDs shipped with your server and is also downloadable in an ISO format from support.dell.com.

You may also hear this tool referred to by other names and nicknames such as DOSA (for Dell OpenManage Server Assistant), Server Assistant, Dell Installation CD, or (erroneously and confusingly) OMSA (for OpenManage Server Assistant). Use of the latter name is improper because of the potential for confusion with the OpenManage Server Administrator which is a totally different application and is also known as OMSA.

[edit] Dell Remote Access Card (DRAC)

The Dell Remote Access Card is an add-in PCI card that allows remote viewing of the console, power-management, and other useful functions.


[edit] Dell OpenManage Server Administrator (OMSA)

The Dell OpenManage Server Administrator is Dell's one-to-one systems management solution. It includes both a kernel module that you can load, command line utilities, an SNMP sub-agent, and a secure web server. These statistics include system temperature, fan speed, service tag and more.


[edit] Why won't OMSA 1.2 install on kernels < 2.4.18-17 ?

[Note: This Q&A is retained for legacy purposes only -- OMSA is now at or above version 4.5 so this issue is no longer relevant.] Essentially, the OMSA code needs a small header file inclusion change to work with the 2.4.18-17* and above kernels. Please see the following email message describing the fix. This change will be included in the next release of OMSA. http://lists.us.dell.com/pipermail/linux-poweredge/2002-October/004614.html

A customer has tried to make this process less painful: http://lists.us.dell.com/pipermail/linux-poweredge/2002-November/005059.html

[edit] Why won't OMSA 1.2 install with kernel 2.4.18-3 as found in Red Hat Linux 7.3?

The 2.4.18-3 kernel has a nasty data corruption bug with the ext3 file system. This was corrected in the 2.4.18-4 kernel and all subsequent kernels. Dell recommends customers *not* use 2.4.18-3 for this reason, please upgrade to a later errata kernel using Red Hat Network. OMSA knows of this issue and refuses to install on 2.4.18-3 to avoid this data corruption problem.

[edit] Dell PowerEdge RAID Controllers

This category includes questions about the various PowerEdge Expandable RAID Controllers (PERC) that we support. Included are systems with RAID-on-Motherboard (ROMB), adapters produced on the AMI (megaraid), and adapters produced by Adaptec (aacraid).


[edit] Adaptec-based PERC Controllers and ROMB

This category includes questions about all Adaptec-based Dell RAID controllers, including add-in cards and Adaptec-based RAID-on-Motherboard implementations.


[edit] I want to run my own, custom kernel. Where can I find the patches?

The aacraid driver has been included in the stock kernel.org kernels since 2.4.17. You must enable CONFIG_EXPERIMENTAL=y and CONFIG_AACRAID=[y|m] for the driver to be built. http://linux.dell.com has links to historical kernel patchs for the aacraid driver.


[edit] Are there management utilities so that I can create and delete LUNs under Linux?

Yes. See http://linux.dell.com for the aacraid-util package.


[edit] What is this "AAC:Batte" message on the console?

You are needing to perform a battery recondition on the battery for the raid on the motherboard. You will need the afacli raid utility loaded to perform the recondition. You can get this from http://support.dell.com/ and after is installed command to start is "afacli". Next command is: "open afa0". Then type: "controller battery_recondition /always=TRUE" and should start the recondition. This will disable cache and slow down performance on the server and is best run at slow business hours. It will take 8 - 10 hours to complete and the error should go away. By the way this is a reoccuring instance every 6 months. It is so that the controller can discharge and then recharge the battery to extend it's life. Steve_Boley@dell.com

This is actually a truncated message. Newer aacraid drivers (for kernels 2.2.20, 2.4.13, and above) have a fix so that the entire adapter-generated messages get printed to syslog properly.


[edit] What commands does afacli take?

Documentation for afacli can be found at http://docs.us.dell.com/docs/storage/57kgr/cli/en/index.htm. Please note that Linux doesn't support the mail commands.


[edit] How do I install Red Hat Linux 7.2 on a PE1650 or PE2650?

To install by passing the PCI IDs to the driver (aacraid) as parameter, follow the steps below: 1. Boot your system with Red Hat Linux 7.2 CD 1 2. At the "boot" prompt, type "expert noprobe" 3. Say "No" when asked "Do you have a driver disk" 4. Choose a language, keyboard type, and installation method 5. Select "Add Device" when you get to "Devices" section 6. Select "SCSI" as "Device" type 7. Move down cursor to "Adaptec AACRAID (aacraid)", choose the "Specify module parameters" box 8. On the popped window,

  • For PowerEdge 1650, enter
aacraid_pciid=0x1028,0x0A,0x1028,0x011B
  • For PowerEdge 2650, enter
aacraid_pciid=0x1028,0x0A,0x1028,0x0121

9. If you want to load other drivers, choose "Add device" otherwise choose "Done" and continue with the installation.


If doing it in kickstart's ks.cfg file, you put:

device scsi aacraid aacraid_pciid=....

with the proper values for .... of course.


[edit] Which controllers use the Adaptec aacraid driver?

  • PowerEdge 2400 PERC2/Si
  • PowerEdge 2450 PERC3/Si
  • PowerEdge 4400 PERC3/Di
  • PowerEdge 2500/2500SC PERC3/Di
  • PowerEdge 2550 PERC3/Di
  • PowerEdge 1650 PERC3/Di
  • PowerEdge 2650 PERC3/Di
  • PowerEdge 4600 PERC3/Di
  • PERC2 = PowerEdge Expandable RAID Controller 2 4-channel Ultra2 RAID controller. Firmware 2.1 or greater is required.

[edit] After upgrading RHL 7.2 to kernel 2.4.9-{21,31,34} or to RHL 7.3 or later, I get a kernel panic when the aacraid driver loads

When you installed Red Hat Linux 7.2, you passed the PCI IDs of the on-board RAID controller to the aacraid driver. Your /etc/modules.conf file contains:

options aacraid aacraid_pciid=......

This option is no longer necessary in kernels 2.4.9-{21,31,34}, any Red Hat Linux 7.3 kernels, or later kernels, and its presence causes the driver load to fail. You may simply remove that line from /etc/modules.conf, remake your initial ramdisk with 'mkinitrd', and reboot.

The mkinitrd command will look something like:

/sbin/mkinitrd -v -f /boot/initrd-2.4.9-34smp.img 2.4.9-34smp

(with appropriate changes for your kernel version)

(A request to fix insmod to not fail under these circumstances has been filed and accepted with Keith Owens, modutils maintainer, and is pending for the 2.4.15 modutils release.)

[edit] LSI-based PERC controllers

This category includes questions and answers for Dell RAID controllers based on LSI (formerly AMI) products which use the megaraid driver.


[edit] The 'megaraid' driver does not see any containers that I have created.

This is usually a firmware problem. Update your firmware to the latest version. You can download firmware from http://support.dell.com/filelib


[edit] I want to run my own, custom kernel. Where can I find the patches?

The megaraid device driver team at LSI work closely with the kernel team and, for the most part, you can find the most recent validated build of the megaraid driver already in the kernel.


[edit] Are there management utilities so that I can create and delete LUNs under Linux?

Yes. See http://linux.dell.com for the megaraid-util package.


[edit] Where are Red Hat Linux 6.2 Drivers?

Tesfamariam has made a Driver Diskette for Red Hat Linux 6.2 SBE2 for use with the PERC 3/DCL, 3/DC, and 3/QC controllers. I've posted it to http://linux.dell.com in the megaraid section. Untar it to a FAT-formatted floppy so that all the files are in the top-level of the floppy, and install using 'linux dd' so it prompts you for a driver disk.


[edit] Which controllers use the LSI megaraid driver?

  • PERC - PowerEdge Expandable RAID Controller. The original PERC card was a two-channel Wide SCSI RAID adapter. This card is supported by the AMI megaraid driver, v1.05 or greater.
  • PERC2/SC - single-channel w/o battery LSI RAID controller (MegaRAID 466). This card is supported by the AMI megaraid driver, v1.05 or greater.
  • PERC2/DC - dual-channel w/o battery LSI RAID controller (MegaRAID 467). This card is supported by the AMI megaraid driver, v1.05 or greater.
  • PERC3/QC - quad-channel w/ battery. LSI RAID controller (MegaRAID 471). This card is supported by the AMI megaraid driver, v1e08 or greater.
  • PERC3/DC - dual-channel w/ battery. LSI RAID controller (MegaRAID 493). This card is supported by the AMI megaraid driver, v1e08 or greater.
  • PERC3/DCL - dual-channel, no battery. LSI RAID controller (MegaRAID 493). This card is supported by the AMI megaraid driver, v1e08 or greater.
  • PERC3/DCP - dual-channel, no battery, sold in Precision workstations. LSI RAID controller (MegaRAID 493). This card is supported by the AMI megaraid driver, v1e08 or greater.
  • PERC3/SC - single-channel w/o battery. LSI RAID controller (MegaRAID 475). This card is supported by the AMI megaraid driver, v1e08 or greater.

[edit] Why don't I get good performance from my RAID controller?

Newer PERC4e based controllers get the best performance.

[edit] What does PERC stand for?

  • PERC = PowerEdge Expandable RAID Controller
  • ROMB = Raid On MotherBoard
  • The '2' designation means Ultra2 SCSI (80MB/sec)
  • The '3' designation means Ultra160/m (160MB/sec) SCSI.
  • The '4' designation means Ultra320 (320MB/sec) SCSI.
  • The 'e' designation means PCI Express
  • The 'Si' designation means single-channel internal.
  • The 'Di' designation means dual-channel internal.
  • DC = dual-channel add-in card
  • DCL = dual-channel lite (no battery backup) add-in card
  • DCP = dual-channel card for Precision workstations
  • QC = quad-channel add-in card
  • SC = single-channel add-in card

[edit] Dell Supported Linux Distributions

This category includes questions about supported Linux installations on Dell platforms. For now, this inludes only Red Hat Linux, thus, the questions in this section deal exclusively with Red Hat Linux.


[edit] Red Hat Linux 6.2 SBE2 is no longer sold! How can I get it?

You can download Red Hat Linux 6.2 SBE2 from http://lists.us.dell.com/62sbe2/


[edit] Are there any known errata with Red Hat Linux 7.x?

Please see our online information update for Red Hat Linux 7.x. PowerEdge Servers: http://docs.us.dell.com/docs/software/oslin7x/ Precision Workstations: http://docs.us.dell.com/docs/software/oslinux/ Also, please see http://www.redhat.com/ for Red Hat released errata.


[edit] My CDROM is corrupting data.

You need to turn off DMA to your CDROM. To do this, add the line below:

hdparm -d0 /dev/cdrom

to the end of your /etc/rc.d/rc.local file. Or, to disable DMA for all IDE devices globally, pass "ide=nodma" to the kernel via the boot loader (lilo or grub).


[edit] After upgrading to kernel 2.4.18-17* or -18*, my system locks up regularly. Is there a fix?

Yes, there is a bug in the tg3 network driver which causes this lockup. The temporary workaround is to change tg3 to bcm5700 in /etc/modules.conf and reboot.

This problem has been fixed in the Red Hat erratum kernel 2.4.18-19*.


[edit] Installing Red Hat Linux 7.2 on a PowerEdge 2600

Installing Red Hat Linux 7.2 on a PowerEdge 2600.

The PowerEdge 2600 has advanced features that is not supported by the 2.4.7-10 kernel contained in the Red Hat Linux 7.2 install media. It is necessary to install Red Hat Linux 7.2 using the UP kernel and Dell supplied drivers for the Storage Controllers.

The PowerEdge 2600 can be configured with either a PERC controller or the LSI SCSI controller.

Please download the device drivers from support.dell.com and make device driver diskettes as per the instructions accompanying the download.

At the time of this writing the following files were available:

SCSI RAID: PERC 3/QC and PERC 3/DC: perc3-118c-72e.tar.gz PERC 4/DI: perc4di-118c-72c.tar.gz


SCSI Non-RAID: LSI Logic Ultra 320 SCSI v 2.00.11, A00: 22320-linux-2.00.11.tar.gz mptlinux-2.00.11rpm.tar.gz

  • NOTE You will need both files. The mptlinux-2.00.11rpm.tar.gz file

may appear under "Previous versions" on the support.dell.com website.


Also you will need to download a patched 2.4.7-10 kernel that contains the changes necessary to operate on the PE2600. Please obtain the 2.4.7-10a kernel from Red Hat's support site: http://www.redhat.com/support/errata/partners/dell/. The link on that page will take you to an FTP site. (ftp://ftp.redhat.com/pub/redhat/support/enterprise/dell/2.4.7-10a/i686/)

You can download the kernel-smp-2.4.7-10a.i686.rpm and the kernel-enterprise-2.4.7-10a.i686.rpm. You will only need to install one of those kernels. Install the smp rpm if you have a system with less than 4GB of RAM. Install the enterprise rpm if you have more than 4GB of RAM.

If you are installing on a SCSI RAID system skip ahead to the next section:

SCSI Non-RAID:

  • Boot your PE2600 with DISC 1 of the Red Hat Linux 7.2 media.
  • At the boot: prompt, type "linux dd" to prompt the install to permit

driver diskettes.

  • When prompted, if you have a driver diskette, select YES.
  • Insert the driver diskette previously made from the file

22320-linux-2.00.11.tar.gz

  • Continue the install following the on screen instructions
  • When prompted select GRUB as the bootloader.
  • When prompted to "Boot Disk Creation":

o Press CTRL-ALT-F2 to switch to the install shell console. o Type the following commands:

chroot /mnt/sysimage cd /boot /sbin/mkinitrd -f initrd-2.4.7-10.img 2.4.7-10

  • Ignore the error message about "binary operator expected"
  • If you used LILO as the boot loader you will have to rerun lilo to

handle the change just made to the initrd.

exit

o Press CTRL-ALT-F7 to return to the install screens.

  • Remove the driver diskette if you already haven't
  • Continue the install and reboot your machine
  • You must now select the UP kernel (2.4.7-10)
  • Login as root using the password you supplied during the install.
  • Remove the current smp kernel

rpm -e kernel-smp

  • Transfer the 2.4.7-10a kernel over and install

rpm -ivh kernel-smp-2.4.7-10a.i686.rpm (for systems < 4GB RAM) rpm -ivh kernel-enterprise-2.4.7-10a.i686.rpm (for systems > 4GB RAM)

  • Transfer the mptlinux-2.00.11rpm.tar.gz
  • Untar the mptlinux-2.00.11rpm.tar.gz file

tar -xvzf mptlinux-2.00.11rpm.tar.gz

  • Install the mptlinux rpm.

rpm -ivh mptlinux-2.00.11-0.i386.rpm

  • Edit the grub.conf file to change the default kernel to the SMP

enabled one by changing the default=1 line to default=0

  • Reboot into the SMP enabled kernel

reboot

The install is now complete. You should also update your Network Adapter drivers, which can be obtained at http://support.dell.com


SCSI RAID:

Using the driver diskette for the appropriate Power Edge Raid Controller (PERC) contained in your machine:

  • Boot your PE2600 with DISC 1 of the Red Hat Linux 7.2 media.
  • At the boot: prompt, type "linux dd" to prompt the install to permit

driver diskettes.

  • When prompted, if you have a driver diskette, select YES.
  • Insert the driver diskette previously made for your PERC and press enter.
  • Continue the install following the on screen instructions
  • When prompted select GRUB as the bootloader.
  • When prompted to "Boot Disk Creation":

o Press CTRL-ALT-F2 to switch to the install shell console. o Type the following commands:

chroot /mnt/sysimage cd /boot /sbin/mkinitrd -f initrd-2.4.7-10.img 2.4.7-10

  • If you used LILO as the boot loader you will have to rerun lilo to

handle the change just made to the initrd.

exit

o Press CTRL-ALT-F7 to return to the install screens.

  • Remove the driver diskette if you already haven't
  • Continue the install and reboot your machine
  • You must now select the UP kernel (2.4.7-10)
  • Login as root using the password you supplied during the install.
  • Remove the current smp kernel

rpm -e kernel-smp

  • Transfer the 2.4.7-10a kernel over to your PE2600 and install

rpm -ivh kernel-smp-2.4.7-10a.i686.rpm (for systems < 4GB RAM) rpm -ivh kernel-enterprise-2.4.7-10a.i686.rpm (for systems > 4GB RAM)

  • Insert the driver diskette again and run the install script

mount /mnt/floppy cd /mnt/floppy ./megarpmwrapper.sh

o Aswer yes to "install dialog" and to "rpm -ivh" o Press enter when prompted for command line options for rpm. o When prompted for the "Initial RAMDISK" use:

2.4.7-10asmp or 2.4.7-10aenterprise

o Change the name of the initrd image:

initrd-2.4.7-10asmp.img or initrd-2.4.7-10aenterprise.img

o Select User Level NOVICE. o Select EXIT (press enter) o Select Yes to proceed (press enter) o Select OK to specify GRUB as your bootloader. (press enter) o Select OK (press enter) o Select vi (press enter) o Exit the editor (press "ESC" then ":q" and enter) o Eject the floppy and select NO to skip reboot (press enter)

  • Edit the grub.conf file to change the default kernel to the SMP

enabled one by changing the default=1 line to default=0

The install is now complete. You should also update your Network Adapter drivers, which can be obtained at http://support.dell.com

Thanks to Robert Hentosh for creating these instructions.


[edit] How do Dell and Red Hat Support groups compliment each other?

Red Hat is focused on testing their OS for stability across the spectrum of generally available hardware platforms. Dell is focused on testing and qualifying their specific hardware on the Red Hat OS. As such, those efforts often run in parallel and have overlapping schedules. Both companies work closely to maintain identical driver revisions, but the tested driver revisions will not always be synchronized.

Dell works very hard to make sure that Red Hat (and the kernel.org kernels, in general) have very up-to-date drivers for all our hardware. For small windows of time, drivers may be available on support.dell.com which are newer than what is available in a Red Hat errata kernel, though we work closely with Red Hat to minimize that window.

Dell will support officially released Red Hat kernels and errata, but the resolution to a customer-specific issue may include installing a Dell-qualified driver to address known issues. If the customer chooses to not install these drivers, they may be exposed to these known issues until they are addressed in a future Red Hat release.


[edit] User Supported Linux Distributions

This category includes answers to commonly asked questions about Non-Dell supported Linux distributions. Debian, SuSE, Slackware, Mandrake, and other distribution questions should be found here.


[edit] How do I patch the kernel?

Here is a really detailed guide on how to apply kernel patches. http://people.redhat.com/dledford/patching.html


[edit] Help! I need a driver disk for {Slackware|Debian|SuSE|Mandrake|other distro}

A good place to find updated driver disks for Non-Dell supported Linux distributions is http://linux.dell.com. For the most part, the only device that needs to have a driver disk is the Adaptec-based RAID controllers, using the 'aacraid' driver.


[edit] How can I install a megaraid-based card on SuSE?

Chris Seline reports: I was able to get 7.2 working on the 2550. The key is to do a manual install and have the install tool search for SCSI drivers. It then successfully identifies the drives for the PERC3/DC.

Use the SP1 version of SLES9, it incorporates a new megariad driver, which seems to work well.

[davetree2-dell]



[edit] Debian On Dell Servers (2.4. 26 ISO - w/patches)

Scott Kveton was greatful enough to maintain this page: http://wiki.osuosl.org/display/LNX/Debian+on+Dell+Servers

[atr]


[edit] Debian OMSA + boot floppies

Scott Kveton provides a great resource for Debian users and also good information for people using newer forms of other distributions (Mandrake, etc) on why a default boot floppy even on a newer release might not work (such as gigE, raid cards, etc) Usually a custom boot floppy containing a kernel module and/or kernel recompile can fix most things. http://wiki.osuosl.org/display/LNX/Debian+on+Dell+Servers

[rydera]


[edit] Support

This category includes questions on how to purchase support for your Dell system running Red Hat Linux, and other avenues of support that may be available to you.


[edit] Why are all of your driver downloads in .exe format?

As you have probably noticed, most BIOS and firmware updates posted to the Dell website are floppy disk images packaged in an .exe format. We are working with the various firmware and BIOS teams to get this changed. As Dell is a very large company with many disparate groups, it is somewhat difficult to build mind-share to get this extra work done.


[edit] Clustering

This category is for questions related to High Availability (HA) and High Performance Computing (HPC) clusters.


[edit] What Clustering products does Dell offer?

Please see http://www.dell.com/clustering/ for a list of all Dell clustering products.


[edit] Network

This category is for questions related to Network Interface Cards (NICs) and general Linux networking topics on Dell systems.

[edit] How can I control the names assigned to my network interfaces?

In most distributions, these names (like "eth0", "eth1", etc.) are assigned based on the order in which the devices are discovered by the kernel. Since that order is subject to change with future changes in hardware, algorithms used in future kernels, and other factors, it is probably best to manually assign meaningful names to each interface based on MAC addresses.

On Red Hat, that's a simple matter of naming the files in /etc/sysconfig/network-scripts/ifcfg-foo, setting DEV=foo, and setting HWADDR=yourmacaddress in ifcfg-foo, for each port name foo you like. You can use ethtool -p to flash the blinking lights on the NIC port to tell visually which physical port goes with which port name.


[edit] How do I get the most up-to-date drivers for my network card?

Search through the Dell support file library for linux drivers, at http://support.dell.com/filelib. Your best bet is to use the Search function. Enter "linux" and the manufacturer of your card.

[edit] Why doesn't my Intel Gigabit card work with Red Hat Linux 7.2?

The Intel e1000 driver is broken on Multiprocessor (SMP) machines in an out-of-the-box Red Hat Linux 7.2 installation. The driver will transmit exactly 256 packets before locking up. The uniprocessor kernel does not exhibit this problem. For an updated e1000 driver, you can download and install the latest Red Hat Linux 7.2 errata kernel from http://www.redhat.com/errata, or download replacement drivers from http://support.dell.com/filelib (search for "Linux Intel"). Updated drivers can also be found on recent Dell Server Assistant version 6.7 or above CDs. If you use the Dell Server Assistant v6.7 or above CD to install Red Hat Linux 7.2, this will automatically be installed.


[edit] Gigabit Support

Newer (>2.4.25) and 2.6 kernels have many of the bcm5700 and Intel Pro 1000 Card support built in. If you're using an older version of RedHat or know your kernel is older than this, it is best to check support on the Dell Support webpages and actual kernel code itself. Googling for an answer is always helpful too.

[atr]


[edit] Crypto SSL Accelerators

Dell has rebranded quite a few Broadcom (BCM5820 to name one) SSL PCI Crypto cards. Modern FreeBSD and Linux kernels have complete working support for these. Sometimes, they're known as an "800" card meaning they can do up to 800 signs a second.

In order for these to work, the kernel first must see them. After this, taking advantage of this typically (at time of writing, 20041014) involves compiling openssl with EXPERIMENTAL (Please see http://www.openssl.org documentation) modifications so you may use this. Linux and FreeBSD have kernel level hardware offloading for software VPN (kame, racoon, ipsec, openswan, etc) so you can start to see over 10Mb/s of crypto capability without CPU lag.

Apache with mod_ssl also supports these cards but requires the custom compiled openssl libraries with the experimental option configured.

[edit] Inspiron Notebooks

[edit] How can I control the fans on my Inspiron 8000 or 8100 with Linux?

Massimo Dal Zotto [dz@cs.unitn.it] has created a device driver and tools which allow for monitoring and controlling system fans. Please see http://people.debian.org/~dz/i8k/.


[edit] How can I use the volume and start/stop/fwd/rewind keyboard buttons under Linux?

Massimo Dal Zotto [dz@cs.unitn.it] provides examples of how to do this at http://people.debian.org/~dz/i8k/.



[edit] Miscellaneous

[edit] I'm performing system updates (BIOS, firmware, etc) with an update package from Dell and the installation instructions tell me to "import the Dell Public Key". What is this and where can I obtain it?

The Dell Public Key is used to verify that the package you have was not corrupted during downloading or otherwise altered since it was originally created by Dell engineers. It can downloaded from http://lists.us.dell.com/.

An example of the use of the Dell Public Key can be found at http://support.dell.com/support/edocs/software/smdup/dup23/en/uglinhtm/2using.htm#wp1055244.

[edit] Where can I find Visio stencils of Dell products?

Dell provides Visio stencils for several of its product lines. http://www.dell.com/us/en/esg/topics/segtopic_visio.htm


[edit] Expanding Storage on Linux-Based Servers

http://www1.us.dell.com/content/topics/global.aspx/power/en/ps1q03_michael?c=us&cs=555&l=en&s=biz


[edit] Cooling

Anoop Mavath provided this from the PE list. Very handy information

Model                Amps       KVA     BTUS/   AC tons/machine@115V
DELL 1500                       2       0.23            86.273 0.007189417
DELL 1550                       1.5     0.1725  64.70475 0.005392063
DELL 6100/200           3       0.345           129.4095 0.010784125
DELL GN +                       1.1     0.1265  47.45015 0.003954179
DELL GX 110                     1.1     0.1265  47.45015 0.003954179
DELL GX 1P                      1.1     0.1265  47.45015 0.003954179
DELL GX PRO                     1.1     0.1265  47.45015 0.003954179
DELL OPTIPLEX GX 150    1.3     0.1495  56.07745        0.004673121
DELL OPTIPLEX GX 200    1.1     0.1265  47.45015        0.003954179
DELL OPTIPLEX GXI               1.1     0.1265  47.45015 0.003954179
DELL POWER EDGE 1750    2.3     0.2645  99.21395        0.008267829
DELL POWER EDGE 2200    3       0.345           129.4095 0.010784125
DELL POWER EDGE 2300    1.6     0.184           69.0184 0.005751533
DELL POWER EDGE 2400    3       0.345           129.4095 0.010784125
DELL POWER EDGE 2500    3.5     0.4025  150.97775       0.012581479
DELL POWER EDGE 2550    1.8     0.207           77.6457 0.006470475
DELL POWER EDGE 2650    1.8     0.207           77.6457 0.006470475
DELL POWER EDGE 410     3       0.345           129.4095 0.010784125
DELL POWER EDGE 4100/   3.2     0.368           138.0368 0.011503067
DELL POWER EDGE 4300    2       0.23            86.273  0.007189417
DELL POWER EDGE 6300    3       0.345           129.4095 0.010784125
DELL POWER EDGE 6450    3       0.345           129.4095 0.010784125
DELL POWER EDGE 6650    2.8     0.322           120.7822 0.010065183
DELL POWEREDGE 1650     1.1     0.1265  47.45015        0.003954179
DELL POWEREDGE 1850     2.7     0.3105  116.46855       0.009705713
DELL POWEREDGE 2450     3       0.345           129.4095 0.010784125
DELL POWEREDGE 4000     2.1     0.2415  90.58665        0.007548888
DELL POWEREDGE 4400     2.1     0.2415  90.58665        0.007548888
DELL PRECISION 330      3       0.345           129.4095 0.010784125
DELL PW 2550            3       0.345           129.4095 0.010784125
DELL Precision PW 530   3       0.345           129.4095 0.010784125

[edit] Rebuilding a RAID Container

Steve Boley from Dell provided this great document:

Dell(tm) PowerEdge(tm) Expandable RAID Controller 2, 2/si, 3/si, and 3/di Drive Rebuild Guide

HOW TO REBUILD A FAILED DRIVE WITH PERC2, 2/si, 3/si, and 3/di RAID CONTROLLERS.

First command to use before rebuilding any failed drives is the container list command. The drive id, if failed, will either be a missing member, or have an exclamation mark next to it. All drive syntax for SCSI ids are (bus[channel]:scsi id:lun[always zero]) The endstate necessary for drives to rebuild is MISSING MEMBER (remember this).

Original Drive Rebuilding

a. Quicker method (but more difficult) :

If drive is a missing member skip the next step but if an exclamation mark next to drive SCSI id is there:

1. Use disk remove dead_partitions (bus, id, lun) using the id of the drive identified during the container list command. NEVER pull a drive that is not showing as a MISSING MEMBER. Further down I will add the instructions for preparing and removing a drive that is in an array and not failed or missing (ie SMART ALERTS). Whether the container is failed or not and container is not degraded or critical, it is still a member of container even if failed and if is PULLED or INITIALIZED it will DROP the CONTAINER and DATA LOSS will occur.

2. Next command is controller rescan and then do another container list.

3. If drive is not showing as a missing member repeat the controller rescan.

4. Next command is container set failover x [x is container number found in container list] (bb,ii,ll) [bus, id, lun]. If the drive is part of more than 1 container, use the number of the lowest container and procede numerically on up all the containers the drive is part of.

For example: container set failover 0 (0,3,0) SCSI id3 for container 0

Should hear the drive array being hammered and command to check the status of the rebuild is task list.

b. Easier method but requires reboot:

You have to have 2.X firmware on the controller and 2.5 or higher is preferred. Do NOT flash firmware on controller while drive is failed!

To check if autorebuild feature of controller is enabled run 'controller show automatic_failover'. If disabled do 'controller set automatic_failover' for autorebuild to be turned on.

1. Do the container list and identify which drive is failed with exclamation mark.

2. Reboot the system and while it is reposting and before the raid controller initializes, pull the failed drive.

3. It will then come up as missing member when raid initializes. After booted up insert the drive and the autorebuild will kick in and reinitialize the drive and start rebuild. The autorebuild will only work when the drive is in missing member status.

4. task list -- will give status of rebuild

New Drive Rebuilding

1. Follow previous instructions in section a steps 1 through 3 until you have drive showing as missing member or you can follow the procedure in section b as well as an easier solution.

2. Insert new drive into system after missing member and raid controller should scan the bus and autospin the drive and autorebuild function will kick in on the controller.

3. Use the task list command to monitor the rebuild progress.

Non-Failed Drive (SMART Alert)

1. container list to see what drive is failed with exclamation mark.

2. enclosure show slot -- to show slot versus scsi id

3. enclosure prepare slot X (x is number of slot)

4. enclosure show slot X again to see if slot is deactivated

5. Remove drive and do enclosure prepare slot again to reactivate and be missing member.

6. controller rescan

7. Insert new drive and should auto initialize and start rebuild.

8. task list to follow progress of rebuild

Steve Boley Advanced SCSI Solutions Team Dell Incorporated [rydera]


[edit] Testing Fallover

From: BHohl@grotecompany.com

A few weeks ago I was looking for a basic write up on how to install the PERC CLI tool and test the RAID5 failover. I didn't find a write up so I made the following write-up which may be useful to others. If someone sees something incorrect in this write up please respond. I did note that there was recently a thread regarding hot versus cold swaping of disks. In the thread one person said that Dell tech support recommended cold swap and one person referenced a line to a Dell support doc recommending hot swap (http://support.dell.com/support/topics/global.aspx/support/kb/en/document?DN=1070984 ). For my test I used cold swap as I considered that to be less risky to the hardware.


C - Install RAID PERC CLI software (afacli) and test fail over. 1.Link to downloadable PERC CLI software for Linux: http://support.dell.com/support/topics/global.aspx/support/kb/en/document?dn=1089105&c=us&l=en&s=gen&cs =

2.Install of CLI tool: RPMs for CLI tool and snmp are inside afa-linux-app-A01.tar.gz Unzip with KDE Ark or CLI tar tool. Install with KDE Yast or CLI rpm tool.

From shell:

  1. tar -xzvf afa-linux-app-A01.tar.gz
  2. rpm -ivh afaapps-2.7-1.i386.rpm

3.Some basic lookups: To open the PERC CLI FAST interface:

  1. afacli

FASTCMD>

To see the controller list (there is one controller [afa0] on this box): FASTCMD> controller list

To open the afa0 controller on a read only basis: FASTCMD> open /readonly=true afa0 Result is following command prompt: AFA0>

Some basic lookups: AFA0> container list AFA0> disk list AFA0> task list

AFA0> enclosure show slot AFA0> disk show space AFA0> disk show partition

AFA0> container show failover AFA0> controller show automatic_failover


Disk light blinking to match physical disks to SCSI device IDs: Use disk list to find SCSI device IDs. 5 sec blink followed by 0 sec blink. AFA0> disk blink <SCSI device ID> 5 AFA0> disk blink <SCSI device ID> 0


4.Adding a failover disk (hot spare) Shutdown computer. Insert new disk. Boot computer.

a)Use the disk list command to find the disk SCSI device ID: filesrv1:~# afacli FASTCMD> open afa0 AFA0> disk list Get disk SCSI devive ID from list

b)Initialize disk and verify: AFA0> disk initialize <SCSI device ID> AFA0> disk list

c)Make disk a global failover disk and verify: If SCSI device ID of disk = (0,4,0) than command is: AFA0> container set global_failover (0,4,0)

Verify success with following lookup: AFA0> container show failover

Remove a global failover disk AFA0> (0,4,0)

d)Make sure automatic failover is enabled. AFA0> controller show automatic_failover

If not enabled, enable as follows: AFA0> controller set automatic_failover /failover_enabled=true



5.Test disk failover a)Shutdown computer; Remove a disk that is part of the RAID5 array; Boot computer. If a hot spare is available (as in above set up) it will automatically added to the array.

b)Monitoring the rebuild of the array: Open the FAST CLI. The rebuild status should be displayed at the bottom of the console. The "task list" command also shows the rebuild information. The "enclosure show slot" command shows disk status information.

c)Before, during and after "enclosure show slot" information:

BEFORE REMOVING DISK: AFA0> enclosure show slot Executing: enclosure show slot Enclosure ID (B:ID:L) Slot scsiId Insert Status


---- ------ ------- --------------------------------------

0 0:06:0 0 0:00:0 1 OK ACTIVATE 0 0:06:0 1 0:01:0 1 OK ACTIVATE 0 0:06:0 2 0:02:0 1 OK ACTIVATE 0 0:06:0 3 0:03:0 1 OK ACTIVATE 0 0:06:0 4 0:04:0 1 OK UNCONFIG HOTSPARE ACTIVATE

DURING ARRAY REBUILD: AFA0> enclosure show slot Executing: enclosure show slot Enclosure ID (B:ID:L) Slot scsiId Insert Status


---- ------ ------- --------------------------------------

0 0:06:0 0 0:00:0 1 OK UNCONFIG EMPTY I/R READY NOTACTIVATE 0 0:06:0 1 0:01:0 1 OK REBUILD FAILED CRITICAL ACTIVATE 0 0:06:0 2 0:02:0 1 OK REBUILD FAILED CRITICAL ACTIVATE 0 0:06:0 3 0:03:0 1 OK REBUILD FAILED CRITICAL ACTIVATE 0 0:06:0 4 0:04:0 1 OK REBUILD FAILED CRITICAL HOTSPARE ACTIVATE

AFTER ARRAY REBUILD: AFA0> enclosure show slot Executing: enclosure show slot Enclosure ID (B:ID:L) Slot scsiId Insert Status


---- ------ ------- --------------------------------------

0 0:06:0 0 0:255:0 0 OK UNCONFIG EMPTY I/R READY NOTACTIVATE 0 0:06:0 1 0:01:0 1 OK ACTIVATE 0 0:06:0 2 0:02:0 1 OK ACTIVATE 0 0:06:0 3 0:03:0 1 OK ACTIVATE 0 0:06:0 4 0:04:0 1 OK HOTSPARE ACTIVATEID


6.Replacing a failed disk after automatic failover has rebuilt the array with a hot spare:

d)Shutdown computer; Replace the failed disk; Boot computer. Use the "enclosure show slot" command to determine the SCSI device ID for the disk. Use the "disk blink" command to determine the physical location of the disk.

e)Check if replacement disk is initialized AFA0> disk list

If not, initialize new disk as follows: AFA0> disk initialize <SCSI device ID>

f)Remove the global failover designation for the original failover disk: AFA0> (0,4,0)

Verify success with following lookup: AFA0> container show failover

AFA0> enclosure show slot Note: disk (0,4,0) continued to show as a HOTSPARE from the "enclosure show slot" command after the " (0,4,0)". A "controller rescan" did not correct this problem but it was corrected after a reboot.

g)Make replacement disk a global failover disk and verify: If SCSI device ID of disk = (0,0,0) than command is: AFA0> container set global_failover (0,0,0)

Verify success with following lookup: AFA0> container show failover AFA0> enclosure show slot

h)Make sure automatic failover is enabled. AFA0> controller show automatic_failover

If not enabled, enable as follows: AFA0> controller set automatic_failover /failover_enabled=TRUE


7.Adding RAID event notification: Follow instructions included in raid.cron.script obtained from http://linux.dell.com/files/aacraid/aacraid_monitoring_script.txt. Be sure to convert txt file to unix format (dos2unix <filename>). [rydera]


  • Dell PowerEdge Remote Access Controller (DRAC)

The Dell⮢ Remote Access Card III (DRAC III), DRAC III/XT, Embedded Remote Access (ERA), and the Embedded Remote Access Option (ERA/O) are systems management hardware and software solutions designed to provide remote management capabilities for Dell PowerEdge⮢ systems. Collectively, these solutions are known as remote access controllers (RACs). RACs allow you to remotely manage and monitor your system even when the system is down. NOTE: Throughout the remainder of this document, the DRAC III, DRAC III/XT, ERA, and ERA/O controllers are referred to collectively as "RACs", except when it is necessary to distinguish between each controller. When information applies only to a specific RAC, it is identified explicitly. Information that refers to "RAC" applies to all of the controllers.

[atr]


[edit] DRAC II

Put simply, the DRAC II is a full length 33mhz PCI card that connects via the SMB connector on a PowerEdge motherboard. It has a PCMCIA slot for PSION gold card, 10Mb/s RJ45 port, and a connector for an external AC adapter power source.

Part numbers varied depending on the model/number used.

The AC adapter is hard to find sometimes. I've found the part number is: Globtek GS-30 6V AC adapter AC Adapter for Dell 4574E/ American Megatrends DRAC II cards

It was based off the AMI MegaRAC card which provided a way to connect via a Java based web interface (known to work only on IE in Windows) so that you may view console, BMC error logs, etc. It is not based on IPMI but shares similar concepts in function. It can send out alphanumeric pages using an external modem and SNMP traps via the ethernet port. Users can configure this using the racadm utility (works under Windows and RedHat) to configure settings.

PowerEdge servers existing from the 1998-2001 era tended to use this card (not the later DRAC III) such as the: 1300, 1400, 2300, 2400, 2450, 4300, 4350, 4400, 4450, 6300, 6350, 6400, 6450 - but not limited to.

Dell Documentation: http://support.dell.com/support/edocs/software/smdrac/index.htm

[atr]


[edit] DRAC III (ERA/O)

The DRAC III card was a 64bit 66Mhz half length PCI card designed for the PowerEdge 1650, 26* and 66* machines.

It was a push and upgrade since there has been an industry wide move towards the IPMI (research Intel for IPMI 1.0-2.0 standards) The card incorporated a java based webserver not requiring a specific client code. It originally was known to work only on IE with JRE >1.4 installed but now supports Mozilla. I've even seen it work on Safari within OS X 10.3.

The hardware involved was a drastic change. The main chipset was based off of Agilant vs. AMI and monitoring capabilities got much more complex. There exists a perl module to easily interface your own Nagios monitoring scripts into: http://lanceerplaats.nl/PowerEdge/RAC/

The author includes great documentation along with some scripts to get RRDTOOL graphs setup to monitor sensors on the BMC.

This card also included an RJ45 port, AC adapter, and PCMCIA slot for modem and incorporated many of the features the DRAC II card offered.

A note to anyone with a 1650. By stock, Dell shipped the 1G824 part (2x64Mhz 66Bit) riser card with a 1650. Because of the lack of 64bit support on the 1650 (Dual P3 Tualatin vs. Dual P4 Xeon) the DRAC III card required the 1G825 Riser card (1x64bit-66mhz and 1x32bit-33mhz) to function properly.

The ERA/O option was a daughterboard offered with the 1650 and 2650 (1u and 2u, respectively) servers. Instead of wasting a riser spot on the PCI backplane, the daughterboard fit nicely into its spot and used a builtin 100Mb connector found on each of these servers. The port is unused if this card is not there. It offered similar features (minus the modem and external AC adapter) as the DRAC III card. I've managed to add one into a system that did not originally ship with the card, but success will vary here.

Dell Documentation: http://support.dell.com/support/edocs/software/smdrac3/drac3/index.ht m

[atr]


[edit] Differences in Dell DRAC Generations

From Tim Murphy of Dell:

Some of the main technical differences, eg: - DRACIII (6G RAC) has no coresident BMC firmware (RAC and BMC have separate FW images, H/W i/f's). console support via VNC, Java applet. support for battery, separate power supply. IPMI 1.0 based. Local/OS/agent interface: PPP, IP addressible. DRACIII has a few sensors of its own (separate from server sensors).

- DRACIII/XT (7G RAC) has coresident BMC FW running onboard (requires ribbon cable connector to main chassis). No battery, nor separate power supply. console, IPMI support, Local/OS/agent: all similar to DRACIII.

- DRAC4: (8G RAC) no coresident BMC (separate FW). no main chassis sensor support thru RAC. No battery, separate power supply. Only RAC w/Virtual media support (to date). Console: superior Virtual KVM support (no VNC, but still a Java applet). IPMI 1.5 support. Local agent interface: raw serial (no PPP), requires OS agent running for host to RAC commo.

And I'm sure I missed some others.. so yes, there are quite a few technical differences across the RAC H/W and FW generations. It is also

  • not* the usual case that more than one DRAC H/W is supported on a given

Dell server (i.e. the DRAC offerings are often closely tied to their server generation counterparts). So a good question is also what DRAC offering is appropriate for a given Dell server model. [rydera]

[edit] ERA

From Tim Murphy:

ERA is "embedded remote access", ERA/O is "embedded remote access option" -- ERA FW is integral/resident on the main chassis (like DRACIII/XT, it also has a coresident BMC FW, but it doesn't require the ribbon cable). ERA/O is a daughterboard implementation with no coresident BMC FW (most similar to a DRACIII). ERA and ERA/O are 7G RAC. [rydera]


[edit] PowerEdge 1850 DRAC4/RAC

Ronan, Those settings seem fine. Note, the RAC requires a full null modem cable, including routing RTS/CTS, DTR/DSR/DCD. The User's Guide calls out the signal routing required for the null modem cable: <snip> Connecting the DB-9 Cable If you want to connect to the managed system using a serial text console, you must connect a DB-9 null modem cable to the COM port that you are using on the managed system. Not all DB-9 cables carry the pinout/signals necessary for this connection. The DB-9 cable for this connection must conform to the specification shown in Table 3-4. NOTE: You can also use this cable for BIOS text console redirection with the DRAC 4 serial console disabled. Table 3-4. Required Pinout for DB-9 Null Modem Cable Signal Name | DB-9 Pin | DB-9 Pin FG (Frame Ground) | - | - TD (Transmit data) | 3 | 2 RD (Receive Data) | 2 | 3 RTS (Request To Send) | 7 | 8 CTS (Clear To Send) | 8 | 7 SG (Signal Ground) | 5 | 5 DSR (Data Set Ready) | 6 | 4 CD (Carrier Detect) | 1 | 4 DTR (Data Terminal Ready) | 4 | 1 DTR (Data Terminal Ready) | 4 | 6

[atr]

[edit] Dell Spare Parts

Dell formally has a spare parts department available at: 1-866- Here are some links below to other places to search (no guarantee on warranty and not supported by Dell) for spare parts

[atr] Other known sellers:


[edit] Nagios and Monitoring

Nagios (http://www.nagios.org) is an opensource network monitoring application. Along with this, there exist many ways to monitor the health of a PowerEdge server.


[edit] nagios afacli plugin (adaptec raid)

http://www.ibnads.com/afacli-nagios/

A substantial number of people are using Nagios with afacli derivatives for monitoring. See: http://marc.theaimsgroup.com/?l=linux-poweredge&w=2&r=1&s=nagios&q=b Specifically, http://marc.theaimsgroup.com/?l=linux-poweredge&m=109365805300287&w=2

[rydera]


[edit] afacli under SLE9

I thank Jonathan Delgado for posting this to the linux-pe list.


http://www.techno-obscura.com/~delgado/notes/sles9-NagiosAfacli.html


Monitoring PERC3Di controllers with afacli and Nagios on SLES9 Intro

These docs describe the basic process of going about monitoring a Dell PERC3Di controller (as found on the PowerEdge 1650) via Nagios and afacli under SuSE Linux Enterprise Server 9 (SLES9).


Just to say, Nagios is a super useful open source tool for monitoring various network services and such. You can find the full deal on it at the Nagios home.


Also, these directions would presumably work for any other system, Dell PowerEdge or not, with the same family of Adaptec RAID controllers which use the aacraid driver and can thus be monitored via the afacli utility.


As always, any comments, code enhancements, etc that you might have are always appreciated.


The Problem

So, I've got a rack full of Dell PowerEdge servers... mostly 1650s and 1750s. They have nifty RAID controllers, but we hadn't really been monitoring them actively, mainly the occasion check of the status lights on the systems. Not much point in having a RAID if you don't know when it stops having redundancy.


Now, with Dell it would seem that if I ran Red Hat in their preferred releases, I would be able to use some of the canned Dell management systems for Linux. One problem (of many) is that I am lazy and I didn't want to go through the whole hassle of trying to get the Dell management solution running under SLES. The other problem is that I just don't trust running the Dell stuff, besides, I already have Nagios installed and it rocks.


The Solution

The basic way that things work is like so:


My Nagios central monitoring system polls the remote server for it's RAID status as the schedule demands. A daemon process listening on the remote server receives the requests and kicks off the local plugin. The local plugin dumps a set of commands to the command line RAID utility and then parses the logged output.

The plugin returns an appropriate result code for the interpreted logs back to the Nagios server.


Getting all of this to work requires three basic parts:


The RAID controller monitoring utility, afacli. A basic Nagios installation.

The nagios plugin which provides the glue between afacli and the centralized monitoring, check_afacli.


afacli

afacli is the command line interface (thus the cli in afacli) for the Adaptec RAID controller which Dell uses as their PERC 3Di. There are links to some RPMs for it from Dell's Linux RAID page. The most recent version listed on that page (at the time of writing this) has the afaapps-2.7 RPM as part of it. 2.7 works fine, but whoever built it is a real tool and managed to leave some dependencies audio libs (WTF???) in the package. So, if you use that, you actually need to install the arts RPM.


Otherwise you want to find afaapps-2.8 which is less broken. I found that with the that comes with 2.8 that I really needed to run sh MAKEDEV.afa afa0 with the MAKEDEV.afa provided in the RPM to make the appropriate device. This was not an issue with 2.7.


Nagios


I can't and won't go into the details of setting up Nagios monitoring, please refer to the Nagios home for that. For the purposes of this doc, I am that you are remotely monitoring the RAID. If it is a local RAID, then you can obviously cut out many steps.


There isn't much Nagios-wise that needs to be installed on the system to be monitored. Basically, you need to install all of the glue to enable the remote execution and results gathering from the Nagios plugin. SLES9 comes with a nagios-plugins-1.3.1 RPM which I installed. This gives me some Perl libs that my plugin depends upon and other plugins that I would want to use anyways.


Because I am checking the state of the RAID remotely, I need to setup a daemon on the system to answer the requests to check on the RAID. The tool used for this is nrpe (Nagios Remote Plugin Executor), which can be downloaded from the Nagios: Extras and Addons page. This is pretty trivial to build and install. Be sure to create an unprivileged nagios user and group for nrpe to run under as a daemon.


nrpe needs to be configured with info on which commands it will accept and what it does when they are called, so my nrpe.cfg has the following line in it to call my plugin:

command[check_afacli]=/usr/lib/nagios/plugins/check_afacli


The Nagios server has to know how to call check_afacli on the remote system, so my

checkcommands.cfg has an entry like:


define command{
command_name    check_afacli
command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_afacli -t 30
}


check_afacli

There isn't a whole lot to say about my script, check_afacli. It is written in my unpolished Perl. I would like to think that the code is sound, but my regex may be ugly. You are warned.


If you decide to adopt it for your own use, you will need to customize any paths to required files as needed, of course.


Also, the script is being executed as the nagios user by the nrpe daemon, to you need to be sure that the nagios user has permission to run afacli. I did this by enabling the nagios user to sudo afacli without requiring a password. So my sudoers file has a line like this: nagios ALL=(ALL) NOPASSWD: /sbin/afacli


The plugin redirects a set of commands to afacli from a file called afascript which looks like:

logfile start '/tmp/afacli.log'
open afa0
controller details
container list /all /full
enclosure show slot
close
logfile end
exit

Yes, the spaces do seem to be required in there, that isn't just indenting for the sake of it. You could also add more commands to be passed to afacli, but check_afacli won't do much with them. What it does do is:


Check the battery state.

List all of the RAID containers. Report if they have any dead or missing partitions.


To-Dos

Some things I need to work on with this:


I originally tried to get all of this going os SLES 8, not 9, and had a hell of a time. In short, I was unable to get afacli to run in a non-interactive terminal session. I could run check_afacli on the local system just fine from the command line, but as soon as nrpe tried to run it it would abort before any commands could get executed by afacli. Likewise, I couldn't run afacli in any cron jobs etc. So now that I have got things working with 9 I need look at what may have changed and made this difference. I am suspecting that it is some sort of libncurses SNAFU.

If you have a drive set as a hotspare in the system, you may very well not get notified if one drive has gone away and the spare has been activated. With the spare in place you have RAID integrity still, but you still should be informed that something has gone amiss. I think the docs out there for what to expect for afacli are rather poor, so much of this is going to have to be through experimentation.

[atr]

[edit] Monitoring with MRTG

Using the Dell OMSA SNMP agent, McFadden Associates was able to setup mrtg/rrdtool graphs monitoring each OID. These can be VERY useful to see if a fan is dying, voltage is going indicating a power supply might die, etc.

His page is very nicely formatted and well documented. Please see: http://web.csma.biz/resources/resources.dellpedge.shtml

[atr]

[edit] Server Temperature Abnormality

From Brian Smith: We also get this error every once in a while on our systems. It is an erronous error being reported by the Adaptec controller. We just ignore it, although it is annoying. If you go into afacli and do a "diag show hist" to show the log history, you can see what temperature it reported:

root# afacli FASTCMD> open afa0 Executing: open "afa0"
AFA0> diag show hist Executing: diagnostic show history No switches specified, defaulting to "/current".
*** HISTORY BUFFER FROM CURRENT CONTROLLER RUN ***
. . [log lines deleted] . [97]: Enclosure 0 - Temperature 225, over
threshold 120 AFA0> close Executing: close
FASTCMD> exit root#

[atr]


[edit] FreeBSD

FreeBSD is another opensource operating system that can be run on the x86 PowerEdge systems. While not as common as Linux is, it is still used and many users do find the same power and stability with Linux.

At this moment, specific things like OMSA, racadm, etc are not available for FreeBSD even under Linux emulation.

Currently, 4.10-RELEASE and 5.2.1-RELEASE contain support for the Adaptec (Perc 2/Di Perc 3/Di) and AMI Megatrends (Perc 2/DC Perc 3/DC Perc 4/DC) controllers. Using a DRAC card and IT Assistant/OMSA on Windows, you are still able to view BMC information and logs remotely.

FreeBSD has also added support for the em drivers (Intel Gigabit Ethernet) and bge (Broadcom gigabit like the 2550) in recent releases so any modern release should run fine.

As always, see http://www.freebsd.org for information and search the Dell PowerEdge Linux lists since there contains relevant information for all users utilizing a non-Microsoft operating system.

[atr]

[edit] PowerVault External (112t) Tape Drives

FreeBSD 5.2.1-RELEASE and priors have contained Adaptec 39160 and 29160 support for quite some time. FreeBSD also supports SCSI Sequential access drives (tape drives) and can easily be used with a wide variety of DDS/DLT/etc tape solutions.

It is wise to search the freebsd.org mailing lists if using something new or a tape changer. Tape changers do work (122t) but since drivers are constantly being worked on, it is best to search and/or ask.

sa0 at ahc0 bus 0 target 1 lun 0 sa0: <ARCHIVE Python 06408-XXX 9050>
Removable Sequential Access SCSI-3 device sa0: 80.000MB/s transfers
(40.000MHz, offset 32, 16bit)

Is what a DDS4 inside a PowerVault 112t (1u tape enclosure) looks like to FreeBSD once properly configured. The 'mt' command can work with different tape density, compression, and tape options.

Many users have had good luck with http://www.bacula.org if not using an enterprise backup solution, tapeware, Veritas, or home brew'd script.

[atr]



[edit] Backup

Linux offers many backup options from the home brew #!/bin/sh, dump, scp, rsync to a variety of media including floppies, zip disks, tape drives, SAN, etc.

A commercial solution might be in order but every system administrator finds tranquility with something they trust. AMANDA was the "old faithful" and still serves many on the UNIX end. It's complex configuration was not suited for simple backups/smaller operations but has changed over the years - as have mediums and solutions.

[rydera]


[edit] bacula

http://www.bacula.org

Bacula could be considered the premier upcoming backup application. Currently, it boasts clients working on Linux, FreeBSD, and Windows. It uses a MySQL backend. Configuration is very modular and easy to understand. From my current understanding, it's been used in a wide strata of applications; from home backup solutions to medium/large sized businesses. It's strength lies in it's ability to work with both Linux and FreeBSD systems connected to a single tape drive or autochanger. It also has no problem using a NAS/SAN or anything that can be mounted as a file system (writable) on UNIX. My personal experience with it has been very good and I use it for both home and corporate backups without a problem. The MySQL backend (like dump levels) makes it very handy too. View the website. The creator and user base have been very helpful in contributing good documentation.

[rydera]


[edit] Mondo

When the Linux-PE list was probed in Oct 04 about backup applications, there was quite a number of users who submitted a response about Mondo.

http://www.microwerks.net/~hugo/

I have personally not used it but uses were currently personal and corporate. From Mondo's webpage, "Mondo is reliable. It backs up your GNU/Linux server or workstation to tape, CD-R, CD-RW, NFS or hard disk partition. In the event of catastrophic data loss, you will be able to restore all of your data [or as much as you want], from bare metal if necessary. Mondo is in use by Lockheed-Martin, Nortel Networks, Siemens, HP (US and France), IBM, NASA's JPL, the US Dept of Agriculture, dozens of smaller companies, and tens of thousands of users.

Mondo is comprehensive. Mondo supports LVM, RAID, ext2, ext3, JFS, XFS, ReiserFS, VFAT, and can support additional filesystems easily: just e-mail the mailing list with your request. It supports adjustments in disk geometry, including migration from non-RAID to RAID. Mondo runs on all major Linux distributions and is getting better all the time. You may even use it to backup non-Linux partitions, such as NTFS.

Mondo is free! It has been published under the GPL (GNU Public License), partly to expose it to thousands of potential beta-testers but mostly as a contribution to the Linux community. I charge for 1-to-1 technical support to fund Mondo's development."

[rydera]


[edit] AMANDA

http://www.amanda.org

The quintessential UNIX backup application. When you think backup and UNIX, AMANDA is the yardstick of civilization in this territory.

[rydera]

Personal tools