Archives for: February 2010

When Open Directory goes bad

23/02/10 | by admin [mail] | Categories: Software, Networking, Mac

Open Directory is one of the harder components of Mac OS X server to recover.

I recently had two opportunities to fix bad OD services on two Macs running 10.6 server.

The first server was in a bad way. The OD was not accessible from Workgroup manager, and was not running in Server Admin. Users could not authenticate against the OD.

The usual poking around showed that LDAP was dead. Google lead me to:
sudo db_recover -h /var/db/openldap/openldap-data

Which failed. However, using man db_recover, I came across the option
-c
Perform catastrophic recovery instead of normal recovery.
Which did the trick, nicely.

What caused this failure? Well, a fsck and then permission repair both showed issues. But since it was fixed, I did not spend too much time indulging in 'why'. I am guessing that hiccup caused some file corruption.

The second issue is a more commonplace one. It was a new Snow Leopard server install, and authentication against the OD was taking forever and accessing the OD via WorkGroup Manager was slow and painful.

This was an issue I remember from our early experiences with Leopard Server. Basically, if DNS is not up and running nicely, you get big delays accessing Server Admin, WorkGroup Manager and in authenticating.

This user had set up the server using a FQDN ending in .local. This seemed not to be working nicely. However, they had registered a domain to use with this server (the server will be doing some Web hosting), so a good FQDN was available.

Since there were only 10 users set up, and all the passwords were available, I exported the users and groups using Workgroup Manger and then demoted the server to standalone. I then set up DNS correctly using the registered domain name, and re-promoted the server to OD Master. I then imported the accounts and groups, and reset the passwords. I have found that it is quicker to do this than use the changeip command (and the rest) to change the Kerberos realm, and LDAP, because experience has shown that it is rarely a matter of issuing changeip.

Permalink

RAID disaster stories

22/02/10 | by admin [mail] | Categories: Hardware, Mac, PC

Over the last few months I've run into a few RAID (mostly RAID-5) horror stories.

Story 1:
MegaRAID breaks killing RAID.

If you have ever had an Xserve G5 with hardware RAID, you are probably familiar with this story. Basically, what happens is that your RAID 5 will die for no apparent reason. Sometimes, a restart and removing one of the drives and replacing it will fix the issue. Sometimes you need to boot from CD and repair the RAID using the terminal commands. Sometimes, you need to re-add it in OpenFirmware.

The problems with this set up are common enough that I routinely remove the cards and set it up as a software mirror.

Story 2:
Xserve RAID disaster 1

The Apple XServe RAID was a nice product, since discontinued. Basically it consisted of a 3u box split into two hardware RAID arrays. The two arrays are independent, but can have software RAID applied to them to create, for example, a mirror. Apple details a number of set ups, including such things as striping the two halves and concatenating the two halves (WTF???). You can probably guess what happens next.

The Xserve RAID was set up with two arrays (RAID level 5). They were then software striped. The RAID was left unmonitored. One side failed (two bad drives on one array). All the data was gone. The only backup was several months old.

The Xserve RAID was set up from scratch, using larger drives and only using one half of the array. A backup system was implemented to create daily backups.

Story 3:
Xserve RAID disaster 2

Similar situation, but in this scenario they simply concatenated the two sides. Then the controller in second half started having 'issues'. This turned out a little better, because most of the data was physically on the first half of the RAID. The downside was that to access the data, the whole concatenated volume needed to be online, but as soon you accessed anything on the second half of the RAID, the concatenated volume went offline and the server needed to be restarted.

Again, there was no current backup. All the data that could be retrieved was taken off the RAID. The Xserve RAID was replaced, and a backup strategy implemented.

Story 4:
PC hardware RAID mirror

This is an older story. This was a cheap linux box with a "RAID" card which basically supported striping and mirroring across 2 volumes. What happened was that the RAID controller itself failed, and wrote garbage to both halves. Some data was retrievable but since this was also the boot drive, the system was down for over a day.

Story 5:
External RAID 5 box

This is another system set up by someone who fails to grasp the basics of RAID. This is another familiar story.
The RAID box used connected to the server via eSATA. It supported a number of useful features - RAID 5, hot spares, mixed drive capacities, web interface, visual+audible alarms, email monitoring etc.
The array was set up as a straight RAID-5. Identical drives were used, all from what turned out to be a bad batch. Two drives failed in close sequence. This company outsourced its IT, and the staff had no training so were unaware of fault until the RAID failed and went off line. They had a backup system in place, but it too was unmonitored, and they were untrained in using it.

They changed IT support companies and pro-actively sought training in how to monitor the RAID and their backup system. A hot spare and email monitoring was set up.

Story 6:
External RAID box.

This was another external RAID box. The company which set it up for them said that they didn't need a separate backup because two drives would need to fail.

One weekend, there was a small fire in office which set off the fire sprinklers. The RAID was destroyed, and with no off site backup they lost their data.

Story 7:
Xserve with hardware RAID

This company shelled out for an Intel Xserve with hardware RAID. The hardware RAID in the intel Xserve replaces the regular SATA back plane. The problem here, again, was the use of identical drives from a bad batch in combination with a hot weekend.

This company leaves their IT equipment running over the weekend, but turns the aircon off. One weekend the temperature hit 43C. They came in on Monday to a server which would not boot. The RAID controller itself failed, and two of the drives were working intermittently. Thankfully, the company had a backup of their vital data.

RAID lessons

A RAID is not a substitute for a backup system, it is just a part of your business continuity strategy. A RAID allows you to continue working while the repair is carried out. A properly implemented RAID reduces the chances of catastrophic failure

Data recovery from a RAID-5 is far more expensive and difficult than data recovery from a mirror or a single drive. This means a good backup system is MORE important if you are running RAID-5.

Don't put all your faith in a RAID controller. Sometimes controllers go bad. Do some research. Be wary of systems that make it hard to change to a non-hardware RAID set up.

Understand the underlying technologies. If you are using hardware RAID-5 it probably uses a proprietary algorithm to write data across the array. If your concern is redundancy, then don't stripe (RAID-0) in software or hardware. Concatenation is rarely useful.

Understand Redundancy. RAID is not a simple technology. There are a number of things that trip up the inexperienced. Use different drives. You are more likely to have two drives fail in quick succession if they are from the same batch. Consider adding a hot spare or using a higher RAID level. The longer it takes to get your RAID back up and running means a greater risk of that second drive failing. Long rebuild times can increase the chances of a second drive failure above acceptable limits.

Educate Users. IT support won't always be around when something goes 'funny'. If users know what to look out for, and they know that a flashing amber light or a beep is not normal, then they can report it. The sooner the problem is noticed, generally the better the outcome is.

Have a backup strategy. An offsite backup makes sense.

Permalink

Installing eyeOS on Mac Leopard Server 10.5 (and probably 10.6)

22/02/10 | by admin [mail] | Categories: Software, Mac

eyeOS is a PHP based web/cloud app, which runs happily on Apache with PHP5 module.

Mac OS X Server 10.5 in its default install almost satisfies the requirements for eyeOS. There are just a couple of bits to tidy up.

So, where to start?

Install Mac OS X Server 10.5, run the updates as you see fit.

From ServerAdmin go to the Web service and enable the PHP module. Then create a website.

Download eyeOS, extract and point your website to the eyeOS folder. I moved the eyeOS folder to /Library/WebServer/Documents. You need the whole extracted folder.

Change privileges on the eyeOS folder to enable read/write for all (not sure about this bit, might get away with lower security settings).

Now for the tricky bit:

You need to have three PHP extensions installed and active:
php-mbstring
php-imap
php-sqlite
php-mbstring and php-sqlite are part of the default config for Mac OS X Server, so we just need php-imap. The php-imap is not strictly necessary: it is only required for email support.

So to enable the php-imap extension, we need to download the source and compile it.

Fortunately, someone has done this on Leopard before, so I based my install on this here.

The steps I used were:
get the c-client imap libraries by, downloading, extracting, compiling, copying the libraries

$ curl -O ftp://ftp.cac.washington.edu/imap/c-client.tar.Z
$ tar -zxvf c-client.tar.Z
$ cd imap-2007e
$ make oxp EXTRACFLAGS="-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe -no-cpp-precomp"

NB: the extracted folder may have a different name. When I extracted it was imap-2007e, but the original instructions refer to imap-2007b.
When compiling the source under Snow Leopard, you will need to leave out the references to ppc "-arch ppc -arch ppc64" because there is no PPC under Snow Leopard.

Next, get the current php source, extract, compile the php-imap extension

First find out the version of PHP you are currently running by using:
$ php -v

Download the corresponding version (in my case on 10.5.8 with security patch it is 5.2.11), and extract

$ curl -O http://www.opensource.apple.com/source/apache_mod_php/apache_mod_php-44.2/php-5.2.11.tar.bz2
$ tar -xjf php-5.2.11.tar.bz2

NB: if the correct source version is not at apple, try php.net. Something like:
$ sudo curl -O http://www.php.net/distributions/php-5.2.11.tar.bz2
$ tar -xjf php-5.2.11.tar.bz2

change directory to the imap extension in the untarballed source
$ cd php-5.2.11/ext/imap

compile
$ phpize
$ MACOSX_DEPLOYMENT_TARGET=10.5 CFLAGS=”-arch ppc64 -g -Os -pipe -no-cpp-precomp” CCFLAGS=”-arch ppc64 -g -Os -pipe” CXXFLAGS=”-arch ppc64 -g -Os -pipe” LDFLAGS=”-arch ppc64 -bind_at_load” ./configure –with-kerberos=/usr –with-imap-ssl=/usr –with-imap=/usr
$ make
$ sudo make install

modify the php.ini

To do this I first had to make a php.ini file by
$ sudo cp /etc/php.ini.default /etc/php.ini

Then
$ sudo echo “extension=imap.so” >> /etc/php.ini

and also you’ll want to comment out the line ‘extension_dir = “./”‘ in the php.ini file, since it will interfere with the extension loading (apparently).

Check your work with:
$ php -m

Now, restart the web service

$ sudo serveradmin stop web
$ sudo serveradmin start web

And then browse to the site you set up with eyeOS, and carry on with the webside set up.

Permalink