OSX Server Migrations – The absolute opposite of fun

Ever start an upgrade or change a setting on a production server or even a testbed server and get that nervous excited feeling in the pit of your stomach. Unfortunately like 100% of all human beings my stomach leads to my intestines and beyond that, well you get the point. So when upgrades go smoothly I’m sitting pretty when they don’t I make regular trips to the mens room. Apple has Q/A issues with server software. Since 10.7 I’ve always had the above feeling when it comes to operating system and Server.app upgrades. Microsoft might be evil to some folks and largely irrelevant these days, but they do know how to make stable server software, and more importantly, they know how to fully document it. Apple god love em’ they don’t have that gift.

The issues at hand are easily avoided with a “nuke and pave” attitude and it’s something I wish I could do every summer. Unfortunately my job is rarely easy or fun so nuke and paves are only done if something breaks. Guess what? OS X Server breaks itself quite frequently.

The migration from 10.8.5 and Server 2.2.2 to 10.9.3 and Server 3.1 seemed smooth. The OS upgraded without a hitch and Server.app installed with no nasty grams thrust towards me on the screen. No, it’s when you navigate to Open Directory, Certificates, Logs, and Profile Manager that things make a turn for the worse.

Open Directory seems like it’s runs on gasoline mixed with dirty puddle water. The moment you turn it over and take it for a drive it’s like a ticking time bomb. Edit a user the wrong way in Work Group Manager and watch Server.app blow a gasket. Attempt to remove a replica and watch the OD Master spew radiator fluid like a geyser. For me replication has always been fraught with issues. If fresh os installs happen you’d better remove that replica before you do. God forbid hardware failure occurs. Not even ones best Terminal “Fu” can get rid of the replication node in Server.app . Luckily Time Machine is around and a great APS app that automates OD Backups for extra piece of mind. If I were forced to not take backups I’d quit my job faster than….., well pretty fast.

Open Directory issues found after the 10.9 upgrade

  • One user thus far had his account disabled despite Server.App, Workgroup Manager, Directory Utility, pwpolicy and dscl stating otherwise. I had to delete the account, and use a different uid.
  • Somewhat cryptic log messages indicating doom in the kerberos database and or password server for a handful of users. It’s stable none the less despite this and password resets, modifications, etc.. still work.
  • A grocery bill sized certificate renewal list. If Apple intends to dumb down the interface and remove useful features they need to consolidate settings in a more useful manner.
  • Replicas would not reattach themselves without big yellow exclamation triangles shoved in your face. They eventually did between server restarts and removing cruft from the replica itself it eventually reattached. Psychological trickery also helps out too.

We then come to Profile Manager. Apple is the pioneer of the MDM protocol so one would expect Profile Manager to be the height of greatness. Eh.., not so much. Outside of the upgrade experience if it’s not working at 99.9 or 100% it’s working at -10% meaning Push Notifications are sent when it feels like it, navigation is slower than molasses on a bitterly cold day and the push certs, open directory certs, certs, certs, certs. Eventually the latter succumbs to the cruft that builds up over the year and when it comes time to renew those certificates you’d better pray everything goes smoothly. MCX was so much less error prone and settings applied without fuss. Luckily a large portion of MCX still resides in 10.9 and hopefully will continue on with 10.10(yes I know 10.1 was used already but they added a zero so it’s all good). Between open and closed source scripting and applications set backs that occur with profiles can be corrected easily.

Profile Manager issues after the 10.9 upgrade

  • Despite a loginwindow setting being applied in a Device Group for all of our OSX machines it was not applied on the client. The mobileconfig file was devoid of the payload. After some brain wracking and other trickery I tried one thing which was to create single separate Device Groups with single payloads and nest a single Device Group with the actual clients added to it. That worked but it was messy, and I didn’t like that. I eventually found out that when I combined settings into one profile if it contained any payload from the Security & Privacy section everything in the loginwindow payload failed to apply. Once the Security & Privacy payload was dropped the clients applied the entirety of the combined payloads.
  • Like Open Directory a myriad of unexplained errors that pile up in the profilemanager.log and system.log .
  • Corrupt Group profile which means that either A. I can drop the profile completely from the OD group or B. I have to recreate both the profile and the OD group.
  • Odd error showed up after the migration pertaining to the fact that Postgres wasn’t running when it in fact was. Unless PM runs off unicorn farts and pixie dust I doubt that error was dealing in the reality.

File sharing has been the most stable and least affected by the upgrade. The only gotcha I’ve found thus far is some issues pertaining to users not being unmounted/disconnected from the AFP service. This could be a “hint hint” to switch the SMB2 service.

The one service that I see great potential already in is the Caching service. With multiple caching servers through the campus it will cut down greatly on bandwidth from external connections (i.e. the internet). With our 1:1 iPad initiative this will certainly speed up iOS App distribution.

Hopefully Server 4 is more mature and Apple doesn’t pull another ship then fix.

Next year it’s clean installs and fresh user accounts.