This blog post may seem like a bit of a strange title, but this is a subject I’ve been struggling with for the longest of time and really just need to get it off my chest. I’ve put off this blog post for the longest time, but I still feel it needs to be said.
Firstly I’ll start by saying I think Citrix Provisioning services is an amazing piece of technology. Hosting, maintaining and deploying a single image to as many servers as you like is a solution you would have killed for a few years ago; and the fact that it’s rolled into the platinum licensing model only further wets the appetites of potential customers. In short, it’s a wonderful product and you would be mad (in most cases) not to use it.
Citrix provisioning services is also a consultants best friend, you need only install a a single XenApp server, configure it with best practices, install some software, negotiate some caveats with AntiVirus/Edgesight, document the change procedure then deploy en mass. The customer is amazed you got the job done so quick and you can retreat in success safe in the knowledge that if the customers are capable of reading and following instructions, it should be ok.
This will work great for a small, local company deployment where a single administrator will oversee the changes and understands the process himself. But as it scales up paranoia (for me) sets in.
Here’s the bit I dont like:
1: Citrix provisioning Services employs the assumption you must store as many of the changes to the golden images as you can afford to reduce your chances of being caught out. This may not sound like a big issue to some, but with large rates of change in environments (say even 5 changes a month) and large golden images (lets say 100gb) your storage costs will quickly become unjustifiable over time.
I am aware Provisioning services 6 has the ability to layer changes, but even Citrix recommend chaining 8 or less layers before consolidating these images. Even with the above figures this would spiral quickly.
2: Citrix Provisioning Services employs the assumption that the administrators are going to fully document the changes they made between images.
I’ll openly admit I dont trust techies to follow even the simplest personal hygiene routine never mind change management procedures and you can try to protect yourself with these procedures till you are blue in the face, a process is only as good as your laziest / least forgetful administrator and ultimately its your ass in the hotseat when the shit hits the fan.
3: Corruption. If infrequently accessed parts of the registry, file system or corruption in an infrequently used application occurs before your oldest golden image backup you are well and truly sunk, unless you have full and concise documentation on how this oldest image came to be and all the necessary steps there after.
4: Ever troubleshooted an application under intense pressure, made 4-5 changes at once and found it fixed the problem but the urgency dictated this change needs to go live yesterday? This basically.
Even the most meticulous administrator will suffer pressure related shortcuts and when the pressure is off its all to easy to forget that you need to route cause analysis the issue. Provisioning services allows these shortcuts all too easily.
But what are the alternatives to ensure control?
Who said scripting?
Products that use scripted installs, such as RES Automation manager, Frontrange DSM, Microsoft’s WDS / MDT etc have the benefit of forcing the technical guys to script the task ahead of time. This means they must fully test the install procedure before it gets to the production environment. Scripting the install forces documentation of some kind before time and ensures the package sources are kept in a shared, backed up location.
The downside to this method is the time taken to script a deployment, the potential (albeit, rare) unscriptable changes and the time taken to deploy an image from baremetal to the last job can be quite lengthy.
“But what if you use these tools to manage the golden image, then deploy via Citrix PVS?”
Well, you could do this, but then what are you really achieving? Sure you have a scripted install of your golden image but you’ve to go to the bother of scripting everything, performing a full reinstall (to ensure full operation) then importing the image into PVS. You’ve also now a licensing overhead (in most cases) for each server you deploy thereafter as their software is installed in the Golden Image.
I’ve spoken to three vendors in this market and they all got a little uncomfortable with the idea of using their product up to the finish of the build then uninstalling it in favour of PVS before sealing the image. In RES Automation manager’s case, if you leave it installed at least you then have the benefit of automating user based tasks as well, so i suppose its not a great loss.
In all directions, ask yourself what are you really still achieving using PVS if you have already scripted the entire process?
And what could Citrix do to address my concerns?
To be honest, I’m not sure, I spoke about this frequently at Synergy last year and there were quite a few people with similar concerns. Either way, here’s my 2 cents.
Scan and report:
I would like to see Provisioning Services scan the file system, registry and known keys (e.g. installed applications) of the old and new image, then perform a comparison (alike regshot) and report on the changes each time a new revision is added. This scan would identify key changes to both structures and provide a report of change. E.G.:
- new folder c:program files (x86)application1
- new file: c:program files (x86)application1runme.exe
- newer file: c:windowssystem32driversetchosts
- Deleted file c:windowssystem32some.dll
- New registry value: hklmsoftwaresomevendordangerouskey – Dword – 1
Smart reporting too:
- new system odbc connection: application1
- new Installed Application: Application1
- Windows updates applied: KBxxxxx, kbxxxxy
Etc, you get the idea…
This scan should be mandatory and the results should be stored in the Provisioning services database which can be referenced or searched for keywords at a later point. This scan should also include a mandatory sign off from the upgrading administrator in question so you have accountability for who made the change.
I can’t see this as being difficult, I mean, I’m sure a powershell script wouldn’t be too difficult and it would really address the change comparison to be sure you are aware of what has happened between images.
To really get across what I’m trying to say, I figured as techies we love detailed exam questions, so below are some real life examples of where I’ve seen Citrix Provisioning Services fall down in a production environments.
George works as an architect in Company A, George’s SBC environment publishes hosted desktops for over 5,000 users on roughly 200 XenApp servers.
- Company A has a large application land scape of roughly 1,000 applications.
- Company A has a rapid rate of change, with an estimated 2-4 changes per week.
- Company A’s golden image including App-V cache is over 100gb’s.
- Due to the vast size of the golden image, only the last 20 images are stored.
- George has employed a strategy to ensure all changes to the XenApp environment are fully documented in a sharepoint instance.
- The corporate policy on acceptance testing dictates no additional changes can be processed until the acceptance test is complete. The acceptance testing period is 10 business days.
Scenario A: An issue has been experienced recently and an office application is crashing. This crash has been ongoing for roughly 6 months but has never generated enough steam until now, after troubleshooting George finds a DLL called OpenEX.dll has been added to excel generating random crashes.
Due to a fall down in documentation there is no record of where this Dll came from. This Dll has been included in the last twenty images and it’s unclear who needs it….
After a recent purge and merge of layers to the master golden image, the Super Calculation 1 application requires an upgrade to version 2. During upgrade testing it is found that this application requires an un-installation of version 1 before installing version 2
The uninstallation fails and cannot be completed. No golden image stored contains an image before Super calculation 1 was installed. The support vendor cannot assist as they insist a reinstall is required.
Scenario C: Roughly a year ago, a new piece of software (Craptastic app) was purchased by the Finance team. A consultant was employed to assist the administration team in deploying the software, due to a communication error between project manager and the consultant no documentation or source files were provided on how to install the application.
The administrator involved has since been fired for stealing pens.
Finance are performing end of year calculations and a seldom used part of Craptastic app is no longer working, it worked last year when they performed closing statements but this year it errors out. At some point you realise an ODBC connection was removed from your image but you are not sure which image contains the correct one. You have 20 images, one or none of which may contain the right value.
The software vendor’s is closed early for a christmas party. They wont be back till Monday, you wont have a job on Monday.
This issue needs to be fixed yesterday.
Andrew, you’re being a pedantic, paranoid, eejit.
Maybe, but I’d rather be the above than suffer a major loss of face, or loss of job from the wrong recommendation!
I’d really be interested in your real world experiences in similar scenarios to the above;
- How do you enforce documentation and change management?
- Do you need to regularly police these kinds of changes?
- Do you share these concerns?
- Is a certain amount of trust (or faith) in administrators required?