Tag Archives: Microsoft

On IOPS, shared storage and a fresh idea. (Part 3) tying it all together in the stack

Note: This is part three, have a read of part one and two.

Hello there, and thank you for dropping back for part 3…

I suppose I should start with the disappointing news that I have yet to test this option for VDI in a box. And despite Aaron Parker’s suggestions it wasn’t due to lack of inspiration, it was down to lack of time! This series has gathered allot of interest from both community and storage vendors alike, and I feel I should set the record straight before I got any further:

  1. This isn’t a production idea, you would be crazy to use this idea in a live environment.
  2. Throughout this entire project, we’re focusing on pooled stateless. Stateful desktops would be a separate post entirely.
  3. This wasn’t an attack on products in this market space, merely a fresh view on an old problem.
  4. If i had the skills or funds necessary to run this project to a production solution, I wouldn’t have posted it. I would already be hard at work creating a reasonably priced product!

Now that my declarations are out of the way, I’d first like to talk about the moral of the story. This isn’t an unfamiliar expression:

IOPS mitigation is not about read IOPS it’s about WRITE IOPS!

VMware, Citrix and Microsoft have similar but very different solutions for read IOPS negotiation. Similar in the sense that they try to negate storage read IOPS. But the key difference with XenServer is the local disk cache via Intellicache has the out of box functionality to cache the majority of read’s to local disk (think SSD*) without the baked in soft limit of 512 MB for Microsoft HyperV and VMware respectively.

Long story short, VMware and Microsoft’s solution is about 512mb of recognizable read IOPS negation un-tuned, but enabled. Of course this value can be tuned upwards, but the low entry point of cache would suggest, at least to me, that tuning up will have an upsetting affect on the host.

This to me is why IntelliCache has the upperhand in the (value add product) VDI space for read IOPS negation and they even throw in the Hypervisor as part of your XenDesktop licensing, so win win, but what about those pesky write IOPS?

Continue reading

On IOPS, shared storage and a fresh idea. (Part 2) GO GO Citrix Machine Creation Services.

Welcome back to part two of this little adventure on Exploring the RAM caching ability of our newly favourite little file system filter driver, the Micrsoft Extended Write Filter. If you missed part 1, you can find it here.

So after the first post, my mailbox blew up with queries on how to do this, how the ram cache weighs up vs pvs and how can you manipulate and “Write Out” to physical disk in a spill over, so before I go any further, lets have a quick look at the EWF.

Deeper into the EWF:

As previously mentioned, the EWF is contained in the operating system of all Windows Embedded Operating systems and also Windows Thin PC. The EWF is a mini filter driver that redirects all reads and writes to memory or local file system depending on where the file currently lives.

In the case of this experiment, we’re using the RAMREG method of EWF. Have a read of that link if you would like more information, but basically the EWF creates an overlay in RAM for the files to live and the configuration is stored in the registry.

Once the EWF is enabled and rebooted, on next login (from an adminstrative command prompt) you can run ewfmgr -all to view the current status of the EWF:

So what happens if I write a large file to the local disk?

Well this basically! Below I have an installer for SQL express which is roughly 250mb,  I copied this file to the desktop from a network share, as you can see below the file has been stuck into the RAM overlay.

And That’s pretty much it! Simple stuff for the moment. When I venture into the API of the EWF in a later post I’ll show you how to write this file out to storage if we are reaching capacity, allowing us to spill to disk and address the largest concern surrounding Citrix Provisioning Server and RAM caching presently.

Next up to the block. Machine Creation Services…

I won’t go into why I think this technology is cool, I covered that quite well in the previous post. But this was the first technology I planned to test with the EWF.

So I created a lab at home consisting of two XenServers and an NFS share with MCS enabled as  below:

Now before we get down to the nitty gritty, let’s remind ourselves of what our tests looked like before and after using the extended write filter in this configuration when booting using and shutting down a single VM:

As with last time, black Denotes Writes, Red Denotes Reads.


So looking at our little experiment earlier, we pretty much killed write IOPS on this volume when using the Extended write filter driver. But the read IOPS (in red above) were still quite high for a single desktop.

And this is where I had hoped MCS would step into the fray. Even on the first boot with MCS enabled the read negation was notable:


Test performed on a newly built host.


At no point do read’s hit as high as the 400 + peak we saw in the previous test. But we still see spikey read IOPS as the Intellicache begins to read and store the image.

So now that we know what a first boot will look like. I kicked the test off again from a pre cached device, this time the image should be cached on the local disk of the XenServer as I’ve booted the image a number of times.

Below are the results of the pre cached test:


Holy crap…


Well, I was incredibly impressed with this result… But did it scale?

So next up, I added additional desktops to the pool to allow for 10 concurrent desktop (restraints of my lab size).



Once all the desktops had been booted, I logged in 10 users and decided to send a mass shutdown command from Desktop studio, with fairly cool results:



And what about the other way around, what about a 10 vm cold boot?


Pretty much IO Free boots and shutdowns!


So lets do some napkin math for a second.


Taking what we’ve seen so far, lets get down to the nitty gritty and look at maximums / averages.

In the first test, minus EWF and MCS, from a single desktop, we saw:

  • Maximum Write IOPS: 238
  • Average Write IOPS: 24
  • Maximum Read IOPS: 613
  • Average Read IOPS: 83

And in the end result, with EWF MCS and even with the workload times 10, we saw the following:

  • Maximum Write IOPS: 26 (2,380*)
  • Average Write IOPS: 1.9 (240*)
  • Maximum Read IOPS: 34 (6130*)
  • Average Read IOPS: 1.3 (830*)

* Denotes original figures times 10

I was amazed by the results, of two readily available technologies coupled together to tackle a problem in VDI that we are all aware of and regularly struggle with.


What about other, similar technologies?

Now as VMware and Microsoft have their own technologies similar to MCS (CBRC and CSV cache, respectfully). I would be interested in seeing similar tests to see if their solutions can match the underestimated Intellicache. So if you have a lab with either of these technologies, get in touch and I’ll send you the method to test this.


What’s up next:

In the following two blog posts, I’ll cover off the remaining topics:

  • VDI in a Box.
  • EWF API for spill over
  • Who has this technology at their disposal?
  • Other ways to skin this cat.
  • How to recreate this yourself for you own test.

Last up I wanted to address a few quick queries I received via email / Comments:

“Will this technology approach work with Provisioning Server and local disk caching, allowing you to leverage PVS but spill to a disk write cache?”

No, The Provisioning Server filter driver has a higher altitude than poor EWF, so PVS grabs and deals with the write before EWF can see it.

“Couldn’t we just use a RAM disk on the hypervisor?”

Yes, maybe and not yet…

Not yet, with Citrix MCS and Citrix VDI in a Box, Separating the write cache and identity disk from the LUN on which the image is hosted is a bit of a challenge.

Maybe If using Hyper-V v3 with the shared nothing migration, you now have migration options for live vm’s. This would allow you to move the WC / ID from one ram cache to another.

Yes, If using Citrix Provisioning server you could assign the WC to a local storage object on the host the VM lives. This would be tricky with VMware ESXi and XenServer but feel free to give it a try, Hyper-V on the other hand would be extremely easy as many ram disk’s are available online.

“Atlantis Ilio also has inline dedupe, offering more than just ram caching?”

True, and I never meant, even for a second to say this technology from Atlantis was anything but brilliant, but with RAM caching on a VM basis, wouldn’t VMware’s Transparent page sharing also deliver similar benefits? Without the associated cost?

On E2E, Geek speak, IOPS, shared storage and a fresh idea. (Part 1)

Note: this is part 1 of a 4 post blog.

While attending E2EVC vienna recently, I found myself attending the Citrix Geekspeak session, although the opportunity to rub shoulders with fellow Citrix aficionado’s was a blast, I found myself utterly frustrated spending time talking about storage challenges in VDI.

The topic was interesting and informative, and while there were a few idea’s shared about solid state arrays, read IOPS negation with PVS or MCS there really wasn’t a vendor based, one size fits all solution to reduce both Read and Write IOPS without leveraging paid for (and quite expensive I may add) solutions like Atlantis ILIO, Fusion IO, etc. The conversation was tepid, deflating and we moved on fairly quickly to another topic.

So we got to talking…

In the company of great minds and my good friends Barry Schiffer, Iain BrightonIngmar VerheijKees BaggermanRemko Weijnen and Simon Pettit. Over lunch we got to talking about this challenge and the pro’s and con’s to IO negation technologies.

Citrix Provisioning Server… 

We spoke about the pro’s and con’s delivered by Citrix provisioning server, namely rather than actually reducing read IOPS to shared storage, it merely puts the pressure instead on the network rather than a SAN or local resources in the hypervisor.

The RAM caching option for differencing is really clever with Citrix Provisioning server, but it’s rarely utilised due to the ability to fill the cache in RAM and the ultimate blue screen that will follow this event.

Citrix provisioning server does require quite a bit of pre work to get the PVS server configured, highly available then the education process of learning how to manage images with Provisioning server, it’s more of an enterprise tool.

Citrix Machine Creation Services…

We spoke about the pro’s and con’s of delivered by Citrix Machine Creation Services and Intellicache, this technology is much smarter around caching regular reads to a local device but it’s adoption is mainly SMB and the requirements for NFS and XenServer were a hard sell… Particularly with a certain hypervisors dominance to date.

Machine Creation Services is stupidly simple to deploy, install XenServer, stick the image on an NFS share (even windows servers can host these)… Bingo. Image changes are based on Snapshots so the educational process is a null point.

But again MCS is only negating read IO, and this assumes you have capable disk’s under the hypervizor to run multiple workspaces from a single disk or array. It’s also specific to XenDesktop, sadly. So no hosted shared desktop solution.

We mulled over SSD based storage quite alot with MCS and in agreement we discussed how MCS and Intellicache could be leveraged on an SSD, but the write cache or differencing disk activity would be so write intensive it would not be long before the SSD would start to wear out.

The mission Statement:

So all this got us to thinking, without paying for incredibly expensive shared storage, redirecting the issue onto another infrastructure service, or adding on to your current shared storage to squeeze out those precious few IOPS, what could be done with VDI and storage to negate this issue?

So we got back to talking about RAM:

RAM has been around for a long time, and it’s getting cheaper as time goes by. We pile the stuff into hypervisors for shared server workloads. Citrix provisioning server chews up the stuff for caching images in server side and the PVS client even offers a ram caching feature too, but at present it limit’s the customer to a Provisioning server and XenDesktop or XenApp.

But what if we could leverage a RAM caching mechanism? decoupled from a paid for technology, or provisioning server, that’s freely available and can be leveraged in any hypervisor or technology stack? Providing a catch all, free and easy caching mechanism?

and then it hit me…


This technology has been around for years, cleverly tucked away in Thin Client architecture and Microsoft have been gradually adding more and more functionality to this heavily underestimated driver…

Re Introducing ladies and gents, the Microsoft Extended write filter.

The Microsoft Extended Write Filter has been happily tucked away in the Microsoft embedded operating systems since XP and is fairly unfamiliar to most administrators. The Microsoft extended write filter saves all disk writes to memory (except for specific chunks of registry) resulting in the pc being clean each time it is restarted.

This technology has been mostly ignored unless you are particularly comfortable with Thin Client architectures. Interestingly during my initial research for this mini project, I found that many car pc enthusiasts or early SSD adopters have been hacking this component out of Windows Embedded and installing it into their mainstream windows operating systems to cut down on wear and tear to flash or Solid state disks.

Microsoft have been gradually building this technology with each release adding a powerful API to view cache hits, contents of the cache and even write the cache out to disk as required, in an image update scenario… or even better, when a RAM spill over is about to happen…

If this technology is tried and trusted in the car pc world, could it be translated to VDI?…

To the lab!

Excited with my research Thus far, I decided to take the plunge. I began by extracting the EWF drivers and registry keys necessary to run this driver from a copy of windows Embedded standard…

With a bit of trial, error, hair pulling and programming I managed to get this driver into a clean windows 7 x86 image, so from here I disabled the driver and started testing first without this driver.

(I’m not going to go into how to extract the driver yet, please check back later for a follow up post)

Test A: XenDesktop,  no RAM caching

To start my testing, I went with a XenDesktop image, hosted on XenServer without intellicache. I sealed the shared image on an NFS shared from a windows server. I performed a cold boot, opened a few applications then shutdown again.

I tracked the read and write IOPS Via Windows Perfmon on the shared NFS disk, and below were my findings:

(Black line denotes Write IOPS to shared storage)

Test B: XenDesktop, with the Microsoft Extended Write Filter.

The results below pretty much speak for themselves…

(again, black line denotes write IOPS)

I couldn’t believe how much of a difference this little driver had made to the write IOPS…

Just for ease of comparison, here’s a side by side look at what exactly is happening here:

So where do we go from here:

Well, no matter what you do, Microsoft will not support this option and they’d probably sue the pants off me if I wrapped this solution up for download…

But this exercise on thinking outside of the box raised some really interesting questions… Around other technologies and the offerings they already have in their stack.

In my next few posts on this subject I’ll cover the below topic’s at length and discuss my findings.

  • I’ll be looking at how to leverage this with Citrix Intellicache for truly IO less desktops.
  • I’ll be looking at how this technology could be adopted by Microsoft for their MED-V stack.
  • I’ll be looking at one of my personal favourite technologies, VDI in a Box.
  • I’ll be looking at how we could leverage this API ,for spill over to disk in a cache flood scenario or even a management appliance to control the spill overs to your storage.

And Finally, I’ll be looking at how Citrix or Microsoft could quite easily combine two of their current solutions to provide an incredible offering.

And that’s it for now, just a little teaser for the follow up blog posts which I’ll gradually release before Citrix Synergy.

I would like to thank Barry SchifferIngmar VerheijKees Baggerman and Remko Weijnen. For their help and input on this mini project. It was greatly appreciated.

Date and time shift when using Lotus Notes in Server 2008 R2 / XenApp

This was an extremely strange / rare issue, so I figured I would share it.

In this customers environment, they are using XenApp 6.5 on Server 2008 R2 for published desktops, this environment is a hosted desktop environment for a number of countries in Europe.

Infrequently an issue could be observed where the users timezones would shift out by one or two hours within the Lotus Notes application. This would case SameTime conversations and Calendar times to display out by the aforementioned value above.

When this issue occurred, it happened to all users on the server. A restart of the server did not fix the issue.

Interestingly, a “TZUtil /g” was reporting the client was in the correct time zone:

If you ran “TZUtil /s GMT Standard Time“, then closed and opened Lotus Notes… The problem was resolved for that user, in that session until they logged off.

It’s worth pointing out, that this issue was only seen in Lotus Notes, not in any other application, java or otherwise.

When comparing the TimeZone settings from a problematic server to a working server, I found the following difference:

These keys are stored under:


And the working server looked as follows:


Now that is weird! So we copied the correct keys from the server to server and the issue was resolved. On all servers once users closed and opened Lotus Notes again.

But what caused this?

With a work around in place, I began to dig deeper into what caused the timezone to change on the servers despite the fact that no users have the ability to do so.

Analysing the logins to the servers, I spotted an administrator account logging into each of the servers as the day went by. This user didn’t log into the correctly working servers so this was the first clue.

Now if you’ve used Lotus Notes combined with XenApp and timezones before, you’ll know its a complete nightmare, interestingly the administrator in question (me, shamefully), was logging onto a XenApp session with a linux timezone to replicate an issue.

More embarrassingly, I then decided to Remote Desktop inside of the XenApp session to the affected servers, and with my admin account being who it was… inadvertently changed the timezone for the servers it seems.

That doesn’t sound right? You rdp’d from a client in a different time zone and it changed the server timezone?

I agree, but I have since been able to replicate this in a test environment. As with Server 2008 Microsoft now handle the timezone redirection themselves as part of group policy and administrative accounts will change the timezone of the server intermittently.

Now most customers probably wouldn’t even notice this, unless they are using lotus notes, as all other applications behaved correctly.

How do you work around this issue?

Ensure that the Group Policy you use to configure timezone redirection is configured to “not apply” to any local administrator on the XenApp server that may log in.

Citrix Clipboard issues in a published desktop environment

It amazes me in this day and age that something as fundamentally simple as the clipboard in a windows environment can have issues, but it does happen… Especially in a multi user environment.

If you experience clipboard issues with office, Lotus notes and other copy and paste capable programs you may have a session memory issue!

The session View size and pool size have a massive part to play in 32 bit citrix environments, these size options dictate the amount of graphical memory that can be assigned to each session. The ceiling for these options in 32 bit are just 16mb, which rediculously low in modern days.

Taking equal parts of graphically heavy applications like Office 2007 and Known citrix killers like lotus notes, it really isn’t long until these limits spill over into horrible clipboard and copy / paste disasters. Take one step further and implement Microsoft App-V and you are in serious trouble.

If alike me, you run an environment that requires all the above nasties, help is at hand. After 12 months of continued troubleshooting I’ve found a happy medium of 64mb between low limits and too high limits.

To test the same, simply bash these off the following command lines and reboot, waving goodbye to your horrible clipboard issues:

reg add “HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlSession ManagerMemory Management” /v SessionPoolSize /t reg_dword /d 64 /f
reg add “HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlSession ManagerMemory Management” /v SessionViewSize /t reg_dword /d 64 /f


We currently run with 10gb of memory, allowing us to allocate this extra 112mb of memory per user. If you require these options, consider a lower value or upgrade your ram capacity