Tag Archives: RAM Caching

On IOPS, shared storage and a fresh idea. (Part 2) GO GO Citrix Machine Creation Services.

Welcome back to part two of this little adventure on Exploring the RAM caching ability of our newly favourite little file system filter driver, the Micrsoft Extended Write Filter. If you missed part 1, you can find it here.

So after the first post, my mailbox blew up with queries on how to do this, how the ram cache weighs up vs pvs and how can you manipulate and “Write Out” to physical disk in a spill over, so before I go any further, lets have a quick look at the EWF.

Deeper into the EWF:

As previously mentioned, the EWF is contained in the operating system of all Windows Embedded Operating systems and also Windows Thin PC. The EWF is a mini filter driver that redirects all reads and writes to memory or local file system depending on where the file currently lives.

In the case of this experiment, we’re using the RAMREG method of EWF. Have a read of that link if you would like more information, but basically the EWF creates an overlay in RAM for the files to live and the configuration is stored in the registry.

Once the EWF is enabled and rebooted, on next login (from an adminstrative command prompt) you can run ewfmgr -all to view the current status of the EWF:

So what happens if I write a large file to the local disk?

Well this basically! Below I have an installer for SQL express which is roughly 250mb,  I copied this file to the desktop from a network share, as you can see below the file has been stuck into the RAM overlay.

And That’s pretty much it! Simple stuff for the moment. When I venture into the API of the EWF in a later post I’ll show you how to write this file out to storage if we are reaching capacity, allowing us to spill to disk and address the largest concern surrounding Citrix Provisioning Server and RAM caching presently.

Next up to the block. Machine Creation Services…

I won’t go into why I think this technology is cool, I covered that quite well in the previous post. But this was the first technology I planned to test with the EWF.

So I created a lab at home consisting of two XenServers and an NFS share with MCS enabled as  below:

Now before we get down to the nitty gritty, let’s remind ourselves of what our tests looked like before and after using the extended write filter in this configuration when booting using and shutting down a single VM:

As with last time, black Denotes Writes, Red Denotes Reads.


So looking at our little experiment earlier, we pretty much killed write IOPS on this volume when using the Extended write filter driver. But the read IOPS (in red above) were still quite high for a single desktop.

And this is where I had hoped MCS would step into the fray. Even on the first boot with MCS enabled the read negation was notable:


Test performed on a newly built host.


At no point do read’s hit as high as the 400 + peak we saw in the previous test. But we still see spikey read IOPS as the Intellicache begins to read and store the image.

So now that we know what a first boot will look like. I kicked the test off again from a pre cached device, this time the image should be cached on the local disk of the XenServer as I’ve booted the image a number of times.

Below are the results of the pre cached test:


Holy crap…


Well, I was incredibly impressed with this result… But did it scale?

So next up, I added additional desktops to the pool to allow for 10 concurrent desktop (restraints of my lab size).



Once all the desktops had been booted, I logged in 10 users and decided to send a mass shutdown command from Desktop studio, with fairly cool results:



And what about the other way around, what about a 10 vm cold boot?


Pretty much IO Free boots and shutdowns!


So lets do some napkin math for a second.


Taking what we’ve seen so far, lets get down to the nitty gritty and look at maximums / averages.

In the first test, minus EWF and MCS, from a single desktop, we saw:

  • Maximum Write IOPS: 238
  • Average Write IOPS: 24
  • Maximum Read IOPS: 613
  • Average Read IOPS: 83

And in the end result, with EWF MCS and even with the workload times 10, we saw the following:

  • Maximum Write IOPS: 26 (2,380*)
  • Average Write IOPS: 1.9 (240*)
  • Maximum Read IOPS: 34 (6130*)
  • Average Read IOPS: 1.3 (830*)

* Denotes original figures times 10

I was amazed by the results, of two readily available technologies coupled together to tackle a problem in VDI that we are all aware of and regularly struggle with.


What about other, similar technologies?

Now as VMware and Microsoft have their own technologies similar to MCS (CBRC and CSV cache, respectfully). I would be interested in seeing similar tests to see if their solutions can match the underestimated Intellicache. So if you have a lab with either of these technologies, get in touch and I’ll send you the method to test this.


What’s up next:

In the following two blog posts, I’ll cover off the remaining topics:

  • VDI in a Box.
  • EWF API for spill over
  • Who has this technology at their disposal?
  • Other ways to skin this cat.
  • How to recreate this yourself for you own test.

Last up I wanted to address a few quick queries I received via email / Comments:

“Will this technology approach work with Provisioning Server and local disk caching, allowing you to leverage PVS but spill to a disk write cache?”

No, The Provisioning Server filter driver has a higher altitude than poor EWF, so PVS grabs and deals with the write before EWF can see it.

“Couldn’t we just use a RAM disk on the hypervisor?”

Yes, maybe and not yet…

Not yet, with Citrix MCS and Citrix VDI in a Box, Separating the write cache and identity disk from the LUN on which the image is hosted is a bit of a challenge.

Maybe If using Hyper-V v3 with the shared nothing migration, you now have migration options for live vm’s. This would allow you to move the WC / ID from one ram cache to another.

Yes, If using Citrix Provisioning server you could assign the WC to a local storage object on the host the VM lives. This would be tricky with VMware ESXi and XenServer but feel free to give it a try, Hyper-V on the other hand would be extremely easy as many ram disk’s are available online.

“Atlantis Ilio also has inline dedupe, offering more than just ram caching?”

True, and I never meant, even for a second to say this technology from Atlantis was anything but brilliant, but with RAM caching on a VM basis, wouldn’t VMware’s Transparent page sharing also deliver similar benefits? Without the associated cost?

On E2E, Geek speak, IOPS, shared storage and a fresh idea. (Part 1)

Note: this is part 1 of a 4 post blog.

While attending E2EVC vienna recently, I found myself attending the Citrix Geekspeak session, although the opportunity to rub shoulders with fellow Citrix aficionado’s was a blast, I found myself utterly frustrated spending time talking about storage challenges in VDI.

The topic was interesting and informative, and while there were a few idea’s shared about solid state arrays, read IOPS negation with PVS or MCS there really wasn’t a vendor based, one size fits all solution to reduce both Read and Write IOPS without leveraging paid for (and quite expensive I may add) solutions like Atlantis ILIO, Fusion IO, etc. The conversation was tepid, deflating and we moved on fairly quickly to another topic.

So we got to talking…

In the company of great minds and my good friends Barry Schiffer, Iain BrightonIngmar VerheijKees BaggermanRemko Weijnen and Simon Pettit. Over lunch we got to talking about this challenge and the pro’s and con’s to IO negation technologies.

Citrix Provisioning Server… 

We spoke about the pro’s and con’s delivered by Citrix provisioning server, namely rather than actually reducing read IOPS to shared storage, it merely puts the pressure instead on the network rather than a SAN or local resources in the hypervisor.

The RAM caching option for differencing is really clever with Citrix Provisioning server, but it’s rarely utilised due to the ability to fill the cache in RAM and the ultimate blue screen that will follow this event.

Citrix provisioning server does require quite a bit of pre work to get the PVS server configured, highly available then the education process of learning how to manage images with Provisioning server, it’s more of an enterprise tool.

Citrix Machine Creation Services…

We spoke about the pro’s and con’s of delivered by Citrix Machine Creation Services and Intellicache, this technology is much smarter around caching regular reads to a local device but it’s adoption is mainly SMB and the requirements for NFS and XenServer were a hard sell… Particularly with a certain hypervisors dominance to date.

Machine Creation Services is stupidly simple to deploy, install XenServer, stick the image on an NFS share (even windows servers can host these)… Bingo. Image changes are based on Snapshots so the educational process is a null point.

But again MCS is only negating read IO, and this assumes you have capable disk’s under the hypervizor to run multiple workspaces from a single disk or array. It’s also specific to XenDesktop, sadly. So no hosted shared desktop solution.

We mulled over SSD based storage quite alot with MCS and in agreement we discussed how MCS and Intellicache could be leveraged on an SSD, but the write cache or differencing disk activity would be so write intensive it would not be long before the SSD would start to wear out.

The mission Statement:

So all this got us to thinking, without paying for incredibly expensive shared storage, redirecting the issue onto another infrastructure service, or adding on to your current shared storage to squeeze out those precious few IOPS, what could be done with VDI and storage to negate this issue?

So we got back to talking about RAM:

RAM has been around for a long time, and it’s getting cheaper as time goes by. We pile the stuff into hypervisors for shared server workloads. Citrix provisioning server chews up the stuff for caching images in server side and the PVS client even offers a ram caching feature too, but at present it limit’s the customer to a Provisioning server and XenDesktop or XenApp.

But what if we could leverage a RAM caching mechanism? decoupled from a paid for technology, or provisioning server, that’s freely available and can be leveraged in any hypervisor or technology stack? Providing a catch all, free and easy caching mechanism?

and then it hit me…


This technology has been around for years, cleverly tucked away in Thin Client architecture and Microsoft have been gradually adding more and more functionality to this heavily underestimated driver…

Re Introducing ladies and gents, the Microsoft Extended write filter.

The Microsoft Extended Write Filter has been happily tucked away in the Microsoft embedded operating systems since XP and is fairly unfamiliar to most administrators. The Microsoft extended write filter saves all disk writes to memory (except for specific chunks of registry) resulting in the pc being clean each time it is restarted.

This technology has been mostly ignored unless you are particularly comfortable with Thin Client architectures. Interestingly during my initial research for this mini project, I found that many car pc enthusiasts or early SSD adopters have been hacking this component out of Windows Embedded and installing it into their mainstream windows operating systems to cut down on wear and tear to flash or Solid state disks.

Microsoft have been gradually building this technology with each release adding a powerful API to view cache hits, contents of the cache and even write the cache out to disk as required, in an image update scenario… or even better, when a RAM spill over is about to happen…

If this technology is tried and trusted in the car pc world, could it be translated to VDI?…

To the lab!

Excited with my research Thus far, I decided to take the plunge. I began by extracting the EWF drivers and registry keys necessary to run this driver from a copy of windows Embedded standard…

With a bit of trial, error, hair pulling and programming I managed to get this driver into a clean windows 7 x86 image, so from here I disabled the driver and started testing first without this driver.

(I’m not going to go into how to extract the driver yet, please check back later for a follow up post)

Test A: XenDesktop,  no RAM caching

To start my testing, I went with a XenDesktop image, hosted on XenServer without intellicache. I sealed the shared image on an NFS shared from a windows server. I performed a cold boot, opened a few applications then shutdown again.

I tracked the read and write IOPS Via Windows Perfmon on the shared NFS disk, and below were my findings:

(Black line denotes Write IOPS to shared storage)

Test B: XenDesktop, with the Microsoft Extended Write Filter.

The results below pretty much speak for themselves…

(again, black line denotes write IOPS)

I couldn’t believe how much of a difference this little driver had made to the write IOPS…

Just for ease of comparison, here’s a side by side look at what exactly is happening here:

So where do we go from here:

Well, no matter what you do, Microsoft will not support this option and they’d probably sue the pants off me if I wrapped this solution up for download…

But this exercise on thinking outside of the box raised some really interesting questions… Around other technologies and the offerings they already have in their stack.

In my next few posts on this subject I’ll cover the below topic’s at length and discuss my findings.

  • I’ll be looking at how to leverage this with Citrix Intellicache for truly IO less desktops.
  • I’ll be looking at how this technology could be adopted by Microsoft for their MED-V stack.
  • I’ll be looking at one of my personal favourite technologies, VDI in a Box.
  • I’ll be looking at how we could leverage this API ,for spill over to disk in a cache flood scenario or even a management appliance to control the spill overs to your storage.

And Finally, I’ll be looking at how Citrix or Microsoft could quite easily combine two of their current solutions to provide an incredible offering.

And that’s it for now, just a little teaser for the follow up blog posts which I’ll gradually release before Citrix Synergy.

I would like to thank Barry SchifferIngmar VerheijKees Baggerman and Remko Weijnen. For their help and input on this mini project. It was greatly appreciated.