On IOPS, shared storage and a fresh idea. (Part 2) GO GO Citrix Machine Creation Services.

Welcome back to part two of this little adventure on Exploring the RAM caching ability of our newly favourite little file system filter driver, the Micrsoft Extended Write Filter. If you missed part 1, you can find it here.

So after the first post, my mailbox blew up with queries on how to do this, how the ram cache weighs up vs pvs and how can you manipulate and “Write Out” to physical disk in a spill over, so before I go any further, lets have a quick look at the EWF.


Deeper into the EWF:


As previously mentioned, the EWF is contained in the operating system of all Windows Embedded Operating systems and also Windows Thin PC. The EWF is a mini filter driver that redirects all reads and writes to memory or local file system depending on where the file currently lives.

In the case of this experiment, we’re using the RAMREG method of EWF. Have a read of that link if you would like more information, but basically the EWF creates an overlay in RAM for the files to live and the configuration is stored in the registry.

Once the EWF is enabled and rebooted, on next login (from an adminstrative command prompt) you can run ewfmgr -all to view the current status of the EWF:





So what happens if I write a large file to the local disk?

Well this basically! Below I have an installer for SQL express which is roughly 250mb,  I copied this file to the desktop from a network share, as you can see below the file has been stuck into the RAM overlay.





And That’s pretty much it! Simple stuff for the moment. When I venture into the API of the EWF in a later post I’ll show you how to write this file out to storage if we are reaching capacity, allowing us to spill to disk and address the largest concern surrounding Citrix Provisioning Server and RAM caching presently.


Next up to the block. Machine Creation Services…


I won’t go into why I think this technology is cool, I covered that quite well in the previous post. But this was the first technology I planned to test with the EWF.

So I created a lab at home consisting of two XenServers and an NFS share with MCS enabled as  below:





Now before we get down to the nitty gritty, let’s remind ourselves of what our tests looked like before and after using the extended write filter in this configuration when booting using and shutting down a single VM:


As with last time, black Denotes Writes, Red Denotes Reads.

 

So looking at our little experiment earlier, we pretty much killed write IOPS on this volume when using the Extended write filter driver. But the read IOPS (in red above) were still quite high for a single desktop.

And this is where I had hoped MCS would step into the fray. Even on the first boot with MCS enabled the read negation was notable:

 

Test performed on a newly built host.

 

At no point do read’s hit as high as the 400 + peak we saw in the previous test. But we still see spikey read IOPS as the Intellicache begins to read and store the image.

So now that we know what a first boot will look like. I kicked the test off again from a pre cached device, this time the image should be cached on the local disk of the XenServer as I’ve booted the image a number of times.

Below are the results of the pre cached test:

 

Holy crap…

 

Well, I was incredibly impressed with this result… But did it scale?

So next up, I added additional desktops to the pool to allow for 10 concurrent desktop (restraints of my lab size).

 

 

Once all the desktops had been booted, I logged in 10 users and decided to send a mass shutdown command from Desktop studio, with fairly cool results:

 

 

And what about the other way around, what about a 10 vm cold boot?

 

Pretty much IO Free boots and shutdowns!

 

So lets do some napkin math for a second.

 

Taking what we’ve seen so far, lets get down to the nitty gritty and look at maximums / averages.

In the first test, minus EWF and MCS, from a single desktop, we saw:

  • Maximum Write IOPS: 238
  • Average Write IOPS: 24
  • Maximum Read IOPS: 613
  • Average Read IOPS: 83

And in the end result, with EWF MCS and even with the workload times 10, we saw the following:

  • Maximum Write IOPS: 26 (2,380*)
  • Average Write IOPS: 1.9 (240*)
  • Maximum Read IOPS: 34 (6130*)
  • Average Read IOPS: 1.3 (830*)

* Denotes original figures times 10

I was amazed by the results, of two readily available technologies coupled together to tackle a problem in VDI that we are all aware of and regularly struggle with.

 

What about other, similar technologies?

Now as VMware and Microsoft have their own technologies similar to MCS (CBRC and CSV cache, respectfully). I would be interested in seeing similar tests to see if their solutions can match the underestimated Intellicache. So if you have a lab with either of these technologies, get in touch and I’ll send you the method to test this.

 

What’s up next:

In the following two blog posts, I’ll cover off the remaining topics:

  • VDI in a Box.
  • EWF API for spill over
  • Who has this technology at their disposal?
  • Other ways to skin this cat.
  • How to recreate this yourself for you own test.



Last up I wanted to address a few quick queries I received via email / Comments:


“Will this technology approach work with Provisioning Server and local disk caching, allowing you to leverage PVS but spill to a disk write cache?”

No, The Provisioning Server filter driver has a higher altitude than poor EWF, so PVS grabs and deals with the write before EWF can see it.

“Couldn’t we just use a RAM disk on the hypervisor?”

Yes, maybe and not yet…

Not yet, with Citrix MCS and Citrix VDI in a Box, Separating the write cache and identity disk from the LUN on which the image is hosted is a bit of a challenge.

Maybe If using Hyper-V v3 with the shared nothing migration, you now have migration options for live vm’s. This would allow you to move the WC / ID from one ram cache to another.

Yes, If using Citrix Provisioning server you could assign the WC to a local storage object on the host the VM lives. This would be tricky with VMware ESXi and XenServer but feel free to give it a try, Hyper-V on the other hand would be extremely easy as many ram disk’s are available online.

“Atlantis Ilio also has inline dedupe, offering more than just ram caching?”

True, and I never meant, even for a second to say this technology from Atlantis was anything but brilliant, but with RAM caching on a VM basis, wouldn’t VMware’s Transparent page sharing also deliver similar benefits? Without the associated cost?

Related Posts

While using the ShareFile mobile applications, NTF... Here's a weird little bug I caught in the wild while deploying XenMobile Enterprise. While browsing NTFS shares, published as connectors in the ShareF...
UnSticking an AppDisk provisioning task in XenDesk... Here's a wee little bug I've no idea how i created, but managed to clear it out anyway. After creating an AppDisk, it got a little stuck. I tried d...
Cannot Log into XenMobile 10.3 Appliance after ini... Here's a horrendous bug I just came across in the field today while deploying a XenMobile 10.3 Proof...

4 Comments About “On IOPS, shared storage and a fresh idea. (Part 2) GO GO Citrix Machine Creation Services.

  1. Jim Moyle

    VMware transparent page sharing occurs once an hour and is a post process, it’s also got a very poor dedupe ratio, so unfortunately wouldn’t help very much. PS I work for Atlantis Computing :)

    Reply
  2. seaneyc

    I found much the same as Jim a while back, although this was with XenApp workloads rather than desktop. I agree the Ilio product looks great, I spoke briefly with Jim 6 months ago (hello!) but unfortunately was unable to take it any further. That said I change company in 3 weeks so who knows!

    I am just in the middle of doing some XenDesktop at the moment and the amount of data hitting our NetApp (or rather the lack of it) has really impressed me. Obviously Citrix don’t have a catch all yet with regards to non-pooled random desktops etc but I estimate 99% of our 1200 seats will be fine with the model, which leaves the SAN happy to deal with the 1% that don’t fit.

    Like you Andrew I’ve only been able to test 10 VDIs but will be ramping up to about 50 on one host within the next week so will keep an eye on IOPS on how much spills over as the SSD in my blades are a paltry 50GB.

    Reply
  3. Pingback: On IOPS, shared storage and a fresh idea. (Part 3) tying it all together in the stack « AndrewMorgan.ie

Leave a Reply