Provisioning services; following a different route, Cache to RAM
Provisioning services, Cache to RAM
At a project we faced issues with streamed Citrix XenApp 6.5 servers and therefore we started to do some testing.
These issues were mostly because of to many network printers connecting for each user. When a user would log on they received 40 printers generating 1GB of data on the C: drive so effectively ending up in the provisioning server cache location. All other users on that XenApp server were infected when a users logs on. Another issue we had to deal with was a non-optimal storage environment due to several reasons. For both reasons we decided to take a step aside of best practices and look towards different caching solutions.
Not to take credits here, I have to say that my collegae Erik van Veenendaal started this when getting frustrated with the performance of the environment. He can tell everything I write here in very very much details. He was at Briforum London so if you missed him…. you missed out on a great discussion.
So first about the pilot environment;
- Cisco UCS, blades with 2 Intel Xeon HexaCores at 3,07GHz and 192GB RAM
- SAN connection is 2 x 1Gbit per host in the pilot.
- Citrix XenApp 6.5 and Provisioning server 5.6 going to 6.1
- RES Workspace Manager 2011 SR4
- RES Automation Manager
- Microsoft App-v
- VMware vSphere 4.1
So a very basic environment many of you work with on a daily bases.
What did we test and what are we still testing:
- Cache to RAM and what that does with stability, BSOD etc etc
- Pagefile location, C: or D:
- Pagefile size
- Fixed pagefile versus system managed pagefile together with the previous tests
- Memory reservation for the virtual machine so that the .vswp file is not created. this would prevent overcommiting of memory. (only VMware???)
Some results: (all the time cache to RAM is used)
Disk queue length
When the Cache is set to disk the disk queue length was 50 – 70, when changing to “Cache to RAM” the disk queue length vanished, and more important users stopped complaining.
When you use “Cache to disk” and you have a system managed pagefile, this pagefile will end up on your persistent disk (if you have one).
With “Cache to RAM” this behavior is different and the pagefile when it’s system managed ends up on C: and therefore end up in RAM. A result of this test was that we noticed that when the pagefile is system managed and caching is set to RAM the cache will fill up damn fast and thus creating issues.
What we did to solve this was create a fixed page file on the D: disk (SAN). The pagefile will be 16GB+1% and usage is about 25%. The Cache at that time was about 1,3GB when 20 users were working on the server. What they were doing I don’t know, we didn’t instruct them, they just did what they normally do.
Another test we did, we love tests, is; We created a fixed page file on C:. This page file is still in RAM, doing this we see the usage of that page file is 90-95%. performance is great everything is stable and users are happy. the cache size of course was 1GB higher here so about 2,6GB with 20 users. So this looks like a good solution… it looks like one, but is it????
We are discussing internally about the size of the page file and whether or not 95% usages of a pagefile is bad. for now we set the page file on D: (SAN) and set it to 16GB+1% according to best practices.
…and on we go…
For a small test we tested a RAM disk (Primo Ramdisk server edition). we removed the virtual harddrive of the server and created a RAM disk in private mode. After that we set cache to hard disk as the caching method and did some tests. The positive thing about this is that you won’t receive a BSOD and still the cache is send to RAM.
We haven’t tested it further for obvious reasons… curious to know what Citrix might think about this possibility and if they aver tested it.
Some things we still wonder about and are having internal discussions about are related to the persistent disk. With App-v 5.0 more and more you could use shared cache and you rely less and less on the cache on the persistent disk. That having said, taken in consideration that if the pagefile fixed on C: with those high usage numbers turn out fine, then why would you not invest in memory instead of disks….
With App-v 5, as I understood it, it will write more to Windows cache locations and therefore you need to scale your environment to that for it will end up in the PVS cache.for me this is still a big question mark that will need time to figure out.
A pagefile of 16GB+1% is in my opinion crazy and should be reduced. We have tested this with lower values and seen that it resulted in BSOD when it was system managed and available RAM was only 8GB. With a fixed pagefile on D: or C: to let’s say 4GB and enough memory assigned to the server this should be fine….
Production ready or not?
The discussion I had with Alex at E2EVc was about if this solution would be production ready. His message was clear, don’t do it for you will run into several issues with stability and applications.
Looking at our pilot environment I can say that at this moment we run 4 XenApp 6.5 servers on one blade with over 80 users working happily. We haven’t had any real issue with this caching method. Having this said I also need to point out that we won’t go into production like this before having a good discussion with the customer and Citrix about the support and best practices.
Currently we are doing VSI load tests with Citrix XenApp on a different host, there we run 5 XenApp servers on one blade, all with Cache to RAM. the tests show that disk queue length is simply not there and that CPU suddenly is an issue again…. We’ll rerun the test in different configuration one of which will be cache to device disk to see what setup is preferable..
I’m interested in your views and comments…