Citrix Local host Cache – connection leasing 2.0, LHC 2.0 or….
With Citrix XenApp / XenDesktop 7.12 Local host cache, better known as LHC, was introduced again. Of course everyone knows LHC from the 6.5 era, it was the solution to provide high availability when the database was unavailable. With the 7.x version Citrix abandoned LHC and relied completely on database high availability. Database high availability is fine but once the connection to that database is gone users are impacted.
Citrix’s solution to this was Connection leasing a solution that would keep a record of the last connection made. During times that the database was unavailable users would be connected based on the last connection. This sounds like a good solution in case of an outage but in pooled VDI environments this was un-workable as desktops could be in use already. The solution so it seemed was designed for a XenApp environment or for static pooled desktops.
Now, already in version 7.12, local host cache is introduced again. Is this connection leasing 2.0, LHC 2.0 or… I was wondering about what they created and wanted to put my thoughts on paper about it.. I think something is missing, it is a way forward but not the solution.
With the 7.12 implementation of LHC Citrix is using a secondary broker service . This secondary broker service is synchronized with the primary broker service. The information synchronized is stored in a LHC database residing on the delivery controller. If changes are found the database is recreated and not just synchronized. with doing this Citrix makes sure the information in the site database is the same as that in the LHC database.
The drawing, copied from the Citrix eDocs site, shows the configuration. Simple but effective. So far so good.
When things go bad and the database is unavailable the secondary broker service is getting the lead. This is where my concern is starting. although it offers high availability the switch between the principal and secondary broker has it’s limitations. Let’s name three of them;
The solution is designed for XenApp (shared environments) and XenDesktop static / assigned desktop pools. The solution does not offer high availability for pooled desktop pools. I can be wrong but 99% of our VDI customers that use either Citrix or VMware use a pooled desktop pool. The assigned pool is one I rarely see, specific use case use this but most scenario’s fit well in the pooled use case. Of course we also have a lot of shared / XenApp environments.
So my thought on this is that Citrix needs to work on a better solution that also works with pooled desktops. It’s a major use case I think.
Switching from the primary broker service to the secondary broker service isn’t done in a second. There is a time-out that that is default 10 minutes (you can change the timeout). With a failover it means that the even though the database was synced not all the information is available… sorry that is beyond me…. When a user connects during the failover it could be given a new session even when the old session is still available. This occurs as the secondary broker service is waiting for all the VDA’s to register and report back what sessions are active.
So my thoughts on this is that it should be a solution that allows for non-interruptible production environment. The current solution is fine when your users get a message that there is a failover and don’t connect for 5-10 minutes. Of course some will say outages are rare with high available databases but that is no reason to design like this. Together with the previous limitation I’m wondering if customers will be that happy with this solution.
When an outage takes place one broker is elected to take charge. In your design you need to make sure one broker is able to handle all requests. All other brokers (secondary ones in this case) will reject any request. The impact on RAM, IOPS and CPU will be massive for that one broker. Think about it that it also has a local database to manage. Of course when this one goes down the next one is elected but it is a risk… you only have that many brokers in your farm and they all will take the full load once they are chosen.
Every time the delivery controller would need to switch to another delivery broker (in case of an outage) all the VDA’s need to re-register. During that you could have the situation that a user gets a third session as not all information is available when they switch. I Think some kind of inter-communication channel like VMware is doing with the connection brokers would be preferable. I don’t see all the time-outs and design issues with sizing a monster delivery controller as an Enterprise solution.
So my thoughts on this are that you design your controllers to big to make sure they are capable to take the load for an outage.
The good the bad and …
Perhaps you think that I’m looking for the bad things in this but that is not the case. I was reading up and I’m glad Citrix is introducing LHC again, connection leasing was a pretty bad way of providing high availability.
The good thing about LHC is that it fails over when the database connection is lost, doesn’t matter whether the database is down or the database is unreachable. The failover also works when the outage takes place during the synchronization, the controller wil revert to the last known state. Citrix worked on a solution to make sure the controller elect the one to be in charge and an election process when a broker goes down.
So that is all good, a well thought process but the limitations are described earlier are also serious.
The timeout in which the functionality is not optimal is some thing I think should be improved, we live in a 24/7/365 environment. 5 minutes is a life time these days. also not supporting pooled desktops is something I can’t understand. I would think if you design LHC from an FMA environment that “works with all pools” is a requirement.
So for me it is not LHC 2.0, it is perhaps connection leasing 2.0 but if we talk about LHC I would call it 0.9 as it is a step back to the great functionality LHC in 6.5 offered.
I’m wondering if Citrix is working on improving this, perhaps during Synergy more is told on this. I keep a watch for this and keep you posted.