This was such a weird, wonderful and perfect storm issue I felt I had to post it.
Recently while configuring a XenApp 6.5 hosted desktop environment we experienced a weird CPU spike across all XenApp sessions on a server. Intermittently throughout the day, the XenApp’s CPU would max out, and as the load increased on the XenApp servers the timeframe in which the CPU would max out would get longer and longer.
From Initial troubleshooting we found that after 7 users logged into the XenApp server, the next user login would cause the server to hang.
To Process explorer!
During this hang, using process explorer we found an instance of taskhost.exe for each logged in user running at this time:
As most of you will already know, taskhost.exe is the process handler for scheduled tasks, and as most of you will also know scheduled tasks have changed in a big way since server 2003.
In Server 2008 and upwards scheduled tasks are no longer limited to times or schedules, scheduled tasks can now operate on triggers or events too.
The user login process in our case seemed to be triggering an event across all sessions that was consuming all the CPU, but which task runs on every login?
As I ventured further down the rabbits hole on this one, I resorted to my good friend PowerShell to catch a user login and see which processes or “Scheduled Tasks” were running at this time. As the CPU was maxing out during this process, it was quite easy to catch first time.
Below is the command I used to catch the running tasks during the CPU spike:
$ts = New-Object -ComObject Schedule.Service
and the output of the above command gave us the following hint:
To Server Manager!
Taking the above path as an indicator, I launched Windows Server Manager and browsed to Configuration > task scheduler > Task Scheduler Library > Microsoft > Windows > CertificateServicesClient:
Under this task category, we can see 3 tasks.
As above, its the UserTask that seems to be running, so lets have a look at this. Below is a list of triggers configured for this task. As you can see, the two interesting triggers are the “On an event” & “At Logon”.
Looking at the history tab, we can see that during a user login, both events are triggered as below. The Second trigger seems to be related to the event log which turned out to be not related as Microsoft supress this task from running twice it seems by assigning a zero value GUID to the task.
The first trigger however is the money shot, As you can see below this user log in caused this event to trigger, which then triggered in a cascading effect in all user sessions:
So with the culprit at hand and more troubleshooting, we found that Interestingly, if a user disconnected and reconnected, this issue did not occur. And even more interesting again, if a stale user profile belonging to the user was still present on the server, this task did not trigger.
As with most of my implementations I chose to use mandatory profiles with this customer, with some sleuthing it turns out this event was triggering for the user login only if this was a brand new user profile creation (from mandatory) and only in this event was further tasks being spawned in the user sessions.
So to review, every day, when users were creating “new” or “clean” profiles from the mandatory profile this event would trigger for them and in turn for every other logged in user.
Going to the lab, and talking to a Microsoft support representative, it turns out this turn of events is by design, frustratingly Microsoft stopped before they confirmed that this event triggering in all users sessions is a bug, but agreed it was unnecessary. The confusing thing to arrise out of all of this was:
If this is by design, why is only this customer suffering this issue and not every Citrix XenApp environment with Mandatory profiles?
To Group Policy!
Looking further into certificate settings in the customer environment, It turns out that, as part of the default, enforced domain policy, the customer is assigning a large amount of trusted root certificates:
Now this is not the way Microsoft recommend you deploy Root Certificates, but due to a corruption in Certificate Services and a Microsoft representative proposing this as a work around. This is how the customer was deploying their trusted roots.
But this is a computer policy? Why is this hampering user logins?
Yep, you guessed it, Group Policy loop back processing!
Group policy Loop back processing was causing this computer policy to reapply on each login for the user.
So in review, this issue was caused by:
- Task scheduler in windows triggering events in user sessions on every login to a server.
- Utilising Mandatory profiles with Windows RDS.
- The customer storing a large number of Certificates in Group policy.
- Loop back processing enabled without full consideration of the policies being applied.
You took me this far, how did you resolve it?
Well if you’re curious to know how we got around this issue, read below:
1: Once I realised it was taskhost.exe that was causing the CPU spikes, I utilised an application i wrote ThreadLocker, to restrict taskhost.exe to:
- two cores
- its process priority to idle
This resulted in the tasks taking much longer to complete but the user sessions were uninterrupted during the process.
This bought us precious time to troubleshoot as this issue was in a production environment.
2: Once we realized this issue was related to the certificate services, we disabled the client scheduled task temporarily while we devised a new solution for active directory. We could not disable Loop Back processing due to its dependency in the environment.
3: The customer moved the certificates to a non enforced domain policy and we restricted this policy from propagating to the XenApp servers. We then re-enabled the client task and remove the rule from ThreadLocker as it was no longer needed.