{"id":3141,"date":"2014-10-25T22:27:21","date_gmt":"2014-10-26T06:27:21","guid":{"rendered":"http:\/\/www.atumvirt.com\/?p=3141"},"modified":"2014-10-25T22:27:21","modified_gmt":"2014-10-26T06:27:21","slug":"the-strange-case-of-no-desktops-available","status":"publish","type":"post","link":"https:\/\/avtempwp.azurewebsites.net\/2014\/10\/the-strange-case-of-no-desktops-available\/","title":{"rendered":"The Strange Case of \u201cNo Desktops Available\u201d"},"content":{"rendered":"
I recently spoke with someone and realized I hadn\u2019t created a post for this yet \u2013 much to my disappointment, as it was a major accomplishment to have the issue solved. At my previous employer (K-12 education) the environment was setup as follows:<\/p>\n
On February 28th, there was a brief \u201coutage” period where users were reporting \u201cNo Desktops Available.\u201d I frantically scrambled to find out why \u2013 there were only about 400 users logged in so we should have had plenty of reserve available and powered on or at the very least more machines powering on. It wasn\u2019t every login that was failing, either. During my troubleshooting, school ended and I was eventually unable to duplicate the issue, despite having changed nothing. Since I couldn\u2019t duplicate, I looked a bit longer for a cause then chalked it up to a ghost I\u2019d have to watch out for in the future.<\/p>\n
The same thing happened on March 5th, at which time my alarms went into overdrive. We didn\u2019t have support, so it was definitely time to call in reinforcements. A quick check in #Citrix proved unfruitful at the time and it seemed we were on our own. We had previously had difficulties getting satisfactory results out of support so we were reluctant to pay per-hour support from a partner. As before, while troubleshooting the issue, it went away seemingly on its own. It was at an earlier time of day, too, which proved strange.<\/p>\n
On March 12th and 13th the same symptoms were experienced but reports didn\u2019t trickle in through the helpdesk to the right channels, despite being extra vigilant. On the 21st, however, it happened again and being on XenDesktop 7 (which was being serviced only through private, non-publicized hotfixes), we decided an \u201cemergency\u201d change request to XenDesktop 7.1 was in order. This update proved unfruitful as the issue re-occurred on the 26th as well, however between the 21st and 26th we had arranged for a Citrix case to be opened through a partner.<\/p>\n
On the 26th, with support on the line, we captured the standard stuff \u2013 Scout, CDF traces, event logs, etc. of every conceivable component that support could muster \u2013 including powering on all 1100 VMs and collecting their broker agent logs – but again, the issue went away on its own. Since support seemed pretty lost (which turns out to be somewhat common, unfortunately), I decided to give #Citrix another whirl. A Citrix Support staff member who participates in the channel reviewed my CDF trace and uploads. The next day I received a nice PM, paraphrased below:<\/p>\n
\u201cWe looked over your logs. An incredibly smart guy on my team has a hunch. You\u2019re using the default power settings, yes?\u201d<\/p>\n
Me: \u201cYes\u201d<\/p>\n
\u201cCheck MaxPowerActionsPerMinute\u201d.<\/p>\n
Since I had a support case that was being escalated at this point I passed this tidbit along to support, who shrugged it off, despite me trying to emphasize the importance that this was a recommendation from another unit within support. Given the visibility this case had within the organization, we needed this resolved and an official <\/em><\/strong>answer. There were weeks of inactivity due to the fact that the issue didn\u2019t re-occur. I had set the power settings to keep all 1100 vms powered on in order to mitigate the risk during a critical testing time. However for the better part of a month I harped on support about that setting but was ignored again and again. Finally, reflecting on my earlier work (and post) digging around in the XenDesktop database, I decided to go find out where the Power Actions were stored. I initially didn\u2019t find the Get-BrokerHostingPowerAction cmdlet (I didn\u2019t look very hard, because support told me the answer from development was that there was \u201cNo way to find this information, we don\u2019t expose it\u201d). So instead, I turned to the VirtualCenter database, which logs the tasks. My horror was immediately confirmed when I saw the pattern in the tasks submitted from svc-xendesktop.<\/p>\n There were a large number of commands being sent, then they slow down to 10 per minute, but every 60 seconds there\u2019s 10 new commands \u2013 indicating that the commands are queuing, then executing. When I submitted this database dump and my previous assertion that MaxPowerActionsPerMinute was suggested by support and that we were not going to renew SA<\/strong> because we weren\u2019t getting satistfactory results with support, I got an e-mail within 6 hours that had the \u201cUse Get-BrokerHostingPowerAction\u201d answer (eye roll, good job there guys).<\/p>\n Get-BrokerHostingPowerAction did exactly what I wanted \u2013 it showed the output of the Pending actions which immediately revealed during peak hours I was seeing 250-350 queued commands, which over a period of logons and logoffs, would result in not enough VDA\u2019s being registered and unable to be powered on in a timely fashion, thus resulting in \u201cNo desktops available\u201d. Finally, after nearly 2 months, I was almost free. I had the answer, I had proven to support and escalation support that the issue was indeed power management and the advanced setting \u201cMaxPowerActionsPerMinute\u201d had to be changed in our environment, so we asked for the official recommendation. After a week or two of waiting, they finally came back with (paraphrased), \u201cwe haven no official answer, you will need to test it in your environment.\u201d That\u2019s exactly what I did. I found that with the storage configuration, host configuration of my VDI servers CPU and RAM, hypervisor (vSphere standard) and solid PVS servers I could safely execute about 60 power actions per minute without causing the host CPU\u2019s to go hog-wild and degrade the experience while a large number of VMs booted (this number is closer to 100 in that environment, but I intentionally set it much lower to provide a cushion during the daytime).<\/p>\n In a pooled VDI environment, delivery groups are set, by default, to power off machines after use. This behavior can be configured with the following PowerShell commands:<\/p>\n Set-BrokerDesktopGroup -Name “Desktop Group Name” -ShutdownDesktopsAfterUse $True<\/p>\n Set-BrokerDesktopGroup -Name “Desktop Group Name” -ShutdownDesktopsAfterUse $False<\/p>\n<\/blockquote>\n When a logoff occurs, XenDesktop will send the Hypervisor connection associated with that VM a graceful shutdown command (i.e. using VMware tools). This task is limited by the 10 new actions per minute setting on the Hypervisor resource in the site configuration, shown below<\/p>\n <\/p>\nSo why was this an issue?<\/h2>\n
\n