After Wild Waters We Enter Smooth Sailing – VDI Successes This Week 3


My previous post outlined some of the challenges we faced during our first few weeks on XenDesktop 7 with Atlantis ILIO backing it.  We still have challenges to work out (such as how to deliver multiple, simultaneous sessions that do NOT roam from workstation to workstation) but the environment itself has been stable.  Let me recap.

The week of 9/4/2013 we faced 2 days in a week where the entire environment went down because the ILIO datastores were filling to capacity.  This problem proved difficult to diagnose given the volume of changes we faced over our summer months.  After ensuring Outlook was forced to Online mode, antivirus and configuration / update management software was disabled, and unique data generating services were in fact disabled, we were in business.  Hyper-V and SCVMM proved to be a real hinderance as well.  Last week, we decided to abandon it in favor of ESXi Standard.  This week has been a dream from the hypervisor level.

One of the biggest challenges this week was that a really dreadful error was not being reported – we discovered it by chance, glancing over the shoulder of a user at a site.  Users were logging in, experiencing a delay at the welcome screen, then being disconnected.  Sometimes it took 2, 3 – sometimes 5 tries to  get a successful login.  At its core, the issue was that logins were taking a long time.  There was some sort of built in timer that was disconnecting users after 90s, so if they didn’t get in they could be stuck in a frustrating loop until that login completed.  Unfortunately, these reports didn’t come in or filter upstream from our helpdesk.  After we became aware of the issue, the following script was used to monitor for it:

 

 

$vdiVms = Get-ADComputer -Filter { Name -like 'VDIWIN7VS*-*' }
$fails = @()
foreach ($vdiVm in $vdiVms) {
 $events = Get-EventLog -LogName 'Application' -Source 'Citrix Desktop Service' -EntryType 'Information' -Message '*reason code NoValue.*' -ComputerName $vdiVm.Name
 if ($events -ne $null) {
 foreach ($event in $events) {
 $time = $event.TimeGenerated.ToString()
 $event.Message -match 'user ''(?<user>[A-Za-z0-9]*)'' has ended' | Out-Null
 $computer = $vdiVm.Name
 $user = $matches['user']

 $properties = @{'MachineName'=$computer; 'Time'=$time;'User'=$user}
 $object = New-Object –TypeName PSObject –Prop $properties
 #$fails += "$time; $computer; $user"
 $fails += $object
 }
 }
}
$fails | Sort-Object -Property time | Write-Host
$path0="D:WriteCacheLoginFails_latest.csv"
$path1="D:WriteCacheLoginFails_$(get-date -format `"yyyy-MM-dd_hh-mm_ss`").csv"
$fails | sort-object -property time | export-csv $path0 -NoTypeInformation
$fails | sort-object -property time | export-csv $path1 -NoTypeInformation
"Runtime: $(get-date)"
"Total Count: $($fails.Count)"

We contacted Citrix Support asking for a specific way to extend the 90s timeout so that we could buy time to troubleshoot WHY logins were taking that long. By all accounts – they shouldn’t. It was affecting existing as well as new profiles, too. After we made a change which (we believe has) solved the issue, Citrix did come back with the following suggestion: Key : HKLMSOFTWAREPoliciesCitrixPortICA 

Value AutoLogonTimeout

Type : DWORD

Value is in seconds.

*Update 1/21/2014* Value is in seconds, not milliseconds as previously stated by Citrix support.  Any value above 3600 will result in the default value of 90 being used.

 

 

This still wouldn’t have resolved our issue, but it would have bought us some time and made users slightly less agitated (even though they weren’t bothered enough to call in).  So what was it?

 

In folder redirection, we had selected the option to MOVE THE CONTENTS of the redirected folder to the new location.  Normally this wouldn’t be a problem but our hypothesis is that since our permissions did not exactly mirror Microsoft’s guidelines that we may have been experiencing an issue.  In the past we had selected the ‘grant exclusive access’ option to ensure permissions were ok, however, that was deselected at some point in the past.  Without this logon delay from folder redirection, we could not detect a single legitimate failure afterwards.  The only other failures that we detected using this script were corrupted profiles which were far more rare (2 out of 1000+ unique users logging in in 24 hours.)

 

The highlight of my day was logging into XD and thinking I had reconnected to a session but realizing I did not.  Atlantis ILIO to thank for that.  Fast fast fast!  My login time was 16s to desktop!

                                      

Leave a comment

Your email address will not be published.

3 thoughts on “After Wild Waters We Enter Smooth Sailing – VDI Successes This Week