Kitchen EC2 on Windows - Eventual WinRM::WinRMAuthorizationError

I am using the Kitchen EC2 driver for testing some Windows "base" cookbooks on Windows Server 2012R2, 2016, and 1803 AMIs from Amazon. I'm encountering what seems like a problem that happens after a specific duration of time while executing a recipe via Kitchen, as opposed to hitting a specific line of code.

At some point during the recipe converge it stops with a WinRM::WinRMAuthorizationError. If I rerun the converge it immediately boots me out with the same error. I've tried changing and moving resources in the recipe, and it seems like it isn't a problem with a specific part of the recipe, it just happens randomly and then continues to happen.

This started because I'm trying to remove our custom user-data script, and instead use the one that the Kitchen-EC2 driver generates. They are roughly the same, but they just do things in slightly different ways.

I'm planning to do more troubleshooting (this seems like a good resource) but given that I'm trying to use vanilla settings, I'm hoping I'm missing something obvious.

Relevant part of the .kitchen.yml:

transport:
  name: 'winrm'
  elevated: true
  username: 'Administrator'
  ssh_key: ~/.ssh/test-kitchen

Default user-data:

# Logic for determining $logfile is removed...
$logfile='C:\Program Files\Amazon\Ec2ConfigService\Logs\kitchen-ec2.log'

# Allow script execution
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Force

# PS Remoting and & winrm.cmd basic config
Enable-PSRemoting -Force -skipnetworkprofilecheck

& winrm.cmd set winrm/config '@{MaxTimeoutms="1800000"}' >> $logfile
& winrm.cmd set winrm/config/winrs '@{MaxMemoryPerShellMB="1024"}' >> $logfile
& winrm.cmd set winrm/config/winrs '@{MaxShellsPerUser="50"}' >> $logfile
& winrm.cmd set winrm/config/winrs '@{MaxMemoryPerShellMB="1024"}' >> $logfile

# Firewall Config
& netsh advfirewall firewall set rule name="Windows Remote Management (HTTP-In)" profile=public protocol=tcp localport=5985 remoteip=localsubnet new remoteip=any  >> $logfile
Set-ItemProperty -Name LocalAccountTokenFilterPolicy -Path HKLM:\software\Microsoft\Windows\CurrentVersion\Policies\system -Value 1

Solution

My best guess is this had to do with the AMI I was using, which was a CIS image.

More specifically, I think the userdata script did work for initial WinRM authorization. But some of the CIS AMI group policies that conflict with the winrm.cmd commands in the userdata were possibly later applied and that killed the connection. My thought is that running gpupdate later as part of the normal recipe was reapplying them.

I think running these commands these in userdata was what helped. It requires installing module that contains the Set-PolicyFileEntry command.

Set-PolicyFileEntry -Path $MachineDir -Key Software\Policies\Microsoft\Windows\WinRM\Client -ValueName AllowBasic -Data 1 -Type DWord  
Set-PolicyFileEntry -Path $MachineDir -Key Software\Policies\Microsoft\Windows\WinRM\Service -ValueName AllowBasic -Data 1 -Type DWord 
Set-PolicyFileEntry -Path $MachineDir -Key Software\Policies\Microsoft\WindowsFirewall\DomainProfile -ValueName EnableFirewall -Data 0 -Type DWord 
Set-PolicyFileEntry -Path $MachineDir -Key Software\Policies\Microsoft\WindowsFirewall\PublicProfile -ValueName EnableFirewall -Data 0 -Type DWord 
Set-PolicyFileEntry -Path $MachineDir -Key Software\Policies\Microsoft\WindowsFirewall\PrivateProfile -ValueName EnableFirewall -Data 0 -Type DWord 
Set-PolicyFileEntry -Path $MachineDir -Key Software\Microsoft\Windows\CurrentVersion\Policies\System -ValueName LocalAccountTokenFilterPolicy -Data 1 -Type DWord 

gpupdate