Last week my team noticed some of our PowerShell AD Group management scripts throwing errors. After a little digging, we found that the 'Get-ADGroupMember' command was timing out when pointed to certain domain controllers. We have nine total DCs: five VMs and four physical. The issue is repeatable only on the four physical. Those four are running Server 2016; the VMs are Server 2022.
When targeting the unaffected DCs, a sample command will return in milliseconds. The impacted DCs timeout after the five minute limit. The test group is not big. It has less than 300 members.
I put ADWS in debug mode (as described here)and captured the process on one of the slow DCs and on one of the 'normal' DCs. Nothing stand out to me other than, once the process starts getting each member object, the time each object takes to retrieve is slower. Timing on other commands, such as Get-ADUser, don't seem to show any consistent measurable differences.
We do not appear to be resource bound but, I did note that while the command is running on the slow DCs, we're seeing all the activity on a single core.
We're coming off holiday break so usage has been light so it's hard to pin down exactly when the issue started. We also made a major change to our IAM processes a few weeks ago that roughly doubled our number of user objects from circa 30k to 70k. Our DC health checks and replication all show green following that change.
Short of this being a 2016 specific issue with the December updates, I'm at a loss of where to go next. The physical servers are 8 years old but were significantly over spec. While one core is being used during the call, overall system usage is low:
Slow log sample:
ActiveDirectoryWebServices: [1/11/2025 11:47:05 PM] [31] GetADGroupMember: found sdsamGroupPrincipal CN=GroupX
ActiveDirectoryWebServices: [1/11/2025 11:47:06 PM] [31] GetADGroupMember: member of CN=XXX
DirectoryUtilities: [1/11/2025 11:47:06 PM] [31] GetTimeRemaining: remaining time is 00:05:09.5457365
ActiveDirectoryWebServices: [1/11/2025 11:47:06 PM] [31] PopulateObjectClasses: adding class top
ActiveDirectoryWebServices: [1/11/2025 11:47:06 PM] [31] PopulateObjectClasses: adding class person
ActiveDirectoryWebServices: [1/11/2025 11:47:06 PM] [31] PopulateObjectClasses: adding class organizationalPerson
ActiveDirectoryWebServices: [1/11/2025 11:47:06 PM] [31] PopulateObjectClasses: adding class user
ActiveDirectoryWebServices: [1/11/2025 11:47:06 PM] [31] GetLocalDnsDomainName: Retrieved domain name 'domain.name'
ActiveDirectoryWebServices: [1/11/2025 11:47:06 PM] [31] GetADGroupMember: member of CN=YYY
DirectoryUtilities: [1/11/2025 11:47:06 PM] [31] GetTimeRemaining: remaining time is 00:05:09.4051086
ActiveDirectoryWebServices: [1/11/2025 11:47:07 PM] [31] PopulateObjectClasses: adding class top
ActiveDirectoryWebServices: [1/11/2025 11:47:07 PM] [31] PopulateObjectClasses: adding class person
ActiveDirectoryWebServices: [1/11/2025 11:47:07 PM] [31] PopulateObjectClasses: adding class organizationalPerson
ActiveDirectoryWebServices: [1/11/2025 11:47:07 PM] [31] PopulateObjectClasses: adding class user
ActiveDirectoryWebServices: [1/11/2025 11:47:07 PM] [31] GetADGroupMember: member of CN=ZZZ
DirectoryUtilities: [1/11/2025 11:47:07 PM] [31] GetTimeRemaining: remaining time is 00:05:07.5925597
ActiveDirectoryWebServices: [1/11/2025 11:47:09 PM] [31] PopulateObjectClasses: adding class top
ActiveDirectoryWebServices: [1/11/2025 11:47:09 PM] [31] PopulateObjectClasses: adding class person
ActiveDirectoryWebServices: [1/11/2025 11:47:09 PM] [31] PopulateObjectClasses: adding class organizationalPerson
ActiveDirectoryWebServices: [1/11/2025 11:47:09 PM] [31] PopulateObjectClasses: adding class user
ActiveDirectoryWebServices: [1/11/2025 11:47:09 PM] [31] GetADGroupMember: member of CN=AAA
DirectoryUtilities: [1/11/2025 11:47:10 PM] [31] GetTimeRemaining: remaining time is 00:05:05.5456304
The first record appears to return quickly. Subsequent records look like the take 1s to 2s.
It is unclear as to why or how, but simply starting and stopping the Directory Services Authentication Scripts found here permanently resolved the issue. The issue will still manifest after the start-auth script runs; it's only remediated after the stop-auth script is executed. Now that the issue is resolved, I have no way to test or determine which specific command in the script is the key.