RE: gitlab-ce-16.8.2
A few days ago it was reported that all projects were having issues with open CI/CD Settings. A 500 error page would appear. I could duplicate this behavior and no one could recall when it initially started.
After some investigation and frantic google searches our problem seemed to match this older issue https://gitlab.com/gitlab-org/gitlab/-/issues/22782.
I ran the commands mentioned in a more recent comment with the understanding that we might have to reregister runners.
# gitlab-psql
psql (14.10)
Type "help" for help.
gitlabhq_production=# UPDATE projects SET runners_token = null, runners_token_encrypted = null;
UPDATE 153
gitlabhq_production=# UPDATE namespaces SET runners_token = null, runners_token_encrypted = null;
UPDATE 258
gitlabhq_production=# UPDATE application_settings SET runners_registration_token_encrypted = null;
UPDATE 1
gitlabhq_production=# UPDATE ci_runners SET token = null, token_encrypted = null;
UPDATE 4
This allowed us to reach the setting pages. At this point it was discovered that now a banner is displayed "There was an error fetching the variables." and none were visible in the webui. If someone tries to add a variable the banner switches to "Something went wrong on our end. Please try again."
I can see entries in the ci_variables table of pgsql and populated encrypted_value, encrypted_value_salt and encrypted_value_iv values.(including sanitized output)
gitlabhq_production=# select * from ci_variables;
id | key | value | encrypted_value | encrypted_value_salt | encrypted_value_iv | project_id | prot
ected | environment_scope | masked | variable_type | raw | description
21 | ssh_host_ecdsa_key | | <redacted> | <redacted> | <redacted> | 159 | f | * | f | 2 | f | |
(25 rows)
I built a test server and copied the /etc/gitlab contents over and restored a backup from a couple of months ago. This is working and I can see the same entries in the ci_variables table but also can browse to them in the webUI.
Now, I'm not sure if the commands above also caused the variable issue if the runner tokens are related to the ci_variables. At this point I am unsure if I should remove the ci_variables and manually rebuild them from the temporary restore instance (there are around 25 so not terrible) or if there's a better way to handle this. I also don't have a lot of experience with the various container operations so would love to isolate a root cause or establish a timeline.
Advice, constructive feedback, articles and/or targeted links are all greatly appreciated.
Delayed follow up on this but it turned out to be a potential order of operations with the preparation of a new server. This was for a migration to a more modern OS (CentOS 7 to RHEL 9) several months prior. That this wasn't caught sooner is mostly a test plan issue. My restoration to a temporary VM worked so I have to assume I skipped a crucial gitlab-ctl reconfigure
after copying the gitlab-secrets but before a gitlab-backup restore
as outlined in the migration documentation.
Assuming this was the case, I followed the gitlab troubleshooting guide for When the Secrets File is Lost.
This restored the majority of functionality and I was able to use the temporary backup to manually recreate the 25 variables with a copy/paste feature. Several webui pages related to CI/CD settings/runners still had issues.
Rerunning the Verify database values can be decrypted using the current secrets steps identified a few more fields.
# gitlab-rake gitlab:doctor:secrets VERBOSE=1
I, [2024-04-09T17:08:22.275377 #1762562] INFO -- : - ApplicationSetting failures: 1
D, [2024-04-09T17:08:22.275468 #1762562] DEBUG -- : - ApplicationSetting[1]: customers_dot_jwt_signing_key, runners_registration_token, error_tracking_access_token
When checking the application_settings table there were no entries for
customers_dot_jwt_signing_key
error_tracking_access_token
I did find:
encrypted_customers_dot_jwt_signing_key
encrypted_customers_dot_jwt_signing_key_iv
error_tracking_access_token_encrypted
Manually removing the encrypted entries via gitlab-psql
in a manner similar to the troubleshooting documentation worked.
UPDATE application_settings SET encrypted_customers_dot_jwt_signing_key = null, encrypted_customers_dot_jwt_signing_key_iv = null, error_tracking_access_token_encrypted = null;
After this all webui sections for CI/CD and runners worked. Developers were able to delete existing runner entries and add updated runner configurations which setup new keys.
Hope this helps someone in the future.