authenticationkuberneteskeycloakopenid-connectreplication

Keycloak oidc authentication issue on K8s having replica of application server


I am facing an issue with authorization_code grant type on a replica set up in a K8s cluster and seeking for advice and help. My set up is as follows:

  1. 1 instance of Keycloak server running on 1 pod on 1 node.
  2. 2 instances of backend server running on 2 pods on 2 different nodes. (say api1 and api2)

Basically, the problem here is, suppose if the api1 initiates a code verification challenge to the Keycloak during the authentication workflow, and the user after successfully authenticating with Keycloak with a valid username and password, the Keycloak would then invoke the redirectURI of the backend server. However, the redirectURI instead of hitting api1, hits the other instance of backend server api2. And due to this the session state of the Request object for api2 would not have the code_verifier property because of which we are unable to call the /protocol/openid-connect/token api to get the access token.

What I am trying to achieve is either have the redirectURI always hit the same backend server instance that initiated the request OR if there is a way for the backend servers (api1 and api2) to share the sessions so that irrespective of who initiates the request the session will always hold the code_verifier value upon successful authentication with Keycloak. I know this is not a Keycloak specific issue, rather more of K8s thing (I suppose), but if anyone has also encountered this situation before and have managed to do a proper resolution (without compromising HA) then kindly share your knowledge here.

I tried to check if I can attach a sticky session between the Keycloak and backend server so that the redirectURI always hits the same backend server that started the auth request, but unfortunately couldn't find any leads nor any similar problem posted in the community.

Any help or advice is much appreciated. Thanks


Solution

  • So for those who are facing this same issue. Here's how I fixed this.

    Since the requests to the backend server comes through an Ingress, I used a cookie session affinity provided by the Ingress. I used Ingress-NGINX Controller for Kubernetes.

    Below is the configuration I added to the values.yaml of Helm chart in the ingress annotations.

    nginx.ingress.kubernetes.io/affinity: 'cookie'
    nginx.ingress.kubernetes.io/session-cookie-path: '/'
    

    And since the backend is fully stateless except for the one time during the authentication stage using OIDC authorization_code grant type where it stores the code_verifier value on the session, so we didn't have to worry about the limitations of this approach such as during node restarts, container restart, auto-scaling, resource starvation, new rollouts, load balancing which would impact the logged-in users if the backend was maintaining the other authentication state. We manage those all through cookies so even if the pods are down/destroyed/replaced and new pods can still handle those active sessions.

    Here is a link to an article by Paul Dally from where I took some reference associated with the above problem.