I have a web-grpc frontend application that communicates with my gRPC backend. As browsers do not "speak" gRPC, I need an envoy proxy to transform http requests to actual grpc and back. Locally, my setup works well, and it looks like:
Browser web-grpc -----> Envoy proxy -------> gRPC backend
I can even dockerize every component and launch them locally and they run fine.
I deployed every containerized component (3 in total) to Google Cloud Run instances. Cloud Run handles SSL/TLS by default, wrapping it around the provided container. So, I can execute calls to the frontend using https
and the envoy proxy using https
. And I actually can execute grpc calls to the backend gRPC service using Postman, or coding a gRPC client myself as long as I use SSL.
What I cannot do is enable the Envoy proxy to use TLS when initiating connections to the gRPC backend.
The Cloud Run host is my-grpc-server.a.run.app
, given by Cloud Run.
Proof that the backend is up and running correctly from Postman. Note the lock icon at the left of the host, signaling the use of SSL/TLS, and the protocol grpc
:
And using this Golang gRPC code I can call the service correctly too:
host = "my-grpc-server.a.run.app"
port = "443"
address := fmt.Sprintf("%s:%s", host, port)
var opts []grpc.DialOption
opts = append(opts, grpc.WithAuthority(host))
systemRoots, err := x509.SystemCertPool()
if err != nil {
log.Fatalf("Failed to read system root CA certificates: %v", err)
}
cred := credentials.NewTLS(&tls.Config{
RootCAs: systemRoots,
})
opts = append(opts, grpc.WithTransportCredentials(cred))
conn, err := grpc.Dial(address, opts...)
if err != nil {
log.Fatalf("Failed to connect to %s:%s: %v", host, port, err)
}
defer conn.Close()
client := // Build the client
client.Check // Execute the health check correctly.
I have tried setting up SSL/TLS in my Envoy proxy without success. The non-SSL/TLS configuration is the Envoy gRPC vanilla configuration from the docs with additional CORS configuration, and it looks like:
admin:
address:
socket_address: { address: 127.0.0.1, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 127.0.0.1, port_value: 8080 }
filter_chains:
- filters:
- name: envoy.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: auto
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
typed_per_filter_config:
envoy.filters.http.cors:
"@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.CorsPolicy
allow_origin_string_match:
- safe_regex:
regex: \*
allow_methods: "GET,POST,PUT,PATCH,DELETE,OPTIONS"
allow_headers: "DNT,User-Agent,X-User-Agent,X-Grpc-Web,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Access-Control-Allow-Origin"
allow_credentials: true
expose_headers: grpc-status,grpc-message
max_age: "1728000"
routes:
- match: { prefix: "/" }
route: { cluster: my_grpc_service }
http_filters:
- name: envoy.filters.http.cors
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
- name: envoy.grpc_web
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: my_grpc_service
connect_timeout: 3.0s
type: STATIC
http2_protocol_options: {}
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: my_grpc_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1 // Localhost when using it locally
port_value: 50051 // The port needed locally
health_checks:
timeout: 1s
interval: 10s
unhealthy_threshold: 2
healthy_threshold: 2
grpc_health_check: {}
I have tried adding the next block at the same indentation level of load_assignment
, and pointing to the right host and port:
// ...
socket_address:
address: my-grpc-server.a.run.app // The Cloud Run host
port_value: 443 // The default SSL port
// ...
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
But now Envoy seems to stop trying to use gRPC and instead forward the http2 requests from the frontend as they are received, because in the browser I can see a Google-crafted error message from Cloud Run, in the Grpc-Message
response header, saying (among other things):
That's an error. The requested URL <code>/some.path.Health/Check</code> was not found on this server. That's all we know.
I have also tried adding my trusted CA file, but the error is the same as the previous one:
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
validation_context:
match_subject_alt_names:
- exact: "my-grpc-server.a.run.app"
trusted_ca:
filename: /path/to/cert.pem
Using sni
did not solve the issue either.
Using FileAccessLog I can see the gRPC Status is:
12, UNIMPLEMENTED
# When successfuly running locally without TLS, the status is:
2, UNKNOWN
Other resources on the web are confusing because they seem to setup SSL for the Envoy listeners (SSL termination), not the clusters (SSL creation).
Can someone point me in the right direction?
For some additional context, I do not need mutual authentication, and as shown, the frontend and backend code are most probably correct. The issue seems contained within Envoy configuration.
The tools I am using are:
grpc-web 1.4.2 (npm)
envoy version: 7bba38b743bb3bca22dffb4a21c38ccc155fbef8/1.27.0/Distribution/RELEASE/BoringSSL
GCloud Run
Setting auto_host_rewrite: true
at the level of route
solved the issue:
- match: { prefix: "/" }
route:
cluster: my_grpc_service
auto_host_rewrite: true
Thanks a lot Josef Gattermayer and your post that contained a fully working GCloud Run envoy proxy: https://www.ackee.agency/blog/how-to-setup-a-grpc-web-backend-on-google-cloud-run-with-envoy-proxy
By the time I am writting this answer in November 2023, it still works.