We have setup a GKE cluster using Terraform with private and shared networking:
Network configuration:
resource "google_compute_subnetwork" "int_kube02" {
name = "int-kube02"
region = var.region
project = "infrastructure"
network = "projects/infrastructure/global/networks/net-10-23-0-0-16"
ip_cidr_range = "10.23.5.0/24"
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.60.0.0/14" # 10.60 - 10.63
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.56.0.0/16"
}
}
Cluster configuration:
resource "google_container_cluster" "gke_kube02" {
name = "kube02"
location = var.region
initial_node_count = var.gke_kube02_num_nodes
network = "projects/ninfrastructure/global/networks/net-10-23-0-0-16"
subnetwork = "projects/infrastructure/regions/europe-west3/subnetworks/int-kube02"
master_authorized_networks_config {
cidr_blocks {
display_name = "admin vpn"
cidr_block = "10.42.255.0/24"
}
cidr_blocks {
display_name = "monitoring server"
cidr_block = "10.42.4.33/32"
}
cidr_blocks {
display_name = "cluster nodes"
cidr_block = "10.23.5.0/24"
}
}
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = true
master_ipv4_cidr_block = "192.168.23.0/28"
}
node_config {
machine_type = "e2-highcpu-2"
tags = ["kube-no-external-ip"]
metadata = {
disable-legacy-endpoints = true
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
The cluster is online and running fine. If I connect to one of the worker nodes i can reach the api using curl
:
curl -k https://192.168.23.2
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
I also see a healthy cluster when using a SSH port forward:
❯ k get pods --all-namespaces --insecure-skip-tls-verify=true
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system event-exporter-gke-5479fd58c8-mv24r 2/2 Running 0 4h44m
kube-system fluentbit-gke-ckkwh 2/2 Running 0 4h44m
kube-system fluentbit-gke-lblkz 2/2 Running 0 4h44m
kube-system fluentbit-gke-zglv2 2/2 Running 4 4h44m
kube-system gke-metrics-agent-j72d9 1/1 Running 0 4h44m
kube-system gke-metrics-agent-ttrzk 1/1 Running 0 4h44m
kube-system gke-metrics-agent-wbqgc 1/1 Running 0 4h44m
kube-system kube-dns-697dc8fc8b-rbf5b 4/4 Running 5 4h44m
kube-system kube-dns-697dc8fc8b-vnqb4 4/4 Running 1 4h44m
kube-system kube-dns-autoscaler-844c9d9448-f6sqw 1/1 Running 0 4h44m
kube-system kube-proxy-gke-kube02-default-pool-2bf58182-xgp7 1/1 Running 0 4h43m
kube-system kube-proxy-gke-kube02-default-pool-707f5d51-s4xw 1/1 Running 0 4h43m
kube-system kube-proxy-gke-kube02-default-pool-bd2c130d-c67h 1/1 Running 0 4h43m
kube-system l7-default-backend-6654b9bccb-mw6bp 1/1 Running 0 4h44m
kube-system metrics-server-v0.4.4-857776bc9c-sq9kd 2/2 Running 0 4h43m
kube-system pdcsi-node-5zlb7 2/2 Running 0 4h44m
kube-system pdcsi-node-kn2zb 2/2 Running 0 4h44m
kube-system pdcsi-node-swhp9 2/2 Running 0 4h44m
So far so good. Then I setup the Cloud Router to announce the 192.168.23.0/28
network. This was successful and replicated to our local site using BGP. Running show route 192.168.23.2
displays the correct route is advertised and installed.
When trying to reach the API from the monitoring server 10.42.4.33
I just run into timeouts. All three, the Cloud VPN, the Cloud Router and the Kubernetes Cluster run in europe-west3
.
When i try to ping one of the workers its working completely fine, so networking in general works:
[me@monitoring ~]$ ping 10.23.5.216
PING 10.23.5.216 (10.23.5.216) 56(84) bytes of data.
64 bytes from 10.23.5.216: icmp_seq=1 ttl=63 time=8.21 ms
64 bytes from 10.23.5.216: icmp_seq=2 ttl=63 time=7.70 ms
64 bytes from 10.23.5.216: icmp_seq=3 ttl=63 time=5.41 ms
64 bytes from 10.23.5.216: icmp_seq=4 ttl=63 time=7.98 ms
Googles Documentation gives no hit what could be missing. From what I understand the Cluster API should be reachable by now.
What could be missing and why is the API not reachable via VPN?
I have been missing the peering configuration documented here: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#cp-on-prem-routing
resource "google_compute_network_peering_routes_config" "peer_kube02" {
peering = google_container_cluster.gke_kube02.private_cluster_config[0].peering_name
project = "infrastructure"
network = "net-10-13-0-0-16"
export_custom_routes = true
import_custom_routes = false
}