influxdbazure-akschronograf

panic: invalid page type: 2: 10 & page 5 already freed on Kubed Chronograf


I have influx v1.7.9 on Azure kubernetes service and I tried to add chronograf but failed to start, it has a PVC to store data using an azure storage account.

    panic: invalid page type: 2: 10

    goroutine 1 [running]:
    github.com/boltdb/bolt.(*Cursor).search(0xc000551960, 0x2f083e0, 0x5, 0x5, 0x2)
        /root/go/pkg/mod/github.com/boltdb/bolt@v0.0.0-20160719165138-5cc10bbbc5c1/cursor.go:256 +0x354
    github.com/boltdb/bolt.(*Cursor).seek(0xc000551960, 0x2f083e0, 0x5, 0x5, 0xc000b03640, 0x40d619, 0xc0000b92e8, 0x8, 0x8, 0x1417b00, ...)
        /root/go/pkg/mod/github.com/boltdb/bolt@v0.0.0-20160719165138-5cc10bbbc5c1/cursor.go:159 +0x7e
    github.com/boltdb/bolt.(*Bucket).Bucket(0xc000116478, 0x2f083e0, 0x5, 0x5, 0xc000b03718)
        /root/go/pkg/mod/github.com/boltdb/bolt@v0.0.0-20160719165138-5cc10bbbc5c1/bucket.go:112 +0xef
    github.com/boltdb/bolt.(*Tx).Bucket(...)
        /root/go/pkg/mod/github.com/boltdb/bolt@v0.0.0-20160719165138-5cc10bbbc5c1/tx.go:101
    github.com/influxdata/chronograf/bolt.(*BuildStore).get(0xc0000b9188, 0x207e0e0, 0xc0000b4010, 0xc000116460, 0xc00000cea0, 0xc000b03748, 0x42da8f, 0xc000000008, 0xc0000bc040, 0x0)
        /root/go/src/github.com/influxdata/chronograf/bolt/build.go:66 +0x77
    github.com/influxdata/chronograf/bolt.(*BuildStore).Get.func1(0xc000116460, 0x1ed3cb8, 0xc000116460)
        /root/go/src/github.com/influxdata/chronograf/bolt/build.go:30 +0x53
    github.com/boltdb/bolt.(*DB).View(0xc00000cd20, 0xc000b037e8, 0x0, 0x0)
        /root/go/pkg/mod/github.com/boltdb/bolt@v0.0.0-20160719165138-5cc10bbbc5c1/db.go:626 +0x90
    github.com/influxdata/chronograf/bolt.(*BuildStore).Get(0xc0000b9188, 0x207e0e0, 0xc0000b4010, 0xc00047ba00, 0xc000b03880, 0x4d293d, 0xc0000462aa, 0x24, 0xc00000cd20)
        /root/go/src/github.com/influxdata/chronograf/bolt/build.go:28 +0xa2
    github.com/influxdata/chronograf/bolt.(*Client).backup(0xc0000c0dc0, 0x207e0e0, 0xc0000b4010, 0x202af30, 0x6, 0x2071000, 0x28, 0xc000b03900, 0x418bfb)
        /root/go/src/github.com/influxdata/chronograf/bolt/client.go:267 +0x4a
    github.com/influxdata/chronograf/bolt.(*Client).Open(0xc0000c0dc0, 0x207e0e0, 0xc0000b4010, 0x2082660, 0xc0000b9058, 0x202af30, 0x6, 0x2071000, 0x28, 0xc000b03a98, ...)
        /root/go/src/github.com/influxdata/chronograf/bolt/client.go:107 +0x58a
    github.com/influxdata/chronograf/server.openService(0x207e0e0, 0xc0000b4010, 0x202af30, 0x6, 0x2071000, 0x28, 0xc0000462aa, 0x24, 0x2056de0, 0xc0003dffb0, ...)
        /root/go/src/github.com/influxdata/chronograf/server/server.go:451 +0x143
    github.com/influxdata/chronograf/server.(*Server).Serve(0xc000095880, 0x207e0e0, 0xc0000b4010, 0x0, 0x0)
        /root/go/src/github.com/influxdata/chronograf/server/server.go:343 +0x498
    main.main()
        /root/go/src/github.com/influxdata/chronograf/cmd/chronograf/main.go:47 +0x1ec

Now, when I delete the chronograf deployment and files from the volume I got other errors.

panic: page 5 already freed

goroutine 1 [running]:
github.com/boltdb/bolt.(*freelist).free(0xc000710f60, 0x3, 0x7f1c06bd8000)
    /root/go/pkg/mod/github.com/boltdb/bolt@v0.0.0-20160719165138-5cc10bbbc5c1/freelist.go:117 +0x2a6
github.com/boltdb/bolt.(*Tx).Commit(0xc000116620, 0x0, 0x0)
    /root/go/pkg/mod/github.com/boltdb/bolt@v0.0.0-20160719165138-5cc10bbbc5c1/tx.go:176 +0x1b7
github.com/boltdb/bolt.(*DB).Update(0xc00000c1e0, 0xc000a8f790, 0x0, 0x0)
    /root/go/pkg/mod/github.com/boltdb/bolt@v0.0.0-20160719165138-5cc10bbbc5c1/db.go:602 +0xe8
github.com/influxdata/chronograf/bolt.(*OrganizationsStore).CreateDefault(0xc0000b88b8, 0x207e0e0, 0xc0000b4010, 0x0, 0x0)
    /root/go/src/github.com/influxdata/chronograf/bolt/organizations.go:55 +0x1b5
github.com/influxdata/chronograf/bolt.(*OrganizationsStore).Migrate(...)
    /root/go/src/github.com/influxdata/chronograf/bolt/organizations.go:37
github.com/influxdata/chronograf/bolt.(*Client).migrate(0xc0002aa0a0, 0x207e0e0, 0xc0000b4010, 0x202af30, 0x6, 0x2071000, 0x28, 0x0, 0x0)
    /root/go/src/github.com/influxdata/chronograf/bolt/client.go:186 +0x64
github.com/influxdata/chronograf/bolt.(*Client).Open(0xc0002aa0a0, 0x207e0e0, 0xc0000b4010, 0x2082660, 0xc0000b8120, 0x202af30, 0x6, 0x2071000, 0x28, 0xc000a8fa98, ...)
    /root/go/src/github.com/influxdata/chronograf/bolt/client.go:116 +0x423
github.com/influxdata/chronograf/server.openService(0x207e0e0, 0xc0000b4010, 0x202af30, 0x6, 0x2071000, 0x28, 0xc0000462aa, 0x24, 0x2056de0, 0xc000710e70, ...)
    /root/go/src/github.com/influxdata/chronograf/server/server.go:451 +0x143
github.com/influxdata/chronograf/server.(*Server).Serve(0xc00038a380, 0x207e0e0, 0xc0000b4010, 0x0, 0x0)
    /root/go/src/github.com/influxdata/chronograf/server/server.go:343 +0x498
main.main()
    /root/go/src/github.com/influxdata/chronograf/cmd/chronograf/main.go:47 +0x1ec

Config YAML files.

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: chronograf
spec:
  replicas: 1
  selector:
    matchLabels:
      component: chronograf
  template:
    metadata:
      labels:
        component: chronograf
    spec:
      initContainers:
        - name: wait-services
          image: busybox
          command: ['sh', '-c', 'until nslookup influx-svc.default.svc.cluster.local; do echo waiting service start; sleep 2; done;']
      containers:
        - name: chronograf
          image: chronograf:1.7.16
          ports:
            - containerPort: 8888
              name: http
          env:
            - name: "influxdb-url"
              value: "http://influx-svc:8086"
          volumeMounts:
            - mountPath: /var/lib/chronograf
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: chronograf-data

Service:

apiVersion: v1
kind: Service
metadata:
  name: chronograf-dashboard
  labels:
    component: chronograf
spec:
  externalTrafficPolicy: Cluster
  type: LoadBalancer
  selector:
    component: chronograf
  ports:
    - port: 80
      name: http
      targetPort: 8888

Persistent Volume Claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: chronograf-data
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: storage-account
  resources:
    requests:
      storage: 5Gi

Solution

  • According to the test, I found that the problem must caused by the property mountOptions of the storage class. I got the same error as you when I use the storage class example that AKS provided here:

    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: azurefile
    provisioner: kubernetes.io/azure-file
    mountOptions:
      - dir_mode=0777
      - file_mode=0777
      - uid=1000
      - gid=1000
      - mfsymlinks
      - nobrl
      - cache=none
    parameters:
      skuName: Standard_LRS
    

    And it would work perfectly when you delete the property mountOptions to let the storage class dynamically define itself. The storage class here:

    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: azurefile
    provisioner: kubernetes.io/azure-file
    parameters:
      skuName: Standard_LRS
    

    You can make more tests yourself if you really want to know which mount option is the problem. In addition, the persist volume of Azure disk would work well for you. Good luck!