amazon-web-servicesserverlessserverless-frameworkaws-serverless

Unable to create EFS using Serverless deploy


When I try to deploy my application with sls deploy --stage dev, it runs for several minutes and I can see most of the resources created in my AWS account, but it finally fails with:

✖ An error occurred: CheckDashindexDashsizeLambdaFunction - Resource handler returned message: "EFS file system arn:aws:elasticfilesystem:us-east-1:<account_id>:file-system/fs-<id1> referenced by access point arn:aws:elasticfilesystem:us-east-1:<account_id>:access-point/fsap-<id2> has mount targets created in all availability zones the function will execute in, but not all are in the available life cycle state yet. Please wait for them to become available and try the request again. (Service: Lambda, Status Code: 400, Request ID: ac1b6016-fd2d-4306-a7f1-745295b7cdb6)"

The first time I ran this command, it worked fine. But then I ran a sls remove --stage dev to purge everything so I could do a clean redeploy. Now, every time I try to deploy, I get this error.

It suggests retrying, but I've re-run the deploy 10 times over the last 6 hours and it's failed every time. Is this just some issue on AWS's end or is my configuration bad?

My serverless.yml looks like:

org: ${env:ORG}
service: lucene-serverless-${env:APP_NAME}
variablesResolutionMode: 20210219

custom:
  name: ${sls:stage}-${self:service}
  region: ${opt:region, "us-east-1"}
  vpcId: ${env:LUCENE_SERVERLESS_VPC_ID}
  subnetId1: ${env:SUBNET_ID1}
  subnetId2: ${env:SUBNET_ID2}
  javaVersion: provided.al2

provider:
  name: aws
  profile: ${env:PROFILE}
  region: ${self:custom.region}
  versionFunctions: false
  apiGateway:
    shouldStartNameWithService: true
  tracing:
    lambda: false
  timeout: 15
  environment:
    stage: prod
    DISABLE_SIGNAL_HANDLERS: true
  iam:
    role:
      statements: ${file(roleStatements.yml)}
  vpc:
    securityGroupIds:
      - Ref: EfsSecurityGroup
    subnetIds:
      - ${self:custom.subnetId1}
      - ${self:custom.subnetId2}

package:
  individually: true

functions:

  index:
    name: ${self:custom.name}-index
    runtime: ${self:custom.javaVersion}
    handler: native.handler
    reservedConcurrency: 1
    memorySize: 256
    timeout: 180
    dependsOn:
      - EfsMountTarget1
      - EfsMountTarget2
      - EfsAccessPoint
    fileSystemConfig:
      localMountPath: /mnt/data
      arn:
        Fn::GetAtt: [EfsAccessPoint, Arn]
    package:
      artifact: target/function.zip
    environment:
      QUARKUS_LAMBDA_HANDLER: index
      QUARKUS_PROFILE: prod
    events:
      - sqs:
          arn:
            Fn::GetAtt: [WriteQueue, Arn]
          batchSize: 5000
          maximumBatchingWindow: 5

  enqueue-index:
    name: ${self:custom.name}-enqueue-index
    runtime: ${self:custom.javaVersion}
    handler: native.handler
    memorySize: 256
    package:
      artifact: target/function.zip
    vpc:
      securityGroupIds: []
      subnetIds: []
    events:
      - http: POST /index
    environment:
      QUARKUS_LAMBDA_HANDLER: enqueue-index
      QUARKUS_PROFILE: prod
      QUEUE_URL:
        Ref: WriteQueue


resources:
  Resources:
    WriteQueue:
      Type: AWS::SQS::Queue
      Properties:
        QueueName: ${self:custom.name}-write-queue
        VisibilityTimeout: 900
        RedrivePolicy:
          deadLetterTargetArn:
            Fn::GetAtt: [WriteDLQ, Arn]
          maxReceiveCount: 5

    WriteDLQ:
      Type: AWS::SQS::Queue
      Properties:
        QueueName: ${self:custom.name}-write-dlq
        MessageRetentionPeriod: 1209600 # 14 days in seconds

    FileSystem:
      Type: AWS::EFS::FileSystem
      Properties:
        BackupPolicy:
          Status: DISABLED
        FileSystemTags:
          - Key: Name
            Value: ${self:custom.name}-fs
        PerformanceMode: generalPurpose
        ThroughputMode: elastic # faster scale up/down
        Encrypted: true
        FileSystemPolicy:
          Version: "2012-10-17"
          Statement:
            - Effect: "Allow"
              Action:
                - "elasticfilesystem:ClientMount"
              Principal:
                AWS: "*"

    EfsSecurityGroup:
      Type: AWS::EC2::SecurityGroup
      Properties:
        VpcId: ${self:custom.vpcId}
        GroupDescription: "mnt target sg"
        SecurityGroupIngress:
          - IpProtocol: -1
            CidrIp: "0.0.0.0/0"
          - IpProtocol: -1
            CidrIpv6: "::/0"
        SecurityGroupEgress:
          - IpProtocol: -1
            CidrIp: "0.0.0.0/0"
          - IpProtocol: -1
            CidrIpv6: "::/0"

    EfsMountTarget1:
      Type: AWS::EFS::MountTarget
      Properties:
        FileSystemId: !Ref FileSystem
        SubnetId: ${self:custom.subnetId1}
        SecurityGroups:
          - Ref: EfsSecurityGroup

    EfsMountTarget2:
      Type: AWS::EFS::MountTarget
      Properties:
        FileSystemId: !Ref FileSystem
        SubnetId: ${self:custom.subnetId2}
        SecurityGroups:
          - Ref: EfsSecurityGroup

    EfsAccessPoint:
      Type: "AWS::EFS::AccessPoint"
      Properties:
        FileSystemId: !Ref FileSystem
        PosixUser:
          Uid: "1000"
          Gid: "1000"
        RootDirectory:
          CreationInfo:
            OwnerGid: "1000"
            OwnerUid: "1000"
            Permissions: "0777"
          Path: "/mnt/data"

And yes, I've made sure to define all the appropriate environnment variables.


Solution

  • What fixed it for me was a suggestion mentioned in this forum to tell serverless that the access point depends on the two mount targets using the DependsOn keyword.

    e.g.

    EfsAccessPoint:
      Type: "AWS::EFS::AccessPoint"
      Properties:
        FileSystemId: !Ref FileSystem
        PosixUser:
          Uid: "1000"
          Gid: "1000"
        RootDirectory:
          CreationInfo:
            OwnerGid: "1000"
            OwnerUid: "1000"
            Permissions: "0777"
          Path: "/mnt/data"
      DependsOn:
        - EfsMountTarget1
        - EfsMountTarget2