javascriptnode.jsfirebasegoogle-cloud-firestoregoogle-cloud-functions

updating firestore document multiple times from event trigger function overwrites or stops updating after few triggers


I am trying to update the doc when a group of images are resized just to show the progress and keep track if any images could not transformed.

we have few images in dir where dir is doc id. using the doc id we have to update that doc. we have to set the status field, status contains objects for transformed or failed to trnasform info. {image:path, prcoessed:string}

it would look like { ... status : [ filename1:{...}, filename2:{...} ] }

But when i upload 5 to 10 images it only set first 4-5 image info in the status field. it pauses (or looks like paused or overwritten few info) before writing last image info to status.

Although i tried simply replacing old data with new one in the status means only single object would be here and get changed accordngly. and this works most of time. i saw rare case when this also fails but most of time it works.

Also tried to have array of objects but that also misses few entry.

async function updateDoc({filePath, metadata}) {
    const [docId, fileName] = filePath.split('/');
    
    const db = admin.firestore();
    const collectionRef = db.collection(COLLECTION_PACKS).doc(docId);
    const stickersRef = collectionRef.collection(COLLECTION_STICKERS);
  
    let attempts = 0;
    const maxAttempts = 5;

    while (attempts < maxAttempts) {
        try {
            await db.runTransaction(async (transaction) => {

                const updatedStatus = {};
                updatedStatus[fileName] = metadata;

                // Update the document with the new status and progress
                transaction.update(collectionRef, {
                    status: updatedStatus,
                });
            });
            console.log("Updated document for", filePath);
            break; // Exit the loop if successful
        } catch (error) {
            attempts++;
            console.error(`Error updating document (attempt ${attempts}):`, error);
            if (attempts >= maxAttempts) {
                throw error; // Re-throw the error after max attempts
            }
            await new Promise(resolve => setTimeout(resolve, 1000 * attempts)); // Exponential backoff
        }
    }
}

I also tried this simple snippet which adds new object into the array of status but this also failed to keep the data integrated.


      const db = admin.firestore();
      const docRef = db.collection(COLLECTION_PACKS).doc(docId);

      await db.runTransaction(async (transaction) => {
        const docSnapshot = await transaction.get(docRef);
        if (docSnapshot.exists) {

          docRef.update({
            status: FieldValue.arrayUnion(metadata),
          });

        }
      });

Whether it's a nested objects or an array of objects, either one is acceptable.

EDIT: I've Observed how the data get overwritten few of objects which get inserted early and quickly get replaced by new objects later on. for few seconds it looks like

status : [
   object1, object2
]

then it becomes

status : [
  object3, object4, object5
]

the object which get inserted quickly on early triggers get replaced later and it seems that new triggered event are not aware of these inserted data. is it race issue.

Additional code detail: this is how event trigger function invoke method updateDoc()

exports.resize = onObjectFinalized({ 
    bucket:'sticker-app-2ad48', 
    region:'us-central1', 
    timeoutSeconds: 450, 
    memory:'1GiB',
    cpu:2 
  
  },
    transform
)

async function transform(event){
    ...
    let  metadatainfo = await bucket.file(filePath).getMetadata()[0];
    
    // Early return; 
    if(metadatainfo?.metadata?.process === "done"){
        console.log("previously done processing  | ",fileName)
        return; 
    }

    try {
        if(valid){
            console.log("image is valid ✅ |",filePath)
            await updateMeta({filePath, valid:true, process:"done"});
            return;
        }

        ...  // invoke image processing with sharpjs
        
        if(buffer?.length > 0){
            // save
            log("✔️ transformed ", fileName)
            await bucket.file(`${filePath}`).save(buffer);
            await updateMeta({filePath, valid:true, process:"done"});
            
        }else{
            // leave image data as it is. set metadata
            log("❌ failed to transform  ", fileName)
            await updateMeta({filePath, valid:false, process:"done"});
        }

        buffer = null;
    } catch (e) {
        console.error('Error_> ', e)
        await updateMeta({filePath, valid:false, process:"done", error: e.message});
    } 
}

async function updateMeta({filePath, process, valid, error}){
    // console.log("updateMeta ",filePath)
    let metadata = { process, valid }
    if (error) { metadata.error = error; }
    await bucket.file(filePath).setMetadata({
        metadata: metadata
    })

    await updateDoc({ filePath, metadata})
}

Edit: adding client side code which may be the cause of the problem. Client Side code:

const promises = selectedFiles.map(..) => fetch(`${baseUrl}/upload`, ...) )

// Uploads images which triggers the onObjectFinalized 
// to prcoess and add the info to the doc
Promise.all(promises)
.then((responses) => {
  const jsonPromises = responses.map((response) => response.json());
  return Promise.all(jsonPromises);
})
.then(uploadedImages=>{
  // structure the stickers objects with download url
  ...  
})
.then(stickerObjects=>{
  ... 
  // Populate subcollection "stickers"
  return fetch(`${baseUrl}/addStickerPack/${docId}`, { 
    method: 'PUT', 
    headers: {
      'Content-Type': 'application/json'
    },
    body: body 
  })

})
.then(result=>{
  console.log("Populated subcollection 'stickers' of the Doc ", {docId})
  ......
})
.catch(...)
.finally(...)

the last 'then' block which prints 'populated subcollection...' helped me to find the root cause. When this line prints the status field of doc set to undefined which wipes the status field and after this log whatever is added to status stays integrated without any issue. removing the fetch method prevents any modification to subcollection of doc fixes the issue of data lose. Dont know why this happening.

here is the backend side code which creates and updates the sub- collection.

app.put('/addStickerPack/:stickerPackId', async (req, res) => {
  try {
    const stickerPackDocId = req.params.stickerPackId;
    const {
      ...
      name,
      publisher,
      stickers
    } = req.body;
    
    const stickerPackData = {
      name,
      publisher,
      ...
      
    };


    const stickerPackRef = admin.firestore().collection(COLLECTION_PACKS).doc(stickerPackDocId);
    await stickerPackRef.set(stickerPackData);
    

      // Create sticker documents within the pack
    const stickerPromises = stickers.map(async (sticker) => {
      const stickerRef = stickerPackRef.collection(COLLECTION_STICKERS).doc();
      const stickerData = {
        name: sticker.name, // Replace with actual image storage logic
        image_file: sticker.image_file, // Replace with actual image storage logic
        emojis: sticker.emojis,
        filePath: sticker.filePath,
        valid:false,
      };
      console.log(sticker.image_file)
      return stickerRef.set(stickerData);
    });
      
    // Wait for all sticker writes to complete
    await Promise.all(stickerPromises);
    console.log('Sticker pack added successfully:', stickerPackData);
    const stickerPackId = stickerPackRef.id;
    res.status(201).send({message:'Sticker pack added successfully', id:stickerPackId});
  } catch (error) {
    console.error('Error adding sticker pack:', error);
    res.status(500).send('Internal Server Error');
  }
});

Solution

  • Your use of db.runTransaction() is not needed because you're not getting any older value to be used in the update, so it's just adding more unnecessary processing.

    If your data structure is actually { ... status : { filename1:{...}, filename2:{...} } } (notice I replaced your squared brackets by curly brackets, I believe you meant an object not an array), then you could simply update the nested path without using a transaction. No need for retries either:

    // this is a docRef, not a colRef as in your example
    const docRef = db.collection(COLLECTION_PACKS).doc(docId);
    const update = {
      [`status.${fileName}`]: metadata,
    };
    await docRef.update(update);
    

    ===== EDIT:

    Just to explain further, when you say:

    But when i upload 5 to 10 images it only set first 4-5 image info in the status field. it pauses before writting last image info to status.

    I don't think it pauses. I believe you have a race condition. Note you're overwriting the whole status object on each file. Which means if files trigger in this order 1-2-3-4 but they finish processing 1-2-4-3, you'll end up with the information you had on your 3rd file, instead of the last one. That is not consistent with the code you show, but according to what you say, that's a possibility if you intended to get the whole status object before the update.

    ===== EDIT 2:

    Regarding the latest code you added, which is used on the client side: this code is overwriting the whole document, which causes the status property to get undefined. That's why you add 1,2,clear-from-client,3,4,5, and you end up with 3,4,5 only.

    Change the line which sets the doc, which is clearing the rest of the document. Instead of:

    await stickerPackRef.set(stickerPackData);
    

    rather use

    await stickerPackRef.set(stickerPackData, {merge:true});
    OR
    await stickerPackRef.update(stickerPackData);