pythonaws-lambdascikit-learnvercel

Scikit-learn import error during Vercel deployment


I'm deploying a Flask chatbot backend to Vercel. I'm using scikit-learn (sklearn) to train my model, but it's not required during the chatbot's runtime.

During deployment, I encounter the following error:

LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html [ERROR] 
Runtime.ImportModuleError: Unable to import module 'vc__handler__python': No module named 'sklearn' Traceback (most recent call last): 

I've tried searching for solutions online but haven't found a specific fix for this scenario.

Here's a relevant code snippet from my index.py:

import joblib 

# Load preprocessed data 
words = joblib.load('words.pkl') 
classes = joblib.load('classes.pkl') 
nb_classifier = joblib.load('nb_classifier.joblib') 

My project structure looks like this:

.gitignore
README.md
requirements.txt
vercel.json
api/
    classes.pkl
    index.py 
    intents.json 
    nb_classifier.joblib 
    words.pkl 

My requirements.txt includes:

Flask==3.0.2 
Flask-Cors==4.0.0 
joblib==1.3.2 
nltk==3.8.1 
numpy==1.26.3 
wikipedia==1.4.0 

How can I resolve this sklearn import error during Vercel deployment without impacting my chatbot's functionality?


Solution

  • I inspected the nb_classifier.joblib and found out that it uses sklearn/scikit-learn module.

    So I simply replaced that file with newly created file that is trained using :

    This code:

    import random
    import json
    import numpy as np
    import nltk
    from nltk.stem import WordNetLemmatizer
    from nltk.tokenize import word_tokenize
    from nltk.classify import NaiveBayesClassifier
    import joblib
    
    lemmatizer = WordNetLemmatizer()
    
    # Load intents data
    intents = json.loads(open('intents.json').read())
    
    words = []
    classes = []
    documents = []
    ignore_letters = ['?', '!', '.', ',']
    
    # Extract words and classes from intents
    for intent in intents['intents']:
        for pattern in intent['patterns']:
            word_list = word_tokenize(pattern)
            words.extend(word_list)
            documents.append((word_list, intent['tag']))
    
            if intent['tag'] not in classes:
                classes.append(intent['tag'])
    
    # Lemmatize words and remove ignored characters
    words = [lemmatizer.lemmatize(word.lower()) for word in words if word not in ignore_letters]
    words = sorted(set(words))
    classes = sorted(set(classes))
    
    # Define a function to extract features
    def extract_features(document):
        document_words = set(document)
        features = {}
        for word in words:
            features[word] = (word in document_words)
        return features
    
    # Prepare training data
    training_set = [(extract_features(doc), tag) for doc, tag in documents]
    
    # Train a Naive Bayes classifier
    nb_classifier = NaiveBayesClassifier.train(training_set)
    
    # Save the model and associated files
    joblib.dump(words, 'words2.pkl')
    joblib.dump(classes, 'classes2.pkl')
    joblib.dump(nb_classifier, 'nb_classifier2.joblib')
    
    print('Done')
    

    So it doesn't require sklearn module.