tokenlarge-language-modelmistral-7b

Mistral7B Instruct input size limited


recently i finetuned a Mistral 7B Instruct v0.3 model and deployed it on an AWS Sagemaker endpoint. But got errors like this:

" Received client error (422) from primary with message "{"error":"Input validation error: inputs tokens + max_new_tokens must be <= 4096. Given: 877 inputs tokens and 4096 max_new_tokens","error_type":"validation"}"."

Which means I am limited to 4096 Tokens. But max. tokens should be the following: Mistral 7B Instruct v0.1 = 8192 Mistral 7B Instruct v0.2,v0.3 = 32k

I also hosted the basemodels from huggingface on sagemaker endpoints and they all seem to be limited to 4096 tokens.

Does anyone know how to fix this?


Solution

  • Okay, I figured it out.

    First, I tested all model and fine-tuning parameters with 4096 as the value, which were quite a few since everything is a multiple of 512. This didn’t do anything, so it was a bust. After figuring out that this mostly means the error lies with the deployment container, I at least had a hint. After lengthy Googling, it turned into a jackpot :)

    So, for anyone with similar problems, here is how you do it: Instead of using the deployment functions as listed on the Huggingface page of the Mistral-7B-Instruct model, I used the functions as written here: https://github.com/aws-samples/Mistral-7B-Instruct-fine-tune-and-deploy-on-SageMaker/blob/main/Deploy_Mistral_7B_on_Amazon_SageMaker_with_vLLM.ipynb

    Basically:

    1. Download your model.tar.gz (skip to step 3 if already unpacked).
    2. Unpack it.
    3. Generate a serving.properties file as described in the link above.
    4. Put it into the folder with the rest of the model files.
    5. Repack all files again into a model.tar.gz and upload it to your S3 bucket.
    6. Deploy the endpoint via the functions used in the link above.

    Alternatively, I also found a link (https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi-custom/example_usage.ipynb) describing how to modify the Huggingface environment, which also probably does the trick, but I didn't get the container to run yet. But I got one solution to work, so... meh~ ¯_(ツ)_/¯