pythonmachine-learningdeep-learninglarge-language-model

How to correctly save a fine tuned model using apple MLX framework


We're using MLX to fine tune a model fetched from hugging face.

from transformers import AutoModel
model = AutoModel.from_pretrained('deepseek-ai/deepseek-coder-6.7b-instruct')

We fine tuned the model with command like python -m mlx_lm.lora --config lora_config.yaml and the config file looks like:

# The path to the local model directory or Hugging Face repo.
model: "deepseek-ai/deepseek-coder-6.7b-instruct"
# Save/load path for the trained adapter weights.
adapter_path: "adapters"

When the adapter files generated after fine tuning, we evaluated the model by scripts like

from mlx_lm.utils import *
model,tokenizer = load(path_or_hf_repo ="deepseek-ai/deepseek-coder-6.7b-instruct",
                      adapter_path = "adapters" # path to new trained adaptor
                      )
text = "Tell sth about New York"
response = generate(model, tokenizer, prompt=text, verbose=True, temp=0.01, max_tokens=100)

and it works as expected.

However, after we saved the model and evaluated with mlx_lm.generate, the model worked poor. (the behavior is completely different from invoking the model with generate(model, tokenizer, prompt=text, verbose=True, temp=0.01, max_tokens=100).

mlx_lm.fuse  --model "deepseek-ai/deepseek-coder-6.7b-instruct" --adapter-path "adapters" --save-path new_model
mlx_lm.generate --model new_model --prompt "Tell sth about New York" --adapter-path "adapters" --temp 0.01

Solution

  • Once you fuse the model you don't want to specify the adapter path otherwise it will try to add adapters to an already fused model (which is a bug).

    Try using:

    mlx_lm.generate --model new_model --prompt "Tell sth about New York" --temp 0.01
    

    Also fusing can cause some degradation. The adapted weights are: W = W + scale * b^T a. When you fuse b^T a into W it can be destructive if the adapter (b^T a) has very different magnitude than the base weights (W), particularly when using quantized or low precision base weights.

    Tuning the scale parameter can improve the model performance after fusion.