pythonpytorchtorchtunellama3

FineTune llama3 model with torch tune gives error


Im trying to fine tune the llama3 model with torch tune.

these are the steps that ive already done :

1.pip install torch
2.pip install torchtune
3.tune download meta-llama/Meta-Llama-3-8B --output-dir llama3 --hf-token ***(my token)***
4.tune run lora_finetune_single_device --config llama3/8B_lora_single_device device="cpu"

and then this error happens:

INFO:torchtune.utils.logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:

batch_size: 2
checkpointer:
  _component_: torchtune.utils.FullModelMetaCheckpointer
  checkpoint_dir: /tmp/Meta-Llama-3-8B/original/
  checkpoint_files:
  - consolidated.00.pth
  model_type: LLAMA3
  output_dir: /tmp/Meta-Llama-3-8B/
  recipe_checkpoint: null
compile: false
dataset:
  _component_: torchtune.datasets.alpaca_cleaned_dataset
  train_on_input: true
device: cpu
dtype: bf16
enable_activation_checkpointing: true
epochs: 1
gradient_accumulation_steps: 64
log_every_n_steps: null
loss:
  _component_: torch.nn.CrossEntropyLoss
lr_scheduler:
  _component_: torchtune.modules.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
max_steps_per_epoch: null
metric_logger:
  _component_: torchtune.utils.metric_logging.DiskLogger
  log_dir: /tmp/lora_finetune_output
model:
  _component_: torchtune.models.llama3.lora_llama3_8b
  apply_lora_to_mlp: false
  apply_lora_to_output: false
  lora_alpha: 16
  lora_attn_modules:
  - q_proj
  - v_proj
  lora_rank: 8
optimizer:
  _component_: torch.optim.AdamW
  lr: 0.0003
  weight_decay: 0.01
output_dir: /tmp/lora_finetune_output
profiler:
  _component_: torchtune.utils.profiler
  enabled: false
resume_from_checkpoint: false
seed: null
shuffle: true
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /tmp/Meta-Llama-3-8B/original/tokenizer.model

DEBUG:torchtune.utils.logging:Setting manual seed to local seed 2762364121. Local seed is seed + rank = 2762364121 + 0
Writing logs to /tmp/lora_finetune_output/log_1717420025.txt
Traceback (most recent call last):
  File "/home/ggpt/.local/bin/tune", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/_cli/run.py", line 179, in _run_cmd
    self._run_single_device(args)
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/_cli/run.py", line 93, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "<frozen runpy>", line 286, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ggpt/.local/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 510, in <module>
    sys.exit(recipe_main())
             ^^^^^^^^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/config/_parse.py", line 50, in wrapper
    sys.exit(recipe_main(conf))
             ^^^^^^^^^^^^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 504, in recipe_main
    recipe.setup(cfg=cfg)
  File "/home/ggpt/.local/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 182, in setup
    checkpoint_dict = self.load_checkpoint(cfg_checkpointer=cfg.checkpointer)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 135, in load_checkpoint
    self._checkpointer = config.instantiate(
                         ^^^^^^^^^^^^^^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/config/_instantiate.py", line 106, in instantiate
    return _instantiate_node(config, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/config/_instantiate.py", line 31, in _instantiate_node
    return _create_component(_component_, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/config/_instantiate.py", line 20, in _create_component
    return _component_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/utils/_checkpointing/_checkpointer.py", line 517, in __init__
    self._checkpoint_path = get_path(self._checkpoint_dir, checkpoint_files[0])
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/utils/_checkpointing/_checkpointer_utils.py", line 44, in get_path
    raise ValueError(f"{input_dir} is not a valid directory.")
ValueError: /tmp/Meta-Llama-3-8B/original is not a valid directory.

should i copy the original folder from llama3 download path to /tmp folder ? its like 16g model. Can i gave the already downloaded model path to tune ?


Solution

  • Try to run it with an additional parameter checkpointer.checkpoint_dir. The value should be the path to the downloaded Llama model: Meta-Llama-3-8B\original

    More info here: Llama3 in torchtune

    tune run lora_finetune_single_device --config llama3/8B_lora_single_device \
    checkpointer.checkpoint_dir=<checkpoint_dir> \
    tokenizer.path=<checkpoint_dir>/tokenizer.model \
    checkpointer.output_dir=<checkpoint_dir>