Im trying to fine tune the llama3 model with torch tune.
these are the steps that ive already done :
1.pip install torch
2.pip install torchtune
3.tune download meta-llama/Meta-Llama-3-8B --output-dir llama3 --hf-token ***(my token)***
4.tune run lora_finetune_single_device --config llama3/8B_lora_single_device device="cpu"
and then this error happens:
INFO:torchtune.utils.logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:
batch_size: 2
checkpointer:
_component_: torchtune.utils.FullModelMetaCheckpointer
checkpoint_dir: /tmp/Meta-Llama-3-8B/original/
checkpoint_files:
- consolidated.00.pth
model_type: LLAMA3
output_dir: /tmp/Meta-Llama-3-8B/
recipe_checkpoint: null
compile: false
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
train_on_input: true
device: cpu
dtype: bf16
enable_activation_checkpointing: true
epochs: 1
gradient_accumulation_steps: 64
log_every_n_steps: null
loss:
_component_: torch.nn.CrossEntropyLoss
lr_scheduler:
_component_: torchtune.modules.get_cosine_schedule_with_warmup
num_warmup_steps: 100
max_steps_per_epoch: null
metric_logger:
_component_: torchtune.utils.metric_logging.DiskLogger
log_dir: /tmp/lora_finetune_output
model:
_component_: torchtune.models.llama3.lora_llama3_8b
apply_lora_to_mlp: false
apply_lora_to_output: false
lora_alpha: 16
lora_attn_modules:
- q_proj
- v_proj
lora_rank: 8
optimizer:
_component_: torch.optim.AdamW
lr: 0.0003
weight_decay: 0.01
output_dir: /tmp/lora_finetune_output
profiler:
_component_: torchtune.utils.profiler
enabled: false
resume_from_checkpoint: false
seed: null
shuffle: true
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
path: /tmp/Meta-Llama-3-8B/original/tokenizer.model
DEBUG:torchtune.utils.logging:Setting manual seed to local seed 2762364121. Local seed is seed + rank = 2762364121 + 0
Writing logs to /tmp/lora_finetune_output/log_1717420025.txt
Traceback (most recent call last):
File "/home/ggpt/.local/bin/tune", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/_cli/tune.py", line 49, in main
parser.run(args)
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/_cli/tune.py", line 43, in run
args.func(args)
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/_cli/run.py", line 179, in _run_cmd
self._run_single_device(args)
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/_cli/run.py", line 93, in _run_single_device
runpy.run_path(str(args.recipe), run_name="__main__")
File "<frozen runpy>", line 286, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/home/ggpt/.local/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 510, in <module>
sys.exit(recipe_main())
^^^^^^^^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/config/_parse.py", line 50, in wrapper
sys.exit(recipe_main(conf))
^^^^^^^^^^^^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 504, in recipe_main
recipe.setup(cfg=cfg)
File "/home/ggpt/.local/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 182, in setup
checkpoint_dict = self.load_checkpoint(cfg_checkpointer=cfg.checkpointer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/recipes/lora_finetune_single_device.py", line 135, in load_checkpoint
self._checkpointer = config.instantiate(
^^^^^^^^^^^^^^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/config/_instantiate.py", line 106, in instantiate
return _instantiate_node(config, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/config/_instantiate.py", line 31, in _instantiate_node
return _create_component(_component_, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/config/_instantiate.py", line 20, in _create_component
return _component_(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/utils/_checkpointing/_checkpointer.py", line 517, in __init__
self._checkpoint_path = get_path(self._checkpoint_dir, checkpoint_files[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggpt/.local/lib/python3.12/site-packages/torchtune/utils/_checkpointing/_checkpointer_utils.py", line 44, in get_path
raise ValueError(f"{input_dir} is not a valid directory.")
ValueError: /tmp/Meta-Llama-3-8B/original is not a valid directory.
should i copy the original folder from llama3 download path to /tmp folder ? its like 16g model. Can i gave the already downloaded model path to tune ?
Try to run it with an additional parameter checkpointer.checkpoint_dir
. The value should be the path to the downloaded Llama model: Meta-Llama-3-8B\original
More info here: Llama3 in torchtune
tune run lora_finetune_single_device --config llama3/8B_lora_single_device \
checkpointer.checkpoint_dir=<checkpoint_dir> \
tokenizer.path=<checkpoint_dir>/tokenizer.model \
checkpointer.output_dir=<checkpoint_dir>