
What does "I" in the section "_IQ" and "_M" mean in this name "Meta-Llama-3-8B-Instruct-IQ3_M.gguf"?

Appreciate if someone could let me know what does "I" in the section "_IQ" and "_M" mean in this name "Meta-Llama-3-8B-Instruct-IQ3_M.gguf"???

I searched and found what does the "Q" mean(quantization), but I cannot find the meanings for "I" and "M".


  • IQ quantization uses an Importance Matrix (Imatrix) to determine the importance of different model activations during the quantization process. This is an alternate quantization method to K quantization. The IQ quantization is generally a more advanced and higher-quality quantization technique than the legacy K-quant methods. Still, the optimal choice depends on the target hardware and performance requirements.

    The "M", "S", "XS" and "XXS" suffixes in IQ quantization names refer to the model size, with "M" being the largest and "XXS" being the smallest. For example, the bitness is not exactly 3, as de M uses ~3.6 bits per parameter and XXS uses ~3.2 bits.