No catch at all!! There&#x27;s 0 approximations, so everything is exact! We just have a custom backprop engine, rewrite everything into OpenAI&#x27;s Triton language, and do all the differentiations and maths ourselves :)Unsloth can fit 4x longer context windows than HF + Flash Attention 2 as well with our latest long context update, so 30% less VRAM use, at the expense of slightly +1.9% overhead.

Hard to read on mobile, but if you don’t mind me asking, what’s the catch? Is there a penalty to training faster and less VRAM?

Full training hmm - for now finetuning in 16bit and 4bit are supported - if people are interested can work on it!

Any news on when Unsloth&#x27;s parallel full tuning will be available?

Hey! If you&#x27;re interested to try out finetuning Llama-3 8B (Meta&#x27;s new 15 trillion token!! model), made a Colab to finetune Llama-3 2x faster and use 60% less VRAM, and supports 4x longer contexts than HF+FA2.Also uploaded Llama-3 70b pre-quantized 4bit so you can download it 4x faster! unsloth&#x2F;llama-3-70b-bnb-4bit

I have also have a Kaggle notebook: <a href="https:&#x2F;&#x2F;www.kaggle.com&#x2F;code&#x2F;danielhanchen&#x2F;kaggle-llama-3-8b-unsloth-notebook" rel="nofollow">https:&#x2F;&#x2F;www.kaggle.com&#x2F;code&#x2F;danielhanchen&#x2F;kaggle-llama-3-8b-...</a>Kaggle provides Tesla T4s 30 hours for free per week!!

Show HN: Finetune Llama-3 2x faster in a Colab notebook