I’ve tried lowering the learning rate but no luck. Wondering if batch size or tokenization might be causing this.
BryanJohansanBegginer
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Yeah, that kind of erratic loss can definitely be frustrating. From what you’re describing, it could be a learning rate issue — that’s often the first thing I’d look at. When the learning rate is too high, the model starts overshooting during optimization, kind of like it’s bouncing around instead of settling into a groove. Lowering it, even just a bit, can sometimes calm things down noticeably.
But it’s not always that simple. Sometimes the issue isn’t just the learning rate itself, but how it’s changing over time — especially if you’re using a transformer. Those models really like having a learning rate warmup in the beginning and a proper decay afterward. If your schedule’s too aggressive or missing altogether, it could explain the instability.
Also, not to freak you out, but sometimes the root cause is buried in something like bad input data or tiny batch sizes that make your training super noisy. Even things like not clipping gradients can silently cause chaos behind the scenes.
If you want to dig deeper, feel free to share a few details like your learning rate, optimizer, and whether you’re using any warmup. Sometimes just tweaking one thing makes a world of difference.