When AI can mimic your voice, writing, and face, identity…

Question

4
4

DavidBegginer

Asked: May 29, 20252025-05-29T21:36:40+00:00 2025-05-29T21:36:40+00:00In: Machine Learning

I trained my model, but it's performing too well on validation — could this be data leakage? How do I check for that?

4
4

I’m seeing 98–99% accuracy on my validation set, but when I test on truly unseen data, the performance drops significantly. Suspecting some leakage but not sure where it’s happening

You must login to add an answer.

Need An Account,

2 Answers

joseph1 · Answer 1 · 2025-05-29T21:49:27+00:00

I once trained a model that was performing way too well on the validation set — like, suspiciously good. At first, I was excited… but something felt off. Turned out, it was data leakage.
Here’s what I did to figure it out:

I rechecked my data splits and found that some similar entries had ended up in both training and validation.
I reviewed my features — one of them was indirectly revealing the target.
I even tested a basic model, and it still performed too well, which confirmed my suspicion.

Lesson learned: if your model feels like it’s “too perfect,” always check for leakage. It’ll save you a ton of headaches later. Adopt this, It may solve this problem.

Hassaan Arif · Answer 2 · 2025-05-29T22:46:55+00:00

Absolutely, I’ve been in that spot, getting 98 to 99 percent accuracy on validation and feeling confident, only to see the performance drop a lot on truly unseen data. That’s usually a sign of data leakage. What helped me was carefully checking my data splits to make sure training and validation sets didn’t overlap. I also reviewed my features to find anything that might accidentally reveal the target. Sometimes a feature acts like a shortcut without you realizing it. I looked for very high correlations between features and the label because if something is almost perfectly correlated, that’s suspicious.

Finally, I tried a simple model. If it also performed too well, it was another clue leakage was happening. Fixing these things usually made validation accuracy drop, but then the results matched real-world performance better, which is what really matters.

I'm facing overfitting issues in my deep learning model. What ...

How do you decide between using CNNs, RNNs, or Transformers ...

What are the most beginner-friendly tools/platforms to prototype a voice ...

Hassaan Arif

padhyaakshay

Lartax

Sign Up

Sign In

Forgot Password

Technomantic Latest Questions

I trained my model, but it's performing too well on validation — could this be data leakage? How do I check for that?

2 Answers