Sign Up

Have an account? Sign In Now

Sign In

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

Sorry, you do not have permission to ask a question, You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Technomantic Logo Technomantic Logo
Sign InSign Up

Technomantic

Search
Ask A Question

Mobile menu

Close
Ask A Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
Home/ Questions/Q 21626
Next
In Process

Technomantic Latest Questions

David
  • 4
  • 4
DavidBegginer
Asked: May 29, 20252025-05-29T21:36:40+00:00 2025-05-29T21:36:40+00:00In: Machine Learning

I trained my model, but it's performing too well on validation — could this be data leakage? How do I check for that?

  • 4
  • 4

I’m seeing 98–99% accuracy on my validation set, but when I test on truly unseen data, the performance drops significantly. Suspecting some leakage but not sure where it’s happening

ml
2
  • 2 2 Answers
  • 108 Views
  • 0 Followers
  • 0
    • Report
  • Share
    Share
    • Share on Facebook
    • Share on Twitter
    • Share on LinkedIn
    • Share on WhatsApp

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

2 Answers

  • Voted
  • Oldest
  • Recent
  • Random
  1. joseph1
    joseph1 Begginer
    2025-05-29T21:49:27+00:00Added an answer on May 29, 2025 at 9:49 pm

    I once trained a model that was performing way too well on the validation set — like, suspiciously good. At first, I was excited… but something felt off. Turned out, it was data leakage.
    Here’s what I did to figure it out:

    • I rechecked my data splits and found that some similar entries had ended up in both training and validation.
    • I reviewed my features — one of them was indirectly revealing the target.
    • I even tested a basic model, and it still performed too well, which confirmed my suspicion.

    Lesson learned: if your model feels like it’s “too perfect,” always check for leakage. It’ll save you a ton of headaches later. Adopt this, It may solve this problem.

      • 2
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report
  2. Hassaan Arif
    Hassaan Arif Enlightened
    2025-05-29T22:46:55+00:00Added an answer on May 29, 2025 at 10:46 pm

    Absolutely, I’ve been in that spot, getting 98 to 99 percent accuracy on validation and feeling confident, only to see the performance drop a lot on truly unseen data. That’s usually a sign of data leakage. What helped me was carefully checking my data splits to make sure training and validation sets didn’t overlap. I also reviewed my features to find anything that might accidentally reveal the target. Sometimes a feature acts like a shortcut without you realizing it. I looked for very high correlations between features and the label because if something is almost perfectly correlated, that’s suspicious.

    Finally, I tried a simple model. If it also performed too well, it was another clue leakage was happening. Fixing these things usually made validation accuracy drop, but then the results matched real-world performance better, which is what really matters.

      • 1
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 46
  • Answers 54
  • Best Answers 22
  • Users 84
  • Popular
  • Answers
  • Rety1

    How do you decide between using CNNs, RNNs, or Transformers ...

    • 4 Answers
  • Rety1

    I'm facing overfitting issues in my deep learning model. What ...

    • 4 Answers
  • Jiyakhan

    What are the most beginner-friendly tools/platforms to prototype a voice ...

    • 3 Answers
  • y2mate20201
    y2mate20201 added an answer When AI can mimic your voice, writing, and face, identity… June 25, 2025 at 3:16 pm
  • Hassaan Arif
    Hassaan Arif added an answer AI can inform emotional decision-making, but it should never replace… June 10, 2025 at 10:07 pm
  • Hassaan Arif
    Hassaan Arif added an answer Human-centered AI” is not just a tech buzzword. It’s about… June 10, 2025 at 10:06 pm

Related Questions

  • I'm dealing with an imbalanced dataset. What methods have you ...

    • 2 Answers
  • Please tell me your approach for feature selection in your ...

    • 2 Answers

Top Members

Hassaan Arif

Hassaan Arif

  • 0 Questions
  • 5k Points
Enlightened
Lartax

Lartax

  • 3 Questions
  • 40 Points
Begginer
morila

morila

  • 2 Questions
  • 40 Points
Begginer

Trending Tags

ai ai art ai tools animation chatbot chatgpt content copywriting deep learning gpt image generation long form ml nlp productivity prompting structured content task management visual design writing assistant

Explore

  • Home
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Help

Footer

Technomantic

Technomantic the AI Platform for asking questions, solve AI problems, and connect on machine learning, ChatGPT, NLP, and prompt engineering topics.

About Us

  • About Us
  • Contact Us
  • Contribute

Legal Stuff

  • Disclaimer
  • Privacy Policy
  • Terms and Conditions
  • Community Guidelines / Forum Rules

Help

  • Contact Us

© 2025 Technomantic. All Rights Reserved