Explore Courses
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Birla Institute of Management Technology Birla Institute of Management Technology Post Graduate Diploma in Management (BIMTECH)
  • 24 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Popular
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science & AI (Executive)
  • 12 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
University of MarylandIIIT BangalorePost Graduate Certificate in Data Science & AI (Executive)
  • 8-8.5 Months
upGradupGradData Science Bootcamp with AI
  • 6 months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
OP Jindal Global UniversityOP Jindal Global UniversityMaster of Design in User Experience Design
  • 12 Months
Popular
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Rushford, GenevaRushford Business SchoolDBA Doctorate in Technology (Computer Science)
  • 36 Months
IIIT BangaloreIIIT BangaloreCloud Computing and DevOps Program (Executive)
  • 8 Months
New
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Popular
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
Golden Gate University Golden Gate University Doctor of Business Administration in Digital Leadership
  • 36 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
Popular
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
Bestseller
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
IIIT BangaloreIIIT BangalorePost Graduate Certificate in Machine Learning & Deep Learning (Executive)
  • 8 Months
Bestseller
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in AI and Emerging Technologies (Blended Learning Program)
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
ESGCI, ParisESGCI, ParisDoctorate of Business Administration (DBA) from ESGCI, Paris
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration From Golden Gate University, San Francisco
  • 36 Months
Rushford Business SchoolRushford Business SchoolDoctor of Business Administration from Rushford Business School, Switzerland)
  • 36 Months
Edgewood CollegeEdgewood CollegeDoctorate of Business Administration from Edgewood College
  • 24 Months
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with Concentration in Generative AI
  • 36 Months
Golden Gate University Golden Gate University DBA in Digital Leadership from Golden Gate University, San Francisco
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Deakin Business School and Institute of Management Technology, GhaziabadDeakin Business School and IMT, GhaziabadMBA (Master of Business Administration)
  • 12 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science (Executive)
  • 12 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityO.P.Jindal Global University
  • 12 Months
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (AI/ML)
  • 36 Months
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDBA Specialisation in AI & ML
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
New
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGrad KnowledgeHutupGrad KnowledgeHutAzure Administrator Certification (AZ-104)
  • 24 Hours
KnowledgeHut upGradKnowledgeHut upGradAWS Cloud Practioner Essentials Certification
  • 1 Week
KnowledgeHut upGradKnowledgeHut upGradAzure Data Engineering Training (DP-203)
  • 1 Week
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
Loyola Institute of Business Administration (LIBA)Loyola Institute of Business Administration (LIBA)Executive PG Programme in Human Resource Management
  • 11 Months
Popular
Goa Institute of ManagementGoa Institute of ManagementExecutive PG Program in Healthcare Management
  • 11 Months
IMT GhaziabadIMT GhaziabadAdvanced General Management Program
  • 11 Months
Golden Gate UniversityGolden Gate UniversityProfessional Certificate in Global Business Management
  • 6-8 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
IU, GermanyIU, GermanyMaster of Business Administration (90 ECTS)
  • 18 Months
Bestseller
IU, GermanyIU, GermanyMaster in International Management (120 ECTS)
  • 24 Months
Popular
IU, GermanyIU, GermanyB.Sc. Computer Science (180 ECTS)
  • 36 Months
Clark UniversityClark UniversityMaster of Business Administration
  • 23 Months
New
Golden Gate UniversityGolden Gate UniversityMaster of Business Administration
  • 20 Months
Clark University, USClark University, USMS in Project Management
  • 20 Months
New
Edgewood CollegeEdgewood CollegeMaster of Business Administration
  • 23 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
KnowledgeHut upGradKnowledgeHut upGradBackend Development Bootcamp
  • Self-Paced
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 5 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
upGradupGradDigital Marketing Accelerator Program
  • 05 Months

What is P-Hacking & How To Avoid It in 2024?

Updated on 28 August, 2023

9.09K+ views
9 min read

Statistical Analysis is an essential part of Data Science and analysis. One of the most important concepts in statistics is Hypothesis Testing and P-Values. Interpreting P-Value can be tricky and you might be doing it wrong. Beware of P-Hacking!

By the end of this tutorial you will have the knowledge of below:

  • P-Values
  • How to reject/accept hypothesis
  • What is P-Hacking and how to avoid it
  • What is Statistical Power

Learn data science courses online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career

Let’s dive right in!

What are P-Values?

P-values evaluate how well the sample data supports that the null hypothesis is true. It measures how correct your sample data are with the null hypothesis. 

While performing Statistical tests, a threshold value or the alpha needs to be set prior to starting the test. A common value for it is 0.05, which can be thought of as a probability. P-values are defined as the probability of getting the outcome as rare as that alpha or even rarer. 

Therefore, if we get our P-value less than that alpha, that would mean that our statistical test didn’t occur by chance and it was indeed significant. So, if our P-Value comes, say, 0.04, we say we reject the Null Hypothesis.

A low P value suggests that your sample provides enough evidence that you can reject the null hypothesis for the entire population. If you got a P-Value of anything less than 0.05 in our case, then you can safely say that the null hypothesis can be rejected. In other words, the sample you took from the population didn’t occur by pure chance and the experiment indeed had a significant effect.

So what can go wrong?

As we say that getting any P-value of less than alpha gives us the liberty to safely reject the Null Hypothesis, we might be making a mistake if our experiment itself is not showing the right picture! In other words, it might be a false positive. 

Best Practices to Avoid P Hacking

As we explore the intricacies of p-hacking techniques, a growing realization emerges about the ease with which one can inadvertently or deliberately stray into these practices. This highlights the crucial significance of receiving proper statistical training and maintaining an unyielding dedication to upholding scientific integrity. The primary goal should be to present the data, avoiding any inclination to shape it according to our preferences.

P-hacking possesses the potential to undermine the very core of scientific research silently. However, there is no need to worry. By adhering to certain best practices, one can ensure they stay on the correct path: 

Develop a Clear Research Plan

Before conducting any research, develop a comprehensive and well-structured plan encompassing your hypotheses, data collection strategies, and analysis procedures. This meticulous roadmap safeguards against the tempting path of p-tracking, where one may resort to trial-and-error techniques by manipulating variables and experimenting with different data analyses until significant results are obtained. By adhering to a predetermined plan, you can uphold the integrity of your research and avoid any unintentional bias or manipulation that could compromise the validity of your findings. 

Pre-Register Your Studies

Before initiating the study, make your research strategy known to the general audience. By taking action, you considerably reduce the temptation to deviate from your original goal in light of preliminary results. This open approach also conveys to other researchers that your work may be regarded more seriously since it shows your dedication to impartial and unbiased study. Use systems like upGrad to document and publish your research strategy to pre-register your investigations, assuring more responsibility and legitimacy in the scientific community. 

Transparent Reporting

Embrace honesty as your most helpful ally in research by keeping track of all your efforts, including the unsuccessful ones. This dedication to openness necessitates the establishment of comparison groups in advance and delivering a thorough report containing all relevant variables, circumstances, data exclusions, tests, and measurements. By doing this, you can ensure that your study is transparent and that your findings are trustworthy, helping you build confidence in the scientific community.

Education and Training

The popularity of “p-hacked” research frequently results from ignorance of the dangers rather than deliberate bad intentions. It is essential to understand statistical concepts and be conscious of the risks associated with p-hacking to protect against such practices. Every researcher’s toolset should include continuous learning since it improves their capacity to conduct solid research. Understanding statistics is essential to achieving this goal.

Understanding that any choice made during statistical analysis might impact the outcomes is critical. P-hacking may not necessarily be an intentional act of dishonesty, but it typically results from a lack of statistical knowledge.

We can ensure the reliability of our research and the validity of our conclusions by following these recommended practices. Avoiding p-hacking is essential for maintaining the integrity of the overall scientific method and obtaining reliable results. Adopting these principles strengthens research’s position as a reliable source of information and insight and helps keep research authentic.

What is P-Hacking?

You must be wondering what is p-hacking? We say that we P-Hacked when we incorrectly exploit the statistical analysis and falsely conclude that we can reject the null hypothesis. Let’s understand this in detail.

# Hack 1

Consider we have 5 types of CoronaVirus candidate Vaccines with us for which we need to check which one has actual impact on recovery time of patients. So let’s say we do Hypothesis Tests for all 5 types of vaccines one by one. We set the alpha as 0.05. And hence if P-Value for any vaccine comes less than that, we say we can reject the Null Hypothesis.. Or can we?

Example 1

Say, Vaccine A gives a P-Value of 0.2, Vaccine B gives 0.058, Vaccine C gives 0.4, Vaccine D gives 0.02, Vaccine E gives 0.07.

Now, by above results, a naive way to deduce will be that Vaccine D is the one which significantly reduces recovery time and can be used as the CoronaVirus Vaccine. But can we really say that just yet? No. If we do, we might be P value Hacking. As this can be a false positive.

Example 2

Okay, let’s take it another way. Consider we have a Vaccine X and we surely know that this Vaccine is useless and has no effect on recovery time. Still we carry out 10 hypothesis tests by different random samples each time with P-Value of 0.05. Say we get the following P-values in our 10 tests: 0.8, 0.7, 0.78, 0.65, 0.03, 0.1, 0.4, 0.09, 0.6, 0.75. Now if we had to consider the above tests, the test with a surprisingly low P-Value of 0.03 would have made us reject the Null Hypothesis, but in reality it was not. 

So what do we see from the above examples? In essence, when we say that alpha = 0.05 we set a confidence interval of 95%. And that means that 5% of the tests will still result in errors as above. 

Multiple Testing Problem

One way to tackle this would be to increase the number of tests. So more the tests, more easily you can say that the maximum number of tests are resulting in rejection of Null. But also, more tests will mean that there will be more false positives(5% of total tests in our case). 5 out of 100, 50 out of 1000 or 500 out of 10,000! This is also called the Multiple Testing Problem.

False Discovery Rate

One of the ways to tackle above problems is to adjust all the P-Value by using a mechanism called False Discovery Rate (FDR). FDR is a mathematical adjustment of the P-Values which increases them by some values and in the end, the P-Values which incorrectly came lower, might get adjusted to values higher than 0.05.

Learn: 8 Important Skills for Data Scientists

# Hack 2

Now consider a case from example where Vaccine B gave a P-value of 0.058. Wouldn’t you be tempting to add some more data and retest to see if P-Value decreases? Say, you add a few more data points, and the P-value for Vaccine B came to be 0.048. Is this legit? No, you’d again be P value hacking. We cannot change or add data to suit our tests later and the exact sample size needs to be decided prior to performing the tests by doing Power Analysis.

Power Analysis tells us the right sample size we need to have the maximum chances of correctly rejecting the null hypothesis and not getting fooled.

upGrad’s Exclusive Data Science Webinar for you –

ODE Thought Leadership Presentation

# Hack 3

One more mistake you shouldn’t do is to change the alpha after you perform the experiments. So once you see a P-Value of 0.058, you think what if my alpha was 0.06?

But you cannot change it once your experiment starts. 

Impact Of P-Hacking in Data Science and Machine Learning Projects

P-hacking statistics harms research studies, frequently without the examiner’s knowledge. Data dredging may have several well-known negative impacts in the fields of data science and machine learning models, including: 

  • The generation of false positives, which compromises the accuracy of the findings.
  • Deception of other examiners and falsification of research findings.
  • An increase in the analysis’s biases.
  • Significant resource waste, notably in the area of labour.
  • Improper model training, which reduces accuracy and validity.
  • Requiring researchers to withdraw their findings from publications.
  • A reduction in financing for additional research projects.

Must ReadHow to Become a Data Scientist?

Before you go

Hypothesis Testing and P-Values is a tricky subject and needs to be carefully understood before having any deductions. Statistical Power and Power Analysis are an important part of this which need to be kept in mind before starting the tests. 

If you are curious to learn about data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

Frequently Asked Questions (FAQs)

1. What do you understand by P-Hacking?

P-Hacking statistics or Data dredging is a method to misuse the data analysis techniques to find patterns in data that appear significant but are not. This method affects the study negatively as it gives false promises to provide significant data patterns which in turn can lead to a drastic increase in the number of false positives.
P-hacking can not be prevented completely but there are some methods that can surely reduce it and help avoid the trap.

2. What should I keep in mind to avoid p-hacking?

You can use some safe practices to minimise the instances of p-hacking. You can first make a detailed plan of the tests to carry out and then register it on a registry online. You must ensure that you allow the complete test to get executed first and not interrupt in between even if the required p-value is attained.
Apart from these measures, you can also ensure to start with a high-quality data set to avoid chances of error. All these safety measures will definitely help you to avoid data dredging to a great extent.

3. What is False Discovery Rate?

This is one of the most advanced approaches to solve the problems regarding p-hacking. This method allows you to adjust the p-values for each test. Unlike other methods, it does not reduce the false-positive results, instead, it discovers them. This makes it more significant than other methods like Bonferroni correction and more accurate in finding significant results.
These adjusted p-values are also known as q-values. There are other versions of this FDR approach like the optimised FDR approach.