What you'll learn
- The validity problem: why most teams pick the wrong tests
- Cognitive ability tests: the highest-validity option and the legal considerations
- Work sample tests: the strongest validity with the narrowest application
- Personality assessments: what the evidence actually supports
- Skills tests: role-specific, high signal, and the easiest to design well
- EEOC compliance: building a legally defensible testing program
Most hiring teams have added some form of pre-employment testing in the last three years, but few have built a coherent testing strategy — they have bolted on a personality quiz here, a coding challenge there, and declared the process data-driven. The research tells a different story. Not all tests predict performance equally, some widely used test types have weak or contradictory validity data, and a testing program assembled without legal review can expose your organization to EEOC challenge faster than almost any other hiring practice. This guide cuts through the noise for HR Directors and TA leaders who need to know which test types actually work, which ones carry legal risk, and how to sequence them into a defensible, candidate-respecting pre-hire assessment process.
The validity problem: why most teams pick the wrong tests
Quick answer
The foundational question in pre-employment testing is validity: does a high score on this test actually predict that someone will perform well in the job? The research on this is decades deep, and the results are humbling for anyone who has invested in the wrong tools. Unstructured interviews — the most common hiring method — have a validity coefficient of roughly 0.38 in predicting job performance, according to meta-analyses by Schmidt and Hunter. A well-designed cognitive ability test reaches 0.51. Work sample tests hit 0.54. The gap between the best and worst approaches is large enough to matter at scale.
The problem is that test vendors rarely lead with validity data. They lead with candidate experience scores, completion rates, and integrations. Those are real considerations, but they are secondary to whether the test measures something that predicts job performance for the specific role you are filling. Before evaluating any platform, your first question should be: can you show me a technical manual with validity coefficients, sample sizes, and norming data for roles comparable to the ones I am hiring?
Vendors who cannot produce that data or who cite only face validity — the test 'looks like' it relates to the job — are selling you a risk. Legally defensible pre-employment testing in the US requires that you can demonstrate job-relatedness under the Uniform Guidelines on Employee Selection Procedures, a standard that has been in place since 1978 and that courts continue to enforce. If a test causes adverse impact against a protected class and you cannot demonstrate criterion validity, content validity, or construct validity, you have a problem that no vendor contract indemnification clause will fully protect you from.
Cognitive ability tests: the highest-validity option and the legal considerations
Quick answer
General cognitive ability (GCA) tests — often called general mental ability or g — consistently produce the highest validity coefficients of any pre-employment test type, typically between 0.45 and 0.54 across job families. They predict performance across a wider range of roles than any single work sample or skills test, and their predictive power increases as job complexity increases. For roles involving novel problem-solving, learning new systems, or making decisions under uncertainty, GCA tests are the most defensible single predictor you can add to a process.
The legal consideration is real and should be understood rather than used as a reason to avoid them. Well-designed cognitive tests can produce adverse impact — meaning statistically different pass rates across racial groups — which triggers EEOC scrutiny under the four-fifths rule. This does not make them illegal. It means the organization must be prepared to demonstrate criterion validity: that higher scores on the test predict higher job performance, measured against actual performance data for people in that role. Organizations that have conducted validity studies on their incumbent population and can document the relationship between test scores and supervisor ratings are on solid legal ground.
The practical guidance: use cognitive ability tests for roles where learning complexity and problem-solving are central competencies. Select a vendor whose test has demonstrated incremental validity over structured interviews, not just standalone validity. Run an adverse impact analysis on your own applicant data every 12 months. And never use GCA as the only gate — combining it with a structured interview reduces adverse impact while maintaining predictive power, a well-documented finding in the I/O psychology literature.
Cognitive ability tests (validity 0.51) and work sample tests (validity 0.54) are the two highest-predictive pre-employment test types, outperforming unstructured interviews (0.38) and standalone personality assessments (0.31 for conscientiousness).
Work sample tests: the strongest validity with the narrowest application
Quick answer
Work sample tests — asking candidates to perform a task representative of actual job duties — achieve validity coefficients of 0.54, the highest of any common test type. The reason is intuitive: a candidate who can do the work in a structured sample is more likely to be able to do it on the job than a candidate who scores well on a proxy measure. For roles with clearly defined task components, work sample tests are often the most defensible and most face-valid option you can deploy.
The constraint is design cost. A well-constructed work sample for a financial analyst role takes weeks to build: you need to identify the core tasks, develop a scoring rubric that distinguishes performance levels, calibrate rater agreement, and control for construct-irrelevant variance like access to reference materials. Off-the-shelf work sample products that present the same scenario to every applicant regardless of role often have weaker validity than their marketing suggests because they fail the job-relatedness requirement for your specific context.
Where work samples shine: software engineering (code review exercises, debugging tasks), writing-intensive roles (edit-in-brief assignments), customer-facing roles (recorded service interaction scenarios), and data analysis positions (dataset interpretation exercises). IncBot's AI-driven interview platform supports structured skill-based assessment modules that can serve as work sample proxies for technical and analytical roles, delivering consistent scoring across candidates without requiring a human evaluator for every session.
Personality assessments: what the evidence actually supports
Quick answer
Personality assessments are the most widely used and most misunderstood category in pre-employment testing. The five-factor model — commonly known as the Big Five or OCEAN — has the strongest empirical foundation of any personality framework in the hiring context. Conscientiousness, specifically, has a validity coefficient of approximately 0.31 across job types, making it the single most predictive personality trait for job performance. The research on the other four factors is more mixed and highly role-dependent.
The misuse pattern is consistent: teams deploy broad personality assessments without establishing which traits are job-relevant for which roles, use self-report instruments that are easily gamed by test-aware candidates, and apply cutoffs without validity data connecting personality scores to actual performance outcomes. Personality assessments have also generated EEOC enforcement actions when used as the primary screen in high-volume hiring, particularly when vendors cannot document score reliability and criterion validity for the specific job family.
The right approach is narrow application: use conscientiousness-focused measures for roles where self-direction, follow-through, and organized work habits are critical differentiators. For roles where interpersonal orientation matters, consider validated measures of agreeableness and emotional stability combined with structured interview questions targeting the same constructs. Never use a personality assessment as a standalone decision gate. Use it as one signal in a weighted composite, with documented rationale for why the measured traits are job-relevant for that specific role.
Skills tests: role-specific, high signal, and the easiest to design well
Quick answer
Skills testing — measuring whether a candidate can perform specific technical or functional tasks — is the most straightforward category to implement legally and practically. Because skills tests are directly tied to job requirements that you can document from a job analysis, they satisfy the content validity standard under the Uniform Guidelines without requiring a full criterion validity study. A candidate applying for a SQL analytics role who is tested on query writing is being evaluated on a direct job requirement. The content validity argument is clear and easy to defend.
The quality of skills tests varies enormously by vendor. The best platforms offer role-specific test libraries with regularly updated questions, anti-cheating controls, and norming data that tells you how a candidate's score compares to a relevant benchmark population. For technical roles, look for tests where problems require applied reasoning rather than trivia recall. A SQL question that asks a candidate to debug a query against a realistic schema tests something real. A question that asks them to recall a specific function name tests memorization.
Skills tests work best early in the funnel — before or replacing the first phone screen — for roles where a technical minimum is a genuine job requirement. Teams using IncBot for technical pre-screening configure skills assessments as part of the automated interview flow, so candidates complete role-specific exercises and structured questions in a single session. This eliminates the scheduling friction of a separate testing step and gives reviewers a combined signal profile before any human interviewer time is committed.
A weighted composite of cognitive ability, work sample, and conscientiousness measures consistently outperforms any single test in predicting job performance — and a documented composite with a formal job analysis is your strongest defense against EEOC challenge.
EEOC compliance: building a legally defensible testing program
Quick answer
The Uniform Guidelines on Employee Selection Procedures (1978) apply to any employment test or selection procedure used to make hiring decisions, and the EEOC enforces them actively. The core requirement: if a selection procedure causes adverse impact — defined as a pass rate for any protected group below 80 percent of the highest-scoring group — the employer must demonstrate validity. The four-fifths rule is the standard trigger for scrutiny, not a safe harbor.
Three validity strategies are available to employers under the Guidelines. Criterion validity: a statistical demonstration that test scores correlate with job performance measures collected from employees in the same role. Content validity: a demonstration that the test represents a representative sample of the actual job tasks, appropriate for skills tests and work samples tied to documented task analyses. Construct validity: a demonstration that the test measures a psychological construct that has been shown to predict performance in the relevant job class.
Practical steps for a compliant program: conduct a formal job analysis before deploying any test, documenting the knowledge, skills, abilities, and other characteristics required for success. Map each test in your process to specific KSAOs from the job analysis. Run adverse impact analyses on your applicant data annually, broken down by test type and role. Work with legal counsel to document your validity rationale before a test goes live. Keep test score data, adverse impact analyses, and validity documentation for a minimum of two years.
How to sequence tests in a hiring funnel
Quick answer
Sequencing matters as much as test selection. The optimal sequence runs low-cost, high-discrimination tests early and reserves resource-intensive assessments for later stages when the candidate pool is smaller. At the top of the funnel — typically attached to or replacing the initial phone screen — skills tests and cognitive measures work well because they can be automated, completed asynchronously, and scored without human intervention.
At the middle funnel, after the first live interview, work sample tests and structured situational judgment assessments add meaningful signal. A work sample at the first stage of a 200-candidate funnel creates unnecessary candidate friction. The same work sample after you have narrowed to 20 candidates adds genuine decision value. Personality assessments, if used, belong here — after you have established minimum competence and before final panel interviews.
The composite scoring approach outperforms any single test in predictive validity. A weighted composite combining a cognitive measure, a work sample, and a conscientiousness score consistently outperforms any of the three alone. Document the weights and the validity rationale before the process goes live. When IncBot is integrated into the assessment workflow, it can deliver the cognitive and skills components of this composite in a single automated session, generating a scored profile that feeds directly into the recruiter's decision at the middle-funnel stage.
Choosing and evaluating a pre-employment testing vendor
Quick answer
The vendor evaluation criteria that matter, in priority order: technical manual with validity and reliability data for roles comparable to yours; adverse impact data from real applicant populations; a clear statement of what the test does and does not measure; candidate experience quality, including mobile compatibility and accessible design; ATS integration depth; and pricing structure. Vendors who resist providing a technical manual or who claim proprietary methodology as a reason they cannot share validity data are not worth pursuing further.
Due diligence questions to ask every vendor: What is the internal consistency reliability of this test? What criterion validity studies have been conducted, and were they published in peer-reviewed journals? What is the adverse impact profile across racial groups in your test-taker population? What is the recommended score cutoff, and what is the statistical basis for that recommendation? Has the test been challenged in litigation, and what was the outcome?
For teams looking to consolidate their assessment workflow rather than managing a separate testing vendor alongside their ATS and interview scheduling tool, IncBot integrates structured skills assessment, cognitive problem-solving exercises, and behavioral interview questions into a single AI-driven interview session. This eliminates the coordination overhead of stitching together multiple point solutions and gives recruiters a unified candidate profile rather than scores arriving from different systems on different timelines.
Frequently asked questions
Common questions about candidate assessment and how InCruiter helps teams solve them.
InCruiter Editorial Team
AI Hiring Research · Interview Intelligence · Enterprise Talent Strategy
The InCruiter editorial team covers AI-driven hiring, interview intelligence, and modern talent acquisition strategy. Our guides draw on platform data from 2,000+ hiring teams, conversations with talent leaders, and published research in industrial-organizational psychology.



