Manual vs Automated Accessibility Testing
An evidence-based comparison of the two main accessibility testing approaches. What automated tools actually catch, where manual review is essential, the cost model for each, and the hybrid approach used in every serious audit.
Overview
Manual or automated testing?
Automated accessibility testing catches roughly 30 to 40 percent of WCAG issues, mainly programmatic things like missing alt text, colour contrast, form labels and heading structure. Manual testing with screen readers, keyboard navigation and human review catches the rest, including all subjective and context-dependent issues like meaningful link text, logical reading order and whether content is genuinely understandable. The right answer for most teams is hybrid: automated tools in CI for fast regression checks, plus manual audits at milestones and before launch.
At-a-glance comparison
These numbers are drawn from independent research by Deque, WebAIM and TPGi over more than a decade. Coverage percentages reflect how many WCAG success criteria each approach can fully evaluate, not how many issues each finds in a particular product.
| Automated | Manual | Hybrid (recommended) | |
|---|---|---|---|
| WCAG issue coverage | ~30 to 40% | ~95%+ | ~95%+, faster to reach |
| Cost per audit | Free to low | High (expert hours) | Moderate (automation covers the cheap wins, expert time focuses on the rest) |
| Speed | Seconds per page | Hours per page | Automation in CI; manual at milestones |
| Expertise needed | Low to read results; some skill to interpret | High (WCAG, ARIA, assistive tech) | Mixed: low for CI, high for audit |
| False positives | Some (mostly resolvable with tuning) | Rare | Manageable |
| False negatives (issues missed) | Many (everything semantic) | Few | Few |
| Best tools | axe DevTools, WAVE, Lighthouse, Accessibility Insights | NVDA, VoiceOver, TalkBack, browser DevTools, keyboard-only review | All of the above, plus a documented audit methodology |
| Best for | CI regression checks, design system QA, fast page-level pre-flight | Pre-launch audits, certification, customer complaints, complex apps | Everyone serious about accessibility, especially government and enterprise |
The two approaches in detail
Automated testing
Testing ~30 to 40% coverageAutomated tools statically analyse the HTML, CSS and (sometimes) live DOM of a page and flag any rule violations they can mechanically detect. They are excellent at the programmatic checks: missing alt attributes, insufficient colour contrast, form fields without labels, heading-level skips, ARIA misuse, duplicate IDs, language declaration. They run in milliseconds, scale to thousands of pages, and integrate into CI pipelines so a regression in main is caught before merge.
Strengths
- Fast: a full-page scan in under a second
- Repeatable: same input produces same output, so regressions are obvious
- Scalable: enterprise platforms scan thousands of pages overnight
- CI-friendly: a failed accessibility check can block a pull request the same way a failed unit test does
- Cheap to start: axe DevTools, WAVE and Lighthouse are free
Limitations
- Hard 30 to 40 percent coverage ceiling. Cannot evaluate anything semantic.
- Cannot tell whether alt text is meaningful, just whether it exists
- Cannot tell whether the reading order makes sense to a screen reader user
- Cannot tell whether a custom widget actually works with assistive technology
- Can produce false positives that need triage from someone who knows WCAG
Best tools
- axe DevTools by Deque - the de facto standard, browser extension and CLI versions, used by most accessibility programs
- WAVE by WebAIM - visual overlays make findings easy to communicate to non-developers
- Lighthouse in Chrome DevTools - bundled, free, covers a subset of axe rules
- Accessibility Insights by Microsoft - free, guided manual review wrapper around axe
- Pa11y, Sa11y, IBM Equal Access - CLI and CI options for build pipelines
Manual testing
Testing ~95%+ coverageManual testing is what a trained accessibility specialist does with the actual product: keyboard navigation through every flow, screen reader testing in at least NVDA plus VoiceOver, browser zoom to 200 percent and 400 percent, reflow testing at narrow viewports, focus and tab order review, ARIA tree inspection, and judgement calls on whether content is genuinely understandable. It catches the 60 to 70 percent of WCAG issues that automation cannot see.
Strengths
- Catches every category of WCAG issue that automation cannot
- Validates real-world usability, not just rule compliance
- Produces evidence-rich findings (recordings, screenshots, AT output) that survive procurement review
- Surfaces design-level issues that need product or design intervention, not just code fixes
Limitations
- Slow: hours per template, not seconds
- Expensive: requires WCAG-fluent, AT-fluent specialists
- Hard to run on every PR; usually a milestone activity
- Depends on the auditor's skill and methodology, so consistency requires a documented process
Best practices
- Keyboard-only review first. If you cannot complete every key task without a mouse, you have not started auditing.
- Screen reader review in at least NVDA on Windows and VoiceOver on macOS or iOS. See our comparison of the four major screen readers for selection guidance.
- Browser zoom and reflow at 200 percent and 400 percent. Check that nothing is lost behind sticky bars or modal frames.
- Document everything: WCAG criterion failed, severity, location, evidence, remediation guidance. Vague findings do not get fixed.
- User testing with people with disability for any high-stakes flow. Manual expert testing is not a substitute for actual user experience.
Hybrid: the actual answer
Strategy RecommendedFor any serious accessibility program, the answer is both, deployed at different cadences. Automation runs continuously in CI to catch regressions cheaply. Manual audits run at milestones (pre-launch, major feature, quarterly governance) to catch everything automation cannot. User testing with people with disability runs occasionally on high-stakes flows. This is the model the Australian Digital Service Standard expects and what every government department procurement panel asks about.
Suggested cadence
- Per pull request in CI: axe-core or similar as a build step. Blocks merges that introduce new accessibility regressions.
- Per release: spot manual audit of any new or substantially changed flow. Two to four hours of expert time.
- Quarterly: full manual audit of top user journeys plus a representative sample of templates. One to two weeks of expert time.
- Annually: independent third-party audit for procurement, board reporting and accessibility-statement refresh.
- Project milestones: user testing with people with disability for any flow handling authentication, payments, benefits, health information or applications.
Which approach is right for you?
The right mix depends on your team size, governance maturity and risk profile.
Common questions
What percentage of WCAG issues do automated tools actually catch?
Independent research from Deque, WebAIM and others consistently puts automated coverage at around 30 to 40 percent of WCAG success criteria. Automated tools are very good at programmatic checks (missing alt attributes, contrast ratios, form labels, heading structure, ARIA validity) but cannot evaluate anything subjective: whether alt text is meaningful, whether reading order makes sense, whether a label genuinely describes what the control does, whether a custom widget behaves correctly with a screen reader. Most accessibility failures live in that 60 to 70 percent that automation cannot see.
Can automated tools find everything if they get better?
No. The hard cap on automated coverage is not a technology problem; it is a semantic one. Whether an alt text is useful in context, whether a link name is clear, whether content is genuinely understandable, whether a custom interaction works for a switch user, all require human judgement. AI-assisted tools (Evinced, Stark, axe AI) extend automation slightly into the semantic layer (suggesting alt text candidates, flagging unclear link names) but they still need human review. The 30 to 40 percent figure has been stable for over a decade and is not expected to move dramatically.
Are paid automated tools much better than free ones?
Marginally. Free tools (axe DevTools, WAVE, Lighthouse, Accessibility Insights) catch roughly the same set of issues as paid platforms (Siteimprove, Level Access, Deque). What you pay for in the paid tier is enterprise infrastructure: scheduled scans across thousands of pages, role-based dashboards, audit-trail reports, ticketing integration, dedicated support. For a single project audit, the free tools are excellent. For an enterprise governance program, the paid platforms earn their cost.
How long does a manual accessibility audit take?
For a typical web product, a manual audit covering 10 to 20 representative templates and key user flows takes one to three weeks of expert time. That includes keyboard testing, screen reader testing (NVDA plus VoiceOver minimum), zoom and reflow testing, ARIA review, and writing up findings with severity, evidence and remediation guidance. Document accessibility audits run two to three days per document on average. ExceedAbility audits combine automated scanning as a pre-pass with a structured manual review, so the manual time is spent on the issues automation cannot see.
Is user testing with people with disability required?
Not by WCAG, no. WCAG conformance can be demonstrated through a combination of automated and manual expert testing. But user testing with people with disability is the only way to validate that the product is actually usable, not just technically conformant. We strongly recommend it for any high-stakes flow (authentication, payment, applications, health information) and for any product whose primary audience includes people with disability. The Australian government Digital Service Standard explicitly requires testing with users including users with disability.
Scope a hybrid audit for your product
We combine automated scanning, structured manual review and assistive-technology testing. Severity-ranked findings, evidence, and a remediation roadmap.
Book a Discovery Call Contact Us Today