Back to Blog
Blog

What the CBSE On-Screen Marking Failure Tells Us About Digital Exam Evaluation

R
Rishabh Galkar
June 1, 2026

Something happened in May 2026 that every exam administrator in India should study carefully.

The Central Board of Secondary Education rolled out On-Screen Marking for Class 12 board examinations at national scale — covering nearly one crore answer books and approximately 40 crore scanned pages. The idea was sound. The ambition was understandable. The execution collapsed in a way that affected over 17 lakh student households and pushed the national pass percentage to its lowest point in seven years.

This article is not about blame. It is about understanding what went wrong technically and what it tells us about how digital exam evaluation systems should — and should not — be deployed.

What On-Screen Marking Was Supposed to Deliver

The premise of On-Screen Marking is genuinely compelling.

Answer scripts are scanned, uploaded to a secure portal, and evaluated on screen by examiners who can log in from anywhere. Marks are auto-tabulated. Totalling errors — a chronic problem in physical evaluation — are eliminated. Scanned copies can be made available to students who want to verify their evaluation. The system promises speed, accuracy, and transparency.

These are not fictional benefits. On-Screen Marking has been deployed successfully in other contexts, including international board examinations. The technology, in principle, works.

The question is not whether OSM is a valid idea. The question is whether this particular rollout gave it a fair chance to succeed.

Where It Went Wrong

No pilot before full-scale deployment

The single most consequential decision in this rollout was to go from zero to national scale in one step.

Nearly one crore answer books. Forty crore scanned pages. One deployment cycle. No phased rollout. No small-scale pilot to surface edge cases. No real-world stress test of the scanning pipeline or the portal before committing the Class 12 results of 17 lakh students to the system.

In technology deployment, this is the highest-risk approach possible. Systems that behave perfectly in controlled conditions — in testing, in mock sessions, in vendor demonstrations — routinely surface unexpected failure modes at scale. This is not speculation. It is one of the most well-documented patterns in software engineering. The only way to discover what breaks at scale is to run something at scale — ideally on something low-stakes first.

When the stakes are Class 12 board examinations and there is no fallback, every failure mode becomes a crisis.

Scanning quality was not validated before evaluation began

At the foundation of any On-Screen Marking system is a simple requirement: the scanned image must be legible. Everything downstream — evaluation, marks, results — depends entirely on an examiner being able to read what a student wrote.

In this rollout, that foundation broke. Students who later accessed their scanned copies found pages that were blurred beyond legibility. In some cases, answer sheets belonging to different students were tagged to the wrong roll numbers — a different student's handwriting, a different student's answers, appearing under someone else's name in the system.

Most alarmingly, some of these illegible or mismatched copies were evaluated and marked before the errors were caught. Examiners assigned scores to content they could not have reliably read. The auto-tabulation system then faithfully calculated totals based on those scores.

A scanning validation step — a systematic check that uploaded images are legible and correctly tagged before they enter the evaluation queue — would have caught this. It was either absent or insufficient at the scale of this deployment.

The safety net was removed before the system was proven

In traditional evaluation, post-result verification of marks acts as a safety net. Students can flag discrepancies. Errors can be caught and corrected. It is an imperfect safety net, but it is one.

For this rollout, the Board decided that the digital system's accuracy made post-result verification unnecessary, and abolished it before the system had been tested at national scale. The rationale — that digital auto-tabulation eliminates totalling errors — is technically correct. It does not account for errors upstream: scanning failures, mismatched files, marks assigned to illegible content.

By removing the safety net before proving the system could operate without one, there was no room for correction when the failures arrived.

Warning signs during mock evaluations were not acted upon

Weeks before the actual evaluation began, teachers participating in mandatory mock sessions reported portal access failures, slow system performance, and data errors. These were not minor complaints. They were early signals of the exact problems that later caused the rollout to unravel.

The appropriate response to those signals was to investigate, fix, and re-test before proceeding.

This pattern — dismissing pre-deployment warning signals as temporary or manageable — is one of the most common precursors to large-scale system failures. The warning signals in a mock session are the cheapest possible version of a failure. They should be taken seriously precisely because the alternative is a failure at full scale, with real students and real consequences.

What Responsible Deployment Looks Like

The CBSE OSM rollout did not fail because On-Screen Marking is a bad idea. It failed because it was deployed in a way that gave the system no chance to surface and fix its own problems before they affected real students.

The principles that would have prevented this are not complicated. They are the standard practices of any well-run technology deployment in a high-stakes environment.

Pilot before scaling. Run the system on one subject, one exam batch, one set of booklets. Compare the output against manual evaluation. Find the gaps. Fix them. Then expand.

Validate inputs before processing. Before an answer script enters the evaluation queue, confirm the scan is legible, the file is correctly tagged, and the pages are complete. This step costs time upfront. It prevents catastrophic errors downstream.

Keep humans in the oversight loop. Automation should reduce the burden on human reviewers — not eliminate them. A system that flags uncertain inputs for human review is fundamentally more resilient than one that processes everything automatically and only surfaces errors after results are published.

Take pre-deployment warnings seriously. If evaluators report problems during mock sessions, that is not a perception problem to be managed. It is diagnostic data about how the system behaves under realistic conditions.

Do not remove safety nets before the system earns them. Verification and re-evaluation mechanisms exist because every system makes mistakes. They should be maintained until there is substantial evidence — from real deployments, not from controlled demos — that the error rate is low enough to justify removing them.

The Broader Lesson for Exam Administrators

Every university in India that is considering a move to digital evaluation should take this episode as a case study in deployment risk.

The benefits of digital evaluation are real. Faster logistics. Elimination of transcription errors. Remote examiner access. Transparent audit trails. These are genuine improvements over physical booklet evaluation.

But the shift from physical to digital does not automatically make evaluation more accurate or more reliable. What makes digital evaluation better is not the medium — it is the quality of the implementation, the validation of inputs, and the presence of human oversight at the points where the system is most likely to fail.

An AI-assisted evaluation system that flags low-confidence answers for human review does not just save time. It creates a built-in safety mechanism: the system knows what it is uncertain about, surfaces those cases for human judgment, and produces a full audit trail showing which answers were evaluated by AI, which were reviewed by humans, and what scores were assigned at each step.

That is not just a better evaluation system. It is a defensible one — one that can withstand scrutiny from students, parents, and institutions precisely because the uncertainty has been acknowledged and addressed, not suppressed.

Where to Start

If your institution is evaluating digital exam evaluation options, the single most useful thing you can do is run a small-scale pilot before committing to any system at scale.

One subject. One exam batch. Full evaluation from scan to result — with quality checks, confidence scoring, and a comparison against manual evaluation.

That comparison will tell you more about how a system actually performs than any product demonstration can.


ParikshAI is an AI-assisted exam evaluation platform built for Indian universities. Our hybrid model — AI grading with human review for low-confidence answers — is designed for institutions that want faster results without compromising on accuracy or oversight.

Request your free pilot at pariksha.tech