CoreTechX is developing Al to convert handwritten Arabic documents into searchable digital text, enabling governments and institutions to unlock decades of inaccessible historical and policy data.
The digital transformation of the Middle East is often depicted through high-rise skylines and futuristic smart cities. However, a more profound revolution is taking place in the quiet archives of government ministries and historical libraries. For decades, millions of vital records, from court files to government contracts, remained functionally invisible because they were handwritten. Today, two founders under 30, Fahad Faisal Fahad AlSaud and Fahad Durukan, are reclaiming this “locked” knowledge by building a first-class AI infrastructure specifically designed for Arabic.
The Vision: From Dead Assets to Active Knowledge
For AlSaud, the inspiration was rooted in a practical problem with regional consequences. While working with government institutions, he realized that important decisions were being made without access to decades of data simply because it was trapped in unreadable formats. “Undigitized data is an economic dead weight,” AlSaud notes. He saw this not just as a technical hurdle, but as a strategic liability that puts national policy and cultural history at risk.
Co-founder Fahad Durukan’s perspective was shaped through the lens of a scholar. An avid reader of historical manuscripts, he faced the repeated frustration of navigating degraded pages and unclear handwriting. He observed that even when documents were available digitally, they were often “locked” inside images, inaccessible to modern search or analytical tools. Together, they founded CoreTechX on the belief that Arabic content deserves technology built specifically for its unique complexities.
The Technical Breakthrough: The ENAHR Pipeline
To bridge this gap, CoreTechX developed ENAHR (End-to-End Arabic Handwritten Recognition). Unlike traditional systems that treat Arabic as an afterthought, ENAHR utilizes a hybrid CNN-Transformer architecture designed to handle the cursive structure and contextual letter shapes that define the language.
The pipeline operates through several sophisticated stages to ensure accuracy:
- Preprocessing & Noise Reduction: Techniques clean high-quality inputs, accounting for physical degradation like ink bleed or faded strokes.
- Line Segmentation & Sorting: The system organizes text lines to preserve the original document flow.
- Core OCR Engine: A transformer-based model captures the “long-range dependencies” of cursive script, producing accurate representations even in complex cases.
- The “LLM Fix”: A lightweight language model is applied at the final stage to refine the recognized text and improve overall readability.
Defying Global Giants: Benchmarking Success
This specialized focus has led to a technical triumph that puts global generalist models to shame. CoreTechX’s internal research reveals a massive performance gap when it comes to handwritten Arabic:
- State-of-the-Art Accuracy: Achieved a record 3.1% Character Error Rate (CER) on the contemporary KHATT dataset and 5.6% on the historical Muharaf dataset.
- Outperforming Big Tech: In head-to-head comparisons, CoreTechX (approx. 14% CER) significantly outpaced global models such as ChatGPT (60.7% CER) and Gemini-Pro (28% CER) on similar tasks.
Scaling for the Future: Introducing CoreTechX’s OCR System

CoreTechX is now evolving into its OCR System, a comprehensive knowledge platform signaling a shift from a backend provider to a front-end catalyst for productivity. By layering generative AI on top of structured archives, institutions can now “talk” to their data, performing summaries and statistical analyses with full citations.
The founders’ vision for the next five years is clear: to establish CoreTechX’s OCR System as the backbone of structured Arabic knowledge across the Arab world. As they balance rigorous research with commercial speed, AlSaud and Durukan are guided by the principle of the scholar Al-Ghazali: “Excess in anything becomes a defect”. By finding that perfect balance, they are ensuring that the GCC’s past is no longer a dead asset, but the fuel for its digital future.
Most Read | Some insights about Entrepreneurship & Business



































