IndQA: OpenAI’s First Cultural Benchmark Begins with Indian Languages
5 November, 2025
What is IndQA
OpenAI has launched IndQA — a new benchmark to evaluate how well AI models understand Indian languagesandcultural context.
It is designed not just as a translation test, but to probe reasoning, cultural nuance, local context, in Indian languages.
The benchmark covers 2,278 questions across multiple Indian languages and 10 cultural domains (architecture & design, arts & culture, everyday life, food & cuisine, history, law & ethics, literature & linguistics, media & entertainment, religion & spirituality, sports & recreation).
Languages included: Bengali, English, Hindi, Hinglish, Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil.
Each question is authored natively in the language and includes an English translation, a rubric by domain‐experts, and an ideal answer.
🧐 Why it matters
India is a major market for OpenAI (ChatGPT and related services) and has vast linguistic diversity: India has around one billion people who do not use English as their primary language, and 22 official languages (with many speakers).
Existing multilingual AI benchmarks (e.g., MMMLU) are argued to be “saturated” (i.e., many top models do very well already) and often focus on translation or simple Q&A, but not cultural reasoning. IndQA aims to fill this gap.
For Indian AI ecosystem: This provides a more rigorous way to test models on Indian-language & culture tasks, which can help improve AI tools for regional languages and contexts.
🔍 Key details & caveats
The benchmark is not intended as a language leaderboard (i.e., it’s not to say “Model A is better in Tamil than Model B in Telugu” per se). Cross‐language comparisons are to be taken cautiously.
The questions were adversarially filtered: only those questions where top models (e.g., GPT-4o, GPT-4.5, GPT-5) failed were retained, meaning the benchmark is deliberately hard.
Early performance: One media report says that for the benchmark, e.g., OpenAI’s GPT-5 “Thinking High” model scored ~34.9 % average.
Stronger performance in some languages (Hindi, Hinglish) and weaker in others (Bengali, Telugu) in early results.
The domains are quite broad and culturally grounded: e.g., food & cuisine, architecture & design, everyday life—so the model must reason and understand local culture, not just translate.
🧭 Implications for India
For Indian users: It suggests programs like ChatGPT and related AI services are being pushed to better serve non-English users and incorporate cultural relevance.
For AI developers / researchers in India: IndQA offers a benchmark to test and improve models meant for Indian languages and cultural contexts. It could spur more work in “Indic” language AI.
For policymakers/education: It underlines the importance of indigenous language support and cultural context in AI tools, which has implications for digital inclusion, regional language content, AI literacy.
However: Because the benchmark is so challenging and the models currently score relatively low, it signals there is a long way to go for AI to perform well on Indian-language & culture tasks.