GovBench: AI Benchmarks on US Government domains.

By: Glenn Parham & justin

JointStaffBench

About this:

https://huggingface.co/GovBench

Family Model Name Overall J1 J2 J3 J4 J5 J6
OpenAI o3 0% 0% 0% 0% 0% 0% 0%
OpenAI gpt-4.1 0% 0% 0% 0% 0% 0% 0%
OpenAI o3-mini 0% 0% 0% 0% 0% 0% 0%
OpenAI GPT-4o 0% 0% 0% 0% 0% 0% 0%
OpenAI GPT-3.5 Turbo 0% 0% 0% 0% 0% 0% 0%
Google gemini-2.0-flash 0% 0% 0% 0% 0% 0% 0%
Google gemma-2-9b-it 0% 0% 0% 0% 0% 0% 0%
Meta Llama-4-Scout-17B-16E-Instruct 0% 0% 0% 0% 0% 0% 0%
Meta Meta-Llama-3-70B-Instruct-Turbo 0% 0% 0% 0% 0% 0% 0%
Meta Meta-Llama-3.1-8B-Instruct-Turbo 0% 0% 0% 0% 0% 0% 0%
Meta Meta-Llama-3.1-405B-Instruct-Turbo 0% 0% 0% 0% 0% 0% 0%
DeepSeek DeepSeek V3-0324 0% 0% 0% 0% 0% 0% 0%
DeepSeek DeepSeek-R1 0% 0% 0% 0% 0% 0% 0%
Anthropic claude-3-5-haiku-20241022 0% 0% 0% 0% 0% 0% 0%
claude-3-7-sonnet-20250219 0% 0% 0% 0% 0% 0% 0%
Mistral Mistral-Small-24B-Instruct-2501 0% 0% 0% 0% 0% 0% 0%