GovBench: AI Benchmarks on US Government domains.
By: Glenn Parham & justin
About this:
https://huggingface.co/GovBench
Family | Model Name | Overall | J1 | J2 | J3 | J4 | J5 | J6 |
---|---|---|---|---|---|---|---|---|
OpenAI | o3 | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
OpenAI | gpt-4.1 | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
OpenAI | o3-mini | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
OpenAI | GPT-4o | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
OpenAI | GPT-3.5 Turbo | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
gemini-2.0-flash | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
gemma-2-9b-it | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
Meta | Llama-4-Scout-17B-16E-Instruct | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
Meta | Meta-Llama-3-70B-Instruct-Turbo | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
Meta | Meta-Llama-3.1-8B-Instruct-Turbo | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
Meta | Meta-Llama-3.1-405B-Instruct-Turbo | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
DeepSeek | DeepSeek V3-0324 | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
DeepSeek | DeepSeek-R1 | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
Anthropic | claude-3-5-haiku-20241022 | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
claude-3-7-sonnet-20250219 | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
Mistral | Mistral-Small-24B-Instruct-2501 | 0% | 0% | 0% | 0% | 0% | 0% | 0% |