JobBench: Aligning Agent Work with Human Will
Measuring agents by GDP alone asks how much of a human's job can be taken away.
JobBench asks how much of that job can be given back — built on the work that experts across real-world professions actually want delegated to AI.
In collaboration with





Economics alone is not enough.
The conversation about AI in the workplace has been framed almost entirely in economic terms: what fraction of working hours can agents absorb? how much of GDP is exposed to automation? Benchmarks like OpenAI's GDPval inherit this framing by design — they select tasks that represent economic value, and score agents on whether they can deliver the professional knowledge output.
We believe this framing, on its own, is not enough.
If agents are going to share the professional workplace with humans, the question is not only what work is most economically valuable to automate, but what work do the humans in that role actually want automated? This is a humanist problem. It treats the professional not as labor to be displaced, but as a collaborator whose judgment about their own craft matters — and it is the premise JobBench is built on.
GDPval
OpenAI“What fraction of a human's job is economically valuable to automate?”
JobBench
Ours“What work do the humans in that role actually want automated?”
Model leaderboard
Score = weighted rubric score across all evaluated tasks.
Far from saturation
From knowledge delivery to professional reasoning
What the agent is actually up against
Every JobBench task is a small dossier. Pick one role to see the details.
Reporter — Connecticut investigative desk
WhyInvestigative beat reporting is gated by source-verification time — PDFs, FOIA CSVs, and interview cross-checks eat the day.
Corresponding O*NET: Check reference materials, such as books, news files, or public records, to obtain relevant facts.
Multiple Hartford-area systems exceed the 15 ppb federal action level.
0% of investigated homes identified water as a lead hazard.
CT rows only for 2017–2019; 2020–2022 are dagger-marked non-submissions.
10 ppb action level finalized Oct 2024 — not yet enforceable.
Pediatric referrals up 30% post-threshold change (Dr. Martinez).
Waterbury 16.1 ppb vs. Newark 47.9 ppb — trajectory, not point-in-time.
- Thesis-driven pitch memo
- 3-sheet data workbook
- 15+ entry source verification log
Reasoning challenges by design
click for full detailHeatmap
| Occupation | GPT-5.437.2 | Sonnet4.636.3 | Opus4.635.4 | GPT-5.233.6 | GPT-5.3Codex33.1 | Opus4.531.0 | Sonnet4.526.8 | GPT-5.1Codex26.2 | GPT-5.2Codex24.8 | Opus420.9 | Sonnet417.9 | Qwen3.5 Plus17.6 | Haiku4.515.2 | MiniMaxM2.514.2 | Gemini3 Pro10.9 | Gemini3 Flash10.8 | KimiK2.58.6 | Grok4.2 Fast4.2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Business / Financial Ops | ||||||||||||||||||
| Bookkeeping & Accounting Clerks | 19 | 23 | 51 | 17 | 43 | 13 | 0 | 19 | 17 | 14 | 4 | 4 | 9 | 4 | 14 | 9 | 0 | 0 |
| HR Specialists | 56 | 31 | 47 | 88 | 34 | 19 | 19 | 19 | 41 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 9 | 0 |
| Licensing Examiners / Inspectors | 50 | 33 | 33 | 17 | 17 | 42 | 33 | 17 | 8 | 33 | 33 | 33 | 17 | 33 | 17 | 25 | 42 | 0 |
| Management Analysts | 26 | 30 | 18 | 27 | 24 | 13 | 16 | 6 | 0 | 13 | 0 | 0 | 0 | 3 | 10 | 3 | 3 | 0 |
| Personal Financial Advisors | 33 | 41 | 8 | 23 | 36 | 18 | 21 | 10 | 10 | 31 | 10 | 0 | 8 | 0 | 23 | 10 | 0 | 0 |
| Purchasing Agents | 25 | 43 | 47 | 24 | 34 | 39 | 27 | 21 | 18 | 33 | 16 | 16 | 18 | 8 | 7 | 11 | 2 | 2 |
| Training & Development Specialists | 38 | 41 | 34 | 20 | 30 | 42 | 30 | 16 | 30 | 36 | 30 | 22 | 18 | 18 | 16 | 14 | 0 | 4 |
| Avg. | 35 | 35 | 34 | 31 | 31 | 26 | 21 | 15 | 18 | 23 | 15 | 11 | 10 | 10 | 12 | 10 | 8 | 1 |
| Office / Admin Support | ||||||||||||||||||
| Court Clerks | 37 | 32 | 37 | 45 | 37 | 47 | 0 | 24 | 21 | 11 | 13 | 11 | 0 | 0 | 0 | 13 | 0 | 0 |
| Customer Service Reps | 21 | 50 | 29 | 29 | 16 | 50 | 8 | 16 | 29 | 8 | 8 | 16 | 0 | 21 | 0 | 21 | 16 | 0 |
| Data Entry Keyers | 59 | 66 | 55 | 58 | 61 | 54 | 39 | 47 | 51 | 20 | 36 | 28 | 26 | 32 | 22 | 17 | 7 | 9 |
| Medical Secretaries | 51 | 23 | 41 | 38 | 15 | 15 | 8 | 15 | 41 | 8 | 0 | 15 | 15 | 8 | 8 | 8 | 8 | 0 |
| Police / Fire Dispatchers | 36 | 47 | 36 | 36 | 36 | 15 | 47 | 47 | 26 | 15 | 57 | 47 | 19 | 30 | 11 | 11 | 15 | 0 |
| Secretaries & Admin Assistants | 72 | 30 | 46 | 46 | 48 | 37 | 20 | 20 | 11 | 20 | 41 | 22 | 30 | 6 | 0 | 11 | 20 | 6 |
| Avg. | 46 | 41 | 41 | 42 | 35 | 36 | 20 | 28 | 30 | 14 | 26 | 23 | 15 | 16 | 7 | 13 | 11 | 2 |
| Computer / Mathematical | ||||||||||||||||||
| Biostatisticians | 29 | 25 | 12 | 20 | 46 | 18 | 37 | 57 | 28 | 12 | 28 | 22 | 25 | 28 | 15 | 11 | 11 | 9 |
| CS Researchers | 16 | 38 | 19 | 11 | 22 | 12 | 20 | 8 | 9 | 8 | 14 | 15 | 14 | 4 | 0 | 0 | 4 | 11 |
| Statisticians | 36 | 18 | 44 | 36 | 34 | 37 | 36 | 26 | 22 | 30 | 15 | 14 | 14 | 14 | 14 | 8 | 7 | 4 |
| User Support Specialists | 39 | 36 | 57 | 48 | 33 | 45 | 38 | 28 | 32 | 19 | 29 | 39 | 22 | 26 | 12 | 12 | 25 | 0 |
| Web Administrators | 52 | 48 | 36 | 24 | 24 | 24 | 40 | 12 | 12 | 24 | 12 | 12 | 12 | 12 | 12 | 24 | 12 | 0 |
| Avg. | 34 | 33 | 34 | 28 | 32 | 27 | 34 | 26 | 21 | 19 | 19 | 20 | 17 | 17 | 11 | 11 | 12 | 5 |
| Architecture / Engineering | ||||||||||||||||||
| Civil Engineers | 53 | 55 | 52 | 51 | 35 | 43 | 36 | 49 | 42 | 30 | 18 | 22 | 26 | 24 | 18 | 25 | 3 | 6 |
| Mechanical Eng. Technicians | 24 | 32 | 20 | 20 | 19 | 29 | 27 | 25 | 15 | 5 | 12 | 15 | 12 | 9 | 14 | 6 | 15 | 3 |
| Mechanical Engineers | 36 | 27 | 52 | 27 | 0 | 52 | 18 | 18 | 9 | 0 | 0 | 0 | 0 | 0 | 9 | 9 | 9 | 0 |
| Petroleum Engineers | 12 | 28 | 36 | 0 | 16 | 12 | 28 | 32 | 20 | 12 | 0 | 12 | 12 | 20 | 0 | 12 | 0 | 0 |
| Avg. | 31 | 35 | 40 | 25 | 18 | 34 | 27 | 31 | 21 | 12 | 7 | 12 | 12 | 13 | 10 | 13 | 7 | 2 |
| Management | ||||||||||||||||||
| Financial Managers | 14 | 59 | 44 | 24 | 33 | 14 | 26 | 24 | 32 | 9 | 18 | 10 | 18 | 4 | 9 | 15 | 0 | 4 |
| Health Services Managers | 20 | 33 | 20 | 26 | 8 | 19 | 20 | 8 | 8 | 19 | 14 | 14 | 8 | 8 | 14 | 14 | 8 | 4 |
| IT / IS Managers | 41 | 17 | 36 | 49 | 27 | 24 | 12 | 17 | 15 | 17 | 15 | 10 | 10 | 15 | 8 | 12 | 0 | 0 |
| Supply Chain Managers | 17 | 12 | 17 | 6 | 12 | 12 | 12 | 0 | 6 | 17 | 0 | 6 | 0 | 0 | 6 | 0 | 6 | 6 |
| Avg. | 23 | 30 | 29 | 26 | 20 | 17 | 17 | 12 | 15 | 15 | 12 | 10 | 9 | 7 | 9 | 10 | 3 | 3 |
| Arts / Media | ||||||||||||||||||
| Producers | 53 | 64 | 42 | 64 | 53 | 39 | 39 | 72 | 64 | 28 | 31 | 31 | 22 | 22 | 8 | 0 | 0 | 14 |
| Reporters & Correspondents | 47 | 20 | 20 | 37 | 47 | 33 | 23 | 33 | 20 | 43 | 10 | 10 | 13 | 0 | 10 | 0 | 0 | 10 |
| Technical Writers | 55 | 64 | 50 | 45 | 49 | 45 | 54 | 37 | 34 | 35 | 42 | 41 | 41 | 30 | 12 | 11 | 27 | 9 |
| Avg. | 52 | 49 | 37 | 48 | 49 | 39 | 39 | 48 | 39 | 35 | 27 | 27 | 26 | 17 | 10 | 4 | 9 | 11 |
| Other (Legal · Sales · Science · Edu.) | ||||||||||||||||||
| Lawyers | 50 | 25 | 38 | 25 | 25 | 25 | 50 | 25 | 38 | 0 | 13 | 25 | 13 | 13 | 0 | 0 | 25 | 0 |
| Online Merchants | 63 | 32 | 45 | 43 | 43 | 30 | 21 | 59 | 30 | 38 | 20 | 30 | 14 | 25 | 14 | 20 | 14 | 9 |
| Securities Sales Agents | 35 | 35 | 14 | 27 | 59 | 27 | 27 | 0 | 41 | 24 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Soc. Sci. Research Assistants | 58 | 52 | 47 | 59 | 59 | 55 | 37 | 49 | 52 | 27 | 29 | 20 | 24 | 29 | 18 | 20 | 14 | 13 |
| Sociology Teachers (Postsec.) | 57 | 29 | 28 | 36 | 36 | 45 | 34 | 33 | 41 | 35 | 11 | 17 | 14 | 21 | 17 | 15 | 14 | 7 |
| Tech & Sci. Sales Reps | 12 | 10 | 13 | 30 | 18 | 11 | 6 | 11 | 10 | 13 | 6 | 19 | 10 | 6 | 3 | 0 | 0 | 0 |
| Avg. | 46 | 31 | 31 | 37 | 40 | 32 | 29 | 29 | 35 | 23 | 15 | 19 | 12 | 16 | 9 | 9 | 11 | 5 |