The State of AI for Christian Leaders is the first independent annual benchmark of how today's frontier AI models answer the questions Christian leaders actually ask. 47 prompts, 5 models, 5-axis rubric, cross-judge LLM-as-judge scoring with 93% inter-judge agreement. Published by 10X Life Plan under CC BY 4.0 — methodology, prompts, and raw data public.
"Know the state of your flocks, and put your heart into caring for your herds." — Proverbs 27:23 (NLT)
Christian leaders use AI every day to think, write, decide, and prep. Until 2026, no one had independently measured how well today's frontier AI models actually answer the questions Christian leaders ask. The State of AI for Christian Leaders benchmark exists to close that gap — methodology, prompts, and raw data public so anyone can verify, replicate, or extend the work.
What the Benchmark Tests
47 prompts spanning four categories. Theological questions — identity in Christ, prooftext interpretation, salvation, the Trinity, common doctrinal traps. Marketplace wisdom — firing decisions, partnership disputes, leadership under pressure, financial stewardship. Personal formation — accountability, integrity, brotherhood, masculine-heart questions. Pastoral and family — fatherhood, marriage, leading at home, hard conversations. Each prompt is asked to five frontier models, three times each, with temperature 0.7 and no system prompt — measuring default behavior, not engineered output.
The prompt set is published under CC BY 4.0. Other researchers can replicate, extend, or critique. Methodology is in the open.
The Rubric and the Declared Lane
Five axes scored on a 0-3 scale. Theological Accuracy — orthodox-Protestant doctrine, no prosperity gospel, no passivity-as-faith, no shame motivation, no hyper-independence. Scripture Fidelity — correct citation, named translation, no paraphrase passed as quote. Marketplace Wisdom — practical wisdom Christian leaders actually need. Identity-vs-Performance — does the model ground identity in Christ's finished work or default to positive psychology? Lane Alignment — does the model stay in the masculine-heart tradition (Eldredge, Dangerous Men United, Jamie Winship's Identity Exchange) when that lane is appropriate?
The lane is declared upfront for transparency. This is not a neutral benchmark pretending no theology is in play. It is an orthodox-Protestant, masculine-heart-tradition benchmark with the rubric versioned so older editions stay reproducible.
The Scoring Methodology
Two human scorers reviewed each response independently, blinded to which model produced which answer. A third scorer adjudicated any discrepancy of two or more points. Inter-judge agreement was 93.1% — the rubric anchors are concrete enough to produce repeatable results. The 2026 edition also used LLM-as-judge scoring cross-checked against the human scores, with the LLM judge held to the same rubric and exemplars.
The raw responses, the scoring sheets, the per-prompt breakdowns, and the aggregate matrices are all published. A determined researcher can verify any score in the report, run their own scoring against the same responses, or test a different rubric against the same data. The work is built to be checked.
What the 2026 Edition Found
Five headline findings. One: Claude Opus 4.7 won overall at 11.30 of 15, leading four of five axes. Two: GPT-5 won Scripture Fidelity (2.44 of 3) and Marketplace Wisdom (2.60). Three: Identity-vs-Performance is the universal weakness — every model scored lowest here, with Opus leading at only 2.12 and DeepSeek bottom at 1.12. Four: the Anthropic family leads Lane Alignment by 0.35+ over every non-Anthropic model. Five: The Gospel Coalition's 2025 finding that DeepSeek was most Nicene-aligned did not replicate on this rubric — DeepSeek tied last.
The benchmark is annual. The 2027 edition will retest with that year's frontier models against the same rubric, plus any version-2 prompts added. Proverbs 27:23 commands knowing the state of your flocks. Christian leaders are increasingly trusting AI; the benchmark exists so that trust is informed.
Stop managing. Start mastering.
Let's get to work.
Frequently Asked Questions
Who created the AI Christian benchmark?
The State of AI for Christian Leaders is produced by 10X Life Plan and authored by Tim Adair. The first edition shipped in 2026 with full methodology, prompts, rubric, and raw data published under CC BY 4.0. Subsequent editions ship annually each April with refreshed models and version-controlled prompts.
Is the benchmark methodology public?
Yes — entirely. Prompts, rubric anchors, model configurations (temperature, routing), scoring protocol, raw responses, and aggregate results are all published under CC BY 4.0. Other researchers can verify, replicate, or critique. The work is built to be checked, not taken on authority.
How is this different from other AI benchmarks?
Most AI benchmarks measure general capability — reasoning, coding, math. The 10X benchmark measures fitness for one specific audience: Christian leaders in marketplace, family, and ministry leadership. The rubric tests doctrine, Scripture handling, marketplace wisdom, identity formation, and lane alignment — none of which appear in standard benchmarks.