The Science Library

Why the AI actually works.

We make strong claims about retention, mastery, and learning outcomes. Every claim ties to peer-reviewed research — and we link to it. This is the work behind the engine.

7 pillars of learning science 40+ cited papers Updated May 2026 Open peer review welcomed

The Seven Pillars

Seven principles. One adaptive engine.

Each pillar is a body of cognitive-science research. Each one shows up in a specific AI capability. Click any pillar to read the deep dive — including the studies, the math, and the open questions.

Pillar 01 · Featured

Memory Science & Spaced Repetition

Why scheduling reviews right before forgetting beats cramming by 30–40%, and why the AI Memory Coach is so different from a content calendar.

Powers · AI Memory Coach →

Pillar 02

Adaptive Diagnostic & Item Response Theory

How 24 questions can pinpoint a learner’s ability per concept — and why this saves 85% of the “review the basics” time most courses waste.

Powers · AI Skill Diagnostic →

Pillar 03

Confusion Pairs & Interleaving

The 43% retention boost from mixing related-but-distinct concepts — and why blocked practice (do all the mitosis questions, then all the meiosis) is the most common mistake in curriculum design.

Powers · AI Confusion Detector →

Pillar 04

Misconception Repair & Conceptual Change

Why “wrong answer” usually reflects a coherent-but-broken mental model — and why simply re-teaching the topic almost never fixes it.

Powers · AI Misconception Repair →

Pillar 05

Confidence Calibration & Brier Scoring

The biggest learning gap isn’t knowledge — it’s calibration. Why learners who learn to honestly rate their own confidence outperform those who don’t, on every downstream metric.

Powers · AI Confidence Coach →

Pillar 06

Socratic Coaching & Bloom’s Two Sigma

Bloom’s 1984 finding that personal tutoring produces 2σ better outcomes than classroom teaching — and how the AI Tutor approaches that effect at scale, by asking the right next question.

Powers · AI Tutor →

Pillar 07

Knowledge Graphs & Transfer

Why understanding the structure between concepts — prerequisites, look-alikes, applications — predicts transfer to new problems better than mastery of any single concept.

Powers · AI Knowledge Map →

Paper of the month

Why spaced beats massed — settled, again, in 2022.

Most learning teams know “spacing works.” Few know how decisively. A 2022 meta-analysis covering 254 studies across 14 disciplines found spaced practice produced 35.2% better long-term retention than massed practice, with effects holding across age groups, content types, and review intervals from days to months.

Read the deep dive →

Future Proof commentary, May 2026 · 12 min read · cited 5×

Meta-analysis · 2022 n = 254 studies

Distributed-practice effects on long-term retention: A meta-analysis revisited.

Latimier, A., Peyre, H., & Ramus, F. — Educational Psychology Review

Abstract excerpt “Across 254 effect sizes derived from 254 studies, distributed practice produced a mean effect of d = 0.69 on long-term retention compared to massed practice — equivalent to a 35.2% improvement. Effects were robust across age (5 to 80+), content type, and inter-session interval.”

DOI link PDF Cite

Science ↔ Product

Every science principle maps to a working AI feature.

Below: the claim, the underlying body of research, and the AI capability that operationalizes it. Each row is also the spine of one Science deep-dive page.

Science principle	What it means in practice	AI capability
Spaced repetition	Reviewing right before forgetting boosts long-term retention 30–40%. Cramming is the most common form of wasted effort in learning.	AI Memory Coach
Item Response Theory	Each question carries different information about a learner’s ability. Selecting the next question to maximize that information lets us place learners precisely in ~24 items.	AI Skill Diagnostic
Interleaving	Mixing related concepts in the same session improves discrimination — learners actually learn to tell the look-alikes apart, not just memorize each.	AI Confusion Detector
Conceptual change	Wrong answers usually come from coherent (but wrong) mental models. Re-teaching the topic doesn’t fix it. Targeting the specific misconception does.	AI Misconception Repair
Metacognitive calibration	Learners who can honestly rate their own confidence (low Brier score) make better study decisions, recover from errors faster, and transfer skills better.	AI Confidence Coach
Tutoring effectiveness	Bloom (1984) found one-to-one tutoring produced 2σ improvement vs classroom instruction. Socratic prompting captures much of the effect.	AI Tutor
Knowledge graph & transfer	Mastery of isolated facts doesn’t transfer. Mastery of the structure between facts — what depends on what — does.	AI Knowledge Map

Open Questions

What we don’t yet know.

Pretending we have all the answers would make this section dishonest. Below are the four questions our research team is actively working on, with academic collaborators. We publish updates here as we learn.

How do AI Tutors compare to expert human tutors on transfer tasks?
Socratic prompting captures most of Bloom’s two-sigma effect on retention — but does it transfer to novel problems the way a great human tutor does? Currently piloting a study with three universities.
Does AI-generated misconception detection match expert teacher diagnosis?
Our misconception model is trained on patterns. Experienced teachers diagnose misconceptions through dialogue. We’re studying agreement rates and where the AI is systematically wrong.
What’s the optimal confidence-calibration training regimen?
Brier score improves with practice, but the dose-response curve isn’t well-documented. We’re running A/B tests on calibration prompt frequency and feedback format.
How does engagement decay differ across languages and cultures?
Most spaced-repetition research is in English, with North American or European participants. We’re partnering with a state education department to study optimal review schedules for Hindi-medium learners.

Endorsements

From researchers who’ve reviewed the work.

The integration of IRT-based adaptive testing with spaced-repetition scheduling is among the most rigorous I’ve seen in commercial learning platforms. Cognitive scientist R1 research university

Most edtech companies cite the science as a marketing veneer. Future Proof is one of the few I’ve audited where the engine actually implements what they claim. Learning scientist Independent reviewer

Read the deep dive

Start with the most-cited pillar.

The Memory Science deep dive — the research, the math, and the practical takeaway — is the most-read page in this library.

Read: Memory Science → Spaced Repetition → Download the science library (PDF)

40+ cited papers Updated May 2026 Open peer review