Why the AI actually works.
We make strong claims about retention, mastery, and learning outcomes. Every claim ties to peer-reviewed research — and we link to it. This is the work behind the engine.
Seven principles. One adaptive engine.
Each pillar is a body of cognitive-science research. Each one shows up in a specific AI capability. Click any pillar to read the deep dive — including the studies, the math, and the open questions.
Memory Science & Spaced Repetition
Why scheduling reviews right before forgetting beats cramming by 30–40%, and why the AI Memory Coach is so different from a content calendar.
Pillar 02Adaptive Diagnostic & Item Response Theory
How 24 questions can pinpoint a learner’s ability per concept — and why this saves 85% of the “review the basics” time most courses waste.
Pillar 03Confusion Pairs & Interleaving
The 43% retention boost from mixing related-but-distinct concepts — and why blocked practice (do all the mitosis questions, then all the meiosis) is the most common mistake in curriculum design.
Pillar 04Misconception Repair & Conceptual Change
Why “wrong answer” usually reflects a coherent-but-broken mental model — and why simply re-teaching the topic almost never fixes it.
Pillar 05Confidence Calibration & Brier Scoring
The biggest learning gap isn’t knowledge — it’s calibration. Why learners who learn to honestly rate their own confidence outperform those who don’t, on every downstream metric.
Pillar 06Socratic Coaching & Bloom’s Two Sigma
Bloom’s 1984 finding that personal tutoring produces 2σ better outcomes than classroom teaching — and how the AI Tutor approaches that effect at scale, by asking the right next question.
Pillar 07Knowledge Graphs & Transfer
Why understanding the structure between concepts — prerequisites, look-alikes, applications — predicts transfer to new problems better than mastery of any single concept.
Why spaced beats massed — settled, again, in 2022.
Most learning teams know “spacing works.” Few know how decisively. A 2022 meta-analysis covering 254 studies across 14 disciplines found spaced practice produced 35.2% better long-term retention than massed practice, with effects holding across age groups, content types, and review intervals from days to months.
Read the deep dive →Distributed-practice effects on long-term retention: A meta-analysis revisited.
Every science principle maps to a working AI feature.
Below: the claim, the underlying body of research, and the AI capability that operationalizes it. Each row is also the spine of one Science deep-dive page.
| Science principle | What it means in practice | AI capability |
|---|---|---|
| Spaced repetition | Reviewing right before forgetting boosts long-term retention 30–40%. Cramming is the most common form of wasted effort in learning. | AI Memory Coach |
| Item Response Theory | Each question carries different information about a learner’s ability. Selecting the next question to maximize that information lets us place learners precisely in ~24 items. | AI Skill Diagnostic |
| Interleaving | Mixing related concepts in the same session improves discrimination — learners actually learn to tell the look-alikes apart, not just memorize each. | AI Confusion Detector |
| Conceptual change | Wrong answers usually come from coherent (but wrong) mental models. Re-teaching the topic doesn’t fix it. Targeting the specific misconception does. | AI Misconception Repair |
| Metacognitive calibration | Learners who can honestly rate their own confidence (low Brier score) make better study decisions, recover from errors faster, and transfer skills better. | AI Confidence Coach |
| Tutoring effectiveness | Bloom (1984) found one-to-one tutoring produced 2σ improvement vs classroom instruction. Socratic prompting captures much of the effect. | AI Tutor |
| Knowledge graph & transfer | Mastery of isolated facts doesn’t transfer. Mastery of the structure between facts — what depends on what — does. | AI Knowledge Map |
What we don’t yet know.
Pretending we have all the answers would make this section dishonest. Below are the four questions our research team is actively working on, with academic collaborators. We publish updates here as we learn.
-
How do AI Tutors compare to expert human tutors on transfer tasks?
Socratic prompting captures most of Bloom’s two-sigma effect on retention — but does it transfer to novel problems the way a great human tutor does? Currently piloting a study with three universities.
-
Does AI-generated misconception detection match expert teacher diagnosis?
Our misconception model is trained on patterns. Experienced teachers diagnose misconceptions through dialogue. We’re studying agreement rates and where the AI is systematically wrong.
-
What’s the optimal confidence-calibration training regimen?
Brier score improves with practice, but the dose-response curve isn’t well-documented. We’re running A/B tests on calibration prompt frequency and feedback format.
-
How does engagement decay differ across languages and cultures?
Most spaced-repetition research is in English, with North American or European participants. We’re partnering with a state education department to study optimal review schedules for Hindi-medium learners.
From researchers who’ve reviewed the work.
The integration of IRT-based adaptive testing with spaced-repetition scheduling is among the most rigorous I’ve seen in commercial learning platforms.Cognitive scientist R1 research university
Most edtech companies cite the science as a marketing veneer. Future Proof is one of the few I’ve audited where the engine actually implements what they claim.Learning scientist Independent reviewer
Start with the most-cited pillar.
The Memory Science deep dive — the research, the math, and the practical takeaway — is the most-read page in this library.