LeanPolish: Kernel-Verified Proof Compression for Lean 4

Why proof compression?

Modern language-model theorem provers can produce Lean proofs that verify, but the proofs are often verbose, repetitive, and difficult to inspect. A proof may be correct while still containing redundant tactics, dead hypotheses, repeated proof patterns, or artefacts left by search and self-correction. This matters because formal proofs are not only certificates of truth; they are also objects that humans and machines need to read, maintain, transform, and learn from.

LeanPolish asks a different question from proof generation: given a proof that already elaborates, can we improve it in a way that is itself checked by Lean? The answer becomes a structured source of supervision for AI systems: accepted rewrites are verified examples, and rejected rewrites record where plausible transformations failed.

Framework

A polisher, not a minimizer

LeanPolish is designed as a proof polisher, not a proof minimizer. A minimizer might replace a proof with any short tactic that closes the goal. LeanPolish instead rejects edits that make proofs more fragile or less interpretable, even when those edits reduce length. The framework is governed by quality gates, kernel checks, and a final file-level re-verification step.

1

Tactic upgrade

Try ranked tactic replacements and accept only strict quality-tier upgrades that close the goal.

2

In-proof anti-unification

Merge near-duplicate have-blocks into a generalized lemma when shared structure is real.

3

Dead-code removal

Remove unused have, haveI, and let blocks using dependency information from the elaborated proof.

4

Warning cleanup

Apply Lean-aware cleanup passes, including no-op branch removal and small syntactic simplifications.

5

Kernel re-verification

Re-elaborate the shortened file, including an independent out-of-process verifier on the saved file.

Key idea

The generalization gate

Anti-unification can make two proof fragments look like they share structure even when the merge is only textual factoring. LeanPolish introduces a simple gate for this case. If an anti-unified lemma has u free meta-variable holes and it replaces N near-duplicate proof blocks, the merge is accepted only when:

u < N

Intuitively, the generalized lemma should contain less freedom than the number of instances it replaces. Otherwise, the merge has at least one degree of freedom per instance and does not capture real shared mathematical structure. This makes anti-unification a quality-aware proof transformation rather than a purely textual refactor.

Accepted pattern

have h1 := add_nonneg ha hb
have h2 := add_nonneg hb hc
have h3 := add_nonneg hc hd

-- shared scaffold:
-- add_nonneg ?_1 ?_2
-- u = 2, N = 3, so u < N

Rejected pattern

have h1 := by linarith [hx]
have h2 := by linarith [hy]

-- no shared scaffold beyond binders
-- u = 2, N = 2, so u ≥ N

Dataset

Verified proof rewrites with contrastive siblings

The dataset released with LeanPolish is not a standalone scrape: it is generated by the framework. Each accepted proof edit carries the original source span, replacement span, goal state, proof context, quality annotation, and reproducibility metadata. Rejected candidates from the same rewrite attempt are preserved as contrastive siblings, making the data useful for supervised training, contrastive learning, preference learning, and reinforcement learning.

33,402accepted training pairs

65,596contrastive rejected siblings

12,972Lean files across five corpora

5.14 MBLean source saved across accepted pairs

Released dataset shards
Shard	Files	Accepted pairs	Rejected siblings	Median save
Mathlib v4.21.0 subset	2,233	6,695	26,912	14 B
Goedel-Workbook	10,052	20,822	28,525	44 B
miniF2F verified pool	308	1,184	3,753	48 B
PutnamBench Goedel sample	352	4,354	5,930	20 B
PutnamBench verified pool	16	80	254	52 B
Putnam 2025 / AxiomProver per-file	10	142	147	37.5 B
Putnam 2025 / AxiomProver pooled	10	125	75	38 B

Benchmarking

What does LeanPolish recover?

On kernel-verified Goedel-Prover-V2 outputs, LeanPolish recovers substantial symbolic compression under the same Lean-aware tokenizer used for comparison with prior work. The linter baseline targets a much narrower edit class: top-level unused tactics. LeanPolish extends this with tactic upgrades, dead-code removal, and in-proof anti-unification.

21.2% token reduction on miniF2F verified Goedel-V2 outputs

8.2% token reduction on PutnamBench verified Goedel-V2 outputs

<0.01% linter.unusedTactic on the same files

AI training signal

Why this matters for mathematical AI

LeanPolish treats formal proofs as objects that can be improved and learned from. The framework does not ask a model to imitate mathematical language in isolation; it builds training data from transformations that are checked by a proof assistant. This creates a bridge between symbolic verification and learning-based theorem proving.

Supervised fine-tuning

Train models to map verbose proof fragments to verified shorter replacements under the same goal state.

Preference learning

Use accepted and rejected siblings from the same attempt as preference or contrastive training pairs.

Reward modeling

Use local byte or token savings, quality tiers, and rejection outcomes as dense feedback signals.

Evaluation

Benchmark learned compressors against a deterministic symbolic framework under matched inputs.

Personal note

I built LeanPolish because I am interested in AI systems that interact with mathematical structure rather than only generating plausible mathematical text. Lean makes this possible: models and tools can explore, fail, revise, and receive exact feedback from a verifier. My goal with this project is to make proof-improvement data more open, inspectable, and useful for the next generation of AI-assisted mathematics systems.

Citation

Citing LeanPolish

The paper is currently under review. Until a public preprint or proceedings version is available, please cite the project page and dataset.

@misc{bourigault2026leanpolish,
  title        = {LeanPolish: A Kernel-Verified Dataset and Symbolic Compression Framework for Lean 4 Proofs},
  author       = {Bourigault, Pauline},
  year         = {2026},
  note         = {Project page and dataset},
  howpublished = {\url{https://paulinebourigault.github.io/leanpolish/}}
}