Google AI’s Near Human Genius Levels on Mathematics Problem Solving
Google DeepMind's AlphaGeometry and AlphaProof, have shown abilities in solving complex math problems that near the very best human minds. AlphaGeometry excels in geometry, reaching a level close to a human International Mathematical Olympiad (IMO) gold medallist, while AlphaProof has achieved the same level as a silver medallist in the IMO competition.
AlphaGeometry uses a neuro-symbolic approach, combining a neural language model with a symbolic deduction engine. The language model identifies patterns and relationships in geometric data and proposes useful constructions, such as points, lines, and circles. The symbolic engine uses formal logic to make deductions and find solutions based on these suggestions. This approach allows it to tackle geometry problems requiring multiple steps. Importantly, the solutions generated are verifiable and use classical geometry rules.
AlphaProof is a reinforcement-learning system that focuses on formal math reasoning. It can translate natural language problem statements into formal statements. The system generates solution candidates and proves or disproves them using a search over possible proof steps in the formal language Lean. In a virtuous circle, the proofs are then used to reinforce the language model, enabling it to solve increasingly challenging problems.
A key aspect of these systems is their ability to train on large datasets of synthetic data, generated without human demonstrations. AlphaGeometry used a "symbolic deduction and traceback" process to create 100 million unique examples of varying difficulty.
AlphaProof trained by proving or disproving millions of problems. This approach overcomes the data bottleneck that has limited the use of formal languages in machine learning. In the 2024 IMO competition, a combined system of AlphaProof and AlphaGeometry solved four out of six problems, earning a silver medal equivalent score. This achievement marks the first time an AI system has reached this level of performance in the IMO.
Comments