AI in Science Research
How can modern AI help to push the boundary of science
Ding Li 2022.1
2
MATHEMATICS
3
AI Aids Intuition in Mathematical Discovery
The cycle of developing mathematical theories by
studying examples.
• After recognizing a possible pattern in the properties of
mathematical objects, such as convex polyhedra (3D
shapes with flat faces, straight edges and vertices that all
point outwards), mathematicians typically go through a
cycle to understand this pattern.
• They first compute the properties of some simple
examples and analyze the possible relationships
between these properties.
• The researchers then refine these relationships. For
example, they might come up with Euler’s polyhedron
formula, which posits that the number of vertices (V)
minus the number of edges (E) plus the number of faces
(F) of a convex polyhedron is always equal to two:
V − E + F = 2.
• They then test this suggested relationship on more
complicated examples, discard irrelevant properties and
attempt to understand why the relationship holds. If it
remains unclear, mathematicians then consider different
examples, and the cycle continues.
• Davies et al.1 show that machine-learning techniques
can help researchers with the refinement step, which
usually relies strongly on human intuition
Stump 2021
4
Advancing mathematics by guiding human intuition with AI Davies 2021
As an illustrative example: let z be convex polyhedra,
X(z) ∈ Z2 × R2 be the number of vertices and edges of z, as well as the
volume and surface area, and Y(z) ∈ ℤ be the number of faces of z.
Euler’s formula states that there is an exact relationship between X(z)
and Y(z) in this case: X(z) · (−1, 1, 0, 0) + 2 = Y(z).
The framework helps guide the intuition of mathematicians in two
ways: by verifying the hypothesized existence of structure/patterns in
mathematical objects through the use of supervised machine learning;
and by helping in the understanding of these patterns through the use
of attribution techniques.
5
Quantum
Chemistry
6
Pushing the Frontiers of Density Functionals
by Solving the Fractional Electron Problem
Kirkpatrick 2021
• Computing electronic energies underpins theoretical chemistry and materials science, and
density functional theory (DFT) promises an exact and efficient approach
• But the approach has limitations and is known to give the wrong results for certain types of
molecule.
• “It’s sort of the ideal problem for machine learning: you know the answer, but not the
formula you want to apply.”
• The functional was evaluated by integrating local energies computed by a multilayer
perceptron (MLP), which took as input both local and nonlocal features of the occupied
Kohn-Sham (KS) orbitals and can be described as a local range-separated hybrid.
• To train the functional, the sum of two objective functions was used: a regression a
gradient regularization term that ensured that the functional derivatives can be used in
self-consistent field (SCF) calculations after training
Castelvecchi 2021
7
BIOLOGY
8
Primary Structure
Amino acids (20)
Peptide bond
Secondary Structure Tertiary Structure
Quaternary Structure
9
(MSA) Multiple Sequence Alignments Nseq x Nres
• Evolutionary constrains
• MSA clustering
• Cluster deletion
• Evolutionary correlations
Pairwise Feature Nres x Nres
• Physical and geometric constrains
• Target feat (amino acids), residue index
• Structural templates
• Template distogram
Near experimental accuracy in
most cases for CASP14 assessment
(May-July 2020)
Jumper 2021 GitHub
AlphaFold Protein Structure Database (JAK2)
Blog
Colab
UniProt (JAK2)
10
A BERT-style transformer was applied to predict randomly masked
individual residues within the MSA, which encourages the network to
learn to interpret phylogenetic and covariation relationships without
hardcoding a particular correlation statistic into the features.
Exchange information iteratively
to enable direct reasoning about
the spatial and evolutionary
relationships in the proteins.
Combination of the bioinformatics and physical approaches
We hope that AlphaFold—and computational approaches that apply its techniques
for other biophysical problems—will become essential tools of modern biology.
11
“Do not quench your inspiration
and your imagination; do not
become the slave of your
model.”
– Vincent van Gogh

AI to advance science research

  • 1.
    AI in ScienceResearch How can modern AI help to push the boundary of science Ding Li 2022.1
  • 2.
  • 3.
    3 AI Aids Intuitionin Mathematical Discovery The cycle of developing mathematical theories by studying examples. • After recognizing a possible pattern in the properties of mathematical objects, such as convex polyhedra (3D shapes with flat faces, straight edges and vertices that all point outwards), mathematicians typically go through a cycle to understand this pattern. • They first compute the properties of some simple examples and analyze the possible relationships between these properties. • The researchers then refine these relationships. For example, they might come up with Euler’s polyhedron formula, which posits that the number of vertices (V) minus the number of edges (E) plus the number of faces (F) of a convex polyhedron is always equal to two: V − E + F = 2. • They then test this suggested relationship on more complicated examples, discard irrelevant properties and attempt to understand why the relationship holds. If it remains unclear, mathematicians then consider different examples, and the cycle continues. • Davies et al.1 show that machine-learning techniques can help researchers with the refinement step, which usually relies strongly on human intuition Stump 2021
  • 4.
    4 Advancing mathematics byguiding human intuition with AI Davies 2021 As an illustrative example: let z be convex polyhedra, X(z) ∈ Z2 × R2 be the number of vertices and edges of z, as well as the volume and surface area, and Y(z) ∈ ℤ be the number of faces of z. Euler’s formula states that there is an exact relationship between X(z) and Y(z) in this case: X(z) · (−1, 1, 0, 0) + 2 = Y(z). The framework helps guide the intuition of mathematicians in two ways: by verifying the hypothesized existence of structure/patterns in mathematical objects through the use of supervised machine learning; and by helping in the understanding of these patterns through the use of attribution techniques.
  • 5.
  • 6.
    6 Pushing the Frontiersof Density Functionals by Solving the Fractional Electron Problem Kirkpatrick 2021 • Computing electronic energies underpins theoretical chemistry and materials science, and density functional theory (DFT) promises an exact and efficient approach • But the approach has limitations and is known to give the wrong results for certain types of molecule. • “It’s sort of the ideal problem for machine learning: you know the answer, but not the formula you want to apply.” • The functional was evaluated by integrating local energies computed by a multilayer perceptron (MLP), which took as input both local and nonlocal features of the occupied Kohn-Sham (KS) orbitals and can be described as a local range-separated hybrid. • To train the functional, the sum of two objective functions was used: a regression a gradient regularization term that ensured that the functional derivatives can be used in self-consistent field (SCF) calculations after training Castelvecchi 2021
  • 7.
  • 8.
    8 Primary Structure Amino acids(20) Peptide bond Secondary Structure Tertiary Structure Quaternary Structure
  • 9.
    9 (MSA) Multiple SequenceAlignments Nseq x Nres • Evolutionary constrains • MSA clustering • Cluster deletion • Evolutionary correlations Pairwise Feature Nres x Nres • Physical and geometric constrains • Target feat (amino acids), residue index • Structural templates • Template distogram Near experimental accuracy in most cases for CASP14 assessment (May-July 2020) Jumper 2021 GitHub AlphaFold Protein Structure Database (JAK2) Blog Colab UniProt (JAK2)
  • 10.
    10 A BERT-style transformerwas applied to predict randomly masked individual residues within the MSA, which encourages the network to learn to interpret phylogenetic and covariation relationships without hardcoding a particular correlation statistic into the features. Exchange information iteratively to enable direct reasoning about the spatial and evolutionary relationships in the proteins. Combination of the bioinformatics and physical approaches We hope that AlphaFold—and computational approaches that apply its techniques for other biophysical problems—will become essential tools of modern biology.
  • 11.
    11 “Do not quenchyour inspiration and your imagination; do not become the slave of your model.” – Vincent van Gogh