LAMBDA ZERO
λ₀
LAMBDA ZERO
▸ Minimalist pure lazy functional programming language.
▸ Desugars to the untyped lambda calculus plus
optimization of natural numbers and arithmetic.
▸ Python and Agda-inspired syntax.
▸ Bootstrap interpreter is under 2400 lines of C.
▸ Self-interpreting.
▸ Self-type-inferencing (Hindley-Milner with ADTs).
CODE EXAMPLES
n ! ≔ n ⦊ 0 ↦ 1; ↑ n′ ↦ n ⋅ n′ !
sort ≔ [] ↦ []; n ∷ ns ↦ sort(ns ¦ (≤ n)) ⧺ [n] ⧺ sort(ns ¦ (> n))
primes ≔ p(2 …) where p ≔ [] ↦ []; n ∷ ns ↦ n ∷ p(ns ¦ (% n ≠ 0))
MOTIVATION
▸ Lambda Zero is an attempt to discover the true
foundation of mathematics.
▸ If you try to fully understand anything in mathematics, you
end up getting into the foundations, which generally
means you end up in set theory.
▸ “The language of set theory can be used to define nearly
all mathematical objects.” (Wikipedia)
MOTIVATION
▸ Kuratowski's definition of an ordered pair (the "accepted
definition" according to Wikipedia):
▸ (a, b) := {{a}, {a, b}}
▸ The right is more asymmetric than the left…
▸ This seems unnatural, like a hack rather than the truth.
▸ The lambda calculus captures both sets and tuples without
anything that seems like a hack.
WHY NOT AGDA?
▸ Agda is about 120,000 lines of code.
▸ Agda is written in Haskell…
▸ GHC (Haskell) is 140,000 lines of code (as of 2012).
▸ GHC is written in Haskell…
▸ This doesn't look like a small elegant core of mathematics.
KOLMOGOROV COMPLEXITY
▸ Loosely speaking, Kolmogorov complexity of an object is the
length of the shortest program that produces the object as
output.
▸ Similarly we can define the complexity of a program as the
length of the shortest program that produces the same output.
▸ We can also define the complexity of a language to be the
complexity of its interpreter.
▸ All interpreter implementations of the same language have
the same complexity by definition.
THE COMPLEXITY COMPASS
▸ In designing a language, if you keep the interpreter close
to the minimum complexity, you can use the complexity as
a compass to point you toward the mathematical truth.
▸ If you get too far from the minimum, the compass signal
can get lost in the noise of the non-mathematical
complexity.
▸ Lambda Zero is written in C, which I think of as portable
assembly code.
LESSON LEARNED
▸ On a couple occasions I tried to take a shortcut or do something
potentially clever and useful despite what the math was
suggesting.
▸ Each time, I eventually felt that I had to go back and change it.
▸ Example: Can we represent a triple in terms of pairs? Then both of
these destructurings are valid, which decreases the helpfulness of
type checking (amongst other issues):
▸ (a, b, c) ≔ triple
▸ (a, b) ≔ triple
DISTRACTIONS
▸ Performance and I/O are distractions from this goal.
▸ Performance is not a property of a language.
▸ No language is faster or slower than any other language.
▸ Compromising the language for the sake of performance
makes the language objectively worse.
▸ Examples: Haskell's seq, builtin data types.
▸ I/O does not affect Turing completeness.
COMPONENTS
▸ Bootstrap interpreter: can be interpreted without knowledge of types; type
annotations are ignored.
▸ ~2400 lines of C; 65KB statically linked and stripped.
▸ "Hello world" statically linked and stripped with gcc on linux is 671KB.
▸ Self interpreter: parses and saves type annotations for type checker.
▸ ~2400 lines of Lambda Zero.
▸ Self type inferencer.
▸ ~300 lines of Lambda Zero.
▸ Future: Self compiler, Lambda One (dependent types, implicit calculus of
constructions, requires knowledge of types to interpret).
ADVANTAGES OF LAZY EVALUATION
▸ Automatically avoids computation that is never needed.
▸ Allows simple and consistent representation of infinite
data.
▸ Allows increased modularity.
▸ Example: sqrt function can be split into a digit generator
and a function that truncates the digits at a desired
point.
ADVANTAGES OF STRICT EVALUATION
▸ Easier to implement efficiently.
▸ Easier to reason about performance.
▸ Easier to understand behavior in impure languages.
LAZY EVALUATION
▸ If we only care about making the ideal language (not
performance etc)…
▸ And if we assume the language is purely functional…
▸ Then a lazy evaluated language is strictly better than a
strictly evaluated language.
▸ Lambda Zero uses the Lazy Krivine Machine.
PARSING
▸ There are no statements in the grammar; only expressions.
▸ Shift reduce parser, table-driven. The table associates a shift function
and a reduce function to each operator.
▸ Shift and reduce functions can do arbitrary modifications to the stack
and their arguments, but this is not often needed.
▸ Uses an algorithm very similar to Dijkstra's shunting yard algorithm but
with a combined stack for operators and operands which makes it
more powerful.
▸ For example, newline is defined to be an infix operator: right-
associative application with some special cases.
UNICODE
▸ Unicode makes it easier to write math directly in the language.
▸ Every operator has both an ASCII and a unicode
representation (sometimes they are the same).
▸ You can type the ASCII representation and the editor will
conceal with the unicode form once you leave that line.
▸ Lisp/Agda style lexing: only a few basic characters are
considered delimiters so you have to put spaces between
operators and operands.
SEMANTIC INDENTATION
▸ Just like python.
▸ Very easy to implement: the number of spaces after a
newline is recorded in the newline operator and the more
spaces, the higher the precedence of the operator.
OPERATORS
▸ = for equality
▸ ≔ := for definitions
▸ : for type annotations
▸ :: for prepend
▸ ↦ -> for lambdas
▸ => for function types
▸ ∘ <> for composition
▸ . for high precedence pipelining
▸ ⦊ |> for low precedence pipelining
▸ ; for in-line newline
▸ -- monus (truncating subtraction of naturals)
▸ // floor division, lower precedence than *
▸ ⊓ pair type constructor
▸ ⊔ either type constructor
▸ ? maybe type constructor
▸ ↑ successor
▸ ° function exponentiation
▸ ¦ |: filter
▸ ~ relation
▸ ! factorial
▸ ` flip
PATTERN MATCHING
length := [] -> 0; _ :: xs’ -> 1 + length(xs’)
define length
[] -> 0
_ :: xs’ -> 1 + length(xs’)
define length(xs)
match xs
case []
0
case _ :: xs'
1 + length(xs')
define length(xs)
with xs as _ :: xs'
1 + length(xs')
0
SYNTAX DECLARATIONS
▸ infix, infixL, infixR, prefix, postfix, interfix, alias
▸ interfix is like infix but the definition is always straight application
▸ Lexically scoped
▸ Chaining: syntax(if) ≔ prefix(13)
syntax(then) ≔ interfix(if _)
syntax(else) ≔ interfix(if _ then _)
if a then b else c
if a then b
c
if a
b
c
SPACE OPERATOR
▸ Function application syntax:
▸ Haskell, Agda, Idris, ML: a b c d
▸ Math, mainstream languages: a(b, c, d)
▸ Math notation gives a visual indication of the asymmetry between
applicand and arguments.
▸ In math and physics, space typically means multiplication.
▸ Let the user decide: the space operator can be user-defined.
▸ Space is special though; it has some built-in erasure rules.
ADT NOTATION
▸ Lambda Zero's ADT notation is a generalization of set-
builder notation where the right side of the bar can be
inlined on the left side.
⊥ ⩴ {}
𝕌 ⩴ {()}
a ⊓ b ::= {(first ∈ a, second ∈ b)}
𝔹 ⩴ {False, True}
(a)? ⩴ {Void, Just(_ ∈ a)}
a ⊔ b ⩴ {Left(_ ∈ a), Right(_ ∈ b)}
ℕ ⩴ {0, ↑(_ ∈ ℕ)}
ℤ ⩴ {+(_ ∈ ℕ), -(_ ∈ ℕ)}
ℚ ⩴ {(numerator ∈ ℤ) / (denominator ∈ ℤ)}
List(a) ⩴ {[], (_head ∈ a) ∷ (_tail ∈ List(a))}
ADT ENCODING
▸ Implemented using the Scott encoding, which makes
pattern matching trivial: you just treat the instance as a
function and pass all the case handlers as arguments.
(a, b) ~ f ↦ f(a, b)
[] ~ n ↦ p ↦ n
[a] ~ n ↦ p ↦ p(a, [])
ADT IMPLEMENTATION
▸ For each pattern in the ADT, a constructor and deconstructor is defined.
▸ The deconstructor has a name that is inaccessible to the user.
▸ constructor_n : p1 => ... => pm => A
▸ deconstructor_n : (reconstructor : p1 => ... => pm => B) => (fallback : A
=> B) => A => B
▸ If A was constructed with the constructor that corresponds to the
deconstructor, then the parameters are passed to the reconstructor,
otherwise A is passed to the fallback function.
▸ Deconstructors allow default cases and with/as syntax.
NATURAL NUMBER OPTIMIZATION
▸ Natural numbers are defined within the language by an ADT
just like all other data types.
▸ Naturals can be pattern matched and act just like any other
ADT.
▸ But the interpreter secretly represents naturals as machine
integers and dynamically converts them to their Scott
encodings on demand (whenever they are pattern matched).
ℕ ⩴ {0, ↑(_ ∈ ℕ)}
DATA STRUCTURES
▸ Only two non-trivial data structures were required to implement
everything so far:
▸ 1. Table: AA tree (simplified red-black tree)
▸ 2. Array: Random access list
▸ O(1) head
▸ O(1) tail
▸ O(1) prepend
▸ O(log(n)) lookup
MORE PURE THAN PURE
▸ Haskell is considered a pure functional language because IO is
represented as actions with the IO monad.
▸ But this IO monad is still a relic in the language corresponding to
the impure world.
▸ Lambda Zero has nothing IO-related in the language, so in some
sense it’s even more pure.
▸ The main function takes a string parameter that is mapped to
stdin and the return value is a string that gets mapped to stdout.
main(input) ≔ input
LAZY IO
▸ This restricted form of IO is surprisingly capable due to the
lazy evaluation of the IO streams.
▸ Bytes are read from stdin only when the program accesses
the corresponding element of the input string.
▸ Bytes are sent to stdout as soon as they are computed
even if the remainder of the output is not evaluated.
▸ This makes interactive IO possible.

Lambda Zero Programming Language Introduction

  • 1.
  • 2.
    LAMBDA ZERO ▸ Minimalistpure lazy functional programming language. ▸ Desugars to the untyped lambda calculus plus optimization of natural numbers and arithmetic. ▸ Python and Agda-inspired syntax. ▸ Bootstrap interpreter is under 2400 lines of C. ▸ Self-interpreting. ▸ Self-type-inferencing (Hindley-Milner with ADTs).
  • 3.
    CODE EXAMPLES n !≔ n ⦊ 0 ↦ 1; ↑ n′ ↦ n ⋅ n′ ! sort ≔ [] ↦ []; n ∷ ns ↦ sort(ns ¦ (≤ n)) ⧺ [n] ⧺ sort(ns ¦ (> n)) primes ≔ p(2 …) where p ≔ [] ↦ []; n ∷ ns ↦ n ∷ p(ns ¦ (% n ≠ 0))
  • 4.
    MOTIVATION ▸ Lambda Zerois an attempt to discover the true foundation of mathematics. ▸ If you try to fully understand anything in mathematics, you end up getting into the foundations, which generally means you end up in set theory. ▸ “The language of set theory can be used to define nearly all mathematical objects.” (Wikipedia)
  • 5.
    MOTIVATION ▸ Kuratowski's definitionof an ordered pair (the "accepted definition" according to Wikipedia): ▸ (a, b) := {{a}, {a, b}} ▸ The right is more asymmetric than the left… ▸ This seems unnatural, like a hack rather than the truth. ▸ The lambda calculus captures both sets and tuples without anything that seems like a hack.
  • 6.
    WHY NOT AGDA? ▸Agda is about 120,000 lines of code. ▸ Agda is written in Haskell… ▸ GHC (Haskell) is 140,000 lines of code (as of 2012). ▸ GHC is written in Haskell… ▸ This doesn't look like a small elegant core of mathematics.
  • 7.
    KOLMOGOROV COMPLEXITY ▸ Looselyspeaking, Kolmogorov complexity of an object is the length of the shortest program that produces the object as output. ▸ Similarly we can define the complexity of a program as the length of the shortest program that produces the same output. ▸ We can also define the complexity of a language to be the complexity of its interpreter. ▸ All interpreter implementations of the same language have the same complexity by definition.
  • 8.
    THE COMPLEXITY COMPASS ▸In designing a language, if you keep the interpreter close to the minimum complexity, you can use the complexity as a compass to point you toward the mathematical truth. ▸ If you get too far from the minimum, the compass signal can get lost in the noise of the non-mathematical complexity. ▸ Lambda Zero is written in C, which I think of as portable assembly code.
  • 9.
    LESSON LEARNED ▸ Ona couple occasions I tried to take a shortcut or do something potentially clever and useful despite what the math was suggesting. ▸ Each time, I eventually felt that I had to go back and change it. ▸ Example: Can we represent a triple in terms of pairs? Then both of these destructurings are valid, which decreases the helpfulness of type checking (amongst other issues): ▸ (a, b, c) ≔ triple ▸ (a, b) ≔ triple
  • 10.
    DISTRACTIONS ▸ Performance andI/O are distractions from this goal. ▸ Performance is not a property of a language. ▸ No language is faster or slower than any other language. ▸ Compromising the language for the sake of performance makes the language objectively worse. ▸ Examples: Haskell's seq, builtin data types. ▸ I/O does not affect Turing completeness.
  • 11.
    COMPONENTS ▸ Bootstrap interpreter:can be interpreted without knowledge of types; type annotations are ignored. ▸ ~2400 lines of C; 65KB statically linked and stripped. ▸ "Hello world" statically linked and stripped with gcc on linux is 671KB. ▸ Self interpreter: parses and saves type annotations for type checker. ▸ ~2400 lines of Lambda Zero. ▸ Self type inferencer. ▸ ~300 lines of Lambda Zero. ▸ Future: Self compiler, Lambda One (dependent types, implicit calculus of constructions, requires knowledge of types to interpret).
  • 12.
    ADVANTAGES OF LAZYEVALUATION ▸ Automatically avoids computation that is never needed. ▸ Allows simple and consistent representation of infinite data. ▸ Allows increased modularity. ▸ Example: sqrt function can be split into a digit generator and a function that truncates the digits at a desired point.
  • 13.
    ADVANTAGES OF STRICTEVALUATION ▸ Easier to implement efficiently. ▸ Easier to reason about performance. ▸ Easier to understand behavior in impure languages.
  • 14.
    LAZY EVALUATION ▸ Ifwe only care about making the ideal language (not performance etc)… ▸ And if we assume the language is purely functional… ▸ Then a lazy evaluated language is strictly better than a strictly evaluated language. ▸ Lambda Zero uses the Lazy Krivine Machine.
  • 15.
    PARSING ▸ There areno statements in the grammar; only expressions. ▸ Shift reduce parser, table-driven. The table associates a shift function and a reduce function to each operator. ▸ Shift and reduce functions can do arbitrary modifications to the stack and their arguments, but this is not often needed. ▸ Uses an algorithm very similar to Dijkstra's shunting yard algorithm but with a combined stack for operators and operands which makes it more powerful. ▸ For example, newline is defined to be an infix operator: right- associative application with some special cases.
  • 16.
    UNICODE ▸ Unicode makesit easier to write math directly in the language. ▸ Every operator has both an ASCII and a unicode representation (sometimes they are the same). ▸ You can type the ASCII representation and the editor will conceal with the unicode form once you leave that line. ▸ Lisp/Agda style lexing: only a few basic characters are considered delimiters so you have to put spaces between operators and operands.
  • 17.
    SEMANTIC INDENTATION ▸ Justlike python. ▸ Very easy to implement: the number of spaces after a newline is recorded in the newline operator and the more spaces, the higher the precedence of the operator.
  • 18.
    OPERATORS ▸ = forequality ▸ ≔ := for definitions ▸ : for type annotations ▸ :: for prepend ▸ ↦ -> for lambdas ▸ => for function types ▸ ∘ <> for composition ▸ . for high precedence pipelining ▸ ⦊ |> for low precedence pipelining ▸ ; for in-line newline ▸ -- monus (truncating subtraction of naturals) ▸ // floor division, lower precedence than * ▸ ⊓ pair type constructor ▸ ⊔ either type constructor ▸ ? maybe type constructor ▸ ↑ successor ▸ ° function exponentiation ▸ ¦ |: filter ▸ ~ relation ▸ ! factorial ▸ ` flip
  • 19.
    PATTERN MATCHING length :=[] -> 0; _ :: xs’ -> 1 + length(xs’) define length [] -> 0 _ :: xs’ -> 1 + length(xs’) define length(xs) match xs case [] 0 case _ :: xs' 1 + length(xs') define length(xs) with xs as _ :: xs' 1 + length(xs') 0
  • 20.
    SYNTAX DECLARATIONS ▸ infix,infixL, infixR, prefix, postfix, interfix, alias ▸ interfix is like infix but the definition is always straight application ▸ Lexically scoped ▸ Chaining: syntax(if) ≔ prefix(13) syntax(then) ≔ interfix(if _) syntax(else) ≔ interfix(if _ then _) if a then b else c if a then b c if a b c
  • 21.
    SPACE OPERATOR ▸ Functionapplication syntax: ▸ Haskell, Agda, Idris, ML: a b c d ▸ Math, mainstream languages: a(b, c, d) ▸ Math notation gives a visual indication of the asymmetry between applicand and arguments. ▸ In math and physics, space typically means multiplication. ▸ Let the user decide: the space operator can be user-defined. ▸ Space is special though; it has some built-in erasure rules.
  • 22.
    ADT NOTATION ▸ LambdaZero's ADT notation is a generalization of set- builder notation where the right side of the bar can be inlined on the left side. ⊥ ⩴ {} 𝕌 ⩴ {()} a ⊓ b ::= {(first ∈ a, second ∈ b)} 𝔹 ⩴ {False, True} (a)? ⩴ {Void, Just(_ ∈ a)} a ⊔ b ⩴ {Left(_ ∈ a), Right(_ ∈ b)} ℕ ⩴ {0, ↑(_ ∈ ℕ)} ℤ ⩴ {+(_ ∈ ℕ), -(_ ∈ ℕ)} ℚ ⩴ {(numerator ∈ ℤ) / (denominator ∈ ℤ)} List(a) ⩴ {[], (_head ∈ a) ∷ (_tail ∈ List(a))}
  • 23.
    ADT ENCODING ▸ Implementedusing the Scott encoding, which makes pattern matching trivial: you just treat the instance as a function and pass all the case handlers as arguments. (a, b) ~ f ↦ f(a, b) [] ~ n ↦ p ↦ n [a] ~ n ↦ p ↦ p(a, [])
  • 24.
    ADT IMPLEMENTATION ▸ Foreach pattern in the ADT, a constructor and deconstructor is defined. ▸ The deconstructor has a name that is inaccessible to the user. ▸ constructor_n : p1 => ... => pm => A ▸ deconstructor_n : (reconstructor : p1 => ... => pm => B) => (fallback : A => B) => A => B ▸ If A was constructed with the constructor that corresponds to the deconstructor, then the parameters are passed to the reconstructor, otherwise A is passed to the fallback function. ▸ Deconstructors allow default cases and with/as syntax.
  • 25.
    NATURAL NUMBER OPTIMIZATION ▸Natural numbers are defined within the language by an ADT just like all other data types. ▸ Naturals can be pattern matched and act just like any other ADT. ▸ But the interpreter secretly represents naturals as machine integers and dynamically converts them to their Scott encodings on demand (whenever they are pattern matched). ℕ ⩴ {0, ↑(_ ∈ ℕ)}
  • 26.
    DATA STRUCTURES ▸ Onlytwo non-trivial data structures were required to implement everything so far: ▸ 1. Table: AA tree (simplified red-black tree) ▸ 2. Array: Random access list ▸ O(1) head ▸ O(1) tail ▸ O(1) prepend ▸ O(log(n)) lookup
  • 27.
    MORE PURE THANPURE ▸ Haskell is considered a pure functional language because IO is represented as actions with the IO monad. ▸ But this IO monad is still a relic in the language corresponding to the impure world. ▸ Lambda Zero has nothing IO-related in the language, so in some sense it’s even more pure. ▸ The main function takes a string parameter that is mapped to stdin and the return value is a string that gets mapped to stdout. main(input) ≔ input
  • 28.
    LAZY IO ▸ Thisrestricted form of IO is surprisingly capable due to the lazy evaluation of the IO streams. ▸ Bytes are read from stdin only when the program accesses the corresponding element of the input string. ▸ Bytes are sent to stdout as soon as they are computed even if the remainder of the output is not evaluated. ▸ This makes interactive IO possible.