This post is answer to Jesper Cockx's article “1001 Representations of Syntax with Binding”, so you can read it first (but this is not required).
Note: I will be biased towards practical programs (compilers and interpreters), as opposed to developing metatheory.
Also note that I'm not expert. I have little experience with de Bruijn indices and similar things.
Let me add some methods and comparison criteria to Jesper's article.
First of all, let me introduce one method not mentioned in the Jesper's article: randomly generated globally unique identifiers for all bound variables! I didn't find this method anywhere, so it is possible it has some disadvantages I am not aware of.
I will discuss pros and cons of this method later, but first let me share some criteria not mentioned in the article.
First, ability to perform capture-avoiding substitution for lambda-terms in more or less sane way. (Yes, I'm aware this criteria is not so clearly formulated.) Naive approach (i. e. when you just use strings for variables) obviously fails to confirm to this criteria. On the other hand, de Bruijn indices do meet this criteria. In fact, nearly all existing methods meet this criteria (this is why they are created!). This criteria is so obvious that it is possibly not worth to be mentioned. Yet let me mention it anyway.
Next criteria: any variable is always encoded same. Good example of method, which fails to satisfy this criteria: de Bruijn indices. Let me explain. Consider this term:
λ x. λ y. λ z. (x y)
Let's assume for some reason we want to move subterm x y
out of λ z. (x y)
term (to get λ x. λ y. (x y)
). Then we cannot just move the term as is. Because encoding of x y
depends on where the term is located. So we need to change the term's representation. So, de Bruijn indices don't satisfy this criteria.
On the other hand, both naive encoding and unique identifiers do satisfy this criteria.
At this point some de Bruijn indices fan may say: “Wait! You just use de Bruijn indices wrong way! You should not try to change subterms located deeply inside other terms. You should use locally nameless representation (from “1001 Representations of Syntax with Binding”). Then when you want to mess with some deeply located subterm, you should first open your big term using openT
from “1001 Representations of Syntax with Binding”, possibly multiple times, and then change that deeply located subterm needed way.
Well, yes, I agree. If we call openT
repeatedly, then indeed this de Bruijn indices disadvantage disappears. But this means that every time we want to modify something inside a term, we need to open it!
Let's assume we are writing optimizing compiler. Job of optimizing compiler is to modify some small parts located inside of AST. With de Bruijn indices every time we want to change some small part of AST we will need first to call openT
repeatedly, reach needed part, change it, and then closeT
many times again. This is very artificial. And very slow. Our compiler will simply be too slow.
So, I still think that “any variable is always encoded same” is important property and that de Bruijn indices violate this property.
The other advantage of “any variable is always encoded same” is simple and natural equality function for subterms. Consider this term:
λ x. λ y. ((λ z. (x y)) (x y))
This term contains subterm x y
two times, yet they are encoded differently with de Bruijn indices. And moreover, we simply cannot write function, which will compare them for alpha-equivalence based on terms alone. Because to compare them we must consider not only terms themselves, but also their location! So, with de Bruijn indices we either need some complicated alpha equality function, which deals with locations, either we again need to call openT
many times! Let's again consider optimizer compiler use case. In optimizing compiler we often need to compare some subterms (located deeply inside of bigger term) to decide whether we need to apply some optimization. So, merely to determine whether we need an optimization, we need to call openT
many times! And if we call openT
N
times and our big term has size M
, then, as well as I understand, we need O(NM)
operations for openT
alone!
Next criteria: no need to came up with artificial numbering of variables. Consider this Rust source:
fn f(x: i32, y: i32) {
let z = 0;
let [t, u] = [0, 0];
match foo {
None => {},
Some(v) => println!(x), // <-- how to encode "x" here?
}
}
How to encode x
in println!(x)
with de Bruijn indices? To do this we need to specify how we assign order to all this bindings. Bindings can be introduced by various ways. In this example you can see these ways:
– As function argument
– As normal let
binding
– As destructive let
binding (let [t, u]
)
– As binding in match
(Some(v) =>
)
And we need to came up with methods for encoding all these using de Bruijn indices! So de Bruijn indices fail to satisfy this criteria. But naive approach and unique identifiers do satisfy it!
Next: ability to perform trivial copy of terms without restrictions. Programming languages, such as Haskell and Agda, have a feature: they allow us to copy values very easily. Implicit copying of values is so easy that we did not even think about it. Yet unrestricted copy of terms ruin one of methods, namely unique identifiers. Because if we copy terms with lambdas inside, then our unique identifiers become not so unique anymore. You may say: “So what? What problems this will cause?” Let me describe. Consider this term:
(λ y. y y) (λ x. λ z. x z)
Let's assume we use unique IDs. We will show these unique IDs in square brackets:
(λ y[2]. y[2] y[2]) (λ x[1]. λ z[3]. x[1] z[3])
Now let's do one beta reduction. Let's assume that our beta reduction algorithm is buggy: it blindly performs naive copy of terms. In this case we will get this:
(λ x[1]. λ z[3]. x[1] z[3]) (λ x[1]. λ z[3]. x[1] z[3])
As you can see our unique IDs are not so unique anymore: we have two different x[1]
variables. But what is wrong with this? Well, let's continue our beta reductions to understand where the problem lies. So, let's do another beta reduction:
λ z[3]. (λ x[1]. λ z[3]. x[1] z*[3]) z[3]
Now, ask yourself: what means that z*[3]
? (I marked it with *
to easily refer to it.) It is located inside two lambda expressions, which both bind z[3]
. Oops.
So, this means that if we use unique IDs, then we cannot copy our term using trivial copy (which is implicit in Haskell, Agda, etc). Instead we should use “smart” user-defined copy. I think this is one of the reasons why unique IDs are not popular among functional programmers. Yet there is no any problems in Rust, because Rust has support for affine types! Affine types are types, which are not copied implicitly! So, to implement this in Rust, we simply make our term non-Copy
type and implement user-defined clone
.
Update 16 May 2024. If we use de Bruijn indices, we have some problems with copy, too: if a term is located inside a bigger term, then we cannot simply copy it, we sometimes need to do openT
, as discussed above. Yet, if this is a standalone term, then we can copy it without problems, as opposed to unique IDs method.
Next criteria: no need for external method for generating IDs. “Unique IDs” method fails to satisfy this criteria: you need some method for generating IDs. I. e. counter or random number generator. In both cases you cannot generate IDs from pure computation in pure functional language, such as Haskell or Agda. You need to live inside of some kind of random number generator monad. I think this is another reason why functional programmers don't like unique IDs. Yet this is not problem in imperative languages, such as Rust.
Next criteria: basic operations have no more than linear complexity. As I mentioned above, de Bruijn indices fail to satisfy this criteria: if want to modify (or even inspect) subterm, located deeply inside another term, you need to make O(NM)
operations.
Now let me show you my comparison table.
“LoNa” – locally nameless de Bruijn indices
“Unique” – unique IDs
|
LoNa |
Unique |
ability to perform capture-avoiding substitution for lambda-terms in more or less sane way |
Yes |
Yes |
any variable is always encoded same |
No |
Yes |
no need to came up with artificial numbering of variables |
No |
Yes |
ability to perform trivial copy of terms without restrictions |
Yes |
No |
no need for external method for generating IDs |
Yes |
No |
basic operations have no more than linear complexity |
No |
Yes |
Okay, what we should choose? Well, as you probably guessed, I strongly believe one should always choose unique IDs, if they are creating practical program (compiler or interpreter, etc). No matter what is your implementation language (Rust, Haskell, Agda, etc), you should always choose unique IDs for practical programs. (Yes, unique IDs probably are not so good if you are developing metatheory.)
Why? Because unique IDs are easy to understand AND simple to implement AND fast. I described above, why I think so.
Unique IDs allow you to easily move subterm inside of larger term without need to re-encode it. Unique IDs allow you to inspect or modify subterm inside of larger term without need to repeatedly call openT
(which is both annoying AND very slow, O(NM)
slow). And unique IDs are easier to understand for humans than these strange de Bruijn indices.
Yes, in languages such as Haskell and Agda you need to be careful with copies, and you need to be inside of random number generator monad (or counter monad). So what? I think this is minor point. And you don't have these problems with monads and copies, if your implementation language is Rust.
Update 16 May 2024: Jesper's answer: https://agda.club/notice/AhvAH3SYkaU7vboRoO
This is my first post in this blog. Subscribe to this blog using button in the bottom of https://safinaskar.writeas.com/ or at Mastodon ( https://types.pl/@safinaskar ). I plan to write about formal methods, Rust and Linux.
You can reach me via email safinaskar@gmail.com
Discuss at:
https://lobste.rs/s/lbzg6m/this_is_why_you_should_never_use_de_bruijn
https://types.pl/@safinaskar/112442405322545975 (Mastodon)
https://www.reddit.com/r/rust/comments/1cs8iq3/this_is_why_you_should_never_use_de_bruijn/