Recently, the creator of the bitcoin-based smart contract language sCrypt, Xiaohui Liu, published an article titled "Turing Machines on Bitcoin" where he builds a Turing machine that checks for balanced parentheses. It was well-received within the Script developer community.
When his article was shared outside this community with people who are unfamiliar with Liu's previous work, like on Hacker News, it was mostly misunderstood: common misconceptions were that bitcoin was merely used as a database or that some loop unrolling present in the sCrypt code would constitute the proof. On the other hand, there were also people who overestimate the implications of the proof (in the notion that bitcoin competes with cloud computing).
The purpose of this article is to give some background information for understanding Liu's proof. The article is specifically targeted to people which are not familiar with the preliminary work that went into the proof. Mathematicians and computer scientists may forgive me that a simple explanation is preferred to a precise use of terminology here.
The what and the why
Upfront, it should be clarified that when it says that "bitcoin is Turing complete", this is a simplified way of saying that bitcoin's contract system is Turing complete. Similarly, Ethereum was introduced as a Turing complete blockchain which means that the contract system is Turing complete.
For non-programmers, arguing about whether something is Turing complete or not might appear to be just some theoretical discussion with no practical implications. To some extend that is correct, but in the context of bitcoin contracts the implications are profound; just look at what Ethereum is doing.
From a historical perspective, the discovery that bitcoin's contract system is Turing complete is very interesting: A large percentage of cryptocurrencies was created with the goal of "fixing" perceived issues with the bitcoin protocol. One of the most, if not the most, well-known perceived issues being the lack of Turing completeness as also stated in Ethereum's whitepaper.
Whoever has studied bitcoin's way of moving funds is aware of its UTXO model. It works quite different from the account model where you have a balance and incoming/outgoing transactions. Instead, the UTXO model tracks coins, the coins are moved via transactions: Each transaction contains a list of references to to-be-spent coins (inputs), it merges the value of all these coins and specifies a list of destinations for new coins (outputs).
For simplicity in explanation, we assume only transactions with exactly one input and one output in the following.
Users of bitcoin wallets are accustomed to sending coins to addresses but, under the hood, it gets converted by the wallet software to a small program called Script (the template for that small program is called P2PKH). Script is bitcoin's native and low-level way of writing contracts that was designed even before the genesis block. A script contract describes how the coin is to be spend. The fact that bitcoin has script and the protocol doesn't actually know about addresses might be surprising to many since - for most of its lifetime - more advanced scripts haven't really been in production due to artificial limits in the software.
To spend or not to spend, that is the question
"The script is actually a predicate", Satoshi wrote. A predicate in mathematics is basically a function that takes an input and returns either "yes" or "no" (true/false). One of the simplest examples of a predicate is one that determines whether a natural number is even: As input it takes a natural number and it returns "yes" if the remainder of division with two is zero, "no" otherwise.
A script predicate is concerned about the question of whether a coin is allowed to be spent or not. Each output (coin) in a transaction has, besides its value, an associated script predicate that determines for that and only that coin whether it may be spend. The script is run by bitcoin miners after they checked that the to-be-spent coin has not yet been spent (a coin can only be spent once, obviously). A simple analogy is to think of the script predicate as a lock. The lock can be unlocked using a key which is provided by the input of the spending transaction.
To understand the later proof, we treat the script predicate as having two parameters and not just one: The first being the spending transaction and the second a stack of arguments provided by the input. The latter one corresponds to the key in the analogy from above. The stack of arguments is set up by another script which resides in the spending transaction's input. Albeit also being called script, it's not a predicate but rather some instruction that defines the stack.
Coming back to the most used template for script predicates, a P2PKH predicate returns "yes" if and only if the following conditions are met: The top stack item is a valid public key, the second to top stack item is a valid signature for given public key and transaction, and the hash of the public key matches the hash hardcoded in the predicate (the hardcoded hash of the public key is essentially the address base58-decoded and without the checksum).
Note that while a predicate still computes values internally, it is different from a calculator where you input some numbers and it outputs the result of the calculation: In bitcoin's predicate system, only the fact that some input to a predicate is valid is recorded. The implications of the predicate system are that one cannot offload computation to bitcoin miners, i.e. bitcoin script does not replace a cloud computer.
Do you speak my language?
It is, however, wrong to assume that just because no number or string is output by a script predicate, assertions about how computationally powerful the contracts are cannot be made. Some result is in fact recorded which is that the predicate returned "yes" for a given input.
In theoretical computer science, what is done to argue about the computational powerfulness of some machine that accepts some input (in this case the predicate is a machine) is to look at the set of inputs that the machine accepts. If an encoding for the inputs is provided, the set of inputs is called a formal language. What's important is that these sets can be classified into different categories based on some structural properties. The most important classification scheme is the Chomsky hierarchy.
In the hierarchy, the formal languages are nested: The set of less powerful languages is contained in the set of more powerful languages. To provide an example: For any regular language, the least powerful one in the hierarchy, a plain old regular expression exists that accepts that language. Since the set of regular languages is nested in the set of turing-recognizable (or recursively enumerable) languages it can be concluded that any regular expression can be matched using a Turing complete machine. Also, note that matching a regular expression is also a simple yes-no question, yet it's possible to argue about how computationally powerful (or not) regular expressions are. Try to match balanced parentheses using (plain old) regular expressions!
This leads to the question where bitcoin script fits in. An important property of script predicates is that they are decidable, that is for any input the answer is always a definite "yes" or a definite "no" which makes sense because uncertainty could split the network. It also implies that a script predicate always terminates. The proof that a script predicate is decidable is very simple to follow: A single script predicate in bitcoin has an upper bound on execution since there are no loops, there is a finite count of opcodes in a script and every opcode is guaranteed to halt.
The languages that corresponds to Turing completeness, called turing-recognizable languages, go beyond decidable questions. Therefore, a script predicate is not Turing complete. The subset of turing-recognizable languages called turing-decidable languages (not part of Chomsky) is only concerned about the decidable questions. Since script predicates are always decidable that puts them at most in the latter category.
One important property that's missing from script in order to qualify as Turing complete is that it must also be able to answer semi-decidable questions. These are questions for which only a definite "yes" can be given but no "no". This might not sound intuitive at first but consider the following example from Conway's Game of Life: Given an unbounded initial board (or configuration), can the question of whether the board will ever be in a stable state, i.e. the next generation looks the same as the previous one, be answered for any given input? The only way to answer that question for any input is to actually run the game; if it moves into a stable state, "yes" is returned. But if it keeps generating and generating, it might still end up in a stable state in the future. It might as well keep running to all eternity. Alan Turing proved in 1936 that no universal algorithm exists that can give a definite "no" for any input (note: this is the halting problem).
Back to the claim
Now, that we came to the conclusion that a single script predicate is not Turing complete, the question arises how Turing completeness can still be achieved in bitcoin's contract system. That was the original claim.
Note how "script predicate" has been prefixed with the word "single". A contract for any turing-recognizable language can still be constructed with multiple script predicates. In such setup, the actual spending of coins is implemented in terms of multiple transactions where in the final transaction the possessor of the coin changes.
Proving that something is at least as powerful as something else is more involved than proving the opposite. For a script predicate it was easy to disprove that it's Turing complete because it cannot implement semi-decidable questions as explained with the Game of Life counterexample. Proving that multiple script predicates make the contract system Turing complete means that the validity must be proven for all cases. One way to construct such proof is to simulate the something else in something.
For proving Turing completeness in our case, this means simulating some Turing complete machine in bitcoin's contract system. The most well-known one is a Turing machine which is used for the proof. In order to understand the proof, let us first recap on the building blocks of a Turing machine (simplified):
- An unbounded tape with cells for storage (unbounded should not be confused with infinite)
- A head that points to the current cell; it can be moved one cell to the left or one cell to the right or not at all in each step
- A finite number of states and a register that holds the current state; one or more accepting states exist for which the Turing machine halts
- A finite number of transitions (transition table) that defines what action the Turing machine performs given the current state and cell value: which state to go into next, which value to write into the current cell and which direction the head should be moved to
The Turing machine repeatedly, step by step, performs a lookup in the transition table for the current state and cell value, followed by applying the transition. If an accepting state is reached, the machine halts. The whole state consisting of the tape, the head's position and the current state of the Turing machine is called the configuration.
The bitcoin machine in theory
Liu's Turing machine works by recording each and every step in a separate transaction, thereby forming a chain of transactions. In each transaction, the current configuration is included (read: serialized), that is the tape, the position of the head and the current state. What a program (e.g. wallet software) does to spend a coin locked by a Turing machine is that it computes the next configuration, builds the transaction with the next configuration included and moves the coin there; this is repeated until an accepting state is reached when the program can move the coin anywhere.
One key element yet missing is that nothing forces the program to actually follow this procedure since it could just freely spend the coin to wherever it wants to. This gap is filled with script predicates which verify that the next transaction does indeed include the next valid configuration. This is possible since, as we have seen earlier, one input to the script predicate is the spending transaction. On top of that, a script predicate has to ensure that the spending transaction inherits the script predicate itself so that the next output performs the same check. Since the script predicate has access to the spending transaction and, therefore, the next script predicate this is doable. Essentially, we end up with some kind of recursive enforcement of execution or self-modifying code.
It might also become clear now why it is important to mention that the tape is unbounded, i.e. finite at any step but can grow on demand. This makes the question of whether one Turing machine configuration is the direct next one from another configuration decidable. An infinite tape would make that question semi-decidable since only a definite "no" can be given when comparing the tapes (halt when one cell is different) but no "yes" because it's impossible to check two infinite tapes for equality in finite time. If the tape was infinite, a script predicate couldn't answer the question since it's decidable - as shown before.
The actual implementation
Developers who have used bitcoin script before might have wondered why it was assumed all the time that a script predicate has the spending transaction as an input. The spending transaction might surely be passed to the script interpreter such that it can perform the signature checking operations like
OP_CHECKSIG (and the deprecated
OP_CHECKSEQUENCEVERIFY) but in script itself there is no way to have access to the spending transaction. After all, there was even a proposal to add an opcode that adds requirements on the spending transaction's outputs.
However, it turns out that it was always possible with the original bitcoin opcodes using a technique called OP_PUSH_TX. The "OP_PUSH_TX" is purposefully not formatted in monospace because it should not give the impression that it's an opcode. Personally, I think it's a pretty bad name because neither is it an opcode nor does it push the spending transaction to the stack. It's essentially a macro that requires the spending transaction (for the experts: the sighash preimage) to reside on the top of the stack and it checks that it's indeed the spending transaction. Liu has an article that explains the technique in depth.
With that being clarified, only the script predicate is left to be written. Liu published commented code in the higher-level language sCrypt, to summarise what it does:
- First, it deserialized the current configuration of the Turing machine
- Second, it looks up the suitable transition for the current configuration
- Third, it applies the suitable transition
- Fourth, it verifies that the spending transaction includes exactly this configuration as well as the currently executing script code itself
The last point does not apply if the next configuration would be in an accepting state (the machine halts). In that case the coin may be spent anywhere.
We can now see how this Turing machine deals with semi-decidable question which a single script predicate could not deal with. The Turing machine runs as long as somebody's willing to pay for the transaction fees. It might continue running and never ever end up in an accepting state but it never clogs the network because the Turing machine is processed transaction-by-transaction. Of course, realistically the program that spends the contract would check offchain before that the Turing machine does indeed halt for a maximum limit on fee (this can be compared to Ethereum's gas).
Furthermore, we can also see that the program that spends the contract must be at least as computationally powerful as the contract system itself. In other words, the program must be Turing complete as well. A misconception is that this dependency makes the contract system not Turing complete. Remember, the Turing machine described by Liu allows to construct a contract for any turing-recognizable language that only releases the coin if and only if a valid word of that language is provided. Therefore, the contract system is by definition Turing complete.
Some final and side notes:
- The Turing machine presented by Liu only assists in proving that bitcoin's contract system is Turing complete. In almost all practical cases, you'll find that there is a much simpler and more compact solution for your problem.
- The Turing machine works only within the bounds of bitcoin. The tape, for example, can not exceed the size of 4GiB (which is plenty) in the current transaction format. Strictly speaking, your computer is also only Turing complete within certain bounds (RAM and storage). But with that perspective nothing in this universe is Turing complete.
- OP_PUSH_TX relies on strong collision resistance of double SHA256.
- The example Turing machine in Liu's proof checks for balanced parentheses. This is actually a deterministic context-free problem, so a Turing machine is overkill.
- The way of writing these Turing complete contracts in script or sCrypt require a bit of rethinking as they work with recursively putting enforcements on the spending transaction. In the future there might be an even higher-level language than sCrypt or some kind of framework that simplifies the development process.