Bitcoin Workings Explained, For Real: Part Two

August 13, 2017

Blockchain

The previous part of the series can be found here.

In the last post we designed a fake-proof way to create transactions with the ability to prove their authorship. We now have a way to check that every transaction in our transaction book, containing the complete transaction history, is real and created by the owner of the sending address.

However, what if someone deletes a transaction in the book? Remember, there’s no central place to get the book from, it’s all distributed. We get copies from other people using the book, and vice versa. The person sending the book to you has opportunity to modify whatever’s in it before they send it to you. They could, for example, delete an old transaction where they paid someone money. If the money was not sent further, they would regain full ownership of that money according to that version of the book.

We need a way to quickly check whether the book has been tampered with. Fortunately, computer science has tools for that too. Introducing…

Exhibit B: Hash functions

Hash functions allow you to take any text you want and produce its unique “shortcut”, or “fingerprint”, called a hash. For example, one of the most common hash algorithms is called SHA-256, and the SHA-256 hash of the previous sentence is

483fe96f9d66da9096b8f606614e13ad8f81b4d4e5f550cf48a03bdff0c31287

You can check here.

Hash functions have a couple of very useful properties:

  • A hash will always have the same length, independent of the original text’s size.
  • For the same original text, a hash function will always produce the same hash.
  • It’s chaotic — changing even one letter in the original text will change the hash dramatically. Feel free to experiment.
  • A hash looks completely random. There are no patterns in it, and therefore it’s not possible to figure out any hint about the original text based on the hash. It’s also not possible to predict what a text’s hash will look like, except for just computing it and seeing for yourself.

One of the most widespread uses of hashes is to protect your passwords. Imagine you send over your email address and password to the website to register. The website has to remember this information to create an account for you, so it saves it in a database. From now on, whenever you send your password, it will check whether it matches the one stored in the database.

Sounds okay, right? Except that any authorized persons, (that will include many members of the website’s development team), and many unauthorized ones if they manage to find a way in, can just open it and read potentially thousands of emails and associated passwords. Everyone is also guilty of reusing passwords at some point, so this gives them access to thousands of emails, Facebooks, Linkedins… It’s a scary prospect.

Every website wanting to provide any kind of security to its users will not put your password in the database. It will instead compute a hash of your password, and save that. Now whenever you login, it will hash your password again, and check whether the hash matches the one in the database. It works just as well, but now the database contains hashes and there’s no way to tell what the original password was other that trying all possible combinations of letters and numbers.

This is why websites can’t remind you your passwords if you forget them — they don’t know your password themselves! If you do see your password being reminded, you should instantly question the security of anything you do on that site. Same if the site requires you to not use some characters in the password — it wouldn’t matter for the hashing, so it’s very likely they just store the passwords unhashed. Also, hackers will try to crack the hashes anyway, so longer, complex, unusual passwords are more secure because it takes more time to get to them if you’re trying all possible passwords randomly and seeing if the hashes match your hash. Lastly, duplication of passwords is a big open door inviting hackers which I personally fell victim to already, so do use long random passwords, save them in a password manager like LastPass and enable two-factor authentication everywhere you can.

Another great use for hashes is verifying data integrity. You can hash anything, even 4K videos — after all, they are ultimately just a very long string of ones and zeroes. If even a single one turns into zero, or vice versa, the hash will look completely different, making spotting changes easy. Sometimes, websites offering programs for download will also publish the program’s hash so you can double check that the file you have on your computer is really the one offered by the website, without any sneaky additional code injected.

Back to cryptocurrency. As you might remember, we wanted a way to quickly check whether someone tampered with our transaction history. Hashes are perfect for this — so how about we hash the whole thing? Turns out it doesn’t help us that much. New transactions are added to our transaction book all the time, and each time it happens, the hash would change. Nothing is stopping someone from modifying something in the history while also adding a new transaction. In this case we expect the hash to change because a new transaction was added, so the tampering would go largely unnoticed. If only there was a way to secure the history piece by piece…

The Blockchain emerges

We can solve this by adding transactions to the history in blocks instead of one by one. For this, we can no longer add our transactions by ourselves. We need to gather transactions from others to put them inside our block which we want to add to the history. In effect individual users don’t need to concern themselves with the process of adding transactions to the history — they just share their transactions to the network, and a group of volunteers retrieves them and arranges them into blocks to be added.

Since our transaction history is now divided into blocks, we can now hash each block separately. This allows us to add new transactions without changing the hashes of the existing history. What’s more, if we put the hash of the previous block inside each new block, we can make the new block dependent on its ancestor. If, for example, someone deletes a transaction from a block in history, the block will be different and its hash will change. To replace the original block, they will need to put this hash in the following block, which in turn will change its hash. This change will ripple all the way to the newest block which has nothing attached to it yet. In this case, the tampering is very clear — some blocks have their hashes changed and we can tell exactly which block has been tampered with first by investigating where in the chain the change first appeared. We can then confidently discard this history as fake.

So instead of this:

Plain transaction history

We now have this:

Blockchain

That’s where the name “blockchain” comes from — the idea of blocks forming one secured chain thanks to use of hashes. Using a blockchain, we can preserve the integrity of old entries while reliably adding new ones.

Integrity is not all

We now have a pretty solid system — cryptographically signed transactions, and a chain of hashed blocks to make sure the history stays as it was. However, so far we’ve been quietly ignoring some issues arising from the fact everything is decentralized and there is no decisionmaker to resolve dilemmas. If our digital banking system gains traction, there will be lots of people wanting to add blocks with new transactions simultaneously. It’s very likely we’ll encounter a situation where two people managed to attach two different blocks to the same ancestor instead of one after the other, because the information that a new block has been attached didn’t reach the other person in time.

Let’s say one of them put your transaction into their block, but not the other. Is the transaction confirmed, then? Do you still have the money or not? Which block should be accepted as the official continuation of the chain? There’s no way to tell. The problem is even worse if more blocks get attached to both of those branches and we get even more splits.

To mitigate this, we can lay down a rule that the longest chain in the history is considered the correct one. That is, if we have a split situation, the branch that gets more blocks added to it should be considered the true history. But even though it’s a good guideline, people wanting to harm the system or just plain trolls can still add blocks to alternate branches and try to overtake the longest chain, aiming for a situation when after 100 blocks there’s suddenly another branch that’s 101 blocks long and the previously accepted 100 blocks are no longer true history. These kinds of shenaningans have no place in a reliable banking system and would severely undermine trust.

One more thing we need for our system to work is a reliable way to create one history that everyone agrees on, and discourage attempts to play the system. This is what Bitcoin mining is all about, and that’s the topic for Part Three.

The next part of the series can be found here.

Comments

comments powered by Disqus