BitForge and TSShock

Beau Rancourt
Matt Luongo
Beau Rancourt, Matt Luongo

How we defended against two attacks on the cryptography underlying tBTC — quickly, quietly, and without drama.

TL;DR:

Recently, two attacks against the GG18 protocol and many implementations were publicly disclosed — BitForge, by the Fireblocks team, and TSShock, by Verichains.

tBTC remains safe.

We're happy to report that thanks to a responsible disclosure from Fireblocks and defensive coding by our development team, both vulnerabilities have been mitigated in tBTC since June, months before other projects across the space, and long before public disclosure. As part of this experience, we've decided to ship and maintain an alternative to Binance's tss-lib.

BitForge Disclosure

On May 10th, 2023, we received a second-hand vulnerability report from a friendly security researcher. The vulnerability report purportedly originated from the Fireblocks team, and while we reached out to them for confirmation, its authenticity was verified by Dan Boneh at A16z.

Understanding the Report

To understand how BitForge might impact tBTC, it helps to understand tBTC.

tBTC is a decentralized Bitcoin bridge that relies on randomly chosen client operators to manage bitcoin. In order for operators to do that, they need to be able to create shared bitcoin wallets, and sign bitcoin transactions. To produce a signature, that group needs to have 51-of-100 consensus.

The mathematics of creating signatures without any member learning how to sign on behalf of the whole group is complex. The scheme used in production in tBTC today is described in GG18, and the implementation the main client uses is Binance's tss-lib.

Among other building blocks, GG18 (and tss-lib) uses the Paillier cryptosystem for it's homomorphic properties[1]. When setting up the Paillier cryptosystem, each participant generates two large prime numbers (p, q) as their private key, and then N = p • q as their public key.[2]

The BitForge vulnerability revolves around what happens when a malicious signer is able to use a modulus N = small_prime_1 • small_prime_2 • ... • small_prime_16 • q, and none of the other participants check.

In tBTC, a successful exploit of the BitForge vulnerability could exfiltrate key material, leading to loss of funds. To do that, an attack would require

  • A successful man-in-the-middle attack against signer communication — unlikely, considering our hardened network layer.
  • An existing, malicious T staker with knowledge of the vulnerability, and enough T staked to be chosen for a wallet.

Immediate Mitigation

Now that we understood the vulnerability, there were two actions we could take immediately to mitigate the threat.

First, we paused wallet heartbeats. tBTC relies on periodic signatures from wallets ("heartbeats") to prove signer liveness. Because BitForge targets the signing protocol, and because tBTC did not yet support redemptions in May, disabling heartbeats meant the vulnerability couldn't be further exploited, even by a malicious signer already assigned to a BTC wallet.

Second, we worked with governance to disable new wallet creation. Because tBTC v2 launched with a beta staker restriction, we knew the chance of a malicious signer was low. Still, this move further ensured a new signer with knowledge of the vulnerability couldn't slip in to a new wallet.

Both of these measures were executed on May 11th, 2023. Without signatures or DKG, no code paths in tss-lib could be triggered on mainnet. At this point, less than 24 hours in to incident response, we had what every security engineer could hope for in an active vulnerability mitigation — time.

We considered two plans of attack.

"The Big One"

We called the first mitigation plan under consideration "The Big One". As the name implies, it included drastic measures.

First, we'd need to implement an alternative to tss-lib and GG18 that didn't suffer from the BitForge vulnerability. We had already planned to ship support for Schnorr signatures as a performance upgrade.

If you aren't familiar with Schnorr signatures, think of them as a major upgrade to the ECDSA signature scheme widely used across Bitcoin and Ethereum today. Schnorr signatures greatly simplify threshold constructions without the additional assumptions of BLS signatures... but most importantly, they're supported on Bitcoin today!

This addition would mean a major version update to the keep-core P2P client powering tBTC — a difficult thing to do quietly. Fortunately, it was already on our roadmap, so shipping it wouldn't raise suspicion of an undisclosed vulnerability.

The final step in this plan would require rotating all BTC wallets in the system: doable, but slow, requiring a community-wide coordination effort.

"Patching the Paillier"

The second mitigation plan, called "Patching the Paillier", would attempt to fix the issue in-place.

First, we would ship a client-side fix to prevent accidentally generated small-primes. While this wouldn't defend against a maliciously built client, it would avoid extremely unlikely mistakes from honestly built clients.

Then, we would ship a full mitigation against BitForge, using the "no small factor" Paillier ZK proofs found in the CGGMP21 threshold ECDSA protocol.

Upon discussion, it was clear this plan was the more conservative choice; and it turned out, we had already implemented most of the required proofs in an earlier research effort.

We began implementing the ZK proofs for a private client release.

Proving Operator Honesty

While implementing the full patch, we also set out to confirm our findings so far.

We knew that no new operator could attack tBTC... but what about existing operators? If the details of the BitForge vulnerability were leaked, could operators put funds at risk?

At the time of the disclosure, tBTC had 4 wallets, each of which had 100 members. If we could prove there were no small factors used when those wallets were generated, beyond a reasonable doubt, we could avoid rotating wallets, and be that much more confident in the safety of the system.

Each member of each wallet has a Paillier modulus. We used brute force.

The original disclosure was not specific about how the attack worked, but did write:

The origin of the vulnerability is that the parties do not check if the Paillier modulus, denoted N, has small factors (a prime of size less than 2100 is considered small) or that it is a biprime (so it is the product of exactly two primes)

The hardest-to-detect version of the attack relevant to tBTC is where an attacker uses a small factor on the order of 2100 and another on the order of 21948. The more factors the modulus has, the easier it is to factor it. The smaller any factor is, the easier it is to factor it. We prepared for the worst and hoped for the best!

First, we crafted a set of malicious keys. The python library libnum is good for this:

import libnum

bits = 100
total_bits = 2048

q = libnum.generate_prime(bits)
p = libnum.generate_prime(total_bits - bits)
N = p * q
print(p, q, N)

This creates randomly generated 2048-bit moduli with one small factor (~2^100), and one large factor (~2^1948).

Then, we benchmarked how long these take to factor using sympy.ntheory.factorint:

import time
from sympy.ntheory import factorint

start = time.time()
factors = factorint(N)
end = time.time()
print(factors, str(end - start))

All of the tests factored! Here are the stats:

N.       64
min      4016.88
q1       35906.3
median   65896.9
q3       123775
max      453262
mean     105729
stddev   107148
stderr   13393.6

By May 12th, 2023, two days into our active mitigation, we confirmed that there were no primes under 50 bits using our laptops... and prepared for a longer factoring process in the cloud.

Then, we spun up 8x48-core CPU-Optimized Digital Ocean Droplets and 1x16 core, for a total of 400 cores. factorint runs single-threaded (and is the fastest of the factoring libraries we found); each box was given one number to factor per core.

import sys
from sympy.ntheory import factorint
import time
import json

N = int(sys.argv[1], 16)

start = time.time()
factors = factorint(N)
end = time.time()

file = open(f'{sys.argv[2]}.txt', "w")
file.writelines([json.dumps(factors), "\n", str(end - start)])
file.close()

We distributed the moduli among the machines and then started factoring.

python3 factor.py <moduli1> 1 & disown
python3 factor.py <moduli2> 2 & disown
...
python3 factor.py <moduli48> 48 & disown

We double checked that we had 48 cores on each machine running at 100% utilization, then checked in periodically to make sure nothing crashed.

We ran the machines from 2023-05-15T16:01Z until 2023-05-22T19:21Z which is 7 days, 3 hours, 20 minutes or 616,800 total seconds. None of the 400 threads found any factors.

This is ~4.8 standard deviations above average and 36% longer than the longest factoring attempt we saw, which gives us confidence that we attempted factoring for long enough.

In order for a small-factor to be present, the attacker needs to have:

  • Discovered the attack well before the security researchers.
  • Modified the client before the wallet was formed. The most recent wallet was registered on 2023-04-19.
  • Somehow avoided having their small factor detected by the above algorithm.

With all of this together, we knew that none of the keys were at risk from this potential attack.

A Full Patch

The CGGMP21 paper also uses the Paillier cryptosystem. The difference is that during key generation, the protocol makes each participant prove in zero knowledge that

  • Their modulus contains no small factors. p66.
  • Their modulus is a Paillier-Blum modulus. p36.
  • Their Ring-Pedersen parameters are well-formed. p37.

By May 16th, 2023, we had a prototype of all proofs, and began internal review.

On June 5th, we created a private fork and then quietly built + shipped the client binary using a private, custom version of tss-lib to perform the above proofs.

Because we were already planning a major client release to enable a new feature, sweeping, we were able to quietly include the patch — without disclosing the vulnerability or tipping off any would-be hackers.

On June 6th, Trail of Bits began verifying our patch, and on June 7th, we had remediated all findings. [3]

TSShock Disclosure

On July 14th, we received another report of a critical issue, this time from the Verichains team via ImmuneFi. Most concerning, the team included a proof-of-concept that they claimed demonstrated key exfiltration from the keep-core client.

As we dug in, however, we realized that the Verichains team's PoC was based on an older version of the client, v2.0.0-m3 — not the private client release beta stakers had been using.

By July 18th, we confirmed that the privately patched client from the BitForge mitigation also made tBTC safe against the new vulnerability, dubbed TSShock, and that the attack wasn't possible on mainnet.

How? While working on the BitForge mitigation, we discovered the hashing collision issue behind TSShock that was already publicly fixed, and pulled it into the privately patched client. We appreciated the disclosure from Verichains, and are pleased with the outcome.

Takeaways

Responsible disclosure is difficult

As always, one of the most difficult parts of responsible disclosure is often finding the right person to talk to.

Here, we were lucky we got the BitForge disclosure so early — multiple intermediaries were involved before we were in touch with the Fireblocks team.

It's not clear how we could've done better here... the relevant repositories have a published disclosure process, we have an ImmuneFi bounty program, we respond quickly to emails via security@threshold.network, and we follow the security.txt standard.

On the other hand, we did have a few issues verifying that the issue suffered on by ThorChain on August 16th was, in fact, TSShock — and not a new vulnerability. I wasn't able to have that confirmed, despite reaching out on Twitter, via shared Telegram contacts, etc.

We need to find a better way across the industry to coordinate, especially between projects with critical shared dependencies like tss-lib.

Where there's smoke, there's fire.

The Binance tss-lib has had a number of issues over the years. It's not updated frequently, the test coverage isn't great, and most code improvements over the past few years have been in response to disclosures.

We know, because we're some of the more frequent contributors ourselves.

After BitForge and TSShock, we're no longer confident in the upstream tss-lib repository, including the proper mitigation of these issues.

Instead, we recommend all tss-lib users to consider using our fork, which is now open source.

Future Work

On top of the issues with tss-lib... the GG18 family of threshold protocols has also had a number of vulnerabilities over the years. And while all cryptography needs to serve the test of time, where there's smoke, there's often fire — and if we can avoid risk, we should.

Many projects need threshold ECDSA, and for those, we recommend upgrading to CGGMP21.

Luckily, we have a stronger alternative. Since November 14th, 2021, Bitcoin activated a new signature scheme — Schnorr. Since April, our  plan has been to migrate tBTC to Schnorr signatures, using ROAST/FROST and GJKR (RFC 10)  to avoid the complexities of threshold ECDSA implementations. The team has agreed to recommend the beta staker program be left in place until the migration to Schnorr takes place.

Expect to hear more about this effort later this year!

Acknowledgements

Software security is a big job, and mitigating these vulnerabilities relied on a number of people.

We'd like to thank...

  • Ari and the team at Fireblocks for their findings.
  • MacLane Wilkison, Tux, and Dan Boneh for their help verifying the disclosure.
  • Promethea Rashke for her work on porting the proofs from CGGMP21 to tss-lib, Beau Rancourt for his work factoring existing Paillier moduli, and Jakub Nowakowski, Piotr Dyraga, and Lukasz Zimnoc for their work coordinating our response and reviewing all code.
  • Tjaden Hess and Trail of Bits for their assistance and feedback verifying our mitigation.
  • The Threshold DAO for their incredible responsiveness.

Timeline

For those interested, here's a more detailed timeline. Note that all times are UTC.

  • 2023-05-10 9:19 PM: Receive the original vulnerability disclosure.
  • 2023-05-10 9:53 PM: Distribute work: Kuba fetches the existing Paillier moduli so we can check start factoring them. Promethea begins patching tss-lib. Beau begins writing factoring code and benchmarks. Piotr prepares a transaction for tBTC governance to pause new wallet creation.
  • 2023-05-11 12:06 AM: Kuba extracts and shares the existing 400 moduli.
  • 2023-05-11 1:08 PM: Piotr hands the transaction to pause new wallet creation over to governance.
  • 2023-05-11 7:45 PM: Promethea implements the rough draft of the necessary proofs in a private fork of tss-lib.
  • 2023-05-12 4:35 PM: Beau successfully generates a properly sized biprime and starts trying to factor it.
  • 2023-05-12 7:15 PM: Beau uses easier-to-factor bi-primes to benchmark different factoring libraries and get a handle on how performance scales. primefac performs the best.
  • 2023-05-15 12:12 PM: tBTC governance disables new wallet generation.
  • 2023-05-15 4:36 PM: Beau starts factoring all 400 moduli using CPU optimized Digital Ocean droplets.
  • 2023-05-17 2:49 PM: Promethea adjusts the tss-lib patch to be fully backwards compatible with the keep clients.
  • 2023-05-19 1:58 PM: Promethea identifies potential hash collisions and adds in a fix.
  • 2023-05-22 7:21 PM: Beau shuts down the factoring machines. No factors found.
  • 2023-05-25 5:01 PM: Piotr schedules a Trail of Bits audit for the changes and creates a new security advisory fork of tss-lib to facilitate the audit.
  • 2023-06-05 2:25 PM: Piotr quietly inserts the changes into keep-core release v2.0.0-m3.
  • 2023-06-06 9:42 PM: tjade273 (Trail of Bits Auditor) finishes their first pass of the tss-lib security patch.
  • 2023-06-07 2:51 PM: Promethea identifies that Ntilde, (another Paillier modulus) is also vulnerable and that the fix also needs to be applied to the resharing protocol (we don't use resharing; we fixed it anyway).
  • 2023-06-13 2:34 PM: Promethea finishes the resharing and Ntilde fix.
  • 2023-07-14 10:11 AM: Discover via ImmuneFi that Verichains found an exploit involving hash collisions that Promethea had already fixed.
  • 2023-07-18 3:04 PM: Kuba confirms the Verichains PoC doesn't impact the privately patched keep-core client.
  • 2023-07-20 11:34 AM: Ship keep-core v2.0.0-m4 which includes the ntilde and resharing fix.
  • 2023-06-29 10:49 PM: tjade273 finishes auditing the resharing and Ntilde fix.
  • 2023-08-16 1:46 PM: Another project tweets about a new vulnerability (it's the same one). We pause releasing this disclosure to investigate.

[1]: A Homomorphic encryption scheme is one where you can perform math on the encrypted text and properly be able to decrypt it later. For Paillier, decrypt(encrypt(message1) • encrypt(message2)) == message1 + message2.

[2]: It is computationally easy to verify that N = p • q but difficult to figure out what p and q are given just N. See Integer Factorization. The largest such number factored was RSA-250, a number with 250 decimal digits, in 2020. The total computation time was roughly 2700 core-years of computing.

[3]: The pull review comments were lost when the security disclosure was merged. We're working with GitHub to restore them. In the meantime, here is a screenshot:

And here is the Remediation Summary put together by Trail of Bits.

Appendix

Malicious Ns for Benchmarking

https://gist.github.com/beaurancourt/c26f437baa62ebda45d069998273b053

Benchmark Results

Measured in seconds, in order of index.

Vulnerability Benchmark Timing Results
Vulnerability Benchmark Timing Results. GitHub Gist: instantly share code, notes, and snippets.