Hardware wallet signing speeds and limitations

Introduction

As a member of the technical concierge team at Unchained, each day I work with clients to increase their understanding of bitcoin, multisignature wallets, and holding their own keys securely. In particular, I’ve taken on the role of helping clients think about UTXO management strategies, a more advanced yet important aspect of self custody. In 2022, I wrote a trilogy of articles to explain what UTXOs are, as well as why managing them can have a significant effect on someone’s future transaction costs and privacy.

However, after researching and writing these articles, I still had some unsatisfied curiosities. Specifically, I wanted to know more about the abilities of the different hardware wallets supported by Unchained, in relation to signing large transactions. Was there a definite number of UTXOs which would cause a device to fail signing? How do the various hardware wallet models compare in terms of limitations and signing speeds? To better serve our clients, I was interested in being able to predict the outcome of a signing attempt ahead of time, and finding answers to these questions was crucial to that goal.

While working with clients I noticed that sometimes a hardware wallet would be able to quickly and easily sign a transaction moving dozens of UTXOs, while other times a device of the same model would appear to struggle while signing a transaction with a much lower UTXO count. I began to investigate what other factors might be involved, but there seemed to be very little information on the internet addressing this subject. As a result, I consulted some of our engineers and embarked on a series of experiments. It led to some interesting discoveries and conclusions, which are shared across two articles, this being the second. The first can be found here.

Testing context and methodology

First, let’s discuss some details of my testing process, and its goals. At the time of writing, Unchained supports the Trezor Model One, Trezor Model T, Ledger Nano S, Ledger Nano S+, Ledger Nano X, Coldcard Mk3, and Coldcard Mk4. My primary objective for this project was to test all of these hardware wallets in a wide variety of signing situations. I focused on UTXO count and UTXO signing mass (as explained in the earlier article) as the primary factors, and attempted to keep all other potentially relevant variables constant. For any particularly scrupulous scientists among this article’s audience, the tests were performed at a normal room temperature and pressure, during a bitcoin bear market.

Devices and address types

I used exactly one device to represent each model of hardware wallet. Each device was running modern firmware at the time. Specifically, these firmware versions were used for each device:

  • Trezor Model One 1.11.2 (universal)

  • Trezor Model T 2.5.2 (bitcoin-only)

  • Ledger Nano S 2.0.0 (bitcoin app: 2.0.6)

  • Ledger Nano S+ 1.0.4 (bitcoin app: 2.0.6)

  • Ledger Nano X 2.0.2 (bitcoin app: 2.0.6)

  • Coldcard Mk3 4.1.6

  • Coldcard Mk4 5.0.2

To simulate various signing situations I generated different numbers of deposits into 2-of-3 multisignature wallets representing Unchained vaults. For technical readers: this was done in a regtest environment, and by using certain scripts I was also able to control the signing mass for the UTXOs deposited. To be specific, I could send bitcoin from a P2PKH address to any number of other P2PKH addresses, with one output also going into a P2SH address for my multisig wallet (by leaving out segwit addresses, the results should be extra conservative). The quantity of bitcoin used for each deposit varied, but should have no impact on the results of the tests. Each deposit came from a transaction with a small number of inputs; almost always just a single input.

Once I had the UTXOs I wanted, I used them to create the transactions that I would actually attempt to sign with my hardware wallets and observe the results. While the number of inputs chosen was a changing variable for the experiments, the number of outputs was always exactly two (simulating a change address). Both destination addresses were 2-of-3 P2SH and each of them was sent arbitrary amounts of bitcoin. The fee rate I chose for each transaction was almost always 3 sat/vb, although there may have been some variance that should be inconsequential to the results.

Signing phases and time measurement

In order to interpret the data we will arrive at momentarily, it is important to understand that when a hardware wallet is tasked with signing a transaction, the process occurs over three separate phases. I will refer to the first phase as pre-confirmation, the second phase as user-approval, and the final phase as post-confirmation.

The pre-confirmation phase begins when the device is fed a transaction and asked to sign. After taking some time to load, the device presents the user with the details of the transaction, and the pre-confirmation is complete. I recorded the pre-confirmation time for all of my tests. 

The user-approval phase begins once the transaction information is displayed on the device screen. The user is given an opportunity to double-check the information and either confirm it or cancel. While the device waits for a selection to be made, it does not progress any further with the signing process. This means I did not have to measure the duration of the user-approval stage, and could take as much time as I wanted before moving on.

Once the transaction details are approved, the device proceeds to the post-confirmation phase. After some additional loading time, the post-confirmation ends by providing the user with the private key signatures that were originally requested, and the process is done. I also recorded the post-confirmation time for all of my tests.

My method of measuring time during these tests was a simple stopwatch phone app. The data somewhat relies on my reaction time, which I can say did not vary by more than a few seconds. For the signatures that took longer than five minutes, and especially those that took over an hour, I was able to divert my attention away from the device, and come back when I suspected it was close to finishing a signing phase. As you will see from the timing data, there are patterns that created predictability, which I noticed early on and utilized.

Categorization and results

I performed tests within four categories, representing four different sources from which a bitcoin wallet might receive a deposit. 

  1. Simple transactions: deposits with only two outputs (the deposit itself and change returning to the sender), simulating peer-to-peer transfers and other basic movements of bitcoin.

  2. Exchange distributions: deposits from 30 output transactions (1 output as the deposit being received, and 29 outputs being sent to other addresses). Remember that the deposit received would be just one UTXO among many other UTXOs generated in a similar fashion. The collection of UTXOs would later act as inputs to the transaction I would be signing and measuring.

  3. Mining pool distributions: deposits from 300 output transactions. With such large distributions, this would cause a much higher signing mass when moving the deposited UTXOs.

  4. F2Pool distributions: deposits from 3000 output transactions. This would produce some of the largest signing masses bitcoiners are ever likely to encounter.

Simple transactions data

Let’s start by taking a look at the results for the simple transactions. These transactions had the smallest data size (PSBT kB) relative to the number of UTXOs being moved.

There are five transactions shown below, moving either 25, 50, 100, 250 or 500 UTXOs. They are split up by the pre-confirmation and post-confirmation phases. The time taken by each device is displayed in the format of Hours : Minutes : Seconds.

From the above results there are a few valuable takeaways before moving on to the next data set.

  • The pre-confirmation phase was typically very quick compared to the post-confirmation phase. A conclusion from my testing that I’ll share now is that the number of UTXOs being moved mostly impacts the post-confirmation time rather than the pre-confirmation time. 

  • Each additional UTXO included in a transaction added more time to the signing process than the previous UTXO. In other words, if I doubled the number of UTXOs, I would be more than doubling the time it will take to sign the transaction. This knowledge is useful, because it means splitting up a huge transaction into smaller chunks can save time. For example, moving 500 UTXOs with a Ledger Nano S+ might take 50 minutes, while moving 250 UTXOs twice will total closer to 30 minutes. However, after finishing all my experiments I realized this may not be the case had I been using segwit, which may produce a linear relationship.

  • The Trezor Model One performed better than the Trezor Model T across every opportunity. The Model T is a newer and more expensive device, and it even has superior hardware specifications. However, I suspect that the fancy touchscreen eats up a lot of the processing power, which negatively affects its performance while signing transactions. You will notice this trend continues for all subsequent data.

  • The Nano X and Nano S+ produced very similar numbers. This fact will remain true going forward.

  • The Coldcards were the least efficient signers by a conspicuous margin. Even the newly released Mk4 took more than 70 minutes to sign the exact same transaction that a Nano S+ completed in under 15 minutes. The discrepancy may be related to the air-gapped nature of the Coldcards, and it is also worth noting that outside of miners or DCAers, most bitcoin users aren’t moving around 250 UTXOs regularly, if ever.

Exchange distributions data

Next, let’s look at the signing times for UTXOs that were generated to simulate withdrawals from a bitcoin exchange. Similar to the last data set, I signed transactions moving 25, 50, 100 and 250 UTXOs. I chose to stop testing transactions moving more than 250 UTXOs, for a couple of reasons. 

Not only was signing such massive transactions a tedious experience, but it is also not particularly useful information. It’s very rare that someone would need to move that many UTXOs at once; 250 is already a fairly extreme number. Additionally, if the need ever did arise to move that many UTXOs, I’ve already demonstrated that it would be faster to move them over multiple transactions rather than all at once.

Separately, it is worth noting that although the number of UTXOs going into the transactions for this data set match up with the previous data set, you will see that the corresponding PSBT sizes are larger. This is because of the concept I covered earlier in this article: signing mass.

At this point, most of the valuable observations are produced by comparing this data set with the last one. 

  • The post-confirmation times are quite similar across both data sets, when the device model and number of UTXOs are held constant. 

  • On the other hand, a trend is appearing within the pre-confirmation stage that will become increasingly apparent in the final two data sets. That trend is a significant finding from my experiments: the signing mass (reflected by the PSBT size) typically affects the pre-confirmation time. In other words, you can imagine that during the pre-confirmation step, the device is performing the “input amount safety check” discussed near the beginning of this article. The larger the signing mass, the longer it will take to complete. Then, after the user-approval, the device will actually sign the UTXOs during the post-confirmation stage, which is why the number of UTXOs matters more at that time. At least conceptually, this framework appears correct for Ledgers and Coldcards. For Trezors, the situation is a bit different, which will be highlighted in the next section.

Mining pool distributions data

The next data set represents moving UTXOs that were received as distributions from a typical bitcoin mining pool, which are typically UTXOs with greater signing mass, and the four transactions I tested are using the same number of UTXOs as the previous data set.

Most of the trends pointed out earlier continue in this data set. With these transactions, the signing mass is becoming so heavy that the pre-confirmation times are in many cases similar to the post-confirmation times, if not longer.

  • Trezors are the exception. The pre-confirmation times for both the Model One and Model T are extremely quick, and similar in this data set to the times in the earlier data sets. Instead, the post-confirmation times are distinctly longer, even with the UTXO count held constant. It is as if the “input amount safety check” task on a Trezor takes place during the post-confirmation, after the user-approval. Whether this is true may be able to be verified by someone who can read the open source code better than I can.

  • The other obvious thing to highlight is the Coldcard PSBT limits becoming more restrictive. The data indicates that an Mk3 will be unsuccessful signing many more than 25 UTXOs at a time, if they came from typical mining pool distributions. Luckily, this wouldn’t cause someone to be unable to move their bitcoin; they could still use their Mk3 to move as many UTXOs as they want, just not all at once. The UTXOs would have to be split up into smaller chunks and signed over the course of multiple transactions.

F2Pool distributions data

We have now arrived at the final category, where we will look at UTXOs that were generated to simulate distributions from F2Pool. These have some of the heaviest signing masses that anyone in the bitcoin world is ever likely to experience.

For this data set, I used some new numbers for the UTXO counts. While the 25 and 100 UTXO transactions are useful to compare to the previous data sets, I also wanted to showcase a few transactions moving just a small number of UTXOs, which will put a spotlight on the limitations of air-gapped devices such as the Coldcards.

Upon reviewing this data set, the primary observations from the earlier experiments hold true. For Ledgers and Coldcards, the pre-confirmation times are now substantially longer than the post-confirmation times, which is the opposite of the first data set. Trezors, on the other hand, still have a very quick pre-confirmation time, by postponing most of the computations until after the user-approval.

  • Naturally, with a much greater data size per UTXO, the Coldcards bumped into their PSBT limitations with many fewer UTXOs than before. While the Mk4 hit its threshold between 10 and 25, the Mk3 couldn’t even sign a transaction trying to move more than two UTXOs at a time. Running into signing problems after no more than two F2Pool distributions surprised me, and I became interested in verifying this in the real world. With the help of a friend mining mainnet bitcoin using F2Pool, we were able to confirm this fact, as my Mk3 was still unable to sign off on moving three UTXOs at a time.

If you use Coldcards in addition to F2Pool (or ViaBTC), this information may be quite useful to you. Momentarily we will arrive at the final conclusions from these tests, and I will discuss a few strategies to avoid signing problems with your Coldcard.

Final Conclusions

While some of my conclusions were previously expressed as observations in the above sections, there are some important caveats and additional information worth sharing that came from my experiences performing these tests.

Device behaviors

Each hardware wallet was unique and exhibited different behavior depending on the situation. The amount of time they took to sign a transaction doesn’t provide the full picture of the signing experience. Let me take you through some of the favorable and unfavorable features of each brand of device, with a specific focus on signing transactions. This article does not take into account other device features, such as ones related to security, which may be more important to you if you are considering which option to purchase.

Trezors

For web-based applications, the signing process of a Trezor relies on running a program called Trezor Connect on the computer it is connected to. This reliance caused some inconveniences while setting up my experiments. In fact, this brings us to the most important caveat of this entire report: as of writing this, Trezor Connect currently has a 10 minute timeout, so anyone attempting to use a Trezor to sign a transaction that would take longer than 10 minutes on a web-based application will encounter an error.

The only reason why I was able to sign transactions longer than 10 minutes with my Trezors, is because I had technical assistance. I mentioned this issue to the team at SatoshiLabs, and they were extremely helpful and quick to act. They posted documentation on GitHib describing the “interactionTimeout” property, and with the help of an engineer at Unchained I was able to modify this property within my testing environment. Thankfully, SatoshiLabs also mentioned that they will look into adjusting or removing this limitation within Trezor Connect, so that people are less likely to run into this issue. I must commend SatoshiLabs for their communication and willingness to make improvements to Trezor Connect.

Another feature of the Trezor I’d like to mention is the signing progress indicator. On the Model One, the screen shows gears turning and a progress bar while signing a transaction. The Model T displays a circular progress bar. I really like the progress bar idea, and it was helpful during my experiments. However, the way it works is also a bit deceptive, for three reasons:

  1. For both devices, the progress bar was stuck on 0% during the pre-confirmation phase, and only became active during post-confirmation. Since the pre-confirmation phase of a Trezor is typically just a few seconds, this is not a big deal, but in a rare scenario with a lot of UTXOs, it may cause someone to think that their Trezor isn’t working.

  2. The Model T seemed to finish signing around the 60% - 80% progress area each time, giving a false impression about how much time was left. The Model One doesn’t have this problem, and would finish at the 100% mark as expected.

  3. In general, the progress bars were not a reliable indicator of how much time remained, because the bar moved at different speeds depending on the task being performed behind the scenes. During the first part of the post-confirmation phase, where I suspect the “input amount safety check” was occuring, the progress bar moved much slower than during the second part, where I suspect the safety check was complete and the Trezor was actually signing. This would cause the appearance of the Trezor needing a lot more time, but then it would suddenly jump forward and export the signed transaction.

Finally, I want to reiterate that while the Model T may have some additional desirable features, the older and less expensive Model One is the more powerful signing device. It performed better in every test.

Ledgers

Taking the (possibly temporary) Trezor Connect timeout issue into account, Ledgers seem to be the most capable device for moving lots of UTXOs at one time, or moving UTXOs with high signing masses. The Ledgers succeeded in signing every transaction I fed them. The Nano S+ and Nano X were also generally quite fast compared to other devices, challenged only by the Trezor Model One.

The Nano X and S+ signing speeds were almost identical to each other, with the S+ actually performing slightly better most of the time. It is worth noting that the Nano X is substantially more expensive, so the extra cost appears to be for the bluetooth capabilities and an internal battery.

There are a couple of pain points with the Ledgers to be mindful of. Firstly, in order to sign a transaction that would take longer than 5 or 10 minutes, you might need to change your device’s default settings. In the Security Settings menu, there should be options for PIN Lock and Screen Saver. These can be turned off so that the device will keep working on a lengthy signature instead of timing out.

Secondly, at no point during the signing process do any of the Ledger models have a progress indicator (other than within Ledger Live single-sig wallets). This was called out in Jameson Lopp’s signing performance study over two years ago, yet the annoyance still exists. While signing, Ledgers will look like they aren’t doing anything at all, and if the user doesn’t know what to expect in terms of duration, they may begin to think their device is not working. In reality, the device is likely fine, but some complex signatures may take over a half hour to complete.

Coldcards

I’m a big fan of Coldcards due to their feature set and the fact that Coinkite is bitcoin-only, but the truth is that they are not great performers for signing large transactions. This may be true for all air-gapped devices, in which case this would be a tradeoff for consumers to evaluate. Coldcards appear to be optimized for additional security optionality rather than for signing huge transactions.

Coldcards do have a progress indicator while they are signing a transaction. However, much like the Trezors, it leaves something to be desired. During the pre-confirmation, a progress bar is shown, but once it reaches 100%, the Coldcard displays “Validating…” and can take quite a while (in some cases several minutes) to actually finish, while no progress bar is seen. During the post-confirmation, a new progress bar is pictured, but for hefty transactions the device will still need some time to finish up after the bar has already reached 100%.

As a fun note, the PSBT limit of an Mk3 is advertised as 384 kB, but I was able to get my Mk3 to recognize an unsigned PSBT as large as 390 kB, and sign it in full. However, my 395 kB PSBT from the data above was unrecognized. This makes me wonder whether the 384 kB limit is merely an approximation, or if the precise limit is actually something like 394 kB.

Transaction signing

From these experiments, I was pleased to learn that the time it will take a device to sign a transaction is highly predictable. If the address types, quorums, and output counts are held constant (as they often are in a withdrawal from a collaborative custody arrangement) then the input count (UTXOs being moved) and the signing mass of those inputs are enough information to estimate the full signing experience for any of the devices I tested.

As I mentioned earlier, this fact became evident early on, and was demonstrated by my ability to walk away from a device during signing, returning my attention to it shortly before I expected it to finish, with consistent accuracy. The two critical trends I noticed that allowed me to make accurate predictions were the following:

  1. There was a strong linear relationship between the amount of data (PSBT size) and the “input amount safety check” part of the process (found within the pre-confirmation phase for Ledgers and Coldcards, and found within the post-confirmation phase for Trezors). In other words, if the data size was doubled, you could guess that “input amount safety check” duration would also be approximately doubled.

  2. There was a strong non-linear relationship between the number of UTXOs being moved, and the actual signing part of the process (found within the post-confirmation phase for all devices). Although non-linear, the relationship is indeed strong. I noticed that if I added 10 UTXOs to an existing set, and recorded the time increase created by those 10 UTXOs, then when I added yet another 10 UTXOs the time difference would be about the same, plus a little more. As mentioned earlier, this means that moving a large amount of UTXOs can actually be done more efficiently in multiple transactions rather than all at once. After my experiments concluded, I learned that this may not have been the case if I were using segwit, in which case I might have seen a linear relationship.

[Customizable Graph Generator! Sam Bradbury has cooked this up which he believes can be easily embedded in wordpress]

This knowledge satisfied the questions I originally set out to answer, and I learned a lot of interesting things along the way. For example, I was unaware of the various obstacles one must get around to sign a huge transaction with a Trezor or Ledger, or the extent to which certain mining distributions can cause problems for someone with a Coldcard.

Overcoming and preventing signing failures

Let’s wrap up this article with some strategies to help you overcome signing failures, or prevent them in the first place. 

If you are trying to move bitcoin with a hardware wallet and it doesn’t appear to be working, it doesn’t necessarily mean your device is broken. If you wait patiently, it may succeed after a much longer period than you expected. If that doesn’t work, you may need to move the bitcoin in several smaller transactions.

Remember that so long as your device is not broken, you will always be able to use it to move your bitcoin. Even in the most extreme situations, such as tasking an Mk3 with moving F2Pool distributions, it can be done (although it may be a cumbersome and slightly more expensive process). If your device actually is broken, you can load your backup seed phrase into a new device.

Additionally, there are things you can do to avoid these tricky situations in the first place. You can control the number of UTXOs you are holding in your bitcoin stash by performing UTXO consolidations, which you can learn more about in our previous article.

Not only can you control the number of UTXOs you are holding, but you can also control the signing mass of those UTXOs. If you receive payouts from a mining pool, consider having the payouts sent to an address you control outside of your cold storage wallet, and then move the bitcoin into cold storage yourself, manually. Because signing mass is only affected by the transaction immediately prior, by adding in the intermediary transaction you are essentially clearing the signing mass. When you send the bitcoin from your own address into cold storage, it will have the signing mass of a “simple transaction” as opposed to that of a mining pool distribution.

If you don’t want to bother with extra transactions but are concerned about signing mass from mining pool distributions, consider avoiding F2Pool and ViaBTC, and instead join a mining pool with more friendly distribution practices. I reached out to both mining pools to ask if they have any plans to change their distribution method, and a F2Pool service representative said they do intend to make changes at this time. I never heard back from ViaBTC.

Previous
Previous

How to prevent small UTXOs from becoming bitcoin dust

Next
Next

Bitcoin signing mass: Why some UTXOs are more difficult to sign than others