You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
lnbook/12_path_finding.asciidoc

456 lines
41 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

[[path_finding]]
== Pathfinding and Payment Delivery
((("pathfinding", id="ix_12_path_finding-asciidoc0", range="startofrange")))Payment ((("payment delivery", id="ix_12_path_finding-asciidoc1", range="startofrange")))delivery on the Lightning Network depends on finding a path from the sender to the recipient, a process called _pathfinding_. Since the routing is done by the sender, the sender must find a suitable path to reach the destination. This path is then encoded in an onion, as we saw in <<onion_routing>>.
In this chapter we will examine the problem of pathfinding, understand how uncertainty about channel balances complicates this problem, and look at how a typical pathfinding implementation attempts to solve it.
=== Pathfinding in the Lightning Protocol Suite
((("Lightning Network Protocol","pathfinding in")))((("pathfinding","Lightning Protocol Suite and")))Pathfinding, path selection, multipart payments (MPP), and the payment attempt trial-and-error loop occupy the majority of the payment layer at the top of the protocol suite.
These components are highlighted by an outline in the protocol suite, shown in <<LN_protocol_pathfinding_highlight>>.
[[LN_protocol_pathfinding_highlight]]
.Payment delivery in the Lightning protocol suite
image::images/mtln_1201.png["Payment delivery in the Lightning protocol suite"]
==== Where Is the BOLT?
((("BOLT (Basis of Lightning Technology) standards documents","pathfinding and")))((("pathfinding","BOLT standard and")))So far we've looked at several technologies that are part of the Lightning Network and we have seen their exact specification as part of a BOLT standard. You may be surprised to find that pathfinding is not part of the BOLTs!
That's because pathfinding isn't an activity that requires any form of coordination or interoperability between different implementations. As we've seen, the path is selected by the sender. Even though the routing details are specified in detail in the BOLTs, the path discovery and selection are left entirely up to the sender. So each node implementation can choose a different strategy/algorithm to find paths. In fact, the different node/client and wallet implementations can even compete and use their pathfinding algorithm as a point of differentiation.
=== Pathfinding: What Problem Are We Solving?
((("pathfinding","nature of problem solved by", id="ix_12_path_finding-asciidoc2", range="startofrange")))The term pathfinding may be somewhat misleading because it implies a search for _a single path_ connecting two nodes. In the beginning, when the Lightning Network was small and not well interconnected, the problem was indeed about finding a way to join payment channels to reach the recipient.
But, as the Lightning Network has grown explosively, the pathfinding problem's nature has shifted. In mid-2021, as we finish this book, the Lightning Network consists of 20,000 nodes connected by at least 55,000 public channels with an aggregate capacity of almost 2,000 BTC. A node has on average 8.8 channels, while the top 10 most connected nodes have between 400 and 2,000 channels _each_. A visualization of just a small subset of the LN channel graph is shown in <<lngraph>>.
[[lngraph]]
.A visualization of part of the Lightning Network as of July 2021
image::images/mtln_1202.png[]
[NOTE]
====
The network visualization in <<lngraph>> was produced with a simple Python script you can find in code/lngraph in the book's repository.
====
If the sender and recipient are connected to other well-connected nodes and have at least one channel with adequate capacity, there will be thousands of paths. The problem becomes selecting the _best_ path that will succeed in payment delivery, out of a list of thousands of possible paths.
==== Selecting the Best Path
((("pathfinding","selecting the best path")))To select the best path, we have to first define what we mean by "best." There may be many different criteria, such as:
* Paths with enough liquidity. Obviously if a path doesn't have enough liquidity to route our payment, then it is not a suitable path.
* Paths with low fees. If we have several candidates, we may want to select ones with lower fees.
* Paths with short timelocks. We may want to avoid locking our funds for too long and therefore select paths with shorter timelocks.
All of these criteria may be desirable to some extent, and selecting paths that are favorable across many dimensions is not an easy task. Optimization problems like this may be too complex to solve for the "best" solution, but often can be solved for some approximation of the optimal, which is good news because otherwise pathfinding would be an intractable problem.
==== Pathfinding in Math and Computer Science
((("pathfinding","math and computer science")))Pathfinding in the Lightning Network falls under a general category of _graph theory_ in mathematics and the more specific category of _graph traversal_ in computer science.
A network such as the Lightning Network can be represented as a mathematical construct called a _graph_, where _nodes_ are connected to each other by _edges_ (equivalent to the payment channels). ((("directed graph")))The Lightning Network forms a _directed graph_ because the nodes are linked _asymmetrically_, since the channel balance is split between the two channel partners and the payment liquidity is different in each direction. ((("flow network")))A directed graph with numerical capacity constraints on its edges is called a _flow network_, a mathematical construct used to optimize transportation and other similar networks. Flow networks can be used as a framework when solutions need to achieve a specific flow while minimizing cost, known as the minimum cost flow problem (MCFP).
==== Capacity, Balance, Liquidity
((("pathfinding","capacity, balance, and liquidity")))To better understand the problem of transporting satoshis from point A to point B, we need to better define three important terms: capacity, balance, and liquidity. We use these terms to describe a payment channel's ability to route a payment.
In a payment channel connecting A<-->B:
Capacity:: ((("capacity, payment channel")))This is the aggregate amount of satoshis that were funded into the 2-of-2 multisig with the funding transaction. It represents the maximum amount of value held in the channel. The channel capacity is announced by the gossip protocol and is known to nodes.
Balance:: ((("balance, in payment channel")))This is the amount of satoshis held by each channel partner that can be sent to the other channel partner. A subset of the balance of A can be sent in the direction (A->B) toward node B. A subset of the balance of B can be sent in the opposite direction (A<-B).
Liquidity:: ((("liquidity","in payment channel")))The available (subset) balance that can actually be sent across the channel in one direction. Liquidity of A is equal to the balance of A minus the channel reserve and any pending HTLCs committed by A.
The only value known to the network (via gossip announcements) is the aggregate capacity of the channel. Some unknown portion of that capacity is distributed as each partner's balance. Some subset of that balance is available to send across the channel in one direction:
++++
<ul class="simplelist">
<li>capacity = balance(A) + balance(B)</li>
<li>liquidity(A) = balance(A) channel_reserve(A) pending_HTLCs(A)</li>
</ul>
++++
==== Uncertainty of Balances
((("pathfinding","uncertainty of balances")))If we knew the exact channel balances of every channel, we could compute one or more payment paths using any of the standard pathfinding algorithms taught in good computer science programs. But we don't know the channel balances; we only know the aggregate channel capacity, which is advertised by nodes in channel announcements. In order for a payment to succeed, there must be adequate balance on the sending side of the channel. If we don't know how the capacity is distributed between the channel partners, we don't know if there is enough balance in the direction we are trying to send the payment.
Balances are not announced in channel updates for two reasons: privacy and scalability. First, announcing balances would reduce the privacy of the Lightning Network because it would allow surveillance of payment by statistical analysis of the changes in balances. Second, if nodes announced balances (globally) with every payment, the Lightning Network's scaling would be as bad as that of on-chain Bitcoin transactions which are broadcast to all participants. Therefore, balances are not announced. To solve the pathfinding problem in the face of uncertainty of balances, we need innovative pathfinding strategies. These strategies must relate closely to the routing algorithm that is used, which is source-based onion routing where it is the responsibility of the sender to find a path through the network.
((("range of liquidity")))The uncertainty problem can be described mathematically as a _range of liquidity_, indicating the lower and upper bounds of liquidity based on the information that is known. Since we know the capacity of the channel and we know the channel reserve balance (the minimum allowed balance on each end), the liquidity can be defined as:
++++
<ul class="simplelist">
<li>min(liquidity) = channel_reserve</li>
<li>max(liquidity) = capacity channel_reserve</li>
</ul>
++++
[role="pagebreak-before"]
or as a range:
++++
<ul class="simplelist">
<li>channel_reserve &lt;= liquidity &lt;= (capacity channel_reserve)</li>
</ul>
++++
Our channel liquidity uncertainty range is the range between the minimum and maximum possible liquidity. This is unknown to the network, except the two channel partners. However, as we will see, we can use failed HTLCs returned from our payment attempts to update our liquidity estimate and reduce uncertainty. If, for example, we get an HTLC failure code that tells us that a channel cannot fulfill an HTLC that is smaller than our estimate for maximum liquidity, that means the maximum liquidity can be updated to the amount of the failed HTLC. In simpler terms, if we think the liquidity can handle an HTLC of _N_ satoshis and we find out it fails to deliver _M_ satoshis (where _M_ is smaller), then we can update our estimate to __M__1 as the upper bound. We tried to find the ceiling and bumped against it, so it's lower than we thought!
==== Pathfinding Complexity
((("pathfinding","complexity")))Finding a path through a graph is a problem modern computers can solve rather efficiently.
Developers mainly choose breadth-first search if the edges are all of equal weight.
In cases where the edges are not of equal weight, an algorithm based on ((("Dijkstra&apos;s algorithm")))Dijkstra's algorithm is used, such as https://en.wikipedia.org/wiki/A*_search_algorithm[A* (pronounced "A-star")].
In our case the weights of the edges can represent the routing fees.
Only edges with a capacity larger than the amount to be sent will be included in the search.
In this basic form, pathfinding in the Lightning Network is very simple and straightforward.
However, channel liquidity is unknown to the sender. This turns our easy theoretical computer science problem into a rather complex real-world problem.
We now have to solve a pathfinding problem with only partial knowledge.
For example, we suspect which edges might be able to forward a payment because their capacity seems big enough.
But we can't be certain unless we try it out or ask the channel owners directly.
Even if we were able to ask the channel owners directly, their balance might change by the time we have asked others, computed a path, constructed an onion, and sent it along.
Not only do we have limited information but the information we have is highly dynamic and might change at any point in time without our knowledge.
==== Keeping It Simple
((("pathfinding","simplicity")))The pathfinding mechanism implemented in Lightning nodes is to first create a list of candidate paths, filtered and sorted by some function. Then, the node or wallet will probe paths (by attempting to deliver a payment) in a trial-and-error loop until a path is found that successfully delivers the payment.
[NOTE]
====
This probing is done by the Lightning node or wallet and is not directly observed by the user of the software.
However, the user might suspect that probing is taking place if the payment is not completed instantly.
====
While blind probing is not optimal and leaves ample room for improvement, it should be noted that even this simplistic strategy works surprisingly well for smaller payments and well-connected nodes.
Most Lightning node and wallet implementations improve on this approach by ordering/weighting the list of candidate paths. Some implementations order the candidate paths by cost (fees) or some combination of cost and capacity.(((range="endofrange", startref="ix_12_path_finding-asciidoc2")))
=== Pathfinding and Payment Delivery Process
((("pathfinding","payment delivery process")))((("payment delivery","pathfinding and delivery process")))Pathfinding and payment delivery involves several steps, which we list here. Different implementations may use different algorithms and strategies, but the basic steps are likely to be very similar:
. Create a _channel graph_ from announcements and updates containing the capacity of each channel, and filter the graph, ignoring any channels with insufficient capacity for the amount we want to send.
. Find paths connecting the sender to the recipient.
. Order the paths by some weight (this may be part of the previous step's pass:[<span class="keep-together">algorithm</span>]).
. Try each path in order until payment succeeds (the trial-and-error loop).
. Optionally use the HTLC failure returns to update our graph, reducing pass:[<span class="keep-together">uncertainty</span>].
We can group these steps into three primary activities:
* Channel graph construction
* Pathfinding (filtered and ordered by some heuristics)
* Payment attempt(s)
These three activities can be repeated in a _payment round_ if we use the failure returns to update the graph, or if we are doing multipart payments (see <<mpp>>).
In the next sections we will look at each of these steps in more detail, as well as more advanced payment strategies.
=== Channel Graph Construction
((("channel graph","construction of", id="ix_12_path_finding-asciidoc3", range="startofrange")))((("pathfinding","channel graph construction", id="ix_12_path_finding-asciidoc4", range="startofrange")))In <<gossip>> we covered the three main messages that nodes use in their gossip: +node_announcement+, +channel_announcement+, and +channel_update+. These three messages allow any node to gradually construct a "map" of the Lightning Network in the form of a _channel graph_. Each of these messages provides a critical piece of information for the channel graph:
+node_announcement+:: ((("node_announcement message")))This contains the information about a node on the Lightning Network, such as its node ID (public key), network address (e.g., IPv4/6 or Tor), capabilities/features, etc.
+channel_announcement+:: ((("channel_announcement message","channel graph and")))((("channel_update message")))This contains the capacity and channel ID of a public (announced) channel between two nodes and proof of the channel's existence and ownership.
+channel_update+:: This contains a node's fee and timelock (CLTV) expectations for routing an outgoing (from that node's perspective) payment over a specified channel.
In terms of a mathematical graph, the +node_announcement+ is the information needed to create the nodes or _vertices_ of the graph. The +channel_announcement+ allows us to create the _edges_ of the graph representing the payment channels. Since each direction of the payment channel has its own balance, we create a directed graph. The +channel_update+ allows us to incorporate fees and timelocks to set the _cost_ or _weight_ of the graph edges.
Depending on the algorithm we will use for pathfinding, we may establish a number of different cost functions for the edges of the graph.
For now, let's ignore the cost function and simply establish a channel graph showing nodes and channels, using the +node_announcement+ and +channel_announcement+ messages.
In this chapter we will see how Selena attempts to find a path to pay Rashid one million satoshis. To start, Selena is constructing a channel graph using the information from Lightning Network gossip to discover nodes and channels. Selena will then explore her channel graph to find a path to send a payment to Rashid.
This is _Selena's_ channel graph. There is no such thing as _the_ channel graph, there is only ever _a channel graph_, and it is always from the perspective of the node that has constructed it (see <<map_territory_relation>>).
[TIP]
====
Selena does not contruct a channel graph only when sending a payment. Rather, Selena's node is _continuously_ building and updating a channel graph. From the moment Selena's node starts and connects to any peer on the network it will participate in the gossip and use every message to learn as much as possible about the network.
====
[[map_territory_relation]]
.The Map-Territory Relation
****
((("channel graph","mapterritory relation")))From Wikipedia's https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation[page on the Map-Territory Relation], "The map-territory relation describes the relationship between an object and a representation of that object, as in the relation between a geographical territory and a map of it."
The map-territory relation is best illustrated in "Sylvie and Bruno Concluded," a short story by Lewis Carroll which describes a fictional map that is a 1:1 scale of the territory it maps, therefore having perfect accuracy but becoming completely useless as it would cover the entire territory if unfolded.
What does this mean for the Lightning Network? The Lightning Network is the territory, and a channel graph is a map of that territory.
While we could imagine a theoretical (Platonic ideal) channel graph that represents the complete, up-to-date map of the Lightning Network, such a map is simply the Lightning Network itself. Each node has its own channel graph which is constructed from announcements and is necessarily incomplete, incorrect, and out-of-date!
The map can never completely and accurately describe the territory.
****
Selena listens to +node_announcement+ messages and discovers four other nodes (in addition to Rashid, the intended recipient). The resulting graph represents a network of six nodes: Selena and Rashid are the sender and recipient, respectively; Alice, Bob, Xavier, and Yan are intermediary nodes. Selena's initial graph is just a list of nodes, shown in <<channel_graph_nodes>>.
[[channel_graph_nodes]]
.Node announcements
image::images/mtln_1203.png[]
Selena also receives seven +channel_announcement+ messages with the corresponding channel capacities, allowing her to construct a basic "map" of the network, shown in <<channel_graph_1>>. (The names Alice, Bob, Selena, Xavier, Yan, and Rashid have been replaced by their initials: A, B, S, X, and R, respectively.)
[[channel_graph_1]]
.The channel graph
image::images/mtln_1204.png[]
===== Uncertainty in the channel graph
((("channel graph","uncertainty in")))As you can see from <<channel_graph_1>>, Selena does not know any of the balances of the channels. Her initial channel graph contains the highest level of uncertainty.
But wait: Selena does know _some_ channel balances! She knows the balances of the channels that her own node has connected with other nodes. While this does not seem like much, it is in fact very important information for constructing a path—Selena knows the actual liquidity of her own channels. Let's update the channel graph to show this information. We will use a "?" symbol to represent the unknown balances, as shown in <<channel_graph_2>>.
[[channel_graph_2]]
.Channel graph with known and unknown balances
image::images/mtln_1205.png[]
While the "?" symbol seems ominous, a lack of certainty is not the same as complete ignorance. We can _quantify_ the uncertainty and _reduce_ it by updating the graph with the successful/failed HTLCs we attempt.
Uncertainty can be quantified, because we know the maximum and minimum possible liquidity and can calculate probabilities for smaller (more precise) ranges.
Once we attempt to send an HTLC, we can learn more about channel balances: if we succeed, then the balance was _at least_ sufficient to transport the specific amount. Meanwhile if we get a "temporary channel failure" error, the most likely reason is a lack of liquidity for the specific amount.
[TIP]
====
You may be thinking, "What's the point of learning from a successful HTLC?" After all, if it succeeded we're "done." But consider that we may be sending one part of a multipart payment. We also may be sending other single-part payments within a short time. Anything we learn about liquidity is useful for the next attempt!
====
==== Liquidity Uncertainty and Probability
((("channel graph","liquidity uncertainty and probability")))((("liquidity","uncertainty and probability")))To quantify the uncertainty of a channel's liquidity, we can apply probability theory. A basic model of the probability of payment delivery will lead to some rather obvious, but important, conclusions:
* Smaller payments have a better chance of successful delivery across a path.
* Larger capacity channels will give us a better chance of payment delivery for a specific amount.
* The more channels (hops), the lower the chance of success.
While these may be obvious, they have important implications, especially for the use of multipart payments (see <<mpp>>). The math is not difficult to follow.
Let's use probability theory to see how we arrived at these conclusions.
First, let's posit that a channel with capacity _c_ has liquidity on one side with an unknown value in the range of (0, _c_) or "range between 0 and _c_." For example, if the capacity is 5, then the liquidity will be in the range (0, 5). Now, from this we see that if we want to send 5 satoshis, our chance of success is only 1 in 6 (16.66%), because we will only succeed if the liquidity is exactly 5.
More simply, if the possible values for the liquidity are 0, 1, 2, 3, 4, and 5, only one of those six possible values will be sufficient to send our payment. To continue this example, if our payment amount was 3, then we would succeed if the liquidity was 3, 4, or 5. So our chances of success are 3 in 6 (50%). Expressed in math, the success probability function for a single channel is:
[latexmath]
++++
$P_c(a) = (c + 1 - a) / (c + 1)$
++++
where _a_ is the amount and _c_ is the capacity.
From the equation we see that if the amount is close to 0, the probability is close to 1, whereas if the amount exceeds the capacity, the probability is zero.
In other words: "Smaller payments have a better chance of successful delivery" or "Larger capacity channels give us better chances of delivery for a specific amount" and "You can't send a payment on a channel with insufficient capacity."
Now let's think about the probability of success across a path made of several channels. Let's say our first channel has a 50% chance of success (_P_ = 0.5). Then if our second channel has a 50% chance of success (_P_ = 0.5), it is intuitive that our overall chance is 25% (_P_ = 0.25).
We can express this as an equation that calculates the probability of a payment's success as the product of probabilities for each channel in the path(s):
[latexmath]
++++
$P_{payment} = \prod_{i=1}^n P_i$
++++
Where __P__~__i__~ is the probability of success over one path or channel, and __P__~__payment__~ is the overall probability of a successful payment over all the paths/channels.
From the equation we see that since the probability of success over a single channel is always less than or equal to 1, the probability across many channels will _drop exponentially_.
In other words, "The more channels (hops) you use, the lower the chance of success."
[NOTE]
====
There is a lot of mathematical theory and modeling behind the uncertainty of the liquidity in the channels. Fundamental work about modeling the uncertainty intervals of the channel liquidity can be found in the paper https://arxiv.org/abs/2103.08576["Security and Privacy of Lightning Network Payments with Uncertain Channel Balances"] by (coauthor of this book) Pickhardt pass:[<span class="keep-together">et al</span>].
====
==== Fees and Other Channel Metrics
((("channel graph","fees and other channel metrics", id="ix_12_path_finding-asciidoc5", range="startofrange")))((("fees","channel graph and", id="ix_12_path_finding-asciidoc6", range="startofrange")))Next, our sender will add information to the graph from +channel_update+ messages received from the intermediary nodes. As a reminder, the +channel_update+ contains a wealth of information about a channel and the expectations of one of the channel partners.
In <<channel_graph_3>> we see how Selena can update the channel graph based on +channel_update+ messages from A, B, X, and Y. Note that the channel ID and channel direction (included in +channel_flags+) tell Selena which channel and which direction this update refers to. Each channel partner gossips one or more +channel_update+ messages to announce their fee expectations and other information about the channel. For example, in the top left we see the +channel_update+ sent by Alice for the channel A--B and the direction A-to-B. With this update, Alice tells the network how much she will charge in fees to route an HTLC to Bob over that specific channel. Bob may announce a channel update (not shown in this diagram) for the opposite direction with completely different fee expectations. Any node may send a new +channel_update+ to change the fees or timelock expectations at any time.
[[channel_graph_3]]
.Channel graph fees and other channel metrics
image::images/mtln_1206.png[]
The fee and timelock information are very important, not just as path selection metrics. As we saw in <<onion_routing>>, the sender needs to add up fees and timelocks (+cltv_expiry_delta+) at each hop to make the onion. The process of calculating fees happens from the recipient to the sender _backward_ along the path because each intermediary hop expects an incoming HTLC with higher amount and expiry timelock than the outgoing HTLC they will send to the next hop. So, for example, if Bob wants 1,000 satoshis in fees and 30 blocks of expiry timelock delta to send a payment to Rashid, then that amount and expiry delta must be added to the HTLC _from Alice_.
It is also important to note that a channel must have liquidity that is sufficient not only for the payment amount but also for the cumulative fees of all the subsequent hops. Even though Selena's channel to Xavier (S->X) has enough liquidity for a 1M satoshi payment, it _does not_ have enough liquidity once we consider fees. We need to know fees because only paths that have sufficient liquidity for _both payment and all fees_ will be considered(((range="endofrange", startref="ix_12_path_finding-asciidoc6")))(((range="endofrange", startref="ix_12_path_finding-asciidoc5"))).(((range="endofrange", startref="ix_12_path_finding-asciidoc4")))(((range="endofrange", startref="ix_12_path_finding-asciidoc3")))
=== Finding Candidate Paths
((("pathfinding","finding candidate paths")))Finding a suitable path through a directed graph like this is a well-studied computer science problem (known broadly as the _shortest path problem_), which can be solved by a variety of algorithms depending on the desired optimization and resource constraints.
((("Dijkstra&apos;s algorithm")))The most famous algorithm solving this problem was invented by Dutch mathematician E. W. Dijkstra in 1956, known simply as https://en.wikipedia.org/wiki/Dijkstra's_algorithm[_Dijkstra's algorithm_]. In addition to the original Dijkstra's algorithm, there are many variations and optimizations, such as https://en.wikipedia.org/wiki/A*_search_algorithm[A* ("A-star")], which is a heuristic-based algorithm.
As mentioned previously, the "search" must be applied _backward_ to account for fees that are accumulated from recipient to sender. Thus, Dijkstra, A*, or some other algorithm would search for a path from the recipient to the sender, using fees, estimated liquidity, and timelock delta (or some combination) as a cost function for each hop.
Using one such algorithm, Selena calculates several possible paths to Rashid, sorted by shortest path:
1. S->A->B->R
2. S->X->Y->R
3. S->X->B->R
4. S->A->B->X->Y->R
But, as we saw previously, the channel S->X does not have enough liquidity for a 1M satoshi payment once fees are considered. So Paths 2 and 3 are not viable. That leaves Paths 1 and 4 as possible paths for the payment.
With two possible paths, Selena is ready to attempt delivery!
=== Payment Delivery (Trial-and-Error Loop)
Selena's ((("payment delivery","trial-and error loop", id="ix_12_path_finding-asciidoc8", range="startofrange")))((("trial-and error loop", id="ix_12_path_finding-asciidoc9", range="startofrange")))node starts the trial-and-error loop by constructing the HTLCs, building the onion, and attempting delivery of the payment. For each attempt, there are three possible outcomes:
[role="pagebreak-before"]
- A successful result (+update_fulfill_htlc+)
- An error (+update_fail_htlc+)
- A "stuck" payment with no response (neither success nor failure)
If the payment fails, it can be retried via a different path by updating the graph (changing a channel's metrics) and recalculating an alternative path.
We looked at what happens if the payment is "stuck" in <<stuck_payments>>. The important detail is that a stuck payment is the worst outcome because we cannot retry with another HTLC since both (the stuck one and the retry one) might go through eventually and cause a double payment.
==== First Attempt (Path #1)
Selena attempts the first path (S->A->B->R). She constructs the onion and sends it, but receives a failure code from Bob's node. Bob reports back a +temporary channel failure+ with a +channel_update+ identifying the channel B->R as the one that can't deliver. This attempt is shown in <<path_1_fail>>.
[[path_1_fail]]
.Path #1 attempt fails
image::images/mtln_1207.png[]
===== Learning from failure
From this failure code, Selena will deduce that Bob doesn't have enough liquidity to deliver the payment to Rashid on that channel. Importantly, this failure narrows the uncertainty of the liquidity of that channel! Previously, Selena's node assumed that the liquidity on Bob's side of the channel was somewhere in the range (0, 4M). Now, she can assume that the liquidity is in the range (0, 999999). Similarly, Selena can now assume that the liquidity of that channel on Rashid's side is in the range (1M, 4M), instead of (0, 4M). Selena has learned a lot from this failure.
==== Second Attempt (Path #4)
Now Selena attempts the fourth candidate path (S->A->B->X->Y->R). This is a longer path and will incur more fees, but it's now the best option for delivery of the payment.
Fortunately, Selena receives an +update_fulfill_htlc+ message from Alice, indicating that the payment was successful, as shown in <<path_4_success>>.
[[path_4_success]]
.Path #4 attempt succeeds
image::images/mtln_1208.png[]
===== Learning from success
Selena has also learned a lot from this successful payment. She now knows that all the channels on the path S->A->B->X->Y->R had enough liquidity to deliver the payment. Furthermore, she now knows that each of these channels has moved the HTLC amount (1M &#x2b; fees) to the other end of the channel. This allows Selena to recalculate the range of liquidity on the receiving side of all the channels in that path, replacing the minimum liquidity with 1M &#x2b; fees.
===== Stale knowledge?
Selena now has a much better "map" of the Lightning Network (at least as far as these seven channels go). This knowledge will be useful for any subsequent payments that Selena attempts to make.
However, this knowledge becomes somewhat "stale" as the other nodes send or route payments. Selena will never see any of these payments (unless she is the sender). Even if she is involved in routing payments, the onion routing mechanism means she can only see the changes for one hop (her own channels).
Therefore, Selena's node must consider how long to keep this knowledge before assuming that it is stale and no longer useful(((range="endofrange", startref="ix_12_path_finding-asciidoc9")))(((range="endofrange", startref="ix_12_path_finding-asciidoc8"))).
[[mpp]]
=== Multipart Payments
((("multipart payments (MPP)", id="ix_12_path_finding-asciidoc10", range="startofrange")))((("payment delivery","multipart payments", id="ix_12_path_finding-asciidoc11", range="startofrange")))_Multipart payments (MPP)_ are a feature that was introduced in the Lightning Network in 2020 and are already very widely available. Multipart payments allow a payment to be split into multiple _parts_ which are sent as HTLCs over several different paths to the intended recipient, preserving the _atomicity_ of the overall payment. In this context, atomicity means that either all the HTLC parts of a payment are eventually fulfilled or the entire payment fails and all the HTLC parts fail. There is no possibility of a partially successful payment.
Multipart payments are a significant improvement in the Lightning Network because they make it possible to send amounts that won't "fit" in any single channel by splitting them into smaller amounts for which there is sufficient liquidity. Furthermore, multipart payments have been shown to increase the probability of a successful payment, as compared to a single-path payment.
[TIP]
====
Now that MPP is available, it is best to think of a single-path payment as a subcategory of an MPP. Essentially, a single-path is just a multipart of size one. All payments can be considered as multipart payments unless the size of the payment and liquidity available make it possible to deliver with a single part.
====
==== Using MPP
MPP is not something that a user will select, but rather it is a node pathfinding and payment delivery strategy. The same basic steps are implemented: create a graph, select paths, and the trial-and-error loop. The difference is that during path selection we must also consider how to split the payment to optimize delivery.
In our example we can see some immediate improvements to our pathfinding problem that become possible with MPP. First, we can utilize the S->X channel that has known insufficient liquidity to transport 1M satoshis plus fees. By sending a smaller part along that channel, we can use paths that were previously unavailable. Second, we have the unknown liquidity of the B->R channel, which is insufficient to transport the 1M amount, but might be sufficient to transport a smaller amount.
===== Splitting payments
((("multipart payments (MPP)","splitting payments", id="ix_12_path_finding-asciidoc12", range="startofrange")))((("payment","splitting", id="ix_12_path_finding-asciidoc13", range="startofrange")))The fundamental question is how to split the payments. More specifically, what are the optimal number of splits and the optimal amounts for each split?
This is an area of ongoing research where novel strategies are emerging. Multipart payments lead to a different algorithmic approach than single-path payments, even though single-path solutions can emerge from a multipart optimization (i.e., a single path may be the optimal solution suggested by a multipart pathfinding algorithm).
If you recall, we found that the uncertainty of liquidity/balances leads to some (somewhat obvious) conclusions that we can apply in MPP pathfinding, namely:
* Smaller payments have a higher chance of succeeding.
* The more channels you use, the chance of success becomes (exponentially) lower.
From the first of these insights, we might conclude that splitting a large payment (e.g., 1 million satoshis) into tiny payments increases the chance that each of those smaller payments will succeed. The number of possible paths with sufficient liquidity will be greater if we send smaller amounts.
To take this idea to an extreme, why not split the 1M satoshi payment into one million separate one-satoshi parts? Well, the answer lies in our second insight: since we would be using more channels/paths to send our million single-satoshi HTLCs, our chance of success would drop exponentially.
If it's not obvious, the two preceding insights create a "sweet spot" where we can maximize our chances of success: splitting into smaller payments but not too many splits!
Quantifying this optimal balance of size/number of splits for a given channel graph is out of the scope of this book, but it is an active area of research. Some current implementations use a very simple strategy of splitting the payment in two halves, four quarters, etc.
[NOTE]
====
To read more about the optimization problem known as minimum-cost flows involved when splitting payments into different sizes and allocating them to paths, see the paper https://arxiv.org/abs/2107.05322["Optimally Reliable & Cheap Payment Flows on the Lightning Network"] by (coauthor of this book) René Pickhardt and Stefan Richter.
====
In our example, Selena's node will attempt to split the 1M satoshi payment into 2 parts with 600k and 400k satoshi, respectively, and send them on 2 different paths. This is shown in <<mpp_paths>>.
Because the S->X channel can now be utilized, and (luckily for Selena), the B->R channel has sufficient liquidity for 600k satoshis, the 2 parts are successful along paths that were previously not possible.(((range="endofrange", startref="ix_12_path_finding-asciidoc13")))(((range="endofrange", startref="ix_12_path_finding-asciidoc12")))
[[mpp_paths]]
.Sending two parts of a multipart payment
image::images/mtln_1209.png[]
==== Trial and Error over Multiple "Rounds"
((("multipart payments (MPP)","trial-and error over multiple rounds")))((("payment delivery","trial-and error loop")))((("trial-and error loop")))Multipart payments lead to a somewhat modified trial-and-error loop for payment delivery. Because we are attempting multiple paths in each attempt, we have four possible outcomes:
* All parts succeed, the payment is successful
* Some parts succeed, some fail with errors returned
* All parts fail with errors returned
* Some parts are "stuck," no errors are returned
In the second case, where some parts fail with errors returned and some parts succeed, we can now _repeat_ the trial-and-error loop, but _only for the residual amount_.
Let's assume for example that Selena had a much larger channel graph with hundreds of possible paths to reach Rashid. Her pathfinding algorithm might find an optimal payment split consisting of 26 parts of varying sizes. After attempting to send all 26 parts in the first round, 3 of those parts failed with errors.
If those 3 parts consisted of, say 155k satoshis, then Selena would restart the pathfinding effort, only for 155k satoshis. The next round could find completely different paths (optimized for the residual amount of 155k), and split the 155k amount into completely different splits!
[TIP]
====
While it seems like 26 split parts are a lot, tests on the Lightning Network have successfully delivered a payment of 0.3679 BTC by splitting it into 345 parts.
====
Furthermore, Selena's node would update the channel graph using the information gleaned from the successes and errors of the first round to find the most optimal paths and splits for the second round.
Let's say that Selena's node calculates that the best way to send the 155k residual is 6 parts split as 80k, 42k, 15k, 11k, 6.5k, and 500 satoshis. In the next round, Selena gets only one error, indicating that the 11k satoshi part failed. Again, Selena updates the channel graph based on the information gleaned and runs the pathfinding again to send the 11k residual. This time, she succeeds with 2 parts of 6k and 5k satoshis, respectively.
This multiround example of sending a payment using MPP is shown in <<mpp_rounds>>.
[[mpp_rounds]]
.Sending a payment in multiple rounds with MPP
image::images/mtln_1210.png[]
In the end, Selena's node used 3 rounds of pathfinding to send the 1M satoshis in 30 parts.(((range="endofrange", startref="ix_12_path_finding-asciidoc11")))(((range="endofrange", startref="ix_12_path_finding-asciidoc10")))
=== Conclusion
In this chapter we looked at pathfinding and payment delivery. We saw how to use the channel graph to find paths from a sender to a recipient. We also saw how the sender will attempt to deliver payments on a candidate path and repeat in a trial-and-error loop.
We also examined the uncertainty of channel liquidity (from the perspective of the sender) and the implications that has for pathfinding. We saw how we can quantify the uncertainty and use probability theory to draw some useful conclusions. We also saw how we can reduce uncertainty by learning from both successful and failed payments.
Finally, we saw how the newly deployed multipart payments feature allows us to split payments into parts, increasing the probability of success even for larger payments(((range="endofrange", startref="ix_12_path_finding-asciidoc1"))).(((range="endofrange", startref="ix_12_path_finding-asciidoc0")))