Introduction
IPFS provides a highly developed solution to share content-addressable data in a network of peers. The distributed solution to preservation and exchange of data it provides arouses interest to share more operations between the peers, to build decentralized networks that use and modify shared data collaboratively. It might seem that such a network can be built independently of IPFS and actually there already are a number of peer-to-peer applications that use IPFS as a file sharing tool. However, it also seems promising to find a closer integration between the distributed storage and various methods to access and modify the data it keeps. Supposing the evolution of peer-to-peer networks to go this way we may expect the storage function itself to become only a moment, though the base one, of what peers can do together with the data they share.
Among various approaches to process data in a cooperative and distributed manner the two general organisation principles can be distinguished. The first one is to distribute all functions evenly and randomly among the peers. Consequently, such networks have no authorities and need some sort of decentralized consensus about all dynamic content they produce. This is the field of trustless cooperation. The alternative principle is to retain one or another form of authority and to cooperate relying on explicit trust instead of randomization. While the trustless approach brings a new quality to the peer-to-peer networks and the whole Internet, the trustful cooperation in the self-governing network can reproduce the best practices of the initial Internet, raising them to a whole higher step.
The present paper describes the means to build an open network where dynamic content is produced and shared as a result of trustful cooperation of peers. IPFS seems to be very suitable for a basis of such network which dynamic functions would, in turn, facilitate growth of shared amount of static content and acceleration of exchange.
The trust model
Exchanging of dynamic data between a pair of peers seems to be not harder than a static data transfer. However, when one peer asks for a file with a known hash it is able to verify the reply of the others and be sure there is no garbage or a fake. That’s the key feature of CID addressing which makes it very transparent and handy. In the case of dynamic data we generally don’t have such option. And if we can’t verify the very process of remote calculations (which is true most of the time) we have to trust the peer or to refuse the reply. More precisely, we should have dynamic data transactions only with peers we trust.
Having this in mind we could start to share dynamic services with our friends on a peer-to-peer network. Apparently, offline communications is the first source of trust we all use one way or another.
The second principle we can consider is to trust dynamic data when it is authored by the same source as the static data it is related to. For instance, if a person has taken a picture and told us its CID we could ask he or she to crop and resize it for our needs or to convert it to a format we want and be almost certain the result would not be a fake. Why? Because there is no much sense to fake that way and because the original CID could also be a fake in the first place. So, if someone provides an online picture-processing service for his or her own pictures there is the same reason to trust that service as to trust the picture data itself.
On the other hand, sometimes trust might be critical for one user and irrelevant for the other. For instance, the use of a trusted email or a messaging server is normally more important for the recipient rather than the sender.
These facts encourage the use of self-hosted services in a network of peers. However, if we consider the possibility of using offline communications as the base for trust, it further leads us to the idea of a network where particular services of a peer are provided on behalf of one or more other peers that trust the provider. The only question here is how to make such behalf observable and verifiable for the rest of the network?
That’s the place where IPNS should come into play. It provides a way to publish authorized updates in the IPFS peer-to-peer network.
The simplest form in which trusted relationships could be stated on IPNS requires identifier of one peer to be published by the other peer somewhere in its personal IPNS area. However, in order to state the trust as specific to a particular service only we need a way to reference that service along with ID of the peer that provides it on behalf of us. Moreover, it seems reasonable to reference services in their relation to a specific context or a dataset suggesting their usage for a particular purpose. Therefore, we need to define some kind of layout in order to publish all necessary information unambiguously.
Example 1: A personal picture gallery
A network service is defined by a particular protocol that is implemented in software and sometimes in hardware. A protocol, in turn, is usually described elsewhere in human and machine-readable forms with assignment of a unique name and a version number.
In IPFS and other libp2p-based applications a path-like form of a
protocol identifier is commonly accepted. Using this form seems very
suitable for our task because a path name of a service protocol
a) can be naturally placed under a context as a sub-path and
b) itself defines a sub-context for any service-related data.
Now, let us try to layout a picture gallery service mentioned above using protocol path names to delimit trust contexts:
/ipns/<PeerID>/ClipArt/
dir/
cat_0001.jpg
dog_0002.jpg
...
convert/
imagick/v1.0.1/
endpoints
app/
index.html
js/
css/
search/
filesearch/v1.3.0/
endpoints
First of all, we see the ClipArt/ directory that states the name of
the publication. The pictures themselves can be found under
dir/. Next to it is placed the convert/ directory with
imagick/v1.0.1/ subdirectory sequence containing the
endpoints file. The path of that sequence matches the name-version
identifier of a hypothetical “imagick” protocol that might be used
to communicate with an image processing service (the name resembles
the well-known “ImageMagick” suite). As for the endpoints file, it
should contain the list of peers that are trusted by the clip-art
publisher to handle the image processing requests.
What procedure could be used to access that service? The base idea is
the following. When a client software is pointed to
/ipns/<PeerID>/ClipArt/convert/ it should search for the endpoints
file, trying various paths corresponding to the protocols it supports
until a match is found. Then it should try to connect to one of the
endpoint peers and use the protocol appropriately.
However, where a compatible client software could be obtained from?
Depending on the protocol spread the possible client may or may not be
known to the end user. For sake of completeness we assume here that
a particular version of a client is shipped along with the gallery in
the form of a JavaScript application that uses js-libp2p to
communicate with the endpoint peers (the app/ directory in the tree).
At least one more thing is needed to make the gallery user really
happy: the gallery author should allow the user to somehow search for
pictures. The authorized search service is made possible by inclusion
of one another hypothetical protocol node into the ClipArt/ IPNS
tree: search/filesearch/v1.3.0/.
Who should actually perform the search? In the simplest case, the
corresponding endpoints file would list a single peer — the self
peer ID of the gallery publisher (the same thing for the image
conversion service). Therefore, using a self-endpoint setup one
could easily run a self-hosted site with static content on IPFS and
libp2p-accessible services authorized on IPNS.
However, a more advanced setup with more than one endpoint peer is
also possible. The services, including the search service, could also
be provided by multiple third-party peers on behalf of the gallery
publisher. The publisher on his/her part have to list the trusted
service providers as authorized endpoints in the corresponding
endpoints file. In the case of the search service the trusting
relationships between the publisher and the providers assume the
latter watch after /ipns/<PeerID>/ClipArt/dir/ and perform metadata
indexing on updates. It also assumes the trusting relationships to be
mutual having providers in turn to list on their IPNS some information
that should allow the others to verify that the given peer confirms to
act as an endpoint for another peer. The possible layout for the
counter-authorization is the following:
/ipns/<ProviderPeerID>/supported/
imagick/v1.0.1/
for/
<hash-of-PeerID_1>
<hash-of-PeerID_2>
...
under/
conditions
filesearch/v1.3.0/
for/
<hash-of-PeerID_1>
<hash-of-PeerID_2>
...
under/
conditions
This time it seems better to list peer IDs directly in the form of file names to simplify the verification procedure in the case there would be a lot of authorized peers. Using the form of a hash of a peer ID instead of a plain peer ID seems reasonable as a protection for personal information.
The under/conditions sub-layout is a place to define some
requirements under which the service is provided by that particular
peer. A number of possible requirements are discussed in the
“Service consumer design overview” section below.
The advantages of multi-endpoint setup are obvious: only some
providers of the same service have to be online at a time,
load-balancing also becomes possible (it requires additional procedures
though). Indeed, in order to make the whole site function properly the
parallel providers of the same service would have to somehow
synchronize their states. Despite the means of such synchronization
lies outside the present paper, one thing could be said for sure: the
larger the project is, the more people and peers it consolidate, the
more advanced technologies become available. One possibility is to
use a peer-to-peer message delivery network such as the “gossipsub”
from libp2p.
In the course of the present example of a picture gallery site, a multi-endpoint setup would be a configuration of choice for a large set of pictures, managed by a community. That’s seems naturally, but how does it get along with the authority of the publisher? In the case of a community-managed dataset no single peer is the author.
The prominent solution to this problem we found in the world of free and open-source software. There we can see the unique devision of labour implemented in the roles of the direct author of code and the maintainer who is responsible for inclusion of that code into the project, for making current and release publications. It seems that the proposed trust model can effectively be used with the role of maintainer instead of the author. However, from the theory as well as the practice of free software we also know that the idea of self-governing public projects is virtually inseparable from the idea (and the practice) of forks. That shouldn’t stop us from building community-based peer-to-peer networks.
Example 2: Dissolving a git hub
Let us consider the next example — a bunch of services for a public software project. The main difference here as compared to the personal clip-art site discussed above are the collaboration services:
/ipns/<PeerID>/CodeSample/
branches/
master
...
app/
index.html
css/
js/
issues/
bugreport/v1.2.1/
endpoints
pullreqs/
changes/v0.9.0/
endpoints
news/
newsfeed/v1.0.0
endpoints
Issue tracking and bug reporting are made with the use of a
hypothetical “bugreport” protocol. The endpoint peers defined for it
under issues/ are responsible for managing the reports, including
registration of new issues, making updates and search. The purpose of
pullreqs/ endpoints are to manage pull requests (and, possible,
patches) in a similar way via a hypothetical “changes”
protocol. Finally, the set of news/ endpoints are handling a
hypothetical “newsfeed” protocol for reading and posting news.
The listed service providers are authorized by the project maintainer
<PeerID> to receive messages from the public community and to supply
current and past information about the related topics.
Where issue and pull request data should be processed and stored? That
depends on the agreement between the maintainer and the endpoint, and
might be additionally restricted by the project policy. The following
considerations, however, seem to be rational. Firstly, with the risk
of spam, moderation of the incoming messages should be arranged.
That assume the maintainer has privileged access to the endpoint
service and is able to edit or delete any message. Secondly, the
issues, pull requests and any other discussion material might be kept
in the project repo united with the code. The latter approach isn’t
widely used, yet it has a long history in software development and a
number of implementations (including Bugs Everywhere, git-issue
and deft)[1].
With the proposed trust model, placing the discussion material into the source tree has a number of additional advantages. First of all, the service endpoints need to keep the received messages only temporary until they are committed to the source tree by the maintainer. The other thing is that the ability to clone both the code and the discussions seems to be crucial for a decentralized platform.
A question may arise about the potential use of a publish/subscribe network to get the project activity updates. For instance, an author may propose changes by broadcasting a CID of a commit containing both code modifications and the corresponding pull-request message. Other community members may join the discussion the same way — by broadcasting additional commits over the existing ones. A canonical form of such procedure, however, has the obvious drawback: peers miss the updates when they’re offline. In turn, that makes it more difficult to merge together frequent and small updates to reconstruct the discussion. The use of endpoint services to receive individual updates and to share the resulting discussion would bring the necessary coordination basis to the process. Moreover, publish/subscribe might still be useful as an endpoint communication method.
The above is true for many different communication tasks other than a topic discussion. The one of the key features of the proposed approach with peer-authorized endpoints is to provide the solution for the problem of delivering a message to a temporary offline destination. Unlike the ordinary user peer, especially a desktop or a mobile, the endpoint peer is primarily a stable online peer. As a custom coordination center, or even better — a custom coordination cluster, authorized endpoints would act as a network glue connecting peers through space and time.
The use of authorized stable peers to hold and cache the content should improve reliability of services. However, it can also reduce their scalability as compared to pure peer-to-peer techniques where the number of caching participants is potentially unlimited. For the cases where scalability is critical the proposed trust model can be extended with the idea of trust forwarding.
Trust forwarding
The example of a collaborative service layout above can be modified the following way in order to increase application availability:
/ipns/<PeerID>/CodeSample/
branches/
master
...
app/
index.html
css/
js/
issues/
bugreport/v1.2.1/
endpoints
downlinks:
/ipns/<PeerID1>/.../bugreport/v1.2.1/endpoints
/ipns/<PeerID2>/.../bugreport/v1.2.1/endpoints
/ipns/<PeerID2>/.../bugreport/v1.2.1/downlinks
...
...
The whole difference here is in the presence of the downlinks file. The file contains a list of other IPNS paths and can be thought as a sort of symbolic linking between IPNS trees.
In the simplest trust forwarding case the set of IPNS paths published by one party should point directly to endpoint list files published by another party. In that case the publisher informs the users that any endpoint selected by the listed third party is a trusted endpoint for the published application. Or, in other words, that the linked third party is trusted by the application publisher to select service endpoints on behalf of him/her.
However, the trust forwarding can potentially go one, two or any
number of steps deeper by publishing links to other downlinks files!
How the trust forwarding should play with counter-authorization? In general the relationship between the application publisher and a service provider should be considered mutual as there can be technical reasons for that. So there is no difference in how the relationship was established: either directly or indirectly, via a third party. And so the hash of the publisher peer ID should be listed on the service provider IPNS in confirmation of their trusted relation (and that is the job of the third party to inform the endpoint about the application).
However, in some cases, where it is technically possible, the service
provider might want to state the service as open to anybody by making
the corresponding record on IPNS. For example:
/ipns/<ProviderPeerID>/supported/imagick/v1.0.1/for/*.
Integration with the Bitswap protocol
The use of IPNS layouts discussed so far doesn’t affect the normal IPFS communications in any way. The “Bitswap” data exchange protocol that is normally used in IPFS is based on a simple credit-like system: peers provide data blocks for each other while tracking the credit/debt balance. The protocol was designed for exchange of content-addressable data and can’t be directly applied to the transfer of dynamic data. That isn’t really necessary, though, if we consider Bitswap as one particular way of exchanging data integrated with other ways. What we may want of it as of an integrated component?
The data exchange rate in Bitswap is varied in order to make the exchange equal for the peers and normally depends on the bytes sent to bytes received ratio. However, only the amount of data, that is transferred through Bitswap itself is taken into account.
“In some cases, nodes must work for their blocks. In the case that a node has nothing that its peers want (or nothing at all), it seeks the pieces its peers want, with lower priority than what the node wants itself.” (IPFS specification[2], sec. 3.4). However, a service provider peer does another kind of work — it provides dynamic content to peers that ask for that. Should that work be taken into account too?
Without means to adjust the Bitswap ratio with factors other than the static data exchange balance, a service provider peer that provides mostly dynamic content to its peers may be unequally limited in downloading content from IPFS. That would especially be a surprise with such IPFS content that is the expected input data for a given service! In order to avoid that drawback we need a way to prioritize sending static blocks to the service provider from the peers it serves — a way to trade dynamic content for static content.
For instance, a photo processing service would probably want to download a lot of image content from IPFS in favour of its clients. In order for the service to be done as soon as possible, the clients should send the necessary image blocks to the service provider bypassing the normal Bitswap ratio.
The prioritized traffic from service consumers to the service provider could also be thought as a form of reward for the service. In that case, the peers should negotiate on the amount of reward before the service is actually provided and update their credit/debt balance accordingly. Most likely we need a special protocol for such agreements (a name to consider is “Workswap”).
It seems naturally to use a special Bitswap strategy to control
the extra flow of static blocks to service providers. The use of
different strategies is discussed in the IPFS specification
(sec. 3.4.2) and is already implemented in go-bitswap in the form of
the WithScoreLedger option[3].
One another way to agree with a service provider could be another service provided in reward. That’s the way to trade dynamic content for dynamic content.
Service consumer design overview
Let us outline and discuss the possible design of a client software to be used for accessing the proposed network services.
The main component of such software is seemed to be a Router — the thing that could be asked to find a provider of the desired service. What information the Router should use for that?
First, a search query pointing to the service descriptor on IPNS,
for instance, /ipns/<PeerID>/ClipArt/convert/. That part of a
query is determined by the application and the peer who maintains it.
Then, a protocol identifier, for instance, imagick/v1.0.1,
imagick/v1 or just imagick. Less definitive protocol IDs
expand the search with more providers to consider.
Having a query the Router should return the best provider that
matches it or the list of providers sorted from best to worst.
The matches can be found by traversing down the endpoints and
downlinks, if any. However, if the tree to traverse is relatively
large that would be slow. What other information might help to make
the search faster? And what should it mean — “best” and “worst”
in regard to providers? The answer to these questions is probably the
Accumulated Statistics table the Router should maintain.
The Accumulated Statistics table seems to be a local datastore with records on particular service provider peers that are updated after each — successful or not — communication with them. After a series of communication sessions there would be a basis to predict future communications from.
The peers in the table could be ranked for their speed, availability, the amount and ratio of data exchanged, etc. The exchange ratio here is very important as it can be used for integration with the “Bitswap” protocol that is normally used in IPFS for exchange of static content (see the pervious section).
If a provider of a service has put a price on it in the form of some amount of prioritized static traffic it could be selected as the “best” provider if the price is relatively low in comparison to other providers or when the actual price would be low for the consumer given the provider’s debt in terms of Bitswap send/received ratio. That’s why the prices should also be recorded in the Accumulated Statistics turning it into a consolidated service price list.
The same principle should be used in the case of the service exchange — i. e. when the price of one service is put in the form of corresponding units of another service, and when the consumer, in turn, is able to act as a provider of that service.
The supposed place where the prices should be defined is the
under/conditions sub-layout, introduced in the
discussion of the example personal picture gallery
service above. However, it seems fair to assume that the actual price
of the service might be customized for the particular consumer by the
particular provider. Therefore, the actual price of the service is a
subject for negotiation. That need for negotiation about the service
prior to providing it is the first reason to consider about a
specific protocol whose name could be “Workswap”.
What about the use of some public statistics about the providers? That seems possible too, but again depends on a source of trust. One solution to that problem might be to wrap the public provider rating itself into the form of a special service conforming to the proposed model of trust. The user should then be able to rely on one or another public rating maintained by one or another peer.
After the service provider peer is determined (and verified) the client should try to reach it over the desired protocol and update the Accumulated Statistics afterwards. If the provider needs some data from IPFS in order to process the client’s request, the client should provide needed blocks bypassing the normal Bitswap strategy — to the benefit of both peers.
Service provider design overview
Now, let’s overview the other side — a service provider software. At the first glance it seems to be quite simple: just listen for incoming connections and answer over application specific protocols. However, like finding the best provider is the first thing to do for a client, a server have to decide which client to serve and when. The need to serve peers in some order leads us to the idea of a Request Queue as the central thing in the service provider.
Being a counterpart of a client, the server should also maintain the credit/debt balance of its peers. And as both static and dynamic traffic could be traded, both kinds of trafic should be accounted. The server, therefore, should also maintain its version of an Accumulated Statistics table.
The peers with good balance should probably be served first. However, what about those peers who can’t yet pay the price? Should they be served at all?
It seems that it should be wise to serve them, though with some low
priority resembling the optimistic nature of “Bitswap”. Moreover,
some services, such as the issues/ and pullreqs/ services
discussed in the software development hub example
above — and all messaging services in general — are
maintainer-oriented, i. e. they should be provided at no cost
if their maintainer wants to stay open and reachable by anyone.
Summing up the above, a service provider should add the incoming requests to the Request Queue assigning a priority rank to each one in accordance with the Accumulated Statistics table, then communicate with the next client in order to agree on the service conditions (the “Workswap” protocol) adjusting its priority rank upon the results and finally — provide the service to the (possibly another) next client using all necessary sorts of communication.
Conclusions
In the present paper the possible means of providing dynamic services on the top of IPFS infrastructure have being outlined and discussed. An important feature of the proposed system is that it tries to integrate with the IPFS traffic baseline — the “Bitswap” protocol, making it possible to trade dynamic content for static content (and vise versa). That integration is supposed to stimulate the exchange of both static and dynamic data and to increase the availability of peers.
Some known IPFS and general P2P problems that the proposed system is intended to solve:
-
dynamic web-sites on IPFS;
-
deferred delivery — preservation of undelivered messages while the recipient peer is offline;
-
public and personal data-pinning services.
License
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
Source
The source of this paper can be downloaded from IPFS: https://ipfs.io/ipfs/QmXrXbCea2y48rtcPMfbAkMrcTQTS13ZzB4jbfs2Xp1waR. In order to clone the repository type:
git clone https://ipfs.io/ipfs/QmXrXbCea2y48rtcPMfbAkMrcTQTS13ZzB4jbfs2Xp1waR/.git