Introduction

IPFS provides a highly developed solution to share content-addressable data in a network of peers. The distributed solution to preservation and exchange of data it provides arouses interest to share more operations between the peers, to build decentralized networks that use and modify shared data collaboratively. It might seem that such a network can be built independently of IPFS and actually there already are a number of peer-to-peer applications that use IPFS as a file sharing tool. However, it also seems promising to find a closer integration between the distributed storage and various methods to access and modify the data it keeps. Supposing the evolution of peer-to-peer networks to go this way we may expect the storage function itself to become only a moment, though the base one, of what peers can do together with the data they share.

Among various approaches to process data in a cooperative and distributed manner the two general organisation principles can be distinguished. The first one is to distribute all functions evenly and randomly among the peers. Consequently, such networks have no authorities and need some sort of decentralized consensus about all dynamic content they produce. This is the field of trustless cooperation. The alternative principle is to retain one or another form of authority and to cooperate relying on explicit trust instead of randomization. While the trustless approach brings a new quality to the peer-to-peer networks and the whole Internet, the trustful cooperation in the self-governing network can reproduce the best practices of the initial Internet, raising them to a whole higher step.

The present paper describes the means to build an open network where dynamic content is produced and shared as a result of trustful cooperation of peers. IPFS seems to be very suitable for a basis of such network which dynamic functions would, in turn, facilitate growth of shared amount of static content and acceleration of exchange.

The trust model

Exchanging of dynamic data between a pair of peers seems to be not harder than a static data transfer. However, when one peer asks for a file with a known hash it is able to verify the reply of the others and be sure there is no garbage or a fake. That’s the key feature of CID addressing which makes it very transparent and handy. In the case of dynamic data we generally don’t have such option. And if we can’t verify the very process of remote calculations (which is true most of the time) we have to trust the peer or to refuse the reply. More precisely, we should have dynamic data transactions only with peers we trust.

Having this in mind we could start to share dynamic services with our friends on a peer-to-peer network. Apparently, offline communications is the first source of trust we all use one way or another.

The second principle we can consider is to trust dynamic data when it is authored by the same source as the static data it is related to. For instance, if a person has taken a picture and told us its CID we could ask he or she to crop and resize it for our needs or to convert it to a format we want and be almost certain the result would not be a fake. Why? Because there is no much sense to fake that way and because the original CID could also be a fake in the first place. So, if someone provides an online picture-processing service for his or her own pictures there is the same reason to trust that service as to trust the picture data itself.

On the other hand, sometimes trust might be critical for one user and irrelevant for the other. For instance, the use of a trusted email or a messaging server is normally more important for the recipient rather than the sender.

These facts encourage the use of self-hosted services in a network of peers. However, if we consider the possibility of using offline communications as the base for trust, it further leads us to the idea of a network where particular services of a peer are provided on behalf of one or more other peers that trust the provider. The only question here is how to make such behalf observable and verifiable for the rest of the network?

That’s the place where IPNS should come into play. It provides a way to publish authorized updates in the IPFS peer-to-peer network.

The simplest form in which trusted relationships could be stated on IPNS requires identifier of one peer to be published by the other peer somewhere in its personal IPNS area. However, in order to state the trust as specific to a particular service only we need a way to reference that service along with ID of the peer that provides it on behalf of us. Moreover, it seems reasonable to reference services in their relation to a specific context or a dataset suggesting their usage for a particular purpose. Therefore, we need to define some kind of layout in order to publish all necessary information unambiguously.

Example 1: A personal picture gallery

A network service is defined by a particular protocol that is implemented in software and sometimes in hardware. A protocol, in turn, is usually described elsewhere in human and machine-readable forms with assignment of a unique name and a version number.

In IPFS and other libp2p-based applications a path-like form of a protocol identifier is commonly accepted. Using this form seems very suitable for our task because a path name of a service protocol a) can be naturally placed under a context as a sub-path and b) itself defines a sub-context for any service-related data.

Now, let us try to layout a picture gallery service mentioned above using protocol path names to delimit trust contexts:

/ipns/<PeerID>/ClipArt/
    dir/
        cat_0001.jpg
        dog_0002.jpg
        ...
    convert/
        imagick/v1.0.1/
            endpoints
    app/
        index.html
        js/
        css/
    search/
        filesearch/v1.3.0/
            endpoints

First of all, we see the ClipArt/ directory that states the name of the publication. The pictures themselves can be found under dir/. Next to it is placed the convert/ directory with imagick/v1.0.1/ subdirectory sequence containing the endpoints file. The path of that sequence matches the name-version identifier of a hypothetical “imagick” protocol that might be used to communicate with an image processing service (the name resembles the well-known “ImageMagick” suite). As for the endpoints file, it should contain the list of peers that are trusted by the clip-art publisher to handle the image processing requests.

What procedure could be used to access that service? The base idea is the following. When a client software is pointed to /ipns/<PeerID>/ClipArt/convert/ it should search for the endpoints file, trying various paths corresponding to the protocols it supports until a match is found. Then it should try to connect to one of the endpoint peers and use the protocol appropriately.

However, where a compatible client software could be obtained from? Depending on the protocol spread the possible client may or may not be known to the end user. For sake of completeness we assume here that a particular version of a client is shipped along with the gallery in the form of a JavaScript application that uses js-libp2p to communicate with the endpoint peers (the app/ directory in the tree).

At least one more thing is needed to make the gallery user really happy: the gallery author should allow the user to somehow search for pictures. The authorized search service is made possible by inclusion of one another hypothetical protocol node into the ClipArt/ IPNS tree: search/filesearch/v1.3.0/.

Who should actually perform the search? In the simplest case, the corresponding endpoints file would list a single peer — the self peer ID of the gallery publisher (the same thing for the image conversion service). Therefore, using a self-endpoint setup one could easily run a self-hosted site with static content on IPFS and libp2p-accessible services authorized on IPNS.

However, a more advanced setup with more than one endpoint peer is also possible. The services, including the search service, could also be provided by multiple third-party peers on behalf of the gallery publisher. The publisher on his/her part have to list the trusted service providers as authorized endpoints in the corresponding endpoints file. In the case of the search service the trusting relationships between the publisher and the providers assume the latter watch after /ipns/<PeerID>/ClipArt/dir/ and perform metadata indexing on updates. It also assumes the trusting relationships to be mutual having providers in turn to list on their IPNS some information that should allow the others to verify that the given peer confirms to act as an endpoint for another peer. The possible layout for the counter-authorization is the following:

/ipns/<ProviderPeerID>/supported/
    imagick/v1.0.1/
        for/
            <hash-of-PeerID_1>
            <hash-of-PeerID_2>
            ...
        under/
            conditions
    filesearch/v1.3.0/
        for/
            <hash-of-PeerID_1>
            <hash-of-PeerID_2>
            ...
        under/
            conditions

This time it seems better to list peer IDs directly in the form of file names to simplify the verification procedure in the case there would be a lot of authorized peers. Using the form of a hash of a peer ID instead of a plain peer ID seems reasonable as a protection for personal information.

The under/conditions sub-layout is a place to define some requirements under which the service is provided by that particular peer. A number of possible requirements are discussed in the “Service consumer design overview” section below.

The advantages of multi-endpoint setup are obvious: only some providers of the same service have to be online at a time, load-balancing also becomes possible (it requires additional procedures though). Indeed, in order to make the whole site function properly the parallel providers of the same service would have to somehow synchronize their states. Despite the means of such synchronization lies outside the present paper, one thing could be said for sure: the larger the project is, the more people and peers it consolidate, the more advanced technologies become available. One possibility is to use a peer-to-peer message delivery network such as the “gossipsub” from libp2p.

In the course of the present example of a picture gallery site, a multi-endpoint setup would be a configuration of choice for a large set of pictures, managed by a community. That’s seems naturally, but how does it get along with the authority of the publisher? In the case of a community-managed dataset no single peer is the author.

The prominent solution to this problem we found in the world of free and open-source software. There we can see the unique devision of labour implemented in the roles of the direct author of code and the maintainer who is responsible for inclusion of that code into the project, for making current and release publications. It seems that the proposed trust model can effectively be used with the role of maintainer instead of the author. However, from the theory as well as the practice of free software we also know that the idea of self-governing public projects is virtually inseparable from the idea (and the practice) of forks. That shouldn’t stop us from building community-based peer-to-peer networks.

Example 2: Dissolving a git hub

Let us consider the next example — a bunch of services for a public software project. The main difference here as compared to the personal clip-art site discussed above are the collaboration services:

/ipns/<PeerID>/CodeSample/
    branches/
        master
        ...
    app/
        index.html
        css/
        js/
   issues/
       bugreport/v1.2.1/
           endpoints
   pullreqs/
       changes/v0.9.0/
           endpoints
   news/
       newsfeed/v1.0.0
           endpoints

Issue tracking and bug reporting are made with the use of a hypothetical “bugreport” protocol. The endpoint peers defined for it under issues/ are responsible for managing the reports, including registration of new issues, making updates and search. The purpose of pullreqs/ endpoints are to manage pull requests (and, possible, patches) in a similar way via a hypothetical “changes” protocol. Finally, the set of news/ endpoints are handling a hypothetical “newsfeed” protocol for reading and posting news.

The listed service providers are authorized by the project maintainer <PeerID> to receive messages from the public community and to supply current and past information about the related topics.

Where issue and pull request data should be processed and stored? That depends on the agreement between the maintainer and the endpoint, and might be additionally restricted by the project policy. The following considerations, however, seem to be rational. Firstly, with the risk of spam, moderation of the incoming messages should be arranged. That assume the maintainer has privileged access to the endpoint service and is able to edit or delete any message. Secondly, the issues, pull requests and any other discussion material might be kept in the project repo united with the code. The latter approach isn’t widely used, yet it has a long history in software development and a number of implementations (including Bugs Everywhere, git-issue and deft)[1].

With the proposed trust model, placing the discussion material into the source tree has a number of additional advantages. First of all, the service endpoints need to keep the received messages only temporary until they are committed to the source tree by the maintainer. The other thing is that the ability to clone both the code and the discussions seems to be crucial for a decentralized platform.

A question may arise about the potential use of a publish/subscribe network to get the project activity updates. For instance, an author may propose changes by broadcasting a CID of a commit containing both code modifications and the corresponding pull-request message. Other community members may join the discussion the same way — by broadcasting additional commits over the existing ones. A canonical form of such procedure, however, has the obvious drawback: peers miss the updates when they’re offline. In turn, that makes it more difficult to merge together frequent and small updates to reconstruct the discussion. The use of endpoint services to receive individual updates and to share the resulting discussion would bring the necessary coordination basis to the process. Moreover, publish/subscribe might still be useful as an endpoint communication method.

The above is true for many different communication tasks other than a topic discussion. The one of the key features of the proposed approach with peer-authorized endpoints is to provide the solution for the problem of delivering a message to a temporary offline destination. Unlike the ordinary user peer, especially a desktop or a mobile, the endpoint peer is primarily a stable online peer. As a custom coordination center, or even better — a custom coordination cluster, authorized endpoints would act as a network glue connecting peers through space and time.

The use of authorized stable peers to hold and cache the content should improve reliability of services. However, it can also reduce their scalability as compared to pure peer-to-peer techniques where the number of caching participants is potentially unlimited. For the cases where scalability is critical the proposed trust model can be extended with the idea of trust forwarding.

Trust forwarding

The example of a collaborative service layout above can be modified the following way in order to increase application availability:

/ipns/<PeerID>/CodeSample/
    branches/
        master
        ...
    app/
        index.html
        css/
        js/
    issues/
        bugreport/v1.2.1/
            endpoints
            downlinks:
                /ipns/<PeerID1>/.../bugreport/v1.2.1/endpoints
                /ipns/<PeerID2>/.../bugreport/v1.2.1/endpoints
                /ipns/<PeerID2>/.../bugreport/v1.2.1/downlinks
            ...
    ...

The whole difference here is in the presence of the downlinks file. The file contains a list of other IPNS paths and can be thought as a sort of symbolic linking between IPNS trees.

In the simplest trust forwarding case the set of IPNS paths published by one party should point directly to endpoint list files published by another party. In that case the publisher informs the users that any endpoint selected by the listed third party is a trusted endpoint for the published application. Or, in other words, that the linked third party is trusted by the application publisher to select service endpoints on behalf of him/her.

However, the trust forwarding can potentially go one, two or any number of steps deeper by publishing links to other downlinks files!

How the trust forwarding should play with counter-authorization? In general the relationship between the application publisher and a service provider should be considered mutual as there can be technical reasons for that. So there is no difference in how the relationship was established: either directly or indirectly, via a third party. And so the hash of the publisher peer ID should be listed on the service provider IPNS in confirmation of their trusted relation (and that is the job of the third party to inform the endpoint about the application).

However, in some cases, where it is technically possible, the service provider might want to state the service as open to anybody by making the corresponding record on IPNS. For example: /ipns/<ProviderPeerID>/supported/imagick/v1.0.1/for/*.

Integration with the Bitswap protocol

The use of IPNS layouts discussed so far doesn’t affect the normal IPFS communications in any way. The “Bitswap” data exchange protocol that is normally used in IPFS is based on a simple credit-like system: peers provide data blocks for each other while tracking the credit/debt balance. The protocol was designed for exchange of content-addressable data and can’t be directly applied to the transfer of dynamic data. That isn’t really necessary, though, if we consider Bitswap as one particular way of exchanging data integrated with other ways. What we may want of it as of an integrated component?

The data exchange rate in Bitswap is varied in order to make the exchange equal for the peers and normally depends on the bytes sent to bytes received ratio. However, only the amount of data, that is transferred through Bitswap itself is taken into account.

“In some cases, nodes must work for their blocks. In the case that a node has nothing that its peers want (or nothing at all), it seeks the pieces its peers want, with lower priority than what the node wants itself.” (IPFS specification[2], sec. 3.4). However, a service provider peer does another kind of work — it provides dynamic content to peers that ask for that. Should that work be taken into account too?

Without means to adjust the Bitswap ratio with factors other than the static data exchange balance, a service provider peer that provides mostly dynamic content to its peers may be unequally limited in downloading content from IPFS. That would especially be a surprise with such IPFS content that is the expected input data for a given service! In order to avoid that drawback we need a way to prioritize sending static blocks to the service provider from the peers it serves — a way to trade dynamic content for static content.

For instance, a photo processing service would probably want to download a lot of image content from IPFS in favour of its clients. In order for the service to be done as soon as possible, the clients should send the necessary image blocks to the service provider bypassing the normal Bitswap ratio.

The prioritized traffic from service consumers to the service provider could also be thought as a form of reward for the service. In that case, the peers should negotiate on the amount of reward before the service is actually provided and update their credit/debt balance accordingly. Most likely we need a special protocol for such agreements (a name to consider is “Workswap”).

It seems naturally to use a special Bitswap strategy to control the extra flow of static blocks to service providers. The use of different strategies is discussed in the IPFS specification (sec. 3.4.2) and is already implemented in go-bitswap in the form of the WithScoreLedger option[3].

One another way to agree with a service provider could be another service provided in reward. That’s the way to trade dynamic content for dynamic content.

Service consumer design overview

Let us outline and discuss the possible design of a client software to be used for accessing the proposed network services.

The main component of such software is seemed to be a Router — the thing that could be asked to find a provider of the desired service. What information the Router should use for that?

First, a search query pointing to the service descriptor on IPNS, for instance, /ipns/<PeerID>/ClipArt/convert/. That part of a query is determined by the application and the peer who maintains it. Then, a protocol identifier, for instance, imagick/v1.0.1, imagick/v1 or just imagick. Less definitive protocol IDs expand the search with more providers to consider.

Having a query the Router should return the best provider that matches it or the list of providers sorted from best to worst. The matches can be found by traversing down the endpoints and downlinks, if any. However, if the tree to traverse is relatively large that would be slow. What other information might help to make the search faster? And what should it mean — “best” and “worst” in regard to providers? The answer to these questions is probably the Accumulated Statistics table the Router should maintain.

The Accumulated Statistics table seems to be a local datastore with records on particular service provider peers that are updated after each — successful or not — communication with them. After a series of communication sessions there would be a basis to predict future communications from.

The peers in the table could be ranked for their speed, availability, the amount and ratio of data exchanged, etc. The exchange ratio here is very important as it can be used for integration with the “Bitswap” protocol that is normally used in IPFS for exchange of static content (see the pervious section).

If a provider of a service has put a price on it in the form of some amount of prioritized static traffic it could be selected as the “best” provider if the price is relatively low in comparison to other providers or when the actual price would be low for the consumer given the provider’s debt in terms of Bitswap send/received ratio. That’s why the prices should also be recorded in the Accumulated Statistics turning it into a consolidated service price list.

The same principle should be used in the case of the service exchange — i. e. when the price of one service is put in the form of corresponding units of another service, and when the consumer, in turn, is able to act as a provider of that service.

The supposed place where the prices should be defined is the under/conditions sub-layout, introduced in the discussion of the example personal picture gallery service above. However, it seems fair to assume that the actual price of the service might be customized for the particular consumer by the particular provider. Therefore, the actual price of the service is a subject for negotiation. That need for negotiation about the service prior to providing it is the first reason to consider about a specific protocol whose name could be “Workswap”.

What about the use of some public statistics about the providers? That seems possible too, but again depends on a source of trust. One solution to that problem might be to wrap the public provider rating itself into the form of a special service conforming to the proposed model of trust. The user should then be able to rely on one or another public rating maintained by one or another peer.

After the service provider peer is determined (and verified) the client should try to reach it over the desired protocol and update the Accumulated Statistics afterwards. If the provider needs some data from IPFS in order to process the client’s request, the client should provide needed blocks bypassing the normal Bitswap strategy — to the benefit of both peers.

Service provider design overview

Now, let’s overview the other side — a service provider software. At the first glance it seems to be quite simple: just listen for incoming connections and answer over application specific protocols. However, like finding the best provider is the first thing to do for a client, a server have to decide which client to serve and when. The need to serve peers in some order leads us to the idea of a Request Queue as the central thing in the service provider.

Being a counterpart of a client, the server should also maintain the credit/debt balance of its peers. And as both static and dynamic traffic could be traded, both kinds of trafic should be accounted. The server, therefore, should also maintain its version of an Accumulated Statistics table.

The peers with good balance should probably be served first. However, what about those peers who can’t yet pay the price? Should they be served at all?

It seems that it should be wise to serve them, though with some low priority resembling the optimistic nature of “Bitswap”. Moreover, some services, such as the issues/ and pullreqs/ services discussed in the software development hub example above — and all messaging services in general — are maintainer-oriented, i. e. they should be provided at no cost if their maintainer wants to stay open and reachable by anyone.

Summing up the above, a service provider should add the incoming requests to the Request Queue assigning a priority rank to each one in accordance with the Accumulated Statistics table, then communicate with the next client in order to agree on the service conditions (the “Workswap” protocol) adjusting its priority rank upon the results and finally — provide the service to the (possibly another) next client using all necessary sorts of communication.

Conclusions

In the present paper the possible means of providing dynamic services on the top of IPFS infrastructure have being outlined and discussed. An important feature of the proposed system is that it tries to integrate with the IPFS traffic baseline — the “Bitswap” protocol, making it possible to trade dynamic content for static content (and vise versa). That integration is supposed to stimulate the exchange of both static and dynamic data and to increase the availability of peers.

Some known IPFS and general P2P problems that the proposed system is intended to solve:

  • dynamic web-sites on IPFS;

  • deferred delivery — preservation of undelivered messages while the recipient peer is offline;

  • public and personal data-pinning services.

License

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Source

The source of this paper can be downloaded from IPFS: https://ipfs.io/ipfs/QmXrXbCea2y48rtcPMfbAkMrcTQTS13ZzB4jbfs2Xp1waR. In order to clone the repository type:

git clone https://ipfs.io/ipfs/QmXrXbCea2y48rtcPMfbAkMrcTQTS13ZzB4jbfs2Xp1waR/.git


1. Thanks to git-issue maintainers for providing a comprehensive list of the related projects!
2. https://ipfs.io/ipfs/QmV9tSDx9UiPeWExXEeH6aoDvmihvx6jD5eLb4jbTaKGps
3. https://github.com/ipfs/go-bitswap/pull/430