Manifests¶
Manifests in Swarm¶
In general manifests declare a list of strings associated with swarm hashes. A manifest matches to exactly one hash, and it consists of a list of entries declaring the content which can be retrieved through that hash. Let us begin with an introductory example.
This is demonstrated by the following example. Let’s create a directory containing the two orange papers and an html index file listing the two pdf documents.
$ ls -1 orange-papers/
index.html
smash.pdf
sw^3.pdf
$ cat orange-papers/index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
<ul>
<li>
<a href="./sw^3.pdf">Viktor Trón, Aron Fischer, Dániel Nagy A and Zsolt Felföldi, Nick Johnson: swap, swear and swindle: incentive system for swarm.</a> May 2016
</li>
<li>
<a href="./smash.pdf">Viktor Trón, Aron Fischer, Nick Johnson: smash-proof: auditable storage for swarm secured by masked audit secret hash.</a> May 2016
</li>
</ul>
</body>
</html>
We now use the swarm up
command to upload the directory to swarm to create a mini virtual site.
swarm --recursive --defaultpath orange-papers/index.html --bzzapi http://swarm-gateways.net/ up orange-papers/ 2> up.log
> 2477cc8584cc61091b5cc084cdcdb45bf3c6210c263b0143f030cf7d750e894d
The returned hash is the hash of the manifest for the uploaded content (the orange-papers directory):
We now can get the manifest itself directly (instead of the files they refer to) by using the bzz-raw protocol bzz-raw
:
wget -O - "http://localhost:8500/bzz-raw:/2477cc8584cc61091b5cc084cdcdb45bf3c6210c263b0143f030cf7d750e894d"
> {
"entries": [
{
"hash": "4b3a73e43ae5481960a5296a08aaae9cf466c9d5427e1eaa3b15f600373a048d",
"contentType": "text/html; charset=utf-8"
},
{
"hash": "4b3a73e43ae5481960a5296a08aaae9cf466c9d5427e1eaa3b15f600373a048d",
"contentType": "text/html; charset=utf-8",
"path": "index.html"
},
{
"hash": "69b0a42a93825ac0407a8b0f47ccdd7655c569e80e92f3e9c63c28645df3e039",
"contentType": "application/pdf",
"path": "smash.pdf"
},
{
"hash": "6a18222637cafb4ce692fa11df886a03e6d5e63432c53cbf7846970aa3e6fdf5",
"contentType": "application/pdf",
"path": "sw^3.pdf"
}
]
}
Manifests contain content_type information for the hashes they reference. In other contexts, where content_type is not supplied or, when you suspect the information is wrong, it is possible to specify the content_type manually in the search query. For example, the manifest itself should be text/plain:
http://localhost:8500/bzz-raw:/2477cc8584cc61091b5cc084cdcdb45bf3c6210c263b0143f030cf7d750e894d?content_type="text/plain"
Now you can also check that the manifest hash matches the content (in fact swarm does it for you):
$ wget -O- http://localhost:8500/bzz-raw:/2477cc8584cc61091b5cc084cdcdb45bf3c6210c263b0143f030cf7d750e894d?content_type="text/plain" > manifest.json
$ swarm hash manifest.json
> 2477cc8584cc61091b5cc084cdcdb45bf3c6210c263b0143f030cf7d750e894d
Path Matching¶
A useful feature of manifests is that we can match paths with URLs. In some sense this makes the manifest a routing table and so the manifest acts as if it was a host.
More concretely, continuing in our example, when we request:
GET http://localhost:8500/bzz:/2477cc8584cc61091b5cc084cdcdb45bf3c6210c263b0143f030cf7d750e894d/sw^3.pdf
Swarm first retrieves the document matching the manifest above. The url path sw^3
is then matched against the entries. In this case a perfect match is found and the document at 6a182226… is served as a pdf.
As you can see the manifest contains 4 entries, although our directory contained only 3. The extra entry is there because of the --defaultpath orange-papers/index.html
option to swarm up
, which associates the empty path with the file you give as its argument. This makes it possible to have a default page served when the url path is empty.
This feature essentially implements the most common webserver rewrite rules used to set the landing page of a site served when the url only contains the domain. So when you request
GET http://localhost:8500/bzz:/2477cc8584cc61091b5cc084cdcdb45bf3c6210c263b0143f030cf7d750e894d/
you get served the index page (with content type text/html
) at 4b3a73e43ae5481960a5296a08aaae9cf466c9d5427e1eaa3b15f600373a048d
.