Intermediate: Paths
Paths are an extension of anchors that have been introduced in RSM. They fill the same role but add a lot more flexibility and dimensionality, and allows you to create complex indexes to quickly query the DHT easily.
You can think of paths like anchor trees, in which we don't only create one anchor entry to hold all the links to a particular type of entry, but rather create more than one, to distribute those links much more homogeneously in the DHT. If you haven't done the anchors exercise, do it now before doing the paths one.
The content of each path is a string with segments separated by a dot, for example: all_tasks.project1.finished
. This path will create these entries:
all_tasks
all_tasks.project1
all_tasks.project1.finished
Here, you can see that the root parent of the path is all_tasks
, which has all_tasks.project1
as a child. Each of these entries has a hash in the DHT like any other entry. Also, every parent will have a link pointing to all its children.
There are two goals we have in mind when using paths:
- Reducing DHT hotspots
If we only create one anchor entry and attach all the links to posts from that entry, the poor nodes that will be holding that entry will end up holding all those links as well - this can get big in terms of storage. Creating multiple entries makes it so that the links get distributed around in the DHT much more evenly.
- Read performance
Usually we don't want to query "all the posts that have been ever created". Imagine that you want to get the posts for the last day. If we only have one anchor entry, this can get really slow, because we need to do a get
for every post to check whether it has been made in the last day, and then return the ones that have been. Instead, if we are a bit smart in the way we create the paths, we can just query the appropriate anchors that will only hold the posts for that day.
Try it!
Here you can create paths yourself, and see which entries and links are created.
The basic mechanism for which these entries are useful is to attach links to them. If you attach a link to the all_tasks.project1.finished
that points to all tasks related with project1
that have finished, now you can do a get_links
on that path to get only those.
If, on the contrary, you want to get all tasks within the project regardless of status, you can get all the children paths from all_tasks.project1
, which will give you for example all_tasks.project1.todo
, all_tasks.project1.doing
and all_tasks.project1.finished
, and then do a get_links
to tasks on those.
You can imagine different types of indexes built on top of paths, with multidimensional properties.
entry_defs![
PathEntry::entry_def(),
...
];
Exercise
Problem statement
We need to code a small zome that satisfies these capabilities:
- Create a new post, passing a content and some tags
- Get all posts within a day or an hour, examples:
- "get me all posts posted on 21st February, 2021"
- "get me all posts posted between 21:00 and 22:00 of 21st February, 2021"
- Get all the tags that have been created
- Get all posts that have been created with a certain tag
- "get me all posts that have been posted with the tag "nature""
You can follow this entry design to accomplish it:
- Go to the
developer-exercises
. - Enter the nix-shell:
nix-shell
you should run this in the folder containing the default.nix file - Go to folder with the exercise
intermediate/1.paths
- Inside
zome/exercise/src/lib.rs
- Implement all
unimplemented!()
functions
- Implement all
- Compile and test your code:
cd tests && npm test
. - Don't stop until the test runs green