Intermediate: Paths

Paths are an extension of anchors that have been introduced in RSM. They fill the same role but add a lot more flexibility and dimensionality, and allows you to create complex indexes to quickly query the DHT easily.

You can think of paths like anchor trees, in which we don't only create one anchor entry to hold all the links to a particular type of entry, but rather create more than one, to distribute those links much more homogeneously in the DHT. If you haven't done the anchors exercise, do it now before doing the paths one.

The content of each path is a string with segments separated by a dot, for example: all_tasks.project1.finished. This path will create these entries:

all_tasks
all_tasks.project1
all_tasks.project1.finished

Here, you can see that the root parent of the path is all_tasks, which has all_tasks.project1 as a child. Each of these entries has a hash in the DHT like any other entry. Also, every parent will have a link pointing to all its children.

There are two goals we have in mind when using paths:

Reducing DHT hotspots

If we only create one anchor entry and attach all the links to posts from that entry, the poor nodes that will be holding that entry will end up holding all those links as well - this can get big in terms of storage. Creating multiple entries makes it so that the links get distributed around in the DHT much more evenly.

Read performance

Usually we don't want to query "all the posts that have been ever created". Imagine that you want to get the posts for the last day. If we only have one anchor entry, this can get really slow, because we need to do a get for every post to check whether it has been made in the last day, and then return the ones that have been. Instead, if we are a bit smart in the way we create the paths, we can just query the appropriate anchors that will only hold the posts for that day.

Try it!

Here you can create paths yourself, and see which entries and links are created.

The basic mechanism for which these entries are useful is to attach links to them. If you attach a link to the all_tasks.project1.finished that points to all tasks related with project1 that have finished, now you can do a get_links on that path to get only those.

If, on the contrary, you want to get all tasks within the project regardless of status, you can get all the children paths from all_tasks.project1, which will give you for example all_tasks.project1.todo, all_tasks.project1.doing and all_tasks.project1.finished, and then do a get_links to tasks on those.

You can imagine different types of indexes built on top of paths, with multidimensional properties.

Keep in mind that paths are already incorporated in the core hdk, so you don't need to import them from an external library. Although it is necessary to define them as an entry definition in your zome like this:

entry_defs![
    PathEntry::entry_def(),
    ...
];

Exercise

Problem statement

We need to code a small zome that satisfies these capabilities:

Create a new post, passing a content and some tags
Get all posts within a day or an hour, examples:
- "get me all posts posted on 21st February, 2021"
- "get me all posts posted between 21:00 and 22:00 of 21st February, 2021"
Get all the tags that have been created
Get all posts that have been created with a certain tag
- "get me all posts that have been posted with the tag "nature""

You can follow this entry design to accomplish it:

Go to the developer-exercises.
Enter the nix-shell: nix-shell
you should run this in the folder containing the default.nix file
Go to folder with the exercise intermediate/1.paths
Inside zome/exercise/src/lib.rs
- Implement all unimplemented!() functions
Compile and test your code: cd tests && npm test.
Don't stop until the test runs green

Caught a mistake or want to contribute to the documentation? Edit this page on GitHub!