# Caching slow storages

To improve the access times of slower, remote containers like those comprising the Pandora forest, Wildland provides the notion of cache storages. Cache storages are local storages which are automatically managed and used to mirror containers' contents for faster access. The contents of a cache storage is automatically kept in sync with the original container's storage so that any changes made to the local cache are mirrored to the remote end, and vice versa.

# Single container example

Let's create a Dropbox-based container. For that, we need a storage template (here named mydropbox):

$ wl template create dropbox mydropbox --app-key <Dropbox app key>

Now let's create the container (cache-test):

$ wl container create --path /mydropbox --template mydropbox cache-test

Created: /home/user/.config/wildland/containers/cache-test.container.yaml
Created base path: /6dd6bc9a-c32a-4d63-a3e7-c12497b75011
Adding storage 873eb097-ec52-4ede-83e6-7381b095b2e7 to container.
Saved container /home/user/.config/wildland/containers/cache-test.container.yaml

$ wl container info cache-test

Sensitive fields are hidden.
/home/user/.config/wildland/containers/cache-test.container.yaml
object: container
owner: '0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6'
paths:
- /.uuid/6dd6bc9a-c32a-4d63-a3e7-c12497b75011
- /mydropbox
backends:
  storage:
  - type: dropbox
    backend-id: 873eb097-ec52-4ede-83e6-7381b095b2e7
title: null
categories: []
version: '1'

As we can see, this container has only one dropbox-type storage, as expected.

Let's mount the container and add some files to it:

$ wl start --skip-forest-mount
Starting Wildland at: /home/user/wildland

$ wl c mount cache-test
Loading containers (from 'cache-test'): 1
Checking container references (from 'cache-test'): 1
Preparing mount of container references (from 'cache-test'): 1
Mounting one storage

$ echo "test 1" > ~/wildland/mydropbox/test1.txt

$ dd if=/dev/urandom of=/tmp/rnd bs=4K count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.15051 s, 27.9 MB/s

$ time cp /tmp/rnd ~/wildland/mydropbox/test2.rnd
real    0m4.305s
user    0m0.001s
sys     0m0.011s

$ tree ~/wildland/mydropbox/
/home/user/wildland/mydropbox/
├── test1.txt
└── test2.rnd

0 directories, 2 files

$ time find ~/wildland/mydropbox/test* -type f -print0 | xargs -0 sha256sum
3cd203ac11340842055a6de561c9d69ca4493e912bd4c3c440c80711e16d5aee  /home/user/wildland/mydropbox/test1.txt
e6811bb1913eec21d89ecbb0c2fa72df7671995408e6a8b6291a2595ba48c157  /home/user/wildland/mydropbox/test2.rnd

real    0m3.017s
user    0m0.079s
sys     0m0.004s

$ wl c unmount cache-test
Loading containers (from 'cache-test'): 0
Unmounting 1 containers

In reality the container may contain a lot of data and we may not want to always hit the network any time we access some file there. So, let's add a cache to this container. First, we need to create a default storage template that will be used for container caches. This should usually be a fast local storage:

$ mkdir ~/storage/cache
$ wl template create local --location ~/storage/cache cache-tpl
Storage template [cache-tpl] created in /home/user/.config/wildland/templates/cache-tpl.template.jinja

The ~/storage/cache directory will be the root of all subsequently created local cache storages. Our cache template is named cache-tpl. Now we set the default cache template for Wildland:

$ wl set-default-cache cache-tpl
Set template cache-tpl as default for container cache storages

The steps above only need to be performed once, after that we are ready to create and use the caches. To mark that our cache-test container should use cache we simply add the -c or --with-cache option to the mount command:

$ wl c mount -c cache-test

Loading containers (from 'cache-test'): 1
Checking container references (from 'cache-test'): 1
Preparing mount of container references (from 'cache-test'): 1
Mounting one storage

$ wl status
Mounted containers:

/.users/0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6:/.backends/6dd6bc9a-c32a-4d63-a3e7-c12497b75011/097fc435-1f34-4b9c-b058-487c18765472
  storage: local
  paths:
    /mydropbox
/.users/0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6:/.backends/6dd6bc9a-c32a-4d63-a3e7-c12497b75011/873eb097-ec52-4ede-83e6-7381b095b2e7
  storage: dropbox

Sync jobs:
:/mydropbox: SYNCED 'dropbox'(backend_id=873eb097-ec52-4ede-83e6-7381b095b2e7) <-> 'local'(backend_id=097fc435-1f34-4b9c-b058-487c18765472)

Now we can see something interesting. The mounted container shows two storages: the first one is a local one, this is our new cache storage. Dropbox storage is now a second storage. We also see that an automatic job was created that keeps these two storages in sync. The status of this sync job may be displayed as ONE_SHOT first to indicate that an initial sync from Dropbox to cache is in progress. The wl container info now also shows that the container has a cache storage associated with it:

$ wl c info cache-test
Sensitive fields are hidden.
/home/user/.config/wildland/containers/cache-test.container.yaml
object: container
owner: '0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6'
paths:
- /.uuid/6dd6bc9a-c32a-4d63-a3e7-c12497b75011
- /mydropbox
backends:
  storage:
  - type: dropbox
    backend-id: 873eb097-ec52-4ede-83e6-7381b095b2e7
title: null
categories: []
version: '1'
cache:
  type: local
  backend_id: 097fc435-1f34-4b9c-b058-487c18765472
  location: /home/user/storage/cache/6dd6bc9a-c32a-4d63-a3e7-c12497b75011

Let's make sure our files are intact and compare the performance:

$ tree ~/wildland/mydropbox/
/home/user/wildland/mydropbox/
├── test1.txt
└── test2.rnd

0 directories, 2 files

$ time find ~/wildland/mydropbox/test* -type f -print0 | xargs -0 sha256sum
3cd203ac11340842055a6de561c9d69ca4493e912bd4c3c440c80711e16d5aee  /home/user/wildland/mydropbox/test1.txt
e6811bb1913eec21d89ecbb0c2fa72df7671995408e6a8b6291a2595ba48c157  /home/user/wildland/mydropbox/test2.rnd

real    0m0.108s
user    0m0.033s
sys     0m0.010s

The checksums match and the operation is much faster since we're now operating on local files. Let's make some modifications to our files now:

$ echo "modified" > ~/wildland/mydropbox/test1.txt
$ time cp /tmp/rnd ~/wildland/mydropbox/test3.rnd

real    0m0.030s
user    0m0.003s
sys     0m0.003s

$ cat ~/storage/cache/6dd6bc9a-c32a-4d63-a3e7-c12497b75011/test1.txt
modified

The last command demonstrates that the physical location of our cache storage is in ~/storage/cache/6dd6bc9a-c32a-4d63-a3e7-c12497b75011. ~/storage/cache is our cache root, as set up at the beginning of this tutorial, and the UUID is the container UUID.

After creating the cache with the wl c mount -c command we no longer need to add the -c option: the cache will be used by default whenever the container is mounted:

$ wl c unmount cache-test
Loading containers (from 'cache-test'): 0
Unmounting 1 containers

$ wl c mount cache-test
Loading containers (from 'cache-test'): 1
Checking container references (from 'cache-test'): 1
Preparing mount of container references (from 'cache-test'): 1
Mounting one storage

$ wl status
Mounted containers:

/.users/0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6:/.backends/6dd6bc9a-c32a-4d63-a3e7-c12497b75011/097fc435-1f34-4b9c-b058-487c18765472
  storage: local
  paths:
    /mydropbox
/.users/0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6:/.backends/6dd6bc9a-c32a-4d63-a3e7-c12497b75011/873eb097-ec52-4ede-83e6-7381b095b2e7
  storage: dropbox

Sync jobs:
:/mydropbox: SYNCED 'dropbox'(backend_id=873eb097-ec52-4ede-83e6-7381b095b2e7) <-> 'local'(backend_id=097fc435-1f34-4b9c-b058-487c18765472)

Now let's demonstrate how to disable the cache and double-check that the container contents are preserved. Operations like adding or removing a cache should be done on an unmounted container.

$ wl c unmount cache-test
Loading containers (from 'cache-test'): 0
Unmounting 1 containers

$ wl c delete-cache cache-test
Deleting cache: /home/user/.config/wildland/cache/0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6.6dd6bc9a-c32a-4d63-a3e7-c12497b75011.storage.yaml

$ wl c info cache-test
Sensitive fields are hidden.
/home/user/.config/wildland/containers/cache-test.container.yaml
object: container
owner: '0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6'
paths:
- /.uuid/6dd6bc9a-c32a-4d63-a3e7-c12497b75011
- /mydropbox
backends:
  storage:
  - type: dropbox
    backend-id: 873eb097-ec52-4ede-83e6-7381b095b2e7
title: null
categories: []
version: '1'

Cache manifests are kept in the cache subdirectory of the Wildland config (~/.config/wildland by default). Their names are generated using the container owner ID concatenated with the container UUID. After removing the cache we see that our container is back to having only one storage, the Dropbox one. Note: the delete-cache command only removes the cache manifest, it does not delete any files from the cache directory to prevent accidental loss of data.

$ wl c mount cache-test
Loading containers (from 'cache-test'): 1
Checking container references (from 'cache-test'): 1
Preparing mount of container references (from 'cache-test'): 1
Mounting one storage

$ wl status
Mounted containers:

/.users/0xfa24b153948426f9451f9ebd0103b508b484f7c9042864c0b8f333d974b261b6:/.backends/6dd6bc9a-c32a-4d63-a3e7-c12497b75011/873eb097-ec52-4ede-83e6-7381b095b2e7
  storage: dropbox
  paths:
    /mydropbox

No sync jobs running

We can see that the cache is not being used now, and only the Dropbox storage is mounted. Let's verify that everything is intact:

$ tree ~/wildland/mydropbox/
/home/user/wildland/mydropbox/
├── test1.txt
├── test2.rnd
└── test3.rnd

0 directories, 3 files

$ cat ~/wildland/mydropbox/test1.txt
modified

$ time find ~/wildland/mydropbox/test* -type f -print0 | xargs -0 sha256sum
4487e24377581c1a43c957c7700c8b49920de7b8500c05590cee74996ef73f42  /home/user/wildland/mydropbox/test1.txt
e6811bb1913eec21d89ecbb0c2fa72df7671995408e6a8b6291a2595ba48c157  /home/user/wildland/mydropbox/test2.rnd
e6811bb1913eec21d89ecbb0c2fa72df7671995408e6a8b6291a2595ba48c157  /home/user/wildland/mydropbox/test3.rnd

real    0m6.173s
user    0m0.098s
sys     0m0.030s

Everything is fine and we're back to slow file operations 😉

# Forest example

Now let's see how we can improve performance of accessing a large Wildland forest, eg. the Pandora forest. We're assuming here that the Pandora forest was correctly imported as described in the Public Forests howto. Here are the steps as a reminder:

$ wl user import --path /mydirs/ariadne https://ariadne.wildland.io

We mount all containers in the Pandora forest to record baseline performance:

$ time wl c mount ':/mydirs/ariadne:/forests/pandora:*:'
Warning: cannot load bridge to [/forests/codepoets]
Warning: *: cannot load subcontainer .uuid/fe42cec7-3044-45c8-81a9-2298cd31b393.yaml: Cannot decrypt manifest.
Warning: cannot load bridge to [/forests/besidethepark]
Loading containers (from ':/mydirs/ariadne:/forests/pandora:*:'): 124
Checking container references (from ':/mydirs/ariadne:/forests/pandora:*:'): 124
Preparing mount of container references (from ':/mydirs/ariadne:/forests/pandora:*:'): 124
Mounting storages for containers: 124

real    1m52.631s
user    0m8.521s
sys     0m0.362s

$ time tree -F -L 2 ~/wildland/mydirs/ariadne:/forests/pandora:/
/home/user/wildland/mydirs/ariadne:/forests/pandora:/
├── README/
│   └── Copyright Notice and Terms of Use/
├── agent/
│   ├── @persons/
│   ├── @timeline/
│   └── agent-client-protocol/
├── arch/
│   ├── @clients/
...
    ├── Pandora Docs/
    ├── UX thoughts/
    └── Users onboarding - marketplace/

269 directories, 10 files

real    0m5.409s
user    0m0.015s
sys     0m0.016s

$ wl c unmount ':/mydirs/ariadne:/forests/pandora:*:'
Warning: cannot load bridge to [/forests/codepoets]
Warning: *: cannot load subcontainer .uuid/fe42cec7-3044-45c8-81a9-2298cd31b393.yaml: Cannot decrypt manifest.
Warning: cannot load bridge to [/forests/besidethepark]
Loading containers (from ':/mydirs/ariadne:/forests/pandora:*:'): 123
Unmounting 125 containers

Now let's see how adding a cache to Pandora changes things:

$ time wl c mount -c ':/mydirs/ariadne:/forests/pandora:*:'
Warning: cannot load bridge to [/forests/codepoets]
Warning: *: cannot load subcontainer .uuid/fe42cec7-3044-45c8-81a9-2298cd31b393.yaml: Cannot decrypt manifest.
Warning: cannot load bridge to [/forests/besidethepark]
Loading containers (from ':/mydirs/ariadne:/forests/pandora:*:'): 124
Checking container references (from ':/mydirs/ariadne:/forests/pandora:*:'): 124
Preparing mount of container references (from ':/mydirs/ariadne:/forests/pandora:*:'): 124
Mounting storages for containers: 124

real    1m5.364s
user    0m10.324s
sys     0m0.441s

$ time tree -F -L 2 ~/wildland/mydirs/ariadne:/forests/pandora:/
/home/user/wildland/mydirs/ariadne:/forests/pandora:/
├── README/
│   └── Copyright Notice and Terms of Use/
├── agent/
│   ├── @persons/
│   ├── @timeline/
│   └── agent-client-protocol/
├── arch/
│   ├── @clients/
...
    ├── Pandora Docs/
    ├── UX thoughts/
    └── Users onboarding - marketplace/

269 directories, 10 files

real    0m0.128s
user    0m0.012s
sys     0m0.006s

$ wl c info ':/mydirs/ariadne:/forests/pandora:/home/omeg:'
Sensitive fields are hidden.

object: container
owner: '0x1ea3909882be658d0ab69a822f7c923d12454ec024f4d8dd8f7113465167fcbe'
paths:
- /.uuid/24b3b45c-57e1-44ec-b66c-33d952c99c6a
- /home/omeg
backends:
  storage:
  - type: delegate
    backend-id: 3f65e772-ec1e-419a-9d62-4d1b8c311589
title: null
categories: []
version: '1'
access:
- user: '*'
cache:
  type: local
  backend_id: 57b6de5e-4341-4399-ae8a-04d4b1d2e907
  location: /home/user/storage/cache/24b3b45c-57e1-44ec-b66c-33d952c99c6a

The initial mount time can be slightly longer when creating caches for the first time. We see that accessing Pandora files is pretty much instantaneous now thanks to them being mirrored locally. We can of course modify them if we have write access and the changes will be synced to remote storages. Note: when mounting such a forest with a cache for the first time, it may take a while to perform the initial sync to the cache. We can always see sync status by using the wl status command:

$ wl status
Mounted containers:

...large number of containers...

Sync jobs:
:/mydirs/ariadne:/forests/pandora:/.uuid/014f1f66-c5a3-41e3-9c39-8e000e42f062: SYNCED 'delegate'(backend_id=66ba2c9c-abd5-4056-b609-0a3f8e2a8985) <-> 'local'(backend_id=733a3d3a-f278-4e42-ba22-85a2d4ad06e8)
:/mydirs/ariadne:/forests/pandora:/home/maja.kostacinska: SYNCED 'delegate'(backend_id=a2556f45-ed5c-48a5-ac57-f21c4c21856a) <-> 'local'(backend_id=79200c1f-ab9c-48b1-aa8f-70ab7dd1b37d)

...etc...

As with a single container, after using the initial mount command to create caches, we don't need to specify --with-cache or -c to use caches when mounting, as they are used by default from then on.

To stop using a cache for Pandora we can use the wl c delete-cache command:

$ wl c delete-cache ':/mydirs/ariadne:/forests/pandora:*:'
Deleting cache: /home/user/.config/wildland/cache/0x1ea3909882be658d0ab69a822f7c923d12454ec024f4d8dd8f7113465167fcbe.014f1f66-c5a3-41e3-9c39-8e000e42f062.storage.yaml
Deleting cache: /home/user/.config/wildland/cache/0x1ea3909882be658d0ab69a822f7c923d12454ec024f4d8dd8f7113465167fcbe.0389cbcd-53d2-47a6-9a7d-e9b03185711c.storage.yaml
...