Skip to content

WIP data-mover #2474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/support/tutorials/data-mover.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Data-mover

Data-mover is a tool to move data between Puhti and Mahti local filesystems and
Allas and LUMI-O object storage servers, when
[simple transfers](../faq/how-to-move-data-between-puhti-and-allas.md#move-data-with-rclone)
are not practical, either because there are many small files, or the size of the
dataset is large.

We wish the data-mover tool `data-mover` to be simple to use, and handle all possible
hard corner cases. It is basically a wrapper around [Restic backup tool](https://restic.readthedocs.io)
, and stores the data in Restic repository format.
Restic (as used by data-mover) in turn uses [Rclone](https://rclone.org) backend for the actual data transfers to
the object storage servers and back. In addition, the data-mover tool does the
data transfers in the background, using batch jobs, allowing larger transfers
than would be practical in regular interactive login sessions.

## Simple example case, moving data from Puhti to Allas and back

Below is a guide for a simple scenario, moving data from Puhti project scratch
directory to corresponding project in Allas, and then back. Similar works with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar what works? Unclear

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean exact same instructions work on mahti, which is true. Lumi-O is slightly different, need to specify more

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These object storages are really bad from the perspective of traditional HPC use. The mapping between filesystem to object storage is far from 1-to-1, object storage is completely separate machine with it's own authentication and authorisation, there are many different transfer tools/clients, APIs, and object storage server configurations, all different and often incompatible, instead of OS just handling it... I started writing it all out, noticed that it would be a long article, wrote TLDR text (what it is now), and deleted the start of the more complete guide. This tool is supposed to be easy to use. If the documentation is long, it means the tool is not easy to use :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can redirect the reader quicker to more comprehensive docs for using other services than puhti and allas

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I reread the doc. Similar is exactly how it is. Very unclear, but truthfully so :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just say that easy instructions work the same way in Mahti. Cross service usage, using Lumi-O instead of allas is possible, please go read the advanced section if you are interested

Mahti and LUMI-O. Please have a look at `data-mover help` and `data-mover <sub-command> --help`
for additional documentation.

### Setting up the connection from Puhti to Allas

1. Your CSC project needs to have Allas service enabled. The project PI can add
Allas service for the project in [my.csc.fi](https://my.csc.fi) , if not already enabled, and
the project members need to [accept the service terms](../../accounts/how-to-add-service-access-for-project.md).

2. Create a configuration for rclone and store the authentication token in the
file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from
[Puhti web interface](https://puhti.csc.fi). Open "Cloud storage configuration" from the
"Tools" drop-down menu, and
[create Allas S3 rclone configuration for the project](../../computing/webinterface/file-browser.md#accessing-allas-and-lumi-o).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So this uses the the Open on demand style s3 configuration. This will be bit confusing for the old users of S3 allas.

4. Open a terminal to Puhti, and take the data-mover tool `data-mover` into use with
```
module load .data-mover
```

### Moving a single directory in Puhti to Allas

1. Delete all the files that are not needed from the scratch directory,
`/scratch/project_<projid>/exampledir`, for example. There is no need
to compress the files.

3. Move the data to Allas
```
data-mover export /scratch/project_<projid>/exampledir
```

3. Check the status of the data transfer with
```
data-mover status
```

### Listing the data in Allas

```
data-mover list
```

### Moving data from Allas to Puhti

Import data back to the original directory with
```
data-mover import /scratch/project_<projid>/exampledir
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for the overlapping files in exampledir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you remove old exports?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by overlapping files?

Deleting an export from Allas means you moved something that you should have simply deleted in the first place :D

Ok, there is dm delete ..., says dm help :D There are also lot's of other dm subcommands, but my concern is to make this documentation short. If it is long, it looks like the tool is complicated and not easy to use.

## Links to related material

- [Lue tool for data inventory](lue.md)
- [Data cleaning](clean-up-data.md)
- [Allas introduction](../../data/Allas/introduction.md)
7 changes: 5 additions & 2 deletions docs/support/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,21 @@
## General
* [Getting started with supercomputing at CSC](hpc-quick.md)
* [Getting started with Helmi](../../computing/quantum-computing/helmi/helmi-from-lumi.md)
* [Managing data on Puhti and Mahti scratch disks](clean-up-data.md)
* [CSC Quick reference (pdf)](../../img/csc-quick-reference/csc-quick-reference.pdf)
* [Linux basics for CSC](env-guide/index.md)
* [Interactive and batch job hands-on in Puhti](cmdline-handson.md)
* [Using csc-env command](using_csc_env.md)
* [Developing scripts remotely](remote-dev.md)
* [Using CSC HPC environment efficiently](https://csc-training.github.io/csc-env-eff/)
* [How to run existing containers in Puhti](../../computing/containers/run-existing.md)
* [Getting disk usage using Lue](lue.md)
* [Running Julia jobs on Puhti and Mahti clusters](julia.md)
* [Using Python on CSC supercomputers](python-usage-guide.md)

## Data management
* [Managing data on Puhti and Mahti scratch disks](clean-up-data.md)
* [Getting disk usage using Lue](lue.md)
* [Moving large datasets to Allas](data-mover.md)

## Installation of tools on supercomputers
* [Installing software with Spack](user-spack.md)
* [Building Singularity containers from scratch](singularity-scratch.md)
Expand Down