- Регистрация
- 1 Мар 2015
- Сообщения
- 1,481
- Баллы
- 155
Let me get straight to it, I used to transfer data from Files.com or any other platform where files were dropped to our cloud buckets using scripts. It was all okay when the file sizes were within a few MBs, but things got painful once they grew into GBs. It started taking a lot more time.
To speed things up, I tried running the transfer on a VM, it did get faster, but not faster faster, especially when the size crossed 400+ GB.
That’s when I started looking for a better way to connect my GCP/AWS buckets directly with these storage platforms, something that could make the transfer process faster and more reliable. And that’s where rclone came into the picture.
Rclone
I have set it up on my vm as a job that runs the backups/transfer in ease
sudo apt update
curl | sudo bash
Do note, the process can be made faster if we increase
If your VM has the specs to handle the increased load (CPU, RAM, and network), you’ll see a noticeable improvement in performance (pretty obvious but yea)
To speed things up, I tried running the transfer on a VM, it did get faster, but not faster faster, especially when the size crossed 400+ GB.
That’s when I started looking for a better way to connect my GCP/AWS buckets directly with these storage platforms, something that could make the transfer process faster and more reliable. And that’s where rclone came into the picture.
Rclone
I have set it up on my vm as a job that runs the backups/transfer in ease
sudo apt update
curl | sudo bash
rclone configthe usual installation process, once done with is lets set up the config, this is the place where we mention the details of the storage place where we transferring the data from and to
gonna throw options to set remote
from here n will take u to a bunch of storage platform options supported by rclone that can be used to mount
choose the one that's preferred, i used files.com and give it a name which will be used to refer to this remote later on, did the auth using api here
PS : You might not find the api option right away so wait for the edit advanced config option
now we are done with one remote, moving on to the next follow the similar step. rclone config -> new remote -> pick the one you want and provide the auth method. I have gone for GCS bucket here, mentioned the project number and performed auth using the service account json key
Also if youre concerned and specif about object acl and classes, you can pick the appropriate one from the options
rclone ls filescom:once youre done with it u can check of the mounting has been successful by using the ls command along with the remote name
rclone copy <source> <destination> [flags]And to copy the files the usual syntax is
rclone copy filescom:/hawk gcs:vault-archive/-P --transfers=8 --checkers=10 --buffer-size=64M --fast-list --retries=5 --low-level-retries=10 --timeout=5m --contimeout=30s --retries-sleep=10s --log-file=/home/mohamed-roshan-k/rclone_transfer.log --log-level=INFOwe got bunch of flags to show the progress --progress, mention the parellel transfer with number --transfers [number], to perform a simulation use --dry-run, to exclude or include any files we can use --exclude or --include
Some screenshots on the time taken for transfer.-p = progress bar
--checker = checking if the file already exists in the destination
--buffer-size = mentions the size per file that's transferred to the buffer
--retries = number of times it should retry the transfer if it fails
--low-level-retries = similar like --retries but for network and file level error
--timeout = aborts the task if its stuck more than the mentioned time
--contimeout = connection timeout
--retries-sleep = interval between each retry
--log-file = path to the logs
Do note, the process can be made faster if we increase
- Transfer = --transfers
- Checkers = --checkers
- Buffer size = --buffer-size
If your VM has the specs to handle the increased load (CPU, RAM, and network), you’ll see a noticeable improvement in performance (pretty obvious but yea)