This post is the second in a series; see here for part 1.
First, I created a test environment, on a Windows 2016 server, that proportionally mirrors the files and directory structure on the actual user’s computer. I did this for two reasons: I didn’t want to use actual user files for this kind of testing, and since I’m running several scenarios, I wanted to speed things up as much as possible. So while the user’s actual system has about 3 million files, averaging about 50k each, my test system has about 1.5 million files, averaging 15k per file.
Everything Will Be Going Through the VPN
The VPN I will be using is Hamachi, which is managed from a centralized system via Logmein, so it won’t be the fastest method of VPN, but it does offer secure encryption and it is very, very easy to set up. I also turned off compression for all test cases, as this causes freezing and slow downs.
Method One: Compress, then Xcopy, then Uncompress
- Compress all the files into one big 27GB file using 7zip – 443 minutes
- Send compressed file across VPN using xcopy – 200 minutes
- Decompress the file on the Windows 10 computer – 136 minutes
M1 Final result: 779 Minutes
Method Two: Set up Linux Subsystem and use Rsync to Send the Files
Rsync is a utility included with the linux operating system, and its not native to Windows, so I used a newer feature included with Windows 10 machines: the Windows Subsystem for Linux (WSL). This allows a user to run a relatively resource-light version of linux inside an existing Win10 installation, a virtual machine. Rsync is a great utility for scanning many files and folders for changes, deletions, and additions. For whatever reason, rsync was simply not up to the task. It took in excess of 24 hours to run. It didn’t complete; I actually killed the task, as that’s too long to be of any use to me.
I was shocked, as rsync has always been a useful utility, though I had never run it for so many tiny, different files. I suspected it may have been slowed down by reading all the metadata related to a file, such as ownership, permissions, etc. and since I didn’t care about this data, just the content of the file itself. I tried to run rsync while stripping all this data out:
rsync -avz --no-perms --no-owner --no-group //source/ //destination/
…but there seemed to be no change, it was still in excess of 24 hours. Perhaps it was because it was running as a virtual machine inside Windows or perhaps rsync runs poorly over a VPN?
M2 Final Result: 24 hours +
Method Three: Use Xcopy
This method is simply one command:
xcopy <source> <destination> /i /d /y /e
The /d ensures it only copies newer files. The result: it took 38 minutes! I immediately ran it again, to have it verify there were no new changes: this took a mere 42 minutes.
M3 Final Result: 38 Minutes
Method Four: Use Robocopy
Robocopy is a newer command line utility for DOS which includes advanced features like scheduling, monitoring, re-trying of failed files, etc. Think of it like xcopy on steroids. The exact command I ran is this:
robocopy <source> <destination> /e /copyall /xo /sec /secfix /R:1 /W:3 /LOG:robocopyLog.txt /NP
The /sec and /secfix and /copyall switches are for checking the metadata, such as file ownership, permissions, and attributes. They apparently slow things down considerable, as they did for the rsync test, because robocopy was still running after 6 hours, so I killed it.
I then simplified the robocopy command to the following:
robocopy <source> <destination> /e /R:1 /W:3
This time, the command finished in 33 minutes! Again, I don’t mind losing the metadata/file attributes, as long as the file content itself is copied.
M4 Final Result: 33 Minutes
Using VPN, The Winner Is…
Method Four: Robocopy!
Lessons Learned:
- metadata, like file permissions and attributes, really slows things down; decide if you actually need to include this data
- Since Windows is the “host” operating system, it appears the more direct Windows-to-Windows methods of copying work better
- Next up, Transferring over SSH!
I can’t help but think using the vpn and being in a virtual machine had a huge detrimental effect on rsync. However, 24+ hours can’t be explained with those variables given the speed of the other methods. Interesting results for sure and definitely not what I would have picked.