I'm doing the hard-drive shuffle thing. I have a lot of data and I'm paranoid about losing it. I've been let down once or twice by bad copies so I thought I should take checksums before copying. I had a lot of fun arguing with wildcards and string escaping so I thought I'd share my adventure, as I've already worked out how to do this and forgotten at least once.
My first attempt. It dies when fed directories.
I try again, and it occours to me to use tee in append mode so I get output to the screen as well
for foo in `ls -R` do
md5sum $foo | tee -a md5sums.txt
done
This doesn't work, md5sum complains. I have a look at ls -R
Quote:
some.file
another.file
Directory1/
Directory2/
Directory1:
inDirectory1.file
inDirectory1again.file
Directory2:
inDirectory2.file
inDirectory2again.file
I RTFM and can't find an option to list full paths. I consider using find.
for foo in `find -type f` do
md5sum $foo | tee -a md5sums.txt
done
This fails when anything has a space in the filename. I RTFM on find. find -ls does the escaping i need but has all the useful info that ls -l spits out. Not too handy and I don't feel like resorting to sed, or awk. Further through the fine manual I find
man find wrote:
-exec command ;
Execute command; true if 0 status is returned. All following arguments to find are taken to be
arguments to the command until an argument consisting of ‘;’ is encountered. The string ‘{}’ is
replaced by the current file name being processed everywhere it occurs in the arguments to the
command
To cut a long story short that's not quite the whole truth, as the shell works its magic on {} and ; so we need to add quotes and a slash.
find -type f -exec md5sum '{}' \; | tee md5sums.txt
Of course as I watch the damn thing run I start wondering about how to perform the md5sums in parallel to get it done faster. I start the same command on the copy of the files and notice that find lists the files in a different order, so I'm going to need to apply sort and whilst I'm at it I should probably use the list to strip out any of the inevitable duplicate files I have kicking around.
Or I could have just installed md5deep.