Skip to Content

stopsoftwarepatents.eu petition banner

find

Walking a directory tree with bash

Posted in

I'm doing the hard-drive shuffle thing. I have a lot of data and I'm paranoid about losing it. I've been let down once or twice by bad copies so I thought I should take checksums before copying. I had a lot of fun arguing with wildcards and string escaping so I thought I'd share my adventure, as I've already worked out how to do this and forgotten at least once.

My first attempt. It dies when fed directories.

md5sum * | md5sums.txt

I try again, and it occours to me to use tee in append mode so I get output to the screen as well

for foo in `ls -R` do
md5sum $foo | tee -a md5sums.txt
done

This doesn't work, md5sum complains. I have a look at  ls -R

Quote:
some.file
another.file

Directory1/
Directory2/

Directory1:
inDirectory1.file
inDirectory1again.file

Directory2:
inDirectory2.file
inDirectory2again.file

I RTFM and can't find an option to list full paths. I consider using find.

for foo in `find -type f` do
md5sum $foo | tee -a md5sums.txt
done

This fails when anything has a space in the filename. I RTFM on find. find -ls does the escaping i need but has all the useful info that ls -l spits out. Not too handy and I don't feel like resorting to sed, or awk. Further through the fine manual I find

man find wrote:
-exec command ;
Execute command; true if 0 status is returned. All following arguments to find are taken to be
arguments to the command until an argument consisting of ‘;’ is encountered. The string ‘{}’ is
replaced by the current file name being processed everywhere it occurs in the arguments to the
command

To cut a long story short that's not quite the whole truth, as the shell works its magic on {} and ; so we need to add quotes and a slash.

find -type f -exec md5sum '{}' \; | tee md5sums.txt  

Of course as I watch the damn thing run I start wondering about how to perform the md5sums in parallel to get it done faster. I start the same command on the copy of the files and notice that find lists the files in a different order, so I'm going to need to apply sort and whilst I'm at it I should probably use the list to strip out any of the inevitable duplicate files I have kicking around.

Or I could have just installed md5deep.

Syndicate content