0

I am trying to write a script such that I can identify number of characters of the n-th largest file in a sub-directory. I was trying to assign n and the name of sub-directory into arguments like $1, $2.

Current directory: Greetings
   Sub-directory: language_files, others
     Sub-directory: English, German, French
          Files: Goodmorning.csv, Goodafternoon.csv, Goodevening.csv ….

I would be at directory “Greetings”, while I indicating subdirectory (English, German, French), it would show the nth-largest file in the subdirectory indicated and calculate number of characters as well.

For instance, if I am trying to figure out number of characters of 2nd largest file in English, I did:

langs=$1
n=$2
for langs in language_files/;
 Do count=$(find language_files/$1 name "*.csv" | wc -m | head -n -1 | sort -n -r | sed -n $2(p))
Done | echo "The file has $count bytes!"

The result I wanted was:

$ ./script1.sh English 2
The file has 1100 bytes!

The main problem of all the issue is the fact that I don't understand how variables and looping work in bash script.

1
  • So start learning. The script is almost ok. Remainders: bash is case sensitive. for is not For. To grab command output use command substitution. Even with fors! count_bytes=$(for langs in language/*; do echo $langs; done). bash is space sensitive - var='a' will work, but var = 'a' will not! Use wc -c not wc -m, i think it doesn't do what you think. bash uses " not . And use dynamic programming - split your problem into many little problems. Commented Jan 28, 2019 at 20:02

2 Answers 2

1

no need for looping

find language_files/"$1" -name "*.csv" | xargs wc -m | sort -nr | sed -n "$2{p;q}"

for byte counting you should use -c, since -m is for char counting (it may be the same for you).

You don't use the loop variable in the script anyway.

Sign up to request clarification or add additional context in comments.

3 Comments

Always an efficiency benefit from avoiding the loops.
What does {p;q} mean after $2?
Print and quit. To prevent scanning the rest of the lines.
0

Bash loops are interesting. You are encouraged to learn more about them when you have some time. However, this particular problem might not need a loop. Set lang (you can call it langs if you prefer) and n appropriately, and then try this:

count=$(stat -c'%s %n' language_files/$lang/* | sort -nr | head -n$n | tail -n1 | sed -re 's/^[[:space:]]*([[:digit:]]+).*/\1/')

That should give you the $count you need. Then you can echo it however you like.

EXPLANATION

If you wish to learn how it works:

  • The stat command outputs various statistics about the named file (or files), in this case %s the file's size and %n the file's name.
  • The head and tail output respectively the first and last several lines of a file. Together, they select a specific line from the file
  • The sed command screens a certain part of the line. (You can use cut, instead, if you prefer.)

If you wish to be cleverer, then you can optimize as @karafka has done.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.