0

I am trying to run grep command from my Python module using the subprocess library. Since, I am doing this operation on the doc file, I am using Catdoc third party library to get the content in a plan text file. I want to store the content in a file. I don't know where I am going wrong but the program fails to generate a plain text file and eventually to get the grep result. I have gone through the error log but its empty. Thanks for all the help.

def search_file(name, keyword):
    #Extract and save the text from doc file
    catdoc_cmd = ['catdoc', '-w' , name, '>', 'testing.txt']
    catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
    output = catdoc_process.communicate()[0]
    grep_cmd = []
    #Search the keyword through the text file
    grep_cmd.extend(['grep', '%s' %keyword , 'testing.txt'])
    print grep_cmd
    p = subprocess.Popen(grep_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
    stdoutdata = p.communicate()[0]
    print stdoutdata

2 Answers 2

4

On UNIX, specifying shell=True will cause the first argument to be treated as the command to execute, with all subsequent arguments treated as arguments to the shell itself. Thus, the > won't have any effect (since with /bin/sh -c, all arguments after the command are ignored).

Therefore, you should actually use

catdoc_cmd = ['catdoc -w "%s" > testing.txt' % name]

A better solution, though, would probably be to just read the text out of the subprocess' stdout, and process it using re or Python string operations:

catdoc_cmd = ['catdoc', '-w' , name]
catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE)
for line in catdoc_process.stdout:
    if keyword in line:
        print line.strip()
Sign up to request clarification or add additional context in comments.

2 Comments

Agreed. You could use shell=True and a string of arguments or shell=False and a string of arguments if you kill the '>' in favor of passing stdout to the grep stdin directly which makes it 'streamy'. You can pass the stdout pipe to the grep stdin pipe if you feel you want to use grep at all.
Yeah, passing stdout is a nice idea. You could also do Popen('catdoc -w "%s" | grep "%s"' % (name, keyword), shell=True) :)
2

I think you're trying to pass the > to the shell, but that's not going to work the way you've done it. If you want to spawn a process, you should arrange for its standard out to be redirected. Fortunately, that's really easy to do; all you have to do is open the file you want the output to go to for writing and pass it to popen using the stdout keyword argument, instead of PIPE, which causes it to be attached to a pipe which you can read with communicate().

2 Comments

This isn't actually true for Python on UNIX; > never even gets to catdoc (try subprocess.Popen(['ls', '>', 'foo'], shell=True): it doesn't warn about a missing > file)
yeah,, corrected. And I always forget to write > the first time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.