0

I have multiple hadoop commands to be run and these are going to be invoked from a python script. Currently, I tried the following way.

import os
import xml.etree.ElementTree as etree
import subprocess

filename = "sample.xml"
__currentlocation__ = os.getcwd()
__fullpath__ = os.path.join(__currentlocation__,filename)
tree = etree.parse(__fullpath__)
root = tree.getroot()
hivetable = root.find("hivetable").text
dburl = root.find("dburl").text
username = root.find("username").text
password = root.find("password").text
tablename = root.find("tablename").text
mappers = root.find("mappers").text
targetdir = root.find("targetdir").text
print hivetable
print dburl
print username
print password
print tablename
print mappers
print targetdir

p = subprocess.call(['hadoop','fs','-rmr',targetdir],stdout = subprocess.PIPE, stderr = subprocess.PIPE)

But, the code is not working.It is neither throwing an error not deleting the directory.

3
  • Have you verified targetdir = root.find("targetdir").text isn't empty? Commented Feb 1, 2016 at 17:12
  • yes, it is not empty. Commented Feb 1, 2016 at 17:13
  • Can you also try to run simple unix command from subprocess.call ? May be echo targetdir! Commented Feb 1, 2016 at 17:36

1 Answer 1

3

I suggest you slightly change your approach, or this is how I'm doing it. I make use of python library import commands which then depends how you will use it (https://docs.python.org/2/library/commands.html). Here is a lil demo:

import commands as com
print com.getoutput('hadoop fs -ls /')

This gives you output like (depending on what you have in the HDFS dir )

/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh: line 25: /Library/Java/JavaVirtualMachines/jdk1.8.0_112.jdk/Contents/Home: Is a directory
Found 2 items
drwxr-xr-x   - someone supergroup          0 2017-03-29 13:48 /hdfs_dir_1
drwxr-xr-x   - someone supergroup          0 2017-03-24 13:42 /hdfs_dir_2

Note: the lib commands doesn't work with python 3 (to my knowledge), I'm using python 2.7. Note: Be aware of the limitation of commands

If you will use subprocess which is the equivalent to commands for python 3 then you might consider to find a proper way to deal with your 'pipelines'. I find this discussion useful in that sense: (subprocess popen to run commands (HDFS/hadoop))

I hope this suggestion helps you!

Best

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.