23

I want to find the info about a webpage using curl, but in Python, so far I have this:

os.system("curl --head www.google.com")

If I run that, it prints out:

HTTP/1.1 200 OK
Date: Sun, 15 Apr 2012 00:50:13 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=3e39ad65c9fa03f3:FF=0:TM=1334451013:LM=1334451013:S=IyFnmKZh0Ck4xfJ4; expires=Tue, 15-Apr-2014 00:50:13 GMT; path=/; domain=.google.com
Set-Cookie: NID=58=Giz8e5-6p4cDNmx9j9QLwCbqhRksc907LDDO6WYeeV-hRbugTLTLvyjswf6Vk1xd6FPAGi8VOPaJVXm14TBm-0Seu1_331zS6gPHfFp4u4rRkXtSR9Un0hg-smEqByZO; expires=Mon, 15-Oct-2012 00:50:13 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked

What I want to do, is be able to match the 200 in it using a regex (i don't need help with that), but, I can't find a way to convert all the text above into a string. How do I do that? I tried: info = os.system("curl --head www.google.com") but info was just 0.

1
  • "The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes." -docs.python.org/library/os.html#os.system Commented Apr 15, 2012 at 1:02

7 Answers 7

48

For some reason... I need use curl (no pycurl, httplib2...), maybe this can help to somebody:

import os
result = os.popen("curl http://google.es").read()
print result
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks this is more intuitive than other answers, handy for dirty / quickly created scripts :)
Thanks but it prints also :% Total % Received % Xferd Average Speed Time Time Time Current... how can I remove this extra info from log file
This info that you see is the shell executing the curl. That info won't be captured in the result variable. Si python t.py >> hola.txt will only leave on the file the output of result. But if you don't want to see that add --silent to the curl in case the log is capturing everything in the output. result = os.popen("curl http://google.es --silent").read() Hope this help
27

Try this, using subprocess.Popen():

import subprocess
proc = subprocess.Popen(["curl", "--head", "www.google.com"], stdout=subprocess.PIPE)
(out, err) = proc.communicate()
print out

As stated in the documentation:

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several other, older modules and functions, such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*

3 Comments

@user1333973: Because subprocess works and os.system() doesn't.
@user1333973 added link to the documentation
in order to also ger err - we need to call Popen as: proc = subprocess.Popen(fullCommand, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
5
import os
cmd = 'curl https://randomuser.me/api/'
os.system(cmd)

Result

{"results":[{"gender":"male","name":{"title":"mr","first":"çetin","last":"nebioğlu"},"location":{"street":"5919 abanoz sk","city":"adana","state":"kayseri","postcode":53537},"email":"çetin.nebioğ[email protected]","login":{"username":"heavyleopard188","password":"forgot","salt":"91TJOXWX","md5":"2b1124732ed2716af7d87ff3b140d178","sha1":"cb13fddef0e2ce14fa08a1731b66f5a603e32abe","sha256":"cbc252db886cc20e13f1fe000af1762be9f05e4f6372c289f993b89f1013a68c"},"dob":"1977-05-10 18:26:56","registered":"2009-09-08 15:57:32","phone":"(518)-816-4122","cell":"(605)-165-1900","id":{"name":"","value":null},"picture":{"large":"https://randomuser.me/api/portraits/men/38.jpg","medium":"https://randomuser.me/api/portraits/med/men/38.jpg","thumbnail":"https://randomuser.me/api/portraits/thumb/men/38.jpg"},"nat":"TR"}],"info":{"seed":"0b38b702ef718e83","results":1,"page":1,"version":"1.1"}}

Comments

1

You could use an HTTP library or http client library in Python instead of calling a curl command. In fact, there is a curl library that you can install (as long as you have a compiler on your OS).

Other choices are httplib2 (recommended) which is a fairly complete http protocol client supporting caching as well, or just plain httplib or a library named Request.

If you really, really want to just run the curl command and capture its output, then you can do this with Popen in the builtin subprocess module documented here: http://docs.python.org/library/subprocess.html

Comments

1

Well, there is an easier to read, but messier way to do it. Here it is:

import os
outfile=''  #put your file path there
os.system("curl --head www.google.com>>{x}".format(x=str(outfile))  #Outputs command to log file (and creates it if it doesnt exist).
readOut=open("{z}".format(z=str(outfile),"r")  #Opens file in reading mode.
for line in readOut:
    print line  #Prints lines in file
readOut.close()  #Closes file
os.system("del {c}".format(c=str(outfile))  #This is optional, as it just deletes the log file after use.

This should work properly for your needs. :)

Comments

0

Try this:

import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("GET", "/index.html")
r1 = conn.getresponse()
print r1.status, r1.reason

1 Comment

This does not really answer the question on how to capture output from curl. Often you need curl to send specific cookies and other parameters.
-1

Try this:

import subprocess as sp

cmd = "curl --head www.google.com"
p1 = sp.Popen(cmd, 
              stdin=sp.PIPE,
              stdout=sp.PIPE,
              stderr=sp.PIPE,
              text=True,
              shell=True)  
(output, err) = p1.communicate()
print('output: ', output)
print('err: ', err)

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.