2

I am working on downloading ZIP from URL, and I have a problem with this. First step of my algorithm is to check what is the Content-Type and Content-Length of given url:

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "https://www.dropbox.com/s/0hvgw7nvbdnh13d/ColaClassic.zip");
curl_setopt($ch, CURLOPT_HEADER, 1); //I
curl_setopt($ch, CURLOPT_NOBODY, 1); //without body
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); //L
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

curl_exec($ch);
$content_type = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

However, value of variable $content-type is text/html; charset=utf-8

Then I checked Content-Type from command line like this:

curl -IL https://www.dropbox.com/s/0hvgw7nvbdnh13d/ColaClassic.zip

and I got correct result (application/zip).

So, what is the difference between these two codes, and how do I get correct Content-Type in my php script?

Edit:

curl_setopt($ch, CURLOPT_URL, 'https://www.dropbox.com/s/0hvgw7nvbdnh13d/ColaClassic.zip');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD');
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $verbose);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);

Verbose output from php curl:

* Hostname was found in DNS cache
* Hostname in DNS cache was stale, zapped
*   Trying 162.125.69.1...
* Connected to www.dropbox.com (162.125.69.1) port 443 (#14)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSL connection using ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
*    subject: businessCategory=Private Organization; 1.3.6.1.4.1.311.60.2.1.3=US; 1.3.6.1.4.1.311.60.2.1.2=Delaware; serialNumber=4348296; C=US; ST=California; L=San Francisco; O=Dropbox, Inc; CN=www.dropbox.com
*    start date: 2017-11-14 00:00:00 GMT
*    expire date: 2020-02-11 12:00:00 GMT
*    subjectAltName: www.dropbox.com matched
*    issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
*    SSL certificate verify ok.
> HEAD /s/0hvgw7nvbdnh13d/ColaClassic.zip HTTP/1.1
Host: www.dropbox.com
Accept: */*

Verbose output from cmdline curl:

*   Trying 162.125.69.1...
* TCP_NODELAY set
* Connected to www.dropbox.com (162.125.69.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: businessCategory=Private Organization; jurisdictionCountryName=US; jurisdictionStateOrProvinceName=Delaware; serialNumber=4348296; C=US; ST=California; L=San Francisco; O=Dropbox, Inc; CN=www.dropbox.com
*  start date: Nov 14 00:00:00 2017 GMT
*  expire date: Feb 11 12:00:00 2020 GMT
*  subjectAltName: host "www.dropbox.com" matched cert's "www.dropbox.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fd8c4007a00)
> HEAD /s/0hvgw7nvbdnh13d/ColaClassic.zip HTTP/2
> Host: www.dropbox.com
> User-Agent: curl/7.54.0
> Accept: */*
8
  • 2
    curl_setopt($ch, CURLOPT_HEADER, 1); //I - yeah, nope, wishful thinking. -I means make a HEAD request, CURLOPT_HEADER means include the response headers in the output. You want curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); to properly translate that -I Commented Aug 30, 2019 at 11:37
  • @misorude even after I added CURLOPT_CUSTOMREQUEST parameter, I still get text/html as Content-Type Commented Aug 30, 2019 at 11:41
  • You can get a “translation” of your cURL command to PHP here, incarnate.github.io/curl-to-php If it still doesn’t work with that - then I’d start by sending a request using both methods to a script of my own, that simply logs all request headers, and then check for significant differences. Commented Aug 30, 2019 at 11:49
  • Yes, I already tried. I removed FOLLOWLOCATION (set to false) and in php, I get http status code 200 and in cmd i get 301. How is this possible? Same link Commented Aug 30, 2019 at 11:56
  • Well something about those two requests must be different somehow - hence my suggestion to start by logging what they actually look like. Commented Aug 30, 2019 at 12:01

1 Answer 1

2

Seems dropbox is issuing a different response code depending on user agent — or rather lack thereof. Your command line operation sends something like curl/7.47.0 (or your version) while the php script sends an empty user agent. Adding the user agent to your php request will get dropbox to respond appropriately with a HTTP/1.1 301 Moved Permanently response and then your script will follow the location on as expected:

$ch = curl_init();
// emulates user agent from command line.
$user_agent = 'curl/' . curl_version()['version'];
curl_setopt($ch, CURLOPT_URL, "https://www.dropbox.com/s/0hvgw7nvbdnh13d/ColaClassic.zip");
curl_setopt($ch, CURLOPT_HEADER, 1); //I
curl_setopt($ch, CURLOPT_NOBODY, 1); //without body
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); //L
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);

curl_exec($ch);
$content_type = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
echo $content_type;

UPDATE: Oddly, I just tried a few other things, like emulating various browser useragent strings and it seems dropbox only seems to issue a redirect when presented with the curl/X.X.X useragent. ¯\_(ツ)_/¯

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for this! Yes, I was also experimenting with user agent parameter after you gave me this answer. It is also working with curl/ as user agent. Bit if I add my browser user agent then it doesnt work. Maybe it works like this because if I go to that page with browser, then page will open and show me the content of zip file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.