1

I was trying to build some sort of website status checker. I figure out that the golang HTTP get request is not resolved and hung forever for some URL like https://www.hetzner.com. But the same URL works if we do curl.

Golang

Here there is no error thrown. It just hangs on http.Get

func main() {
  resp, err := http.Get("https://www.hetzner.com")
  if err != nil {
        fmt.Println("Error while retrieving site", err)
  }
  defer resp.Body.Close()
  body, err := io.ReadAll(resp.Body)
    if err != nil {
      fmt.Println("Eroor while reading response body", err)
  }
  fmt.Println("RESPONSE", string(body))}

CURL

I get the response while running following command.

curl https://www.hetzner.com

What may be the reason? And how do I resolve this issue from golang HTTP?

10
  • 1
    It may be some rate limiter based on user agent. Try to set same user agents for curl and go. Commented Aug 30, 2022 at 8:18
  • 1
    you won't get a timeout with default client of http, pkg.go.dev/net/http#Client Commented Aug 30, 2022 at 9:46
  • 2
    @DipeshKC you will get response if a User-Agent is specified. Commented Aug 30, 2022 at 10:21
  • 1
    There is a default User-Agent set by Golang HTTP - "Go-http-client/1.1". And my guess is some sites block the request from this useragent. Commented Aug 30, 2022 at 10:59
  • 1
    I think some sites close the TCP connection if it is coming from useragent starting with [go-http-client] seeing so many request from the same useragent. If you use [go-http-clien] remove [t - from last] OR other any random useragent, it now works. Commented Aug 30, 2022 at 11:35

1 Answer 1

0

Your specific case can be fixed by specifying HTTP User-Agent Header:

import (
    "fmt"
    "io"
    "net/http"
)

func main() {
    client := &http.Client{}

    req, err := http.NewRequest("GET", "https://www.hetzner.com", nil)
    if err != nil {
        fmt.Println("Error while retrieving site", err)
    }

    req.Header.Set("User-Agent", "Golang_Spider_Bot/3.0")

    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("Error while retrieving site", err)
    }

    defer resp.Body.Close()
    body, err := io.ReadAll(resp.Body)
    if err != nil {
        fmt.Println("Eroor while reading response body", err)
    }
    fmt.Println("RESPONSE", string(body))
}

Note: many other hosts will reject requests from your server because of some security rules on their side. Some ideas:

  • Empty or bot-like User-Agent HTTP header
  • Location of your IP address. For example, online shops in the USA don't need to handle requests from Russia.
  • Autonomous System or CIDR of your provider. Some ASNs are completely blackholed because of the enormous malicious activities from their residents.

Note 2: Many modern websites have DDoS protection or CDN systems in front of them. If Cloudflare protects your target website, your HTTP request will be blocked despite the status code 200. To handle this, you need to build something able to render JavaScript-based websites and add some scripts to resolve a captcha.

Also, if you check a considerable amount of websites in a short time, you will be blocked by your DNS servers as they have some inbuild rate limits. In this case, you may want to take a look at massdns or similar solutions.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.