1

i needed help. How do i get domain from a string?

For example: "Hi im Natsume, check out my site http://www.mysite.com/"

How do i get just mysite.com?

Output example:

http://www.mysite.com/ (if http entered)

www.mysite.com (if http not entered)

mysite.com (if both http and www not entered)

2
  • 1
    See this question Commented Jun 27, 2012 at 12:58
  • 4
    What have you tried? Have you thought about searching the string for certain defining characteristics? Commented Jun 27, 2012 at 12:59

7 Answers 7

1
myString = "Hi im Natsume, check out my site http://www.mysite.com/"
>>> a = re.search("(?P<url>https?://[^\s]+)", myString) or re.search("(?P<url>www[^\s]+)", myString)
>>> a.group("url")
'http://www.mysite.com/'
>>> myString = "Hi im Natsume, check out my site www.mysite.com/"
>>> a = re.search("(?P<url>https?://[^\s]+)", myString) or re.search("(?P<url>www[^\s]+)", myString)
>>> a.group("url")
'www.mysite.com/'
Sign up to request clarification or add additional context in comments.

6 Comments

Ok now i how to get the URL, if user didnt enter http? I mean it can accept either if http is entered or not, and also if www is entered or not entered either
@Nastume use re.search("(?P<url>www[^\s]+)", myString).group("url")
hmm it still unable to accept if user didnt enter either http or www either :(
@Nastume can give me an example input and output for your case
so far i can only do this re.search("(?P<url>(https?|www)[^\s]+)", a).group("url") but still unable to accept if http or www not entered
|
1

Well ... You need some way to define what you consider to be something that has a "domain". One approach might be to look up a regular expression for URL-matching, and apply that to the string. If that succeeds, you at least know that the string holds a URL, and can continue to interpret the URL in order to look for a host name, from which you can then extract the domain (possibly).

Comments

1
s= "Hi im Natsume, check out my site http://www.mysite.com/"
start=s.find("http://") if s.find("http://")!=-1 else s.find("https://")+1
t = s[start+11:s.find(" ",start+11)]
print(t)

output: mysite.com

Comments

1

If you want to use regular expression, one way could be -

>>> s = "Hi im Natsume, check out my site http://www.mysite.com/"
>>> re.findall(r'http\:\/\/www\.([a-zA-Z0-9\.-_]*)\/', s)
['mysite.com']

..considering url ends with '/'

3 Comments

Ahh i like your code a bit. But how to get the domain if the user didnt enter http:// or www?
In that case, you can simply do - >>> s = "Hi im Natsume, check out my site mysite.com" >>> [t for t in s.split() if '.com' in t] ['mysite.com']
the regex i modified -> raw = re.findall(r'([a-zA-Z0-9\.]*)([a-zA-Z0-9\/]*)', url), im tyring to make the regex to find domain if http:// or www is entered or not, and either the url is at the beginning, end, or in the middle of the string
1

If all the sites had the same format, you could use a regexp like this (which work in this specific case):

re.findall('http://www\.(\w+)\.com', url)

However you need a more complex regexp able to parse whichever url and extract the domain name.

1 Comment

if domain have structure like this, my-web-site.com (\w+) find just "site"
0

Best way is to use regex to extract the URL. Then use tldextract to get valid domain name from the URL.

import re
import tldextract

text = "Hi im Natsume, check out my site http://www.example.com/"
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text)
found_url = urls[0]
info = tldextract.extract(found_url)
domain_name = info.domain
suffix_name = info.suffix
final_domain_name  = domain_name+"."+suffix_name
print(final_domain_name)

1 Comment

@user_3pij check out the mods I've made. URL is in capitals and not in highlight because its not a method or code in this case. Once you've seen them flag this comment for removal.
-1

How about this?

url='https://www.google.com/'

var=url.split('//www.')[1]

domain=var[0:var.index('/')]

print(domain)

1 Comment

Read the question carefully. There may be different scenarios, including one where www is not in the input string at all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.