8

I am trying to use python3 to return the bibtex citation generated by http://www.doi2bib.org/. The url's are predictable so the script can work out the url without having to interact with the web page. I have tried using selenium, bs4, etc but cant get the text inside the box.

url = "http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9"
import urllib.request
from bs4 import BeautifulSoup
text = BeautifulSoup(urllib.request.urlopen(url).read())
print(text)

Can anyone suggest a way of returning the bibtex citation as a string (or whatever) in python?

1

1 Answer 1

12

You don't need BeautifulSoup here. There is an additional XHR request sent to the server to fill out the bibtex citation, simulate it, for example, with requests:

import requests

bibtex_id = '10.1007/s00425-007-0544-9'

url = "http://www.doi2bib.org/#/doi/{id}".format(id=bibtex_id)
xhr_url = 'http://www.doi2bib.org/doi2bib'

with requests.Session() as session:
    session.get(url)

    response = session.get(xhr_url, params={'id': bibtex_id})
    print(response.content)

Prints:

@article{Burgert_2007,
    doi = {10.1007/s00425-007-0544-9},
    url = {http://dx.doi.org/10.1007/s00425-007-0544-9},
    year = 2007,
    month = {jun},
    publisher = {Springer Science $\mathplus$ Business Media},
    volume = {226},
    number = {4},
    pages = {981--987},
    author = {Ingo Burgert and Michaela Eder and Notburga Gierlinger and Peter Fratzl},
    title = {Tensile and compressive stresses in tracheids are induced by swelling based on geometrical constraints of the wood cell},
    journal = {Planta}
}

You can also solve it with selenium. The key trick here is to use an Explicit Wait to wait for the citation to become visible:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get('http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9')

element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//pre[@ng-show="bib"]')))
print(element.text)

driver.close()

Prints the same as the above solution.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for that. Would you mind telling me how you can see the additional request was sent to doi2bib.org/doi2bib? Pretty new to this.
@Nick sure, open browser developer tools->network tab. Go to the web-site and see all the requests sent to the server while the page is loaded. Among others you would see the one I've mentioned. Hope that helps.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.