0

I am using python to scrape a url such as in the code blow

import requests
from bs4 import BeautifulSoup
import json

n_index = 10
base_link = 'http://xxx.xxx./getinfo?range=10&district_id=1&index='
for i in range (1,n_index+1):
    link = base_link+str(i)
    r = requests.get(link)
    pid = r.json()
    print (pid)

it's return ten result just like this blow

{'product_info': [{'pid': '1', 'product_type': '2'}]}
{'product_info': [{'pid': '2', 'product_type': '2'}]}
{'product_info': [{'pid': '3', 'product_type': '2'}]}
{'product_info': [{'pid': '4', 'product_type': '2'}]}
{'product_info': [{'pid': '5', 'product_type': '2'}]}
{'product_info': [{'pid': '6', 'product_type': '2'}]}
{'product_info': [{'pid': '7', 'product_type': '2'}]}
{'product_info': [{'pid': '8', 'product_type': '2'}]}
{'product_info': [{'pid': '9', 'product_type': '2'}]}
{'product_info': [{'pid': '10', 'product_type': '2'}]}

and then i want to save the resulting 10 lines into a json file, as presented in the code below:

with open('sylist.json', 'w') as outfile:
    json.dump(r.json(), outfile, indent=4)

but only one result is saved into the json file local, who can help me to resolve,thanks a lot

1
  • Use append instead of write ::- with open('sylist.json', 'a') as outfile: Commented Jan 9, 2018 at 5:52

2 Answers 2

2

On a typical way, try below way to write result line by line without open/close file at each time.

with open('sylist.json', 'a+') as outfile:
    for i in range (1,n_index+1):
        link = base_link+str(i)
        r = requests.get(link)
        outfile.write("{}\n".format(json.dump(r.json(), outfile, indent=4)))
Sign up to request clarification or add additional context in comments.

1 Comment

@slackware hey, if it worked for you, please accept it:)
0

Let me extend Frank's answer a bit. You are sending the request inside the for loop, which means at every iteration of the loop, the value of pid is overwritten. As a result, when you want to dump its content to an output file, pid holds only the contents from the very last iteration/request. I would suggest to apply one of the following to address your issue:

  1. Include writing component inside the for loop (or vice-versa, as suggested in the answer by Frank AK).
  2. Instead of overwriting the content of pid each time, you may append it directly inside the for loop as follows:

    my_list = []
    for i in range (1,n_index+1):
        link = base_link+str(i)
        r = requests.get(link)
        pid = r.json()
        my_list.append(pid)
    
    with open('sylist.json', 'w') as outfile:
        json.dump(my_list, outfile, indent=4)
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.