I'm trying to multiprocess an action inside a for x in y loop. Basically, the concept of the script is to do a request to a site, load up a json file containing a list of URLs. Once fetched, another function is called to parse an URL individually. What i've been trying to do is to multiprocess this task with multiprocess.Process() in order to speed up the process since there is lots of URLs to parse. However, my approach doesn't speed up the process at all, it actually goes at the same speed than with no multiprocessing. It seems like gets blocked when using proc.join().
This is a code i've been working on:
import json
import requests
import multiprocessing
def ExtractData(id):
print("Processing ", id)
result = requests.get('http://example-index.com/' + id')
result = result.text.split('\n')[:-1]
for entry in result:
data = json.loads(entry)['url']
print("data is:", data)
def ParseJsonAndCall():
url = "https://example-site.com/info.json"
data = json.loads(requests.get(url).text)
t = []
for results in data:
print("Processing ", results['url'])
p = multiprocessing.Process(target=ExtractData, args=(results['id'],))
t.append(p)
p.start()
for proc in threads:
proc.join()
ParseJsonAndCall()
Any help would be greatly appreciated!