GoFuckYourself.com - Adult Webmaster Forum - View Single Post - How to crawl/scrape millions of URLs within couple of hours ?

freecartoonporn · 02-08-2019, 08:31 PM

i am fetching product price data from amazon like site,
site doesnt provide api/dump,
and site doesnt ban ip either for too many requests.
i have tried contacting them about it.

price changes regularly., i need to keep track of it

what is fast and parallel, async

right now i am using php curl multi with 2000 threads., and server is very powerful.,

128 gigs ram, 24 core cpu, 2 tb HDD, ulimit -a unlimited

but crawling still slow, i am doing test run with 100k urls and its been 1 hour plus

i am trying to scrape page insert in DB

how can i speed up ?

should i write c++/python script just for this task ?

are there limitations in php ?

thanks for your time.

02-08-2019, 08:31 PM
freecartoonporn Confirmed User Industry Role: Join Date: Jan 2012 Location: NC Posts: 7,683	How to crawl/scrape millions of URLs within couple of hours ? i am fetching product price data from amazon like site, site doesnt provide api/dump, and site doesnt ban ip either for too many requests. i have tried contacting them about it. price changes regularly., i need to keep track of it what is fast and parallel, async right now i am using php curl multi with 2000 threads., and server is very powerful., 128 gigs ram, 24 core cpu, 2 tb HDD, ulimit -a unlimited but crawling still slow, i am doing test run with 100k urls and its been 1 hour plus i am trying to scrape page insert in DB how can i speed up ? should i write c++/python script just for this task ? are there limitations in php ? thanks for your time. __________________ SSD Cloud Server, VPS Server, Simple Cloud Hosting \| DigitalOcean