View Single Post
Old 02-10-2019, 10:40 AM  
RycEric
Confirmed User
 
RycEric's Avatar
 
Industry Role:
Join Date: Apr 2009
Posts: 1,313
Quote:
Originally Posted by freecartoonporn View Post
i am fetching product price data from amazon like site,
site doesnt provide api/dump,
and site doesnt ban ip either for too many requests.
i have tried contacting them about it.

price changes regularly., i need to keep track of it

what is fast and parallel, async

right now i am using php curl multi with 2000 threads., and server is very powerful.,

128 gigs ram, 24 core cpu, 2 tb HDD, ulimit -a unlimited

but crawling still slow, i am doing test run with 100k urls and its been 1 hour plus

i am trying to scrape page insert in DB

how can i speed up ?

should i write c++/python script just for this task ?

are there limitations in php ?

thanks for your time.
Do it from desktop instead. c#/python are good.
RycEric is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote