View Single Post
Old 03-13-2014, 05:12 AM  
Barry-xlovecam
It's 42
 
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
Quote:
Originally Posted by adultmobile View Post
Well I just loaded ... maybe doing a porn search engine but there's bing videos for this
Well, have fun but that Amazon Amazon Elastic MapReduce might be the ticket ...

http://aws.amazon.com/datasets/41740

Quote:
Common Crawl provides the glue code required to launch Hadoop jobs on Amazon Elastic MapReduce that can run against the crawl corpus residing here in the Amazon Public Data Sets. By utilizing Amazon Elastic MapReduce to access the S3 resident data, end users can bypass costly network transfer costs.

To learn more about Amazon Elastic MapReduce please see the product detail page.

Common Crawl's Hadoop classes and other code can be found in its GitHub repository.
You might want to price the Hadoop time -- that is "big data" and Hadoop might be a good solution.

Looking for needles in a haystack is more the way I think but as maybe 20%+ of web content is adult: this "shotgun" method may work out to your ends.
Barry-xlovecam is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote