03-13-2014, 05:12 AM
|
|
It's 42
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
|
Quote:
Originally Posted by adultmobile
Well I just loaded ... maybe doing a porn search engine but there's bing videos for this 
|
Well, have fun but that Amazon Amazon Elastic MapReduce might be the ticket ...
http://aws.amazon.com/datasets/41740
Quote:
Common Crawl provides the glue code required to launch Hadoop jobs on Amazon Elastic MapReduce that can run against the crawl corpus residing here in the Amazon Public Data Sets. By utilizing Amazon Elastic MapReduce to access the S3 resident data, end users can bypass costly network transfer costs.
To learn more about Amazon Elastic MapReduce please see the product detail page.
Common Crawl's Hadoop classes and other code can be found in its GitHub repository.
|
You might want to price the Hadoop time -- that is "big data" and Hadoop might be a good solution.
Looking for needles in a haystack is more the way I think but as maybe 20%+ of web content is adult: this "shotgun" method may work out to your ends.
|
|
|