Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar Mark Forums Read
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 06-10-2010, 05:53 PM   #1
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,393
:tongue Remember Cuil the "Google Killer"? They've launched the worlds biggest scraper site

A couple of years ago cuil.com launched to tremendous fanfare and hype, with many saying it could be a Google Killer. People soon realised that the results returned by this new search engine were not up to par, and images displayed with results were seemingly random, often having nothing to do with the search topic or site in the results.

Now they've sunk to a new low, launching an "automated encyclopedia" which is basically a huge site scraper and markov generator that puts out gibberish.

http://www.cpedia.com/

Here's the first few sentences on the entry for Michael Jackson...

Inglethorp herself aware of be able to count michael jackson in the clouds. [1.1]

Rawlings I reflected was very much the sort of storage michael jackson in the clouds and violently as if it trim because't he thought that would have been got silently to my bad no doubt at all in my mind. [1.2]

I listen michael jackson songs to Jolly No morphine left What jadkson martina cole's the take eongs is to do I nodded my head and he shook his. [1.3]


http://www.cpedia.com/wiki?q=michael+jackson



The concept shows promise - grab information about a topic from multiple websites and try to create a single article about it - but the current implementation is horrendous. This is pure garbage.
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 05:57 PM   #2
bignasty
Confirmed User
 
Join Date: Nov 2003
Location: sc
Posts: 1,421
sorry misread
__________________
'
bignasty is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 05:58 PM   #3
vending_machine
Confirmed User
 
Join Date: Jun 2002
Location: Seattle
Posts: 1,062
cuil never impressed me, and the name is retarded.
__________________
vending_machine is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 06:00 PM   #4
GrouchyAdmin
Now choke yourself!
 
GrouchyAdmin's Avatar
 
Industry Role:
Join Date: Apr 2006
Posts: 12,085
Quote:
Originally Posted by rowan View Post
The concept shows promise - grab information about a topic from multiple websites and try to create a single article about it - but the current implementation is horrendous. This is pure garbage.
It's been done.
__________________
GrouchyAdmin is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 06:07 PM   #5
sortie
Confirmed User
 
sortie's Avatar
 
Industry Role:
Join Date: Mar 2007
Posts: 7,771
Plus their bot sucks ass so I blocked their IPs.
__________________
sortie is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 06:16 PM   #6
BigDeanEvans
So Fucking Banned
 
Industry Role:
Join Date: Apr 2006
Posts: 1,368
Sounds like fatfoo!
BigDeanEvans is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 06:17 PM   #7
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,393
Quote:
Originally Posted by BigDeanEvans View Post
Sounds like fatfoo!
That's exactly who I thought of when reading some of the "articles"
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 06:17 PM   #8
Brujah
Beer Money Baron
 
Brujah's Avatar
 
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
So that's how fatfoo does it. Makes sense.
__________________
Brujah is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 09:35 PM   #9
k0nr4d
Confirmed User
 
k0nr4d's Avatar
 
Industry Role:
Join Date: Aug 2006
Location: Poland
Posts: 9,229
I bet they put millions into that new idea for the site. They shoudl have hit me up, I could have written a jibberish generator in a few minutes!
k0nr4d is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 09:38 PM   #10
NetHorse
Confirmed User
 
NetHorse's Avatar
 
Industry Role:
Join Date: Dec 2006
Location: Chicago
Posts: 3,526
God, it looks like the spam I get in my mailbox every morning.
__________________
┌∩┐(◣_◢)┌∩┐
ICQ # 427013273
NetHorse is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 09:43 PM   #11
Agent 488
Registered User
 
Industry Role:
Join Date: Feb 2006
Posts: 22,511
Cpedia and its Detractors
Apr 13, 10:20 AM

Wow, the haters are out in force today. We launched our alpha version of Cpedia last week, and have gotten a little feedback. Some of it was positive, a lot of it questioned whether or not we deserved to live (consensus: no). I was a little surprised at how vituperative things were.

So, I’d like to review some of the top issues we’ve heard about.

First up, Cpedia does very badly with people who write much more on the web than people write about them. Given the 1 billion people on the web one might think this unlikely, but it happens. When we try to summarize the information mentioning these people, we run into a problem. Almost none of it is about them. It’s about random things they have opined on. Dave Parrack, Farhad Manjoo, Louis Gray, I’m talking about you.

Another complaint was that we have stolen, plagiarized, looted, thieved, etc., the information we were providing. People were shocked that all the sentences came from other sources. Yes, all the sentences come from other sources, and we have links to exactly which sources they come from. We do not have a vast array of tiny sensor bots collecting information across the globe on all topics (I’m not even quite sure how this would work for the past). We crawl the web, and use bits of web pages, citing every bit.

A third complaint was that our machines did not seem to really understand the material. People complained of rote recitation, rather than an in-depth understanding. It was ever so. As a child I was made to learn Irish. The Christian Brothers believed in a Platonic theory of learning, where all knowledge was recollection, so they would beat us with leather straps until we “remembered” our Irish vocabulary (this actually works). I, however, could never get full marks, no matter how well I remembered, because my Irish, while technically correct, had no “blas”.

Blas, for those of you not from the West of Ireland, is the polish a hurley gets from the sliothar when used by a player of unusual skill, a patina on the surface of the wood testifying to the depth of talent of the player that had used the stick. Fair enough. Cpedia does not have blas – it’s a machine.

I think a lot of commentators had a misunderstanding about what machines can do, and what Cpedia is trying to accomplish. Cpedia is not an attempt to build something that knows all current knowledge and can write a meaningful essay on any topic – that would be a stretch goal. Rather, we are trying solve a much simpler problem. When people search the web for information, a lot of times the first few results do not contain all the information there is about the subject. Almost no one can continue through all the other pages, because they are almost all regurgitations of the same material, with perhaps a few extra nuggets. Cpedia processes all the pages about a topic, and extracts the unique ideas.

You can see the ideas that we have considered similar when you look at the sources page. There you see how we have collapsed away many other similar sentences to the sentences we show.

Because we remove all this duplication, unique ideas have more chance of coming to the top. We organize the ideas into rough clusters so that information about the same topic is close together.

I’m sorry if people were expecting Skynet. I can understand how it would be upsetting to get psyched up for life, the universe and everything, only to get a different UI on a search engine.


Cpedia has errors. That is intentional. We have tried to be inclusive, and dredge to the bottom of the web. This is great sometimes. For instance, I was meeting a VC recently, and I was able to discover that he has a tendency to over-imbibe. He doesn’t have a Wikipedia page (not that Wikipedia is in the business of random slander) and his firm’s bio on him elides this trivia. If we did not seek out the lower reaches of the web, I would have missed his one redeeming feature.

But bottom-feeding does ensure that we will have mistakes. Some are just algorithmic. For instance, our anaphora resolution (working out who a “he” refers to) is wrong about one in 20 times. This is on par with the accuracy of Hobbs’ algorithm. We sometimes get two senses confused. For example, we have made several pages about Django – one about Django Rheinhart and one about the python framework named after him. There is a little cross bleeding here, as the distinction is not as sharp as it might be. The notion of the identity of objects is not a settled matter – the classic example is Theseus’s Ship, but a simple one is of an axe: if you replace the handle twice, and the head once, is it still the same axe?

The promise of Cpedia is that you will find information that you might otherwise miss. It often works for me. Your mileage will vary. If you find that the page about you is completely random, the only advice I can offer is a poem my six year old recited at breakfast:

A wise old owl sat in an oak,
The more he heard, the less he spoke,
The less he spoke, the more he heard,
Why aren’t we all like that wise old bird.
Agent 488 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-10-2010, 11:31 PM   #12
NetHorse
Confirmed User
 
NetHorse's Avatar
 
Industry Role:
Join Date: Dec 2006
Location: Chicago
Posts: 3,526
Quote:
Originally Posted by Agent 488 View Post
I?m sorry if people were expecting Skynet. I can understand how it would be upsetting to get psyched up for life, the universe and everything, only to get a different UI on a search engine.
__________________
┌∩┐(◣_◢)┌∩┐
ICQ # 427013273
NetHorse is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-11-2010, 01:32 AM   #13
EliteWebmaster
Confirmed User
 
EliteWebmaster's Avatar
 
Join Date: Feb 2010
Posts: 3,983
Quote:
Originally Posted by NetHorse View Post
God, it looks like the spam I get in my mailbox every morning.
So true
EliteWebmaster is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 06-11-2010, 01:51 PM   #14
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,393
I wonder if the problem with the MJ page is because they're scraping spammy junk sites that were not written by a human.

GIGO (Garbage In, Garbage Out)
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks
Thread Tools



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.