Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

 

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
New Webmasters ask "How-To" questions here. This is where other fucking Webmasters help.

 
Thread Tools
Old 05-13-2009, 01:52 AM   #1
hakkrdan
Confirmed User
 
Join Date: Nov 2004
Location: Phoenix, AZ
Posts: 223
Identify blocks of content in NATS-powered sites?

Hi -

I'm trying to identify some patterns where I can essentially spider the contents of a NATS-powered site. I'm talking about the intros or short descriptions on a photo set or gallery.

We'll use http://innocenthigh.com as an example because I know they're fucking rad (And no, Billy, I'm not going to scrape your site's content ). For example, hit up their main intro page:

http://www.innocenthigh.com/t1/

As of this writing their most recent update is for "Bree Olsen". Its that "Bree is a really nice girl that ..." that I'm after. According to Web Developer Tools, this textual content can be found inside of:

html > body > div > table > tbody > tr > td > table #Table_01 > tbody > tr > td > table #Table_01 > tbody > tr > td > table #Table_01 > tbody > tr > td > table > tbody > tr > td > span .student_id_story1

I think I can isolate this text depending on where it appears in a span or a paragraph or simply assigned to a class. Hell, I can use any combination of those, but I know that people template the fuckall out of their sites so even that is not a sure-fire way to identify this conent.

Anyone got any tips/tricks? How about from the NATS guys themselves, do you guys check out these posts? If I can get this hammered out, I think I'll be on to something big - unfortunately in the beginning it will only support sites generated via NATS or any other system where content is easily machine-identifiable.

I guess on that note, how many people would care that I was doing this? The only way I'd be doing this is to use that same content to promote said sponsor. I would not be doing this otherwise. If you have a problem with me doing that, then you have a problem with me converting sales for you.

Thanks!
__________________
Dan
ICQ: 487641781
hakkrdan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook
Old 05-14-2009, 04:42 AM   #2
swordfih
Registered User
 
swordfih's Avatar
 
Join Date: Mar 2005
Location: a few clicks from disneyland
Posts: 70
How about RSS? I belive NATS have RSSDish?

Anyway, if you are looking to fetch contents this way it'll need constant attention. Even the most clever pattern matching can often break or not match at all.
__________________
lamp?
swordfih is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook
Old 05-22-2009, 10:12 PM   #3
hakkrdan
Confirmed User
 
Join Date: Nov 2004
Location: Phoenix, AZ
Posts: 223
Hi -

I'm pretty sure that the text presented via RSS is not the same text that I've described in these teasers/trailers. Further, everyone is using those RSS feeds, so I'd bet that SEO value is diminished a bit.

I've been able to get some good matches down, primarily matching something that I *think* is the target string, then qualifying it if its more than 50 words long. In all honesty if the content is not 50 words long I don't want it.
__________________
Dan
ICQ: 487641781
hakkrdan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook
 
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.