GoFuckYourself.com - Adult Webmaster Forum - View Single Post - Identify blocks of content in NATS-powered sites?

hakkrdan · 05-13-2009, 01:52 AM

Hi -

I'm trying to identify some patterns where I can essentially spider the contents of a NATS-powered site. I'm talking about the intros or short descriptions on a photo set or gallery.

We'll use http://innocenthigh.com as an example because I know they're fucking rad (And no, Billy, I'm not going to scrape your site's content

). For example, hit up their main intro page:

http://www.innocenthigh.com/t1/

As of this writing their most recent update is for "Bree Olsen". Its that "Bree is a really nice girl that ..." that I'm after. According to Web Developer Tools, this textual content can be found inside of:

html > body > div > table > tbody > tr > td > table #Table_01 > tbody > tr > td > table #Table_01 > tbody > tr > td > table #Table_01 > tbody > tr > td > table > tbody > tr > td > span .student_id_story1

I think I can isolate this text depending on where it appears in a span or a paragraph or simply assigned to a class. Hell, I can use any combination of those, but I know that people template the fuckall out of their sites so even that is not a sure-fire way to identify this conent.

Anyone got any tips/tricks? How about from the NATS guys themselves, do you guys check out these posts? If I can get this hammered out, I think I'll be on to something big - unfortunately in the beginning it will only support sites generated via NATS or any other system where content is easily machine-identifiable.

I guess on that note, how many people would care that I was doing this? The only way I'd be doing this is to use that same content to promote said sponsor. I would not be doing this otherwise. If you have a problem with me doing that, then you have a problem with me converting sales for you.

Thanks!

05-13-2009, 01:52 AM
hakkrdan Confirmed User Join Date: Nov 2004 Location: Phoenix, AZ Posts: 223	Identify blocks of content in NATS-powered sites? Hi - I'm trying to identify some patterns where I can essentially spider the contents of a NATS-powered site. I'm talking about the intros or short descriptions on a photo set or gallery. We'll use http://innocenthigh.com as an example because I know they're fucking rad (And no, Billy, I'm not going to scrape your site's content ). For example, hit up their main intro page: http://www.innocenthigh.com/t1/ As of this writing their most recent update is for "Bree Olsen". Its that "Bree is a really nice girl that ..." that I'm after. According to Web Developer Tools, this textual content can be found inside of: html > body > div > table > tbody > tr > td > table #Table_01 > tbody > tr > td > table #Table_01 > tbody > tr > td > table #Table_01 > tbody > tr > td > table > tbody > tr > td > span .student_id_story1 I think I can isolate this text depending on where it appears in a span or a paragraph or simply assigned to a class. Hell, I can use any combination of those, but I know that people template the fuckall out of their sites so even that is not a sure-fire way to identify this conent. Anyone got any tips/tricks? How about from the NATS guys themselves, do you guys check out these posts? If I can get this hammered out, I think I'll be on to something big - unfortunately in the beginning it will only support sites generated via NATS or any other system where content is easily machine-identifiable. I guess on that note, how many people would care that I was doing this? The only way I'd be doing this is to use that same content to promote said sponsor. I would not be doing this otherwise. If you have a problem with me doing that, then you have a problem with me converting sales for you. Thanks! __________________ Dan ICQ: 487641781