![]() |
![]() |
![]() |
||||
Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
|
New Webmasters ask "How-To" questions here. This is where other fucking Webmasters help. |
|
Thread Tools |
![]() |
#1 |
Confirmed User
Join Date: Nov 2004
Location: Phoenix, AZ
Posts: 223
|
Identify blocks of content in NATS-powered sites?
Hi -
I'm trying to identify some patterns where I can essentially spider the contents of a NATS-powered site. I'm talking about the intros or short descriptions on a photo set or gallery. We'll use http://innocenthigh.com as an example because I know they're fucking rad (And no, Billy, I'm not going to scrape your site's content ![]() http://www.innocenthigh.com/t1/ As of this writing their most recent update is for "Bree Olsen". Its that "Bree is a really nice girl that ..." that I'm after. According to Web Developer Tools, this textual content can be found inside of: html > body > div > table > tbody > tr > td > table #Table_01 > tbody > tr > td > table #Table_01 > tbody > tr > td > table #Table_01 > tbody > tr > td > table > tbody > tr > td > span .student_id_story1 I think I can isolate this text depending on where it appears in a span or a paragraph or simply assigned to a class. Hell, I can use any combination of those, but I know that people template the fuckall out of their sites so even that is not a sure-fire way to identify this conent. Anyone got any tips/tricks? How about from the NATS guys themselves, do you guys check out these posts? If I can get this hammered out, I think I'll be on to something big - unfortunately in the beginning it will only support sites generated via NATS or any other system where content is easily machine-identifiable. I guess on that note, how many people would care that I was doing this? The only way I'd be doing this is to use that same content to promote said sponsor. I would not be doing this otherwise. If you have a problem with me doing that, then you have a problem with me converting sales for you. Thanks!
__________________
Dan ICQ: 487641781 |
![]() |
![]() ![]() ![]() ![]() |
![]() |
#2 |
Registered User
Join Date: Mar 2005
Location: a few clicks from disneyland
Posts: 70
|
How about RSS? I belive NATS have RSSDish?
Anyway, if you are looking to fetch contents this way it'll need constant attention. Even the most clever pattern matching can often break or not match at all. ![]() |
![]() |
![]() ![]() ![]() ![]() |
![]() |
#3 |
Confirmed User
Join Date: Nov 2004
Location: Phoenix, AZ
Posts: 223
|
Hi -
I'm pretty sure that the text presented via RSS is not the same text that I've described in these teasers/trailers. Further, everyone is using those RSS feeds, so I'd bet that SEO value is diminished a bit. I've been able to get some good matches down, primarily matching something that I *think* is the target string, then qualifying it if its more than 50 words long. In all honesty if the content is not 50 words long I don't want it.
__________________
Dan ICQ: 487641781 |
![]() |
![]() ![]() ![]() ![]() |