![]() |
![]() |
![]() |
||||
Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
![]() ![]() |
|
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. |
|
Thread Tools |
![]() |
#1 |
Confirmed User
Industry Role:
Join Date: Aug 2006
Posts: 5,594
|
![]() I have a content rich site that this is happening to more and more.
You can ban their IP addresses, but they soon pop up with another. Is their any way to lock down the server better, yet still allow the friendly SE bots? It's becoming an increasing problem. ![]() |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#2 |
Registered User
Industry Role:
Join Date: May 2009
Location: Orange County, CA
Posts: 94
|
We use strongbox which says it has an anti scraping technology but to be honest, I've never dealt with it. The logic they use makes sense though.
http://bettercgi.com/strongbox/features.html#antislurp |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#3 |
Confirmed User
Industry Role:
Join Date: Aug 2006
Posts: 5,594
|
Thanks Chris.
I forgot to mention the site is written in asp.net, don't know how different methods work exactly with different server setups. Knowing nothing much about server security, I would just have a list of all known "friendly" search engine bots, and everything else gets fooked out if there are multiple session attempts. Is that how it works? I should contact my dedicated host, I'm sure they'll be able to recommend something, that's their speciality after all. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#4 |
Industry Role:
Join Date: Aug 2006
Location: Little Vienna
Posts: 32,235
|
Well,one of methods which i use it is to delete old content and replaced it with completely new content on different location.
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#5 | |
Confirmed User
Industry Role:
Join Date: Aug 2006
Posts: 5,594
|
Quote:
I'd like to simply shut off programmatic attempts to access my sites from unknown IP addresses, period. |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#6 |
Sick Fuck
Industry Role:
Join Date: Feb 2004
Location: www
Posts: 9,491
|
Is it a paysite or free? And when you say "thieves".. are they duplicating your website or just download content?
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#7 | |
Confirmed User
Industry Role:
Join Date: Aug 2006
Posts: 5,594
|
Quote:
First instance they had an autoscraper on a daily scrape to pull the new entries and posting to a cough **new** site, which I soon had closed down via dmca. Now I've caught it at an earlier stage. |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#8 |
Confirmed User
Join Date: Feb 2006
Location: Miami, FL
Posts: 1,556
|
Just put advertisements in your player- then they are advertising for you.
![]()
__________________
I spammed in threads! |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#9 |
Confirmed User
Join Date: Feb 2002
Location: ICQ: 251425 Fr/Au/Ca
Posts: 6,863
|
Honestly, there's not much you can do.
If you can see it, you can steal it.... cookies, restrictions, captchyas etc can all be defeated. The best suggestions are: watermark, adverts, etc. One trick that really does stump most spiders, however, is to link to your content via JS or CSS. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#10 | |
Sick Fuck
Industry Role:
Join Date: Feb 2004
Location: www
Posts: 9,491
|
Quote:
Banning IPs won't help much, because they might just use rotating IPs. Banning known offline user-agents could help, but that is also easy to override (they are sending fake user-agent info). If you have movies, put it in javascript (the agent usually can't read that) - and text outside (for SE). The downside of this is surfers who disabled javascript. If you have a decent CPU on your server, then trick their "browser" into fake links with long delays (like a cgi link) or fake targets that temporarily kill and ban too many attempts. This will lag or cut off their agent temporarily. (might be a very good idea to use robots.txt on those links because you do not want to trick google..) You can also create something that their agent doesn't understand to sort out. The more garbage, the better. For instance, if you have invisible links to the "tubegirl" ![]() You can also structure your site in a way that has no logic and is hard to restructure for a software. You can also watermark your stuff with "licensed to...", but talk with your sponsors before doing that. And if they promote same sponsor, then you should talk about that too ![]() |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#11 |
Confirmed User
Industry Role:
Join Date: Dec 2002
Location: Marina Hemingway
Posts: 2,134
|
You can use trap files ... like 1x1 pix big files named 6tjgTTvtfgh.jpg or something like that. If it get's downloaded, you know it's an illegal bot. Now, write a script that will block that user based on IP and session ID or have him download some malicious bullshit ...
__________________
Asian Babes |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#12 |
Too lazy to set a custom title
Join Date: Mar 2002
Location: Australia
Posts: 17,393
|
I'm facing this problem myself as I have a site with millions of pages. Currently if an IP downloads too many pages within a short period of time and it's NOT on the whitelist (eg the IPs of Google) it gets firewalled for a period of time. It's a pretty aggressive approach but it works (for now)
Most of the people trying to scrape don't bother trying to hide it, so their 3 fetches a second gets picked up pretty quickly. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#13 |
Sick Fuck
Industry Role:
Join Date: Feb 2004
Location: www
Posts: 9,491
|
Please post the domains of these bastards. Without the http
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#14 |
Registered User
Industry Role:
Join Date: Feb 2006
Posts: 22,511
|
i was going to say dmca but looks like you did it.
please no idiots come in here saying that is your site is online you are consenting for it to be scraped. fuck off. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#15 | |
►SouthOfHeaven
Join Date: Jun 2004
Location: PlanetEarth MyBoardRank: GerbilMaster My-Penis-Size: extralarge MyWeapon: Computer
Posts: 28,609
|
Quote:
Have you given consent for google to scrape you ? yahoo ? bing ? etc. Basically you have an apple tree in a public place. You aren't stopping anyone from picking your apples, infact you like some people picking your apples ( google ) even though they never asked to pick your apples and furthermore they are showing your apples on their site and making money from it. shitloads of money. Can't be too suprised when some fatty comes by and picks all your apples one day, after watching everyone else pick them and you not stopping them ![]()
__________________
hatisblack at yahoo.com |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#16 |
Confirmed User
Join Date: Dec 2006
Location: Las Vegas, NV
Posts: 968
|
I'm just kind of wondering why you have a "content rich site" that is free. But that's just me, I may be missing something here.
__________________
Donovan Trent |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#17 | |
Registered User
Industry Role:
Join Date: Feb 2006
Posts: 22,511
|
search engines link to my content. in order for someone to read it they have to click through to my site. there is a difference.
Quote:
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#18 | |
Check SIG!
Industry Role:
Join Date: Mar 2006
Location: Europe (Skype: gojkoas)
Posts: 50,945
|
Quote:
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#19 |
Registered User
Industry Role:
Join Date: Feb 2006
Posts: 22,511
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#20 | |
Registered User
Industry Role:
Join Date: Feb 2006
Posts: 22,511
|
i understand where you are coming from, but there is a difference between a content preview and a full scrape.
Quote:
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#21 | |
►SouthOfHeaven
Join Date: Jun 2004
Location: PlanetEarth MyBoardRank: GerbilMaster My-Penis-Size: extralarge MyWeapon: Computer
Posts: 28,609
|
Quote:
![]() even without that , all google is doing is cutting your page up and displaying it as seperate items, the only difference is they get to show way more ads in the process than someone who just scrapes the page and repukes it up.
__________________
hatisblack at yahoo.com |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#22 |
Confirmed User
Join Date: Dec 2006
Location: Las Vegas, NV
Posts: 968
|
Depends on the content. I've seen plenty of content-rich sites that should be nowhere near free.
__________________
Donovan Trent |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#23 | |
Confirmed User
Join Date: Oct 2002
Posts: 3,745
|
Quote:
Throttlebox is an Apache module. The OP says he used ASP.net. I wonder if that means he's hosting on a Windows desktop instead of a server OS running Apache.
__________________
For historical display only. This information is not current: support@bettercgi.com ICQ 7208627 Strongbox - The next generation in site security Throttlebox - The next generation in bandwidth control Clonebox - Backup and disaster recovery on steroids |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#24 |
Confirmed User
Industry Role:
Join Date: Aug 2006
Posts: 5,594
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#25 | |
Confirmed User
Join Date: Oct 2002
Posts: 3,745
|
Quote:
often does not. We don't send people malicious files of course, we're not criminals, but we do use traps. It's a useful part of a multi-layered approach, but not at all sufficient on it's own.
__________________
For historical display only. This information is not current: support@bettercgi.com ICQ 7208627 Strongbox - The next generation in site security Throttlebox - The next generation in bandwidth control Clonebox - Backup and disaster recovery on steroids |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#26 | |
Confirmed User
Join Date: Oct 2002
Posts: 3,745
|
Quote:
If you've been a webmaster for more than a few days, you know about robots.txt. By choosing not to put up a "no indexing" sign (robots.txt), you've given implied permission for Google to promote you by adding you to their index. I'd bet the people scraping (not indexing) the site don't check for robots.txt. Besides, use a ounce or so of common sense. Obviously webmasters want their porno sites listed in search engines. Duh.
__________________
For historical display only. This information is not current: support@bettercgi.com ICQ 7208627 Strongbox - The next generation in site security Throttlebox - The next generation in bandwidth control Clonebox - Backup and disaster recovery on steroids |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#27 |
(felis madjewicus)
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
|
the cat thinks most of the people in this thread have a skewed definition of content.
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#28 |
Confirmed User
Join Date: Dec 2006
Location: Las Vegas, NV
Posts: 968
|
I guess I was just asking, based on the content being so valuable to you as to be concerned about protecting it. That's all.
__________________
Donovan Trent |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#29 |
Now choke yourself!
Industry Role:
Join Date: Apr 2006
Posts: 12,085
|
One of the ways I've dealt with it is with custom webserver-level applications - never post a direct link to the content. Use a custom hash, decode, and sendfile() the bitch. Otherwise, I've used trivial timestamping and other simple methods to break fuskers. Don't forget to disable supporting HTTP Trace.
__________________
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#30 | ||||
►SouthOfHeaven
Join Date: Jun 2004
Location: PlanetEarth MyBoardRank: GerbilMaster My-Penis-Size: extralarge MyWeapon: Computer
Posts: 28,609
|
Quote:
Quote:
Quote:
by that theory everyone has permission , why would it be implied for google but not implied for others ? is it called the googlerobots.txt ? Quote:
![]() maybe thats what he wants to do is become so rich and well known you will beg him to come scrape your site just like google.
__________________
hatisblack at yahoo.com |
||||
![]() |
![]() ![]() ![]() ![]() ![]() |