Hackerszone
Welcome Guest,
learn to hack easily with tutorials, python, notepad hacks and more!
Join today, fast and free!

Are you new to hacking? Learn the basics in computer configuration, hacking tools, and hacker terminology all found here on this forum!

Join today!!

Join the forum, it's quick and easy

Hackerszone
Welcome Guest,
learn to hack easily with tutorials, python, notepad hacks and more!
Join today, fast and free!

Are you new to hacking? Learn the basics in computer configuration, hacking tools, and hacker terminology all found here on this forum!

Join today!!
Hackerszone
Would you like to react to this message? Create an account in a few clicks or log in to continue.
Search
 
 

Display results as :
 

 


Rechercher Advanced Search

HZ Tracker
Hacking Widget Visitor Details
Latest topics
»  How to study to understand and apply RPA?
[Perl] robots.txt webcrawler EmptyTue Feb 02, 2021 7:12 am by manas41

» SQL injection and Quote escaping
[Perl] robots.txt webcrawler EmptySun Jun 28, 2015 11:42 am by ADS1

» [TUT] Chmod: Files & Permissions [TUT]
[Perl] robots.txt webcrawler EmptyThu Jun 04, 2015 12:45 pm by Guest

» Reaver pixiewps
[Perl] robots.txt webcrawler EmptyThu Jun 04, 2015 12:23 pm by voidfletcher

» How To Crash Someone's Skype in 10 SECONDS
[Perl] robots.txt webcrawler EmptyThu Jun 04, 2015 12:20 pm by voidfletcher

» Internet Security & IP Security (IPSec)
[Perl] robots.txt webcrawler EmptyMon May 18, 2015 9:00 pm by voidfletcher

» [Python] Infinite / Definite File Generator
[Perl] robots.txt webcrawler EmptyMon May 18, 2015 8:58 pm by ADS1

» [C#] String Case-Inversion
[Perl] robots.txt webcrawler EmptyMon May 18, 2015 8:57 pm by ADS1

» Rekall Memory Forensic Framework
[Perl] robots.txt webcrawler EmptySat May 16, 2015 8:55 pm by ADS1

Who is online?
In total there are 4 users online :: 0 Registered, 0 Hidden and 4 Guests

None

[ View the whole list ]


Most users ever online was 38 on Sun Mar 19, 2023 10:07 pm

[Perl] robots.txt webcrawler

Go down

[Perl] robots.txt webcrawler Empty [Perl] robots.txt webcrawler

Post by Admin Sat May 16, 2015 8:51 pm

Hello,

This is a program takes in a URL in the format "[You must be registered and logged in to see this link.] and crawls to every new domain that it finds copying their robots.txt file. Note that it doesn't dig very deep because it only looks at the source for the front page of a website for new domains. If it was looking at theHackersozne.forumotion.com, for example, it would not look in [You must be registered and logged in to see this link.]

Theoretically, a robots.txt file tells webcrawlers what portions of their website they can and can't index, etc... There's nothing that actually enforces this, but it's supposed to be convention. This is useful because web administrators put things in there that they don't want to show up on a Google search which can mean that information held within is sensitive.

I tested my program on "[You must be registered and logged in to see this link.] which yielded:

[You must be registered and logged in to see this image.]" />

And if you look in one of these you'll see something like:

[You must be registered and logged in to see this image.]" />

Finally, here is the code:

Code:
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use utf8;
my(@domains, $pageContent, $i, $e, $new, $robotsContent);

print "Enter domain to start with: (Ex. \"www.google.com\")\n";
chomp($domains[0] = <stdin>);
for($i = 0;$i<scalar(@domains);$i++){
  $pageContent = lc(get("http://".$domains[$i]));
  while($pageContent =~ /href=\"(.*?)\"/g){
     if($1 =~ /http:\/\/(.*?)\// or $1 =~ /https:\/\/(.*?)\//){
        $new = 1;
        for($e = 0;$e<scalar(@domains);$e++){
           if($domains[$e] eq $1){
              $new = 0;
           }
        }
        if($new){
           push(@domains, $1);
        }
     }
  }
  $robotsContent = get("http://".$domains[$i]."/robots.txt");
  if($robotsContent){
     $robotsContent = lc($robotsContent);
     open FILE, ">$domains[$i] robots.txt" or die "Error: $!\n";
     binmode(FILE, ":utf8");
     print FILE $robotsContent;
     close FILE;
  }
}

Admin
Coder
Coder

Posts : 101
Join date : 2014-04-07

https://thehackerszone.forumotion.com

Back to top Go down

Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum