20.7. Finding Stale LinksProblemYou want to check whether a document contains invalid links. SolutionUse the technique outlined in Recipe 20.3 to extract each link, and then use the LWP::Simple module's DiscussionExample 20.5 is an applied example of the link-extraction technique. Instead of just printing the name of the link, we call the LWP::Simple module's Because this program uses the Example 20.5: churl#!/usr/bin/perl -w
# churl - check urls
use HTML::LinkExtor;
use LWP::Simple qw(get head);
$base_url = shift
or die "usage: $0 <start_url>\n";
$parser = HTML::LinkExtor->new(undef, $base_url);
$parser->parse(get($base_url));
@links = $parser->links;
print "$base_url: \n";
foreach $linkarray (@links) {
my @element = @$linkarray;
my $elt_type = shift @element;
while (@element) {
my ($attr_name , $attr_value) = splice(@element, 0, 2);
if ($attr_value->scheme =~ /\b(ftp|https?|file)\b/) {
print " $attr_value: ", head($attr_value) ? "OK" : "BAD", "\n";
}
}
}Here's an example of a program run: % churl http://www.wizards.com This program has the same limitation as the HTML::LinkExtor program in Recipe 20.3. See AlsoThe documentation for the CPAN modules HTML::LinkExtor, LWP::Simple, LWP::UserAgent, and HTTP::Response; Recipe 20.8 |