Some RSS 27 Dec 03

I have been playing with RSS for a few days and since most of the RSS that I have seen has been blogs I decided to RSS enable my plain old XHTML diary to a whizzy RSS compliant new fangled jobby. I have no other reason for doing this other than possible self promotion via my massively increased site traffic and “NOT”…..
I can hear people scream “use X or Y” do not write your own. What would be the fun in using someone else’s RSS generator. I had a look at some of the more noteworthy blogs and I noticed that there is an awful lot of commented out text in the source of the file. This seems to me to be a bit ignorant because I am paying for bandwidth and every bit counts ;-). I know thats a lame excuse but I could not help it nor could I think of a better one. To cut a long story short I used a very crude method to do it.
Using a couple of extra “span” tags I was able to come up with some compliant RSS from my blog. The joy of Perl.
The Script I used
The following script is quite rough around the ages but is gets the job done. If you have any questions about the Perl or why I just had to write my own feel free.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::Parser;
use URI::URL;
use XML::RSS;
use LWP::Simple;
my $base = “/hjackson”;
my $base_url = “http://www.hjackson.org”;
my $PAGES = {
“$base_url/cgi-bin/blog/december.html” => ‘htdocs/blog/december.xml’,
“$base_url/cgi-bin/blog/november.html” => ‘htdocs/blog/november.xml’,
“$base_url/cgi-bin/blog/october.html” => ‘htdocs/blog/october.xml’,
“$base_url/cgi-bin/blog/september.html” => ‘htdocs/blog/september.xml’,
};
my $STATE = { ‘intext’ => 0,
‘intitle’ => 0,
‘inlink’ => 0,
‘inspan’ => 0, };
my $RSS = { ‘link’ => “”,
‘title’ => “”,
‘description’ => “”, };
sub start_tag {
my ($self, $tag_name, $attr) = @_;
if( lc($tag_name) eq ‘span’) {
if( lc($attr->{class}) eq ‘blogtitle’) {
#print “In Span $tag_name\n”;
$STATE->{intitle} = 1;
}
if( lc($attr->{class}) eq ‘blogtext’) {
#print “In Span $tag_name\n”;
$STATE->{intext} = 1;
}
}
if( lc($tag_name) eq ‘a’ and $STATE->{intitle} eq ‘2’ ) {
#print “href = $attr->{href}\n”;
$STATE->{‘inlink’} = 1;
$RSS->{‘link’} = $attr->{href};
}
}
sub text {
my ($self, $text) = @_;
if ($STATE->{intitle} eq 1) {
#print “Title = $text\n”;
$RSS->{title} = $text;
$STATE->{intitle} = 2;
}
if ($STATE->{intitle} eq 2 and $STATE->{inlink} eq 1) {
$RSS->{title} = $text;
$STATE->{inlink} = 2;
}
if ($STATE->{intext} eq 1) {
#print “$text\n”;
$RSS->{description} = $text;
$STATE->{intext} = 2;
}
if ( ($STATE->{intitle} eq ‘2’) and ($STATE->{intext} eq ‘2’) and ($STATE->{inlink} eq ‘2’ )) {
\&create_rss();
}
}
sub end_tag{
my ($self, $tag_name, $attr) = @_;
if( lc($tag_name) eq ‘span’) {
if($STATE->{intitle}) {
}
if($STATE->{intext}) {
}
}
}
my $rss;
sub create_rss{
$rss->add_item(
‘title’ => “$RSS->{title}”,
‘link’ => “$RSS->{link}”,
description => “$RSS->{description}”,
);
$RSS->{‘title’} = “”;
$RSS->{‘link’} = “”;
$RSS->{‘description’} = “”;
$STATE->{intext} = 0;
$STATE->{intitle} = 0;
}
my ($html_page, $xml_page);
while ( ($html_page, $xml_page) = each %{ $PAGES } ) {
my $content = get($html_page);
#print “$html_page \n$content\n”;
$rss = new XML::RSS (version => ‘1.0’);
$rss->channel(
title => “Harry Jacksons Blog”,
‘link’ => “www.hjackson.org”,
description => “Just my Blog”,
dc => {
date => ‘2000-08-23T07:00+00:00’,
subject => “Harrys Blog”,
creator => ‘harry@hjackson.org’,
publisher => ‘harry@hjackson.org’,
rights => ‘Copyright 2003, Harry Jackson’,
language => ‘en-us’,
},
syn => {
updatePeriod => “hourly”,
updateFrequency => “1”,
updateBase => “1901-01-01T00:00+00:00”,
},
);
my @tags = (‘span’, ‘a’);
my $p = HTML::Parser->new(api_version => 3);
$p->report_tags( @tags );
$p->handler( start => \&start_tag, “self,tagname,attr”);
$p->handler( text => \&text , “self,text”);
$p->handler( end => \&end_tag , “self,tagname,attr”);
$p->parse($content) || die $!;
open ( FILE, “>$base/$xml_page”)
or die “Cannot open file $!\n”;
print FILE $rss->as_string;
close(FILE);
}

Leave a Reply

Your email address will not be published. Required fields are marked *