Blogs SpamAssassin and Trackbacks

I disabled the trackback facility on my blog months ago because I was getting a lot of trackback spam. Around the same time I wrote a SpamAssassin Plugin for Movable Type. I effectively took MT-Blacklists regex database and converted into a form compatible for SpamAssassin and then wrote the plugin. Of course at the time I had disabled trackbacks so I only wrote it to handle comments and it has been going a great job because I get virtually zero blog spam now that the database is trained.
Of course now that I have turned on the trackback facility again I now have trackback spam to deal with. Of course this time I am not going to forget about it so await an update and I will release anther version that will handle trackbacks as well.
As promised this is the extended entry. I have now just added trackbacks to the spamassassin plugin. It was easier than I thought. It took 2 hours to finish it, now all I need is someone to test it. As soon as I have packaged it up into a tar ball I will release it.

df reports wrong size

I had a weird problem the other day when I seemed to be getting inconsistencies between du and df. The two command where in disagreement about what the disk usage was on my box.
thing:/# du -hax –max-depth=1 /
104M total
thing:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 3.7G 3.4G 118M 97% /
The first thing I thought was that there was a corruption on the partition table or some other terrible bug. I asked around and got no answers so I went Googling. I came up with nothing although there seemed to be plenty of people with a similar problem.
It then dawned on me what I might have done. I had originally created the Postgres database under /var/lib/postgres under the root file system. As this database got bigger and bigger I had to move it onto its own file system and mount it there. What I had forgot to do was remove the files from the root filesystem after I had confirmed the move was successful. This meant that 2.4Gb of disk space had not been freed on the root file system. Of course when you use du it only adds up the sizes of the files it sees whereas df reports the device usage. So there where no bugs in this case just simple human error.
thingthong:/home/harry# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 3.7G 1.1G 2.6G 31% /

Blog Spam Success

Quite a while ago I wrote a spamassassin plugin for movable type and have been using it with great success for quite a while.
Just recently I had to move my blog from one machine to another and they had incompatible Bayes database types. This meant that the Bayes database needed to be started from scratch again and I was quite worried that this would take a while. Luckily for me I was off on holiday for 2 weeks which meant I had no access to a PC to watch my comments fill up with the usual spam. On return from my holidays I then used all the spam I had received to teach the Bayes filter again.
I have been quite surprised at how little spam I have received considering I have added less than 200 spam entries to the database. The only time I get new spam now is when I ping sites when I have added a new entry which is a good indication these sites are being spidered for recent entries etc.

passive ftp and iptables

For those that encounter problems when trying to get passive ftp working with iptables make sure that the following 2 modules are loaded.
ip_conntrack_ftp
ip_nat_ftp
This can be done as follows:
/sbin/modprobe ip_conntrack_ftp
/sbin/modprobe ip_nat_ftp

Language Envy

I seem to be suffering from language envy. I have been delving into C (again) for a couple of weeks now to speed up a text indexer and I have really enjoyed it.
I am sick of Perl. Not because I don’t like it it’s just that I am no longer doing anything interesting with it, perhaps this is the problem and not Perl i.e. I have no more interesting things to do that Perl is suited for.
Most of the things I need to do now require more grunt than Perl has got (I am not including XS) This is partly due to everything I need to do I has already been done at some point before or CPAN has a module that does just that and cobbling modules together gets a bit tiresome after a while. I would love to be working on a large Perl project at work but unfortunately that is not the sort of stuff we do.
Some people may regard this entry as a dig at Perl but I would disagree. Perl is brilliant once you get to know it. It allows me to do pretty much anything I want in short order. This does not mean I am not allowed to peer over the fence into the other camps and look on with envy. Perhaps the grass is greener on the other side but I need a change, if nothing else I would come back to Perl with renewed vengeance.
The following languages are all in the possible camp.
C:
Its just lovely. Anyone who has read K&R will know what I mean. Its such a small language and limited only by the programmers abilities. To quote Kim H “C is assembler on steroids”. I can remember being told that in life you can only have two of the following three items in any one item:

  1. cheap
  2. fast
  3. reliable

I think as far as programming languages go C comes closest to all three than any other language. I have delved into C several times and each time I have enjoyed it.
Python:
Everyone seems to be using Python these days and singing its praises. I have actually never written anything in it which I suppose is a good enough reason to take a shot at it. It also gets a good mention in Eric Raymods How To Become A Hacker
C++:
I like C++ because like C its also on steroids but also comes with an added dose of amphetamines, this and some features I dearly love. I like OO programming. I find it intuitive and I like to use Class diagrams to model applications. I am aware the C++ is not for the faint of heart but I have had reasonable success with it when I have used it. It also has the STL which is just a god send, nothing quite like a hashmap (You can see the Perl in me now).
Ruby:
Again like Python I have never had anything to do with Ruby but it gets lots of good reviews from people I trust. It also has the distinct advantage that if you type “best programming language” into Google its comes first ;). I also love smalltalk and I have heard that Ruby is the Bastard child of Perl and Smalltalk.
Java:
I have my reservations about Java but on the occasions I have used it I found the docs to be reasonable. The problem I found with it was that I felt I was working with C++ but with a slower crippled version. The only reason I can think of for using it would be if you really wanted cross platform interoperability (can be done in all of the above) and if you wanted a better paying job because the job market seems to favour Java coders.
As you will probably see from the above I have left out a fair whack of possible languages I could learn. All in all I seem to keep leaning towards C/C++ so this is probably where I am going to go for a while although like most things I won’t just learn it for the sake of it. I always need to be doing something constructive in anything I am learning otherwise I find it laborious and boring.
PS (I would dearly love to learn a little Python to see if it is as fast a RAD language as Perl).

Postgres and libpq-fe.h

I have been using a Perl based indexer for some time now for the search facility on UKlug but just recently I have noticed that its taking a bit too long to run. This is not the first time I have optimized the indexer but this time I decided to bite the bullet and write it in C.
I have also changed the way the search engine works to tf/idf term ranking. This has added some overhead to the indexing so it would just have been slower if I had left it.
Of course interfacing with Postgresql meant that I had to blow the cobwebs off my libpq skills. Having used libpq a while back I was reasonably familiar with it but it still took me a while to get back into it. It does not help that the docs are a bit spartan and the examples are little use. It was a case of fudge it and see what works.
The indexing takes place as follows
get job text
parse out terms
count terms in text
insert into reverse index data_id,term,term_count
The actual weighting calculation takes place at runtime.
Of course there are problems with what I am doing. The biggest of which is character encoding. At the moment I have not really had to worry too much about this because most of the jobs in the database have been either from the states of UK. So for all intents and purposes treating the text as ASCII was sufficient for my needs. I have just recently added both Dutch and German feeds to the database so its getting to the point where I can no longer ignore the encoding issue.
Another problem is of course the fact that a reverse index does not scale as well as other methods although at the moment it’s handling several million entries with comfort.

passwd: Critical error – immediate abort

If you are seeing these messages then make sure that you have installed pam_cracklib.so. It will manifest itself as unrecognized module when trying to change the password or add a user.
PAM unable to dlopen(/lib/security/pam_cracklib.so)
PAM [dlerror: /lib/security/pam_cracklib.so: cannot open shared object file: No such file or directory]
PAM adding faulty module: /lib/security/pam_cracklib.so
If you ever get the following message and are not sure why!
passwd: Critical error – immediate abort
Make sure you install wenglish or another dictionary installed and run
/etc/cron.daily/cracklib
and try again.

Awstats Exploit

The box that I was previously hosted on was cracked a few weeks ago. Root wasn’t gained because it was just Skids that gained entry. The normal crap was found littered in the usual places.
The actual exploit was a no brainer and I have had several attempts on the site since it happened. As usual I got the normal response from the ISP that the attack originated from ie “We just have too many machines to check”. “Not our fault, fix your exploit” etc etc.
To me its a bit like using a hammer to smash a window. You didn’t manufacture the hammer and wouldn’t know how to, you seen your dad use the hammer so you know what it can do and you seen someone smash a window with it. So you imitate the action. This is what the skids do except they then think that they are elite because of it.
I suppose a better analogy to draw would be of a your typical smash and grab robber thinking he’s Auric Goldfinger after the event.
I suppose a lot of it is peer pressure. Always trying to compete amongst each other and go that little bit further. Then of course you get the real crackers fanning the flames so that they can get a bunch of skids doing the mundane stuff and reporting unmanaged boxes to their “uber mates” them included. I wonder how many of these skids reported their latest conquest over an unmanaged box only to go back and find that someone has battened down the hatches.
I must admit I am never going to understand the Skids culture.
Auric, you’d better watch out.