I have been using a Perl based indexer for some time now for the search facility on UKlug but just recently I have noticed that its taking a bit too long to run. This is not the first time I have optimized the indexer but this time I decided to bite the bullet and write it in C.
I have also changed the way the search engine works to tf/idf term ranking. This has added some overhead to the indexing so it would just have been slower if I had left it.
Of course interfacing with Postgresql meant that I had to blow the cobwebs off my libpq skills. Having used libpq a while back I was reasonably familiar with it but it still took me a while to get back into it. It does not help that the docs are a bit spartan and the examples are little use. It was a case of fudge it and see what works.
The indexing takes place as follows
get job text
parse out terms
count terms in text
insert into reverse index data_id,term,term_count
The actual weighting calculation takes place at runtime.
Of course there are problems with what I am doing. The biggest of which is character encoding. At the moment I have not really had to worry too much about this because most of the jobs in the database have been either from the states of UK. So for all intents and purposes treating the text as ASCII was sufficient for my needs. I have just recently added both Dutch and German feeds to the database so its getting to the point where I can no longer ignore the encoding issue.
Another problem is of course the fact that a reverse index does not scale as well as other methods although at the moment it’s handling several million entries with comfort.
passwd: Critical error – immediate abort
If you are seeing these messages then make sure that you have installed pam_cracklib.so. It will manifest itself as unrecognized module when trying to change the password or add a user.
PAM unable to dlopen(/lib/security/pam_cracklib.so)
PAM [dlerror: /lib/security/pam_cracklib.so: cannot open shared object file: No such file or directory]
PAM adding faulty module: /lib/security/pam_cracklib.so
If you ever get the following message and are not sure why!
passwd: Critical error – immediate abort
Make sure you install wenglish or another dictionary installed and run
/etc/cron.daily/cracklib
and try again.
Walsh Western Shambles
I ordered a set of Non Dell rails from Dell 7 days ago and they where ready to be delivered to me on the Thursday. I though this was great. I got a call on Thursday from Walsh Western saying that they had tried to deliver the rails. I found this very odd because Jenny was in the house. They said they left a card saying they had been there. I phoned Jenny and she had a look for the card. It wasn’t there.
Friday: Same thing. They say they left a card and Jenny was off work the whole day but we received no card and have not heard anything from them at all.
Saturday: I called them to ask what the hell was going on and they told me that it would be delivered on Monday.
Monday: No rails, No card, No phone call, Sweet FA. I tried to phone but they all close at 17:30.
TO BE CONTINUED
This is just a bit pathetic although not as pathetic as my recent encounter with Midland Mainline.
Awstats Exploit
The box that I was previously hosted on was cracked a few weeks ago. Root wasn’t gained because it was just Skids that gained entry. The normal crap was found littered in the usual places.
The actual exploit was a no brainer and I have had several attempts on the site since it happened. As usual I got the normal response from the ISP that the attack originated from ie “We just have too many machines to check”. “Not our fault, fix your exploit” etc etc.
To me its a bit like using a hammer to smash a window. You didn’t manufacture the hammer and wouldn’t know how to, you seen your dad use the hammer so you know what it can do and you seen someone smash a window with it. So you imitate the action. This is what the skids do except they then think that they are elite because of it.
I suppose a better analogy to draw would be of a your typical smash and grab robber thinking he’s Auric Goldfinger after the event.
I suppose a lot of it is peer pressure. Always trying to compete amongst each other and go that little bit further. Then of course you get the real crackers fanning the flames so that they can get a bunch of skids doing the mundane stuff and reporting unmanaged boxes to their “uber mates” them included. I wonder how many of these skids reported their latest conquest over an unmanaged box only to go back and find that someone has battened down the hatches.
I must admit I am never going to understand the Skids culture.
Auric, you’d better watch out.
Edenvale University Scam
The university above is completely fictitious. They have copied Leeds Uni. Compare the following 2 sites
Leeds University
http://www.edenvaleuniversity.net/
They have even left references in the source code of the page to Leeds university. They are selling qualifications. It would appear that some people are using them
http://www.quivivre.com/datadean/ddcv-txt.txt
After reading this guys CV I noticed what appears to be another dodgy site
http://www.wwdlc.net/
I have not seen this “Body” mentioned anywhere else yet they have listed Edenvale as one of their universities.
SCSI or SATA
There seems to be a bit of confusion for some people as to what they should be using when it comes to hard drives.
There are those that will disagree with this and say they use SATA in a server or SCSI at home. For those then you probably know what you are doing so reading this is a bit moot.
If you are a home user then use SATA. Spending megabucks on SCSI would be a waste of your money and you are quite unlikey to see the benifits of it over SATA. SATA is fast enough for everything I have needed at home, and more.
If you are running a server that is on 24/7 and you expect it to remain that way for a very long time and you don’t want to be called ot at 4am in the morning to replace a disk, use SCSI.
ia_archiver DOS attack
I had to send alexa the following message tomnight after suffering a Denial Of Service attach from Alexa. They have been hitting me about 5 times a second for quite a while now and it does not look as if it is going to ease up.
Your ia_archiver robot is hitting my website 5 times a second.
www.uklug.co.uk
This is crazy, please stop it now. I have now added it to the robots.txt file. Don’t you know that a what you are doing to my server is against the law. I can hardly use it at all. Get seomeone in the tech department to fire the idiot who wrote/runs that spider, their clueless.
I blocked them using mod_rewrite and my advice to anyone that has more than a few pages would be to do the same. They are not playing the same ball game as google and yahoo, they obviously don’t give a damn about the people on the recieveing end of these attacks. Put the following in you .htaccess file.
RewriteCond %{HTTP_USER_AGENT} ^.*ia_archiver.* [NC]
RewriteRule ^.*$ – [F]
They recommend adding the following to your robots.txt file.
User-agent: ia_archiver
Disallow: /
I have heard that it is sometimes ignored though so my advice it to block them for good using mod_rewrite. They are only leeching the data or their own gain anyway. At least google forwards you some traffic.
lamicrogroup rip off
I just came across www.lamicrogroup.co.uk tonight and had a look their prices and its more expensive to buy from them than it is to go buy from the dell outlet.
Much to my astonishment I then came across www.lamicrogroup.com which has the exact same products and the exact same website except the prices are in dollars. What knocked me for six is that all the servers are half the price of the uk models.
This begs the question, are they ripping off uk customers? It certainly looks like it to me.
Photo Blog
I just found a my firstPhoto blog tonight and I have to say I was very impressed with it. They have captured some lovely images.
Rack Space and Servers
Are a pain in the ass to find. Actually I should narrow that down a bit. It’s hard to find cheap rack space and cheap servers that are in or close to London and from a company or someone you can trust. There are various offerings on the market most of which are downright rip offs. Its amazing how the prices can differ between 2 suppliers in RedBus or how the prices of 2 server can differ.
Out of curiosity I priced up a quarter rack ages ago (Aug 2003) to see what I could get and the cheapest was £3300 with a 1Mb connection and power. This also came with a £350 setup fee. I have had a look around a few forums and the cheapest I could see tonight was around £3000 with a similar setup fee. It would seem that rack space is holding its price regardless of how many people seem to be in on it.
I only need 1u or 2u to start with and the prices are quite expensive. I am currently looking at UKFSN because I know people with a box there and the guy who runs it (Jason Clifford) has a very good reputation. The proceeds also go to help the free software movement which is something I am interested in. I have seen a few cheaper than this but none by any great margin, at least not yet.
I also need to get a nice 1u possibly 2u rack mount server. I like Dell because parts are fairly cheap and common on ebay but I also like HP which are not cheap and parts are not as common on ebay. If the HP goes duff there is a good chance you will need to go back to the manufacturer and when you do make sure you visited Boots for some ky jelly.
As for which one is more reliable I would need to speak to several unbiased sysadmin’s who had been using both for years. There are good and bad reports for both online.
Given a choice between the 2 I would pick a dell simply because there are more spare parts floating around, although like HP, if you buy direct don’t forget to visit Boots.