Yahoo and Nutch

Its very true that you learn something new every day and today I learned that Yahoo are using Nutch in a research capacity.

Welcome to the Yahoo! Research Labs implementation of the Nutch open source search engine (www.nutch.org). This search engine is intended as a demonstration platform for a number of search related technologies

I found it purely by chance. If you don’t believe then have a look at Yahoo’s intall of Nutch. I think that its a smart move on their part because they get to see how it does its stuff and assess it. They may even be able to incorporate some of it into their own products.

Marketing a simple website

I have spent a fair bit of time working on another website that had some of the most horrible HTML I have ever seen. I managed to actually upload the site last night and it is now live. I didn’t design the site I just converted it to HTML Transitional that validates from some Dreamweaver mess.
I have already made a few entries about this in my blog so here’s the link.
Aerospace NDT
The people at Aerospace NDT realised they where not getting enough from their website so they contacted me to see if I could do something with it. I had a look at their site and wrote up what I thought of it and gave them some advice as to what I though could be done with it to improve its visibility etc. They seemed to like what I said because I got the job.
I am basically tasked with getting their site up the google ranks which I have already done and quite substantially. I was very lucky and they were unlucky in the fact that the single greatest change required to the site so far has been the removal of the splash screen. They were unlucky in this because their last developer had left them with a site that could not be seen by the search engine because there was not a single link off the splash screen. This also meant that in certain browsers without flash they could not actually see the websites.
I have made some fundamental changes to their site during the conversion from the old one so we should see an overall increase in the google ranks but time will tell. I am keeping a tally for certain search terms to make sure that what we do has a positive affect on the site so watch this space.

Open source tools for MARC LIbrary records

I am no librarian but today I got to put on my glasses and tell everyone to be quiet because I was investigating open source library systems. The first one I had to look at is
Koha

Koha is the world’s first free Open Source Library System. Made in New Zealand by the Horowhenua Library Trust and Katipo Communications Ltd, the Koha system is a full catalogue, opac, circulation, member management and acquisitions package. To our knowledge Koha is used by public libraries, private collectors, university faculties, not for profit organizations, churches, schools and corporates. People from as far afield as Australia, USA, Canada, Estonia, India, Nigeria and Poland have installed Koha.
Key features

This is apparently used by a lot of people and does MARC records searches etc etc.
The install was very swish (my idea of swish is not some flash GUI, a simple command line install is fine for me) which gave me the warm and fuzzies. It also came with some sample data which was nice. Different ports are used for different things which was a bit confusing because I initially went to the admin screen and was wondering where all the library data was meant to go when I discovered I needed to go to a different port number to actually use the library system.
I can’t say I was too impressed with the interface. First off, its not very intuitive. This might be because I am not a librarian and don’t really understand what all these funny numbers are for but I still couldn’t get used to the look and feel of it. I suppose this could be customized with a little css.
The other thing I tried was to load a Z39.50 MARC record into the database from one of the online servers. This failed miserably and gave some very cryptic pop up boxes telling me I had not filled in some mandatory fields. It took me 40 minutes to realize that there are some mandatory fields that are not marked as mandatory on another screen. On filling in this it still refused to work. On hunting around the logs I noticed that when you carried out a Z39.50 search the log would be hit every second or two until you closed the search window. I can only assume this is a bug because I cannot think why you would want to do it otherwise.
One thing in its favor is that its written in Perl so if we do decide to run with it I should be able to patch or add things to it that don’t work or that don’t suit our install. Tomorrow I am going to be looking at phpmylibrary which from what I have read of it is quite nice.

Continue reading “Open source tools for MARC LIbrary records”

Nutch and Lucene

We have been wanting a search engine at work for some time now so I started looking at Lucene. I downloaded it and got it running and doing some basic stuff but what we really wanted was something web based, ie an out of the box solution.
I suggested we try Nutch, so I spent today getting it running. Nutch itself is a piece of cake to get working, what wasn’t so easy was getting Tomcat4 working with Nutch.
After much swearing and perspiration I finally manged to get it working and it is as sweet as a nut. We indexed just over 200 word documents in a few minutes (test machine is an old celeron) and gave it a whirl. Straight out of the box solution to your search engine problems. I was very impressed. I may have more to report on this next week because we might be putting it on one of the larger servers for a trial run.

ICANN & IWILL

What planet are ICANN transmitting from!
They have decided to change the policy on transfering domains ie if you are unable to respond to the transfer request and deny it withing 5 days the transfer goes ahead. What does this mean and why is it bad.
I am the sole contact for all of my domains which means if I was on holiday and someone initiates a transfer request and I don’t respond which I won’t because I am on Holiday I get back home and my domain has been given to somone else. The same thing would happen if I was in hospital. For those non techs out there the following is a good analogy.
You decide you would like to rent in London so you have a look around and get yourself a nice property and sign a contract for 2 years with a first option to extend if you want. You pay your deposit and move in. Its great people learn where you live they know where to find you and your little falt becomes prime location. Having the option to always rent this flat is also great because you want to stay.
Then one day you go on holiday and someone who wanted the flat decides to move in, under current rules they cannot. Under new rules if they knock the door and there is no reply for fives days they are able to break the lock and move in.
So when you get back someone has moved into the flat you spent so much time on and there is not a thing you could do because you didn’t answer the door.
This is absolute nonsense and I can only assume ICANN are doing it because there is some way to make some money from all the court cases which are going to appear when the fraudsters start trying to snatch domains that they shouldn’t have.
Luckily for me I use 123-reg.co.uk which posted me the following today:
Dear Customer,
On 12th November ICANN will introduce a new policy designed to make
transfers of non-UK domain names between Registrars quicker and easier.
From this date, if there is no acknowledgement from the domain
owner/admin contact within 5 days of a transfer request being made, the
transfer will automatically take place.
While a great step forward in ensuring domains can be freely
transferred by their owners, 123-Reg is concerned that this new system
could make it easier for your domain to be fraudulently transferred
away from 123-Reg. We would like to reassure you that we are taking
steps to guard against this happening to you. From the 12th, therefore,
all your non-UK domains registered with us will be automatically locked
so that only you can unlock them and initiate a transfer.
The new system will not affect your ability to manage your domain in
the usual way, and will simply mean that should you wish to change name
servers or transfer a domain away from 123-reg you will first need to
unlock it. This can be done quite simply from your 123-reg Control
Panel.
As we will be unable to accept liability if you unlock your domain and
an unauthorised transfer results, we strongly advise that you make sure
domains are kept locked at all times except when absolutely necessary
to change name servers or initiate a transfer.
Best Wishes,
The 123-Reg Team
Thankyou 123-reg for protecting me from the idiocy of ICANN which should now be named ICANN&IWILL.

A Concise History of Mathematics

I have just finsihed reading.
From: A Concise History of Mathematics
ISBN: 0486602559
Author: Dirk J. Struik
Edition: 4th
If one thing I can say without doubt this book is concise. It flys along at blistering pace and in just over 200 pages covers several thousand years of mathemtical history. If you are looking for a brief overvirew of the topic then this is the book.
It is also a great book to try and guage your interest in the topic. Its well written, well researched and enguaging so if you are unable rummage your way through it then I doubt one of the larger or more in depth coverages would suit you. This is of course coming from someone who has not yet read one of these but I am now looking at some of the older classics that I might try next.
One thing I have to mention is the citations. You could use this book to research topics in maths based on the amount of cited literature at the end of each chapter alone.
Personally I think the amount of work that has gone into this book is vast and in stark relation to its size. I would recommend it to any maths enthusiast or historian.

Diligent Editing of HTML

I am a fan of standards ie XHTML Transitional/Strict etc. To this end I do try to make sure that I am keeping my own sites reasonably compliant. Sites I do commercially are always 100% compliant but thats because I insist on it and they have placed their trust in me.
Just recently I have had to convert a really bad site to XHTML Transitional and if you had seen the markup you would have realized how big this task was. To go through it by hand would have been an enormous task and quite frankly I would have been unable to do it at the price I quoted without the following tools:
1. Vim ( Braam Moolenaar )
2. Template Toolkit TT2 ( Andy Wardley )
3. HTML Tidy (Dave Ragget)
4. W3C Validator ( The W3C Validator Team )
The first tool (Vim) could really be any good text editor ie Emacs, ed, or any of the vi children. I just happen to use Vim and once you have learned the basics joy to use and makes editing text almost an art.
TT2! the second tool is slightly more specialized and less well known but just as easy to use, but it deserves a big mention. TT2 is a templating system. Most people won’t really understand or even need to know what the advantages of this is until they need to edit a 10+ page website and hate it when someone wants to change a font on some item on all the pages. This could of course be done using server side includes or some other method but TT makes this easy but also exposes a programmatic API which make its functionality and versatility as wide as the programmers skills. This only scratches the surface of what TT can actually do for you.
The third tool is Dave Raggets HTML Tidy. This one tools is what saved me from going stark raving mad this weekend. Visually selecting an area in vim and then
‘<,’>!tidy -asxhtml -icbq -wrap 100
was what kept me sane. This single command will take ANY html fragment and sanitize it for you. It adds a lot of guff that you may not want but you can remove that and you have a sanitized version complete with CSS.
I just wanted the formatting, indenting and validation. I weeded out the CSS and I was left with a nice plain HTML document that I was then able to understand rather than some debauchery of a mess the devil would not have started with.
Using Tidy this way is a great way to get a clear place to start when converting a messy HTML page.
Last but not least is the W3C’s validator pages for both CSS and XHTML. After all the grunt work is over its time to check the pages and using the methods above I managed to come in with:
Out of 29 Pages:
20 html errors
2 css errors
this took me about 30 minutes to fix!

HTML Validation

I’m fairly lazy when it comes to validating my own site. I mean, who can be arsed making edits and then validating them all every time 😉
I know there are plenty of people who do it but I am not one of them. I normally check to make sure that it looks OK and thats about it. I am not even that concerned about displaying in Internet Explorer ( I have minimal real visitors a month and the rest is blog spam touting Viagra ). This is because I use Debian almost exclusively at work and at home and it is a major pain in the ass to check the windows side of things.
What I have tried to do is be quite strict with myself when I am making edits to my website. What this has resulted in is:
I checked 18 pages of my website and found 5 errors (all silly) all of which were on one page and caused by character references.
For those that have used the W3C validator this is not bad going at all. I know the purists will still think this is crap and that all HTML/XHTML should validate all the time. I believe this would be great too but unfortunately some of us have a life to lead outside the webosphere.
For those that always mean to get around to validating their websites but never do then my final word on HTML Validation is this:
“If you can’t validate religiously, at least edit diligently”
How can I say this. Well it takes more skill to get it right first time than to correct it after you have been shown your mistakes!!!

Welcome to the Collective! Resistance is futile!!!

I be you all though that Star Trek and the Borg was some pipe dream. Well not any more.
I am sure people are wondering why anyone would want to wear such a contraption but surely we would have said that about what is now the humble mobile phone earpiece just a few years ago. Its coming:
RESISTANCE IS FUTILE!!!
YOU WILL BE ASSIMILATED!!!

Catholic teenagers are sexually frustrated

and they are seeing sex in everything. Including Fruity sweetie wrappers
This made me howl. If someone had not pointed it out to me I would never have noticed it but it would appear that the Graduates from St Blasien Jesuit College, near Freiburg are seeing sex in everything. It sounds to me like they are the perverts if they are able to see two imaginary characters on a sweetie wrapper having sex!
Sometimes some people just go to far!