Diligent Editing of HTML

I am a fan of standards ie XHTML Transitional/Strict etc. To this end I do try to make sure that I am keeping my own sites reasonably compliant. Sites I do commercially are always 100% compliant but thats because I insist on it and they have placed their trust in me.
Just recently I have had to convert a really bad site to XHTML Transitional and if you had seen the markup you would have realized how big this task was. To go through it by hand would have been an enormous task and quite frankly I would have been unable to do it at the price I quoted without the following tools:
1. Vim ( Braam Moolenaar )
2. Template Toolkit TT2 ( Andy Wardley )
3. HTML Tidy (Dave Ragget)
4. W3C Validator ( The W3C Validator Team )
The first tool (Vim) could really be any good text editor ie Emacs, ed, or any of the vi children. I just happen to use Vim and once you have learned the basics joy to use and makes editing text almost an art.
TT2! the second tool is slightly more specialized and less well known but just as easy to use, but it deserves a big mention. TT2 is a templating system. Most people won’t really understand or even need to know what the advantages of this is until they need to edit a 10+ page website and hate it when someone wants to change a font on some item on all the pages. This could of course be done using server side includes or some other method but TT makes this easy but also exposes a programmatic API which make its functionality and versatility as wide as the programmers skills. This only scratches the surface of what TT can actually do for you.
The third tool is Dave Raggets HTML Tidy. This one tools is what saved me from going stark raving mad this weekend. Visually selecting an area in vim and then
‘<,’>!tidy -asxhtml -icbq -wrap 100
was what kept me sane. This single command will take ANY html fragment and sanitize it for you. It adds a lot of guff that you may not want but you can remove that and you have a sanitized version complete with CSS.
I just wanted the formatting, indenting and validation. I weeded out the CSS and I was left with a nice plain HTML document that I was then able to understand rather than some debauchery of a mess the devil would not have started with.
Using Tidy this way is a great way to get a clear place to start when converting a messy HTML page.
Last but not least is the W3C’s validator pages for both CSS and XHTML. After all the grunt work is over its time to check the pages and using the methods above I managed to come in with:
Out of 29 Pages:
20 html errors
2 css errors
this took me about 30 minutes to fix!

HTML Validation

I’m fairly lazy when it comes to validating my own site. I mean, who can be arsed making edits and then validating them all every time 😉
I know there are plenty of people who do it but I am not one of them. I normally check to make sure that it looks OK and thats about it. I am not even that concerned about displaying in Internet Explorer ( I have minimal real visitors a month and the rest is blog spam touting Viagra ). This is because I use Debian almost exclusively at work and at home and it is a major pain in the ass to check the windows side of things.
What I have tried to do is be quite strict with myself when I am making edits to my website. What this has resulted in is:
I checked 18 pages of my website and found 5 errors (all silly) all of which were on one page and caused by character references.
For those that have used the W3C validator this is not bad going at all. I know the purists will still think this is crap and that all HTML/XHTML should validate all the time. I believe this would be great too but unfortunately some of us have a life to lead outside the webosphere.
For those that always mean to get around to validating their websites but never do then my final word on HTML Validation is this:
“If you can’t validate religiously, at least edit diligently”
How can I say this. Well it takes more skill to get it right first time than to correct it after you have been shown your mistakes!!!

Welcome to the Collective! Resistance is futile!!!

I be you all though that Star Trek and the Borg was some pipe dream. Well not any more.
I am sure people are wondering why anyone would want to wear such a contraption but surely we would have said that about what is now the humble mobile phone earpiece just a few years ago. Its coming:
RESISTANCE IS FUTILE!!!
YOU WILL BE ASSIMILATED!!!

Catholic teenagers are sexually frustrated

and they are seeing sex in everything. Including Fruity sweetie wrappers
This made me howl. If someone had not pointed it out to me I would never have noticed it but it would appear that the Graduates from St Blasien Jesuit College, near Freiburg are seeing sex in everything. It sounds to me like they are the perverts if they are able to see two imaginary characters on a sweetie wrapper having sex!
Sometimes some people just go to far!

Dreamweaver is shit

Or at least my perception of it has been tainted by a website I am attempting to maintain that has been bolted together using dreamweaver. Note I did not use the word “constructed” or “built”. I prefer bolted because its a mess.
First:
Javascript is everywhere, most of which is bug ridden crap. Its being used to load images and has replaced the humble “link” on half the website. This has meant Google cannot see half the website which from a business point of view is critical. If the search engine cannot see your website then no one will find it!
Second:
Images everywhere. Every time a page was requested over 40 images were requested from the server. This is mad, on what appears to be a plain text website. with no adverts. 25% of the images happened to be used as 1 pixel spacers. This is absolute madness!
MAD MAD MAD BLODDY MAD

Political Bias swayed by Moon

The subject heading of this entry sounds a bit mad dosn’t it. I mean, who the hell would believe that the position of the moon could possibly affect the outcome of an election. It dosn’t, but there are those characters who are basing their election decision on the design of Mr Kerry and Mr Bush’s website. Now isn’t that fscked up. For those that don’t believe me take yourself over to slashdot and have a look around……
Does this mean we are seeing the entrance of the designer website. I can see it now
1. Websites by Gucci
2. Menu’s by Prada
3. Footers by Nike
Or, as a dialogue!
Manger
“Ohhh, love your hit counter”
Webmaster
“Yeah! we got Armani in to do it, worth every penny!”
Who the hell could possibly be that shallow?
Wait, we have Hello magazine, Cosmopolitan ( feminist trash ), FHM and Eurotrash that answers that question, the brain dead.

George W. Bush is an Idiot

As far as I am aware he now prevents foreign users from visiting and viewing his website. I am not making this up. Unless you are on a North American ip range you are forbidden from viewing his website.
This is the most powerful man on the planet who has more affect on foreign governments and their economies than some of the local governments do yet if you ain’t American ( an infidel ) you are not allowed to view his website.
The reason for this dumb ass decision is apparently due to his website getting cracked a few times. Do they really think that banning mass IP ranges is going to stop a real cracker, bollix. Its not hard to crack another PC from inside their borders then launch from there.
All this episode has done is made him and his administration look like people who don’t care about us foreigners. But then, why should he care now he hasn’t really given a damn before.
I hate getting involved in politics but some things are just too dumb to abstain from commenting on them.

Blog Shares

I seen the blog shares website several months ago before I had a blog and wondered what it was all about. I have just noticed that I have an entry on it
Mad. So what makes my blog worth more money then 😉 ?

Lexicon

I have started the process of building the lexicon for my search engine. Its actually surprising how slow the list of words increases. This is partly due to me being quite strict in my definition of what constitutes a word. A normal search engine would need to be able to work with all sorts of arbitrary strings (I am not even considering encodings yet) but due to hardware constraints I have limited myself to Perl’s
m/\w/
if it doesn’t match this it won’t go in the lexicon. I know this is a bit harsh but unfortunately I don’t have several hundred machines in a cluster to play with like the other search engines ;). I think if I get over one million terms in the lexicon I will be doing OK.