Robots.txt file

There appears to be some misunderstanding surrounding the usage of the robots.txt file.
The following is just a fraction of the stuff I have found while spidering websites.
Noarchive: /
The “noarchive” statement should be part of a meta tag it should not be in the robots.txt file. Its not part of the standard.
I believe that the following or something similar should be in the standard but it isn’t yet ie “Crawl-delay”.
Crawl-Delay: 10
Crawl-delay: 1
Crawl-delay: 60
It is implemented by a few crawlers but people insist on doing the following
User-agent: *
Crawl-delay: 1
The proper way should be as follows,
User-agent: Slurp
Crawl-delay: 1
I know Yahoo’s crawler (Slurp) adheres to the Crawl-delay directive
but here we are endorsing a non-standard method, whether this is a good or bad thing is left up to the reader to decide. I think there needs to be a delay type option in the robots.txt file having been hammered once by msn’s bot.
Then we have the people who think that they need to authorize a spider to spider their website.
Allow: /pages/
Allow: /2003/
Allow: /services/xml/
Allow: /Research/Index
Allow: /ER/Research
Allow: /
The reason for not having an Allow directive is simple. Hardly any of the internet would be indexed becasue only a fraction of the websites online actually uses the robots.txt file. By implementing an Allow directive it would mean that websites are closed for business to the spiders. For instance, take the following directive
Allow: /index_me/
is the spider then to assume that only that directory is available to the spider on the entire website, what can the spider assume about the above directive. To me it reads that only the “index_me” directory is to be indexed. What then is the point in the Disallow directive.
The Disallow directive was chosen because the internet is for all intents and purposes a public medium so we all opt in when we put our websites up then we opt out of the things we don’t want indexed.
my favorite though are the following. The honest mistakes
Diasallow: /editor/
Diasallow: /halcongris/
Disallow /katrina/
Diasallow: /reluz/

Copyright by Blunderbuss or Creative Commons

I had heard of Prof. Lessig from general browsing on the internet so I know he’s got some clout with the online community, blogosphere whatever you want to call it but I had never really taken the time to find out what he does that seems to cause such a stir. He seems to have an almost religious following in some circles so I thought that I should go and see just exactly what all the fuss is about.
I had heard of the Creative Commons before so off I went umbrella in hand to University College London’s Edward Lewis Theatre and grabbed myself a seat. I immediately recognized him because I had visited his website before I went to the talk for a general nosey.
This is just the way I heard it I am sure I have probably got some of the ideas and concepts wrong 😉
I loved the way he started his talk ie he took us back to the days when George Eastman was setting about pioneering the camera and how a law passed then enabled the camera business to flourish the way it did. He then described a few things that we take for granted ie cultural remix (first time I had heard this phrase), the act of taking something like a song and putting your own spin on it or having watched a movie how we describe it to our friends and embellish it the way we see it. This goes on every day and there are no copyrights on this and there shouldn’t be.
He then moved this onto the digital age and casually pointed out that our cultural remix which we take for granted every day was now, in part, a digital phenomenon and no longer limited by distance. Kids today are growing up in this digital age and are making friends across the world without even meeting up so our once limited cultural remix has set new boundaries on a global scale. The way we think eat and speak and go about our business is now wrapped up online in this huge boiling ménagerie of digital stuff. People are expressing themselves in ways we would not have dreamed about a few years ago ie we have a new age cultural remix going on and this is a good thing. What is not good is that we have the middle men ie the lawyers trying to stifle this from happening. The lawyers and some corporations are doing this by making vast areas of our new remix illegal. ie
“Using DDT to kill a gnat”
(from memory, used by Prof Lessig in the talk, probably slightly misquoted)
was the way Prof Lessig described it and this is wrong. It was quite clear that Prof Lessig believes in copyright and so do I but it was also clear that he does not believe in applying it blindly. The normal bluderbuss approach to copyright seems to get his goat and quite rightly so, its bloody stupid.
Anyway, the talk centered around the creative commons license and what it means to us and what we can use it for and why we need it.
At the moment everything written down is copyright to the author or creator of it regardless of whether they have stuck the big C on it somewhere. This means that everything on the internet is expressly copyright unless stated otherwise. For people who want to use something they find on the internet ie a DJ finding a sample from a song, they cannot unless they have permission from the owner of the copyright so they have to get lawyers (middlemen) involved to sort out the legal stuff and they can carry on with their mixing. What the creative commons enable us to do is release a piece of work and mark it so that people know what they can and cannot do with it without having to get the lawyers involved ie cutting out the middleman. I am all for this, its a wonderful idea.
Can I prove that its a wonderful idea, yes I can. During the talk Prof Lessig played part of a soundtrack that had been released under the creative commons license “My Life” by Colin Mulcher which was then edited by Cora Beth and the editing certainly added something to the track. It was brilliant. This is not an isolated incident either.
Anyway, I have just found some of the material from the talks online so your time would be better spent watching these flash movies than reading this.
You might also be interested in
Learning More
Creative commons website
Find CC content
George Eastman

Jobs via RSS

Some people really are missing the point when trying to use RSS to list jobs. I have noticed several sites posting the title of the job and a link but there is absolutely no description. I have seen others posting a few words of the description.
This is absolutely useless becasue it contains hardly any useful information for the person finding the information, this is assuming they find it at all. Personnally I won’t add a feed to UKlug unless there is a description with some helpful text.

Gojobsite RSS feeds

I was using the gojobsite feeds for UKlug but for some reason the feeds seem to have went out of date. I can only presume that the techs at gojobsite decided that it would not be worth their while to keep them.
I have sent a couple of emails to see if it would be possible to get the feed back because out of the Uk job sites gojobsite seems to have some of the better adverts and I liked using it on UKlug.
Unfortunately I have had no reply at all from them. Shame!

Spyware and Virus’s

A friend asked me to have a look at her mom’s PC due to several problems that where driving her up the wall. The following is the list I was given.
1. McAfee had run out and was asking for reregistration and fees etc.
Cure: Uninstall and replace with AVG — free edition
Why: AVG is free and as good as McAfee.
2. Windows Messenger pop ups from the internet.
Cure: Disable the service in Services
Why: Windows messenger service should be disabled. For some reason microsoft leave it enabled for normal users which is a bit silly in my book.
See Below for the cure.
3. Firewall.
Cure: Install the free edition of Zone Alarm
Why: Its free and a great product.
Some other things that I had a look for when I was fixing the PC.
Spyware:
To check for spyware download SpyBot and it should find the most common ones. I found the following.
IE DSO Exploit
Wild Tangent
Alexa Toolbar
Media Plex
and a whole lot of tracking cookies. I know a lot of this stuff can be harmless but I consider anything that sends information over the internet without the users express permission as spyware. Why AOL decided to use Wild Tangent I have no idea. Its spyware in my book.
The IE DSO Exploit may not actually be a problem due to a bug in SpyBot. I updated the windows installation that fixes the problem but SpyBot was still throwing a wobbly. A quick Google showed that this is a problem with SpyBot if windows is properly up to date.
Service Pack 2
I downloaded this and installed it which should take care of some problems and no doubt introduce a few others.
HOW TO DISABLE WINDOWS MESSENGER
Windows XP Home
Click Start->Settings ->Control Panel
Click Performance and Maintenance
Click Administrative Tools
Double click Services Scroll
down and highlight “Messenger”
Right-click the highlighted line and choose Properties.
Click the STOP button.
Select Disable or Manual in the Startup Type scroll bar
Click OK
Windows 2000
Click Start-> Settings-> Control Panel-> Administrative Tools->Services
Scroll down and highlight “Messenger”
Right-click the highlighted line and choose Properties.
Click the STOP button.
Select Disable or Manual in the Startup Type scroll bar
Click OK
Windows XP Professional
Click Start->Settings ->Control Panel
Click Administrative Tools
Click Services
Double click Services Scroll
down and highlight “Messenger”
Right-click the highlighted line and choose Properties.
Click the STOP button.
Select Disable or Manual in the Startup Type scroll bar
Click OK
Windows NT
Click Start ->Control Panel
Double Click Administrative Tools
Select Services-> Double-click on Messenger
In the Messenger Properties window, select Stop,
Then choose Disable as the Startup Type
Click OK
Windows 98 & ME
Windows Messenger Service cannot be disabled

Multiple users and X-Windows

I wanted to open an application today as the Postgres user while logged in as my normal user account. I know that dropping access control to the xserver can be a bit of a security risk but I also don’t like flicking between users to achieve a task. I don’t mind opening an xterm but logging in and out of xwindows is not much fun.
Anyway to get an application working using the insecure method we can do the following. User A is the main user and you want user B to be able to open an app in users A’s session.
A@machine:~$ xhost +
A@machine:~$ su – B
password *********
B@machine:~$ export DISPLAY=:0.0
B@machine:~$ /path/to/application/
This was easy but I don’t like using
A@machine:~$ xhost +
and allowing everyone access. This is not smart so I decided to see if there is a more secure method that avoids this. Having a read of the xhost manual I found out that I could limit access on a per host or per user basis as follows
A@machine:~$ xhost +B@
which gives me a lovely error message as seen below. I have tried various different methods but I get the same error message.
B@ being added to access control list
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 109 (X_ChangeHosts)
Value in failed request: 0xfe
Serial number of failed request: 7
Current serial number in output stream: 9
so it would appear to me that there is something amiss somewhere. I googled for quite a while to see if I could find a definitive answer. No joy, they all recommended using xhost + which is not what I want to do.
Simple things like this can be such a bloody chore under Linux. I know, I know stop bitching and start patching.
Anyway. I can remember doing something similar to what I want with ssh so I had a look at the man page and found the following snippet
-X Enables X11 forwarding. This can also be specified on a per-host
basis in a configuration file.
X11 forwarding should be enabled with caution. Users with the
ability to bypass file permissions on the remote host (for the
user’s X authorization database) can access the local X11 display
through the forwarded connection. An attacker may then be able
to perform activities such as keystroke monitoring.
This meant that I could do the following.
A@machine
A@machine:~$ ssh -X B@machine
password ********
B@machine:~$ /path/to/application
and I get the window displayed. Remember that you need to edit the
/etc/ssh/sshd_config
files and set X11forwarding to yes. This is more secure than using xhost + but still not ideal but good enough for what I want it for.

Fun With the Gimp

I have been meaning to learn some sort of image manipulation tool for quite some time. I have used the Gimp in the past when I want to scale something or convert images from one format to another but I have never had the time or inclination to sit down and learn enough of it to be reasonably confident.
I have just had a long weekend so I decided to learn it because like it or lump it, its a very handy tool in any web developers arsenal. I mainly do backend work but there are occasions when I envy the versatility of other web developers when putting together sites. If you want to have a look at the sort of stuff I mean have a look at the The Zen Garden. Some of the pages on that site are just divine.

Sans Serif or not to Sans Serif

I am pretty damned sure that fonts are one of the most used yet least appreciated aspects of computing I have come across.
It would be fine to interject here and say NO, surely its “blah de blah”. I say no…… People in general don’t give a damn about how machines work / talk / do their stuff, however they are concerned with how stuff looks.
For instance, look at the fashion industry. Clothes are expensive but the workmanship is generally crap but who cares, its a “scurgly! made by whatsisname”, and it looks good.
Its the same with computers, look at the Mac, previously thought the underdog, now its almost a fashion statement to own one.
Anyway, back to fonts. I am not really that bothered about what fonts are on my machine as long as I can read the text without squinting to much, but just the other day it was noted that I was using “Sans Serif” fonts and I was lacking in “Serif” fonts.
Well… I just shit my pants, what the hell was I missing. Apparently I wasn’t missing anything. Sans Serif add an extra bit to your fonts and I was missing the option of not missing them on everything I read.
To cut a long story short I needed to get some Microsoft fonts installed on my Debian box and this is the process.
First thing I needed was the Microsoft truetype fonts. ON debian these are called
msttcorefonts
and can be installed via “aptitude” or apt whichever you prefer. I fetched these and this installed a whole pile of stuff in
/usr/share/fonts/truetype/
Next thing I needed to do is create a “fonts.scale” file as ROOT
]$ cd /usr/share/fonts/truetype ; ttmkfdir;
sorted. The directory should now have the correct file. The next thing to do is restart either, the font server, or X11. I just logged in and out and that was it. New fancy fonts that are almost identical to the ones I had previously or at least in first inspection they are.

Remapping Keys in Linux

I never really had to do this before but I have been using enlightenment for a few months now and use the ALT-(q,w,e,a,s,d,z,x,c) to switch between 9 separate desktops. I find this quite quick but I tried xemacs the other day and I needed ALT-x but I kept switching between desktops. So I decided to use a different key.
Since the Windows key ie the one to the left of the left ALT button is never used I decided to use it to switch desktops instead (I know I will need to use the ALT key in other applications so I might as well do it now before its ingrained into me).
Anyway the procedure goes as follows.
1. Detect the keycode of the key you want changed.
Open a terminal and type
]$ xev
Lots of text should whizz past on the terminal and then stop. Touch the key you want the keycode of and more text should whizz past…. something like
KeyRelease event, serial 25, synthetic NO, window 0xe00001,
root 0x38, subw 0x0, time 24436498, (-298,301), root:(458,320),
state 0x0, keycode 115 (keysym 0x0, NoSymbol), same_screen YES,
XLookupString gives 0 bytes: “”
We can see that the keycode is 115.
2. Create a user modmap file to be loaded on starting X
]$ cd ~; touch .Xmodmap
and put the following text in it
keycode 115 = Hyper_L
add mod3 = Hyper_L
then edit your ~/.xsession file: Add
xmodmap .Xmodmap
before the “exec enlightenment” line and that should be you sorted. If for some reason you did not have a .xsession file then you will need to add the line to start whatever window manager you are using ie in my case my .xsession looks like
xmodmap .Xmodmap &
exec enlightenment
yours might well be
xmodmap .Xmodmap &
exec sawfish
You should now be able to use the MOD3 modifying key in e16keyedit as a modifier key after restarting xwindows.

Vim Folding and Perl

I decided to get function folding working today and discovered that it is relatively simple to set up unless Vim isn’t detecting your filetype correctly which it wasn’t in my case.
To get basic Folding for Perl working add
let perl_fold=1
let perl_fold_blocks = 1
to your .vimrc file and then open a .pl file and you should see lots of blue lines running across the screen. These are where the folds have been made and you should see a line count similar to
+– 24 lines: sub summit_sub {——————————–
Put the cursor on this line and press “za”. This will magically unfold the line. Pressing “za” again refolds the line.
I don’t like the perl folding defaults so I dedcided to run with the manual ones but my filetype was alway wrong when working on a modules. it was inserting c-style foldmarker ie “/*}}}*/” instead of the perl foldmarker #}}}
This was easlily remedied as follows
au BufRead,BufNewFile *.pm set filetype=perl || set commentstring=#%s
I now have folding working. Now all I need to do is decide what type of folding I prefer.