I have been writing a word parser in C++ for a while now and noticed two things that had quite a distinct affect on performance. I was storing the words in a “map” which is a standard associative container. I knew I should have been using a hash_map but unfortunately when writing the parser I had access to a minimal Borland library on a windows box so hash_map was out of the question.
When I got home however and back on the Linux box I had a poke around and discovered “ext/hash_map” which is not officially part of the C++ standard but its widely used and so in my eyes ok.
This change had a marked improvement as the word list grew to just over a few thousand. Having a word list in excess of a few hundred thousand meant that it had a vast improvement in performance.
Another improvement was the way I was reading files into a single string object then parsing the string for its words. I was originally using “getline” and appending the string but this is slow even if you reserve space for the string.
If you are looking for a
faster more elegant way to read a file into a string look no further