My little boy is only four; he is locked into his passion for trains, but he just keeps amazing me.

Before we get to the analysis, I need to introduce two things.

I don't seem to be able to find out much about Dr. Fry directly on the internet, but many educational websites cite the fact he created a list of the most popular and common English words in literatureoriginally in the 50's but since updated.

To interpret the chart, the height of each bar tells you by what amount the target word differed from how much of the Fry's list.



He observed all the safety rules, even though his boundless enthusiasm was constantly threatening to break into uncontrollable excitement. And that's when it hit me.

Phonics concerns the systematic pronunciation of the component sounds of a word to reach the whole. I plotted the results for "one" "many" "who" all identified as "trip up" words, plus "galaxy" and "knowledge" indentfied as easily-recalled words.

Or conversely, a greater cognitive load required to uniquely recognise it. Confusability, in various forms, is a factor we have to deal with on a regular basis in speech recognition, which prompted my thinking.

In the case of my own son, it's "whole word". To perform the analysis, I took a set of "sample words" and calculated the Levenshtein distance against between each of those words and every word in the "Fry Sight List". So, I was eventually moved to perform some kind of analysis investigating this.

We might postulate that the more similar a word is to others, the more likely it could be confused - i.

Whole-word does what is says on the tin: the reader either memorises or recognises the whole word in one step.

One of the things I have noticed with my own son and lots of comments from other parents of early readers, gifted and potentially hyperlexic children, is that such children astonishingly read recognise long complex words such as "galaxy" and "knowledge" with ease, yet sometimes perhaps even often get tripped up on short "simple" words, such as "one" and "many".

He is my startling, fragile little whirlwind.



Next we meet Mr. Levenshtein; or at least his algorithm, which provides a way to calculate the number of single character edits to transform one word into another.

To put that another way, it gives a measure of word similarity - small Levenshtein distances between words means they are more textually similar than those with large distances.

He told passers-by all about them, whether they wanted to know or not!

It's possible to come up with lots of theories involving visual processing disorders, dyslexic conditions, motivation laziness and so on. I compared the sample words against the full Fry list words and also against the topand plotted the distribution of Levenshtein distances obtained.

My wife pinpointed it later for me: just an overwhelming sense of protection, not just for now, but probably for all that is to come. What this effectively tells us is "how similar is the target word to the most common words in the language". I don't know if this is original or even valid research, but it was fun to do. This article doesn't claim to be a valid scientific study, none-the-less it was interesting to do, and, essentially, perform as a thought experiment.

The first thing to be aware of is two broad types of reading and reading-teaching methods: phonics and "whole word" or whole language. Since whole-word readers essentially memorise and recognise entire words, it begs the question: given that they handle complex words with ease, why do they sometimes get tripped up on short words? As adults we tend to read like this.

Compared against top words, we see that "one" "many" and "who" are clustered around the 3,4 and 5 mark for Levenshtein distance. If these are the most common words that a child is going to encounter, then it seemed to make sense to evaluate what levels of "confusability" exists among them.

In contrast, "galaxy" is typically different by around 6 - 7 characters, and "knowledge" even more different around 8 - 9 mark. Also, my Sanskrit name is Nikhil which means "complete".

The question is, what is the explanation for this, as it seems to defy logic? It doesn't necessarily tell us how similar words are through the eyes of a child.



In my anecdotal evidence, the most startling early readers are "whole word" because even at age 3 or 4, obscure words of 8, 10, 12 or more letters can be decoded instantly.