| Word frequency text profiling can be used in many ways to support teaching, learning and research. The Word Profiler compares all the words in a text with two word frequency lists, it provides a visual profile of the distribution of these words in a text by printing the different frequency bands in different colours. |
|
|
Words which are contained in
the first list of most frequent words are left in the default text colour. Words which are found in the second word list (see below) are printed in red
and words which are not in either of the lists are printed in blue.
The off-list words are listed separately,
and this list will contain new or unfamiliar words, as well as genre-specific words. The analysed texts and wordlists are saved and can be viewed here.
Sample Profiles for The Blue Carbuncle by Arthur Conan Doyle
A comparison of the profiles for The Blue Carbuncle Total number of words parsed in this text = 7819
1. Profile a text by comparing its words with the MFWL 1-2k and MFWL 2-5K This analysis contrasts words found in the Most Frequent Word Lists built from the Brown Corpus with Concapp for Windows. The lists are based solely on word counts using the Uniique Words Profiler which lists the instances for each word (the Brown Corpus comprises 1,015,945 words with 47,198 unique words). The start of the list is as follows:
Profile 1 gives the lowest number of off-list words (909). 2. Profile a text against the Most Frequent Word Families in Academic English (MFWL K1 - MFWL K2) This analysis contrasts words found in the 1 - 1000 (K1) word families and the second Most Frequent 1001 - 2000 (K2) word family lists found in Academic English,
and were developed by Paul Nation of the School of Linguistics and Applied Language Studies at Victoria University of Wellington, New Zealand.
They are more sophisticated than the lists created with the Brown corpus, as they contain not only the actual high frequency words themselves but also derivative words
which may in fact not be used so frequently. For instance, the word ACCEPT is listed in the K1 first 1000 most frequent word families, and the derived words are listed as follows:
To see the differences in the off-list wordlists found in the three profiles see the comparison of lists page. The figures show a bigger difference in the number of words in red, the second MFW list. In fact. more words are found in the MFW list for Profile 1 than the K2 list used in Profile 2, although K2 contains more words (3711) whereas the list in Profile 1 contains 3000 words. The combined total number of words in the two lists in Profile 1 is 5000, compared with 7816 in the two lists used in Profile 2. In spite of this, the number of off-list words in Profile 1 is about 3% less than that in Profile 2. 3. Profile a text with the MFWL K1 + K2 and the Academic Word List This analysis contrasts words found in the 2000 Most Frequent Word Families and Academic Word Lists as compiled by Paul Nation. They reflect academic English as it is used in universities. Academic Word List The Academic Word List is listed in the Net Dictionary Index and contains 570 word families, comprising 3,110 words, which were selected according to their frequency of use in academic texts. The list does not include words that are in the most frequent 2000 words of English. The AWL was primarily made so that it could be used by teachers as part of a programme preparing learners for tertiary level study or used by students working alone to learn the words most needed to study at tertiary institutions. Using the Word Frequency Profiler to provide an objective test of readability You can use these Text Analysis functions to provide a measure of text readability by passing texts that you use at different levels to be compared against the word frequency lists. Examples of 2 texts which have been profiled in this way can be seen here:Both of these examples have been profiled using 2 word lists:
The first example text, "American History", is a simplified text which I wrote specifically for lower intermediate EFL students. About 90% of the words are in the first 2K MFW list, and only 4 words are not in either list. The second text "The good language learner" was not written for language students at all, and only about 75% of the words are found in the first 2K MFW list. There are 81 words not in either list, over 14% of the total, so we can say that the second text clearly presents very much more difficulty for the learner. As a measure of readability, these percentages could be used as an objective way of gauging readability. |
edict home page | List texts | Unique words text profiler | Net Dictionary
edict virtual language centre.
All Rights Reserved.