SynPhony
A Multi-lingual Synthetic Phonics Literacy Project
SynPhony is an open-source project that will assist in teaching literacy skills for alphabetic writing systems. It is a database system designed to systematically present the patterns of a language and track a user’s progress as they acquire reading skills. There are four main components to SynPhony: linguistic, user knowledge base, pedagogic, and reporting. Although English is the first language being targeted, the structure and methods being used are designed to accomodate many other languages as well. The linguistic component of the database will capture various aspects of a language, including:
- words
- pronunciation information
- part of speech categories
- grapheme phoneme correspondences
- syllable length
- definition
- morphemes
- sample sentence
- semantic domain
- frequency in print rank
- age of acquisition rank
- stories
The user’s knowledge base will keep track of what a user has learned so that the many skills relating to literacy can be covered systematically. This knowledge base will be used by programs wishing to provide educational activities that match a user’s current literacy skills. SynPhony will produce an API (application programming interface) for programs wishing to make use of the database.
Why SynPhony?
Computer games often use a haphazard selection process to present vocabulary to the user. They often have no idea of the reading ability of the user playing the game. Most educational programs are also islands unto themselves. They rarely are aware of other programs and don’t harmonize with each other; each one uses different criteria for selecting words to present to the user. Educational reading programs would benefit if they could select words to present to the user that match the actual reading ability of that specific user, beyond merely targeting generalized age, educational level or other external factors. SynPhony will offer language resources and an API to enable other programs to keep track of the reading ability of each end user so that the vocabulary presented to them is tailored to their current reading ability whether it is presented in a game or an explicit reading program. SynPhony will facilitate in the selection of vocabulary by categorizing words with several crucial criteria, among them grapheme phoneme correspondence, age of acquisition rank, frequency in print rank, and syllable length. Each user’s knowledge base will keep track of which words have been presented and mastered. Using these criteria, vocabulary can be systematically selected so that the user can progressively learn all the important skills necessary for a successful reading career. SynPhony will offer an extensive English database of over 44,000 words which can be queried for any of these criteria. For more technical details on SynPhony click here. The database structures and algorithms are being designed to accomodate other languages as well. Speakers of other languages will be invited to contribute to their language’s database content using the principles developed for SynPhony when this function becomes available.
The Alphabetic Principle
All languages consist of arbitrary sequences of sounds. Alphabetically written languages map their sounds to letters. This mapping can have three levels of complexity: transparent, complex and opaque. Many languages have well designed alphabets in which a sound has only one unique letter that represents that sound. In these languages learning to read consists of the simple process of remembering how various letters should be pronounced. Such alphabets can be called transparent alphabets. In a completely transparent alphabet the number of letters will equal the number of salient sounds, or phonemes, in the language. However, most languages probably have varying degrees of complexity in the relationship of its letters to its sounds. A complex alphabet has more than one way to represent a particular sound in its language. For example, in English the sound /s/ can be represented using the letters “s” and “c” as in “sell” and “cell”. When an alphabet has this feature it can be called a complex alphabet. The level of complexity will vary for each language. For example, while Spanish is known to have a very simple writing system, it does not have a completely transparent alphabet. It does display a low level of complexity. German has more complexity than Spanish, while English has a very high degree of complexity. The complexity of a language’s writing system can be presented on a chart. We can assign each sound to one row and each orthographic representation of that sound can be placed as a cell on that row. This allows us to map the complexity of a writing system in a systematic manner. Here are some interactive alphabetic code charts for Spanish, German, and English. We can predict with some confidence that languages with longer literary histories which have not had any or at least recent orthographic reforms will probably display higher degrees of complexity. The reason for this is that all spoken languages change over time and orthographic reforms often realign the written form with the spoken form of the language or regularise the spelling.
But some writing systems have another level of complexity that goes beyond this. When the same letter(s) can be pronounced more than one way the writing system becomes opaque. This means that the spelling conventions are not predictable guides for how to pronounce the words. English is probably, if not definitely, the best example of a writing system which is opaque. To illustrate, the letter “a” can be pronounced in several different ways, such as “mat”, “made”, “mall” and “about”. The letter by itself is no guide to how it should be pronounced. That knowledge comes from the oral knowledge of native English speakers. English is littered with this kind of mismatch, especially among the vowels, for historical reasons.
Children who speak a language which has an opaque writing system require more effort to learn to read than those whose alphabets are either transparent or have low levels of complexity. Languages with complex or opaque writing systems must develop pedagogical strategies that reflect the complexity of their writing systems. Such strategies must be based on a thorough understanding of the complexities and have methods that address this complexity in a systematic manner. Otherwise students must try to make sense of the system on their own, which can result in problems. SynPhony is a system that supports the systematic development of pedagogical strategies to address even opaque writing systems such as English.
To summarize, alphabetic writing systems can display 3 levels of complexity:
- transparent: 1 sound is represented by 1 letter
- complex: 1 sound is represented by more than 1 letter
- opaque: 1 letter can be pronounced more than 1 way
A writing system can display all three categories and to varying degrees. Some sounds might have a single way to represent them, resulting in a consistent way to spell those sounds. Other sounds in the language can have multiple representations for each sound. Each language is unique as to how many sounds and letters it has and how it maps those sounds to letters.
Grapheme Phoneme Correspondences or GPCs.
What is a GPC?
GPC is an abbreviation for “Grapheme Phoneme Correspondence”. It is an explicit mapping of the sound of a spoken word with the symbol used to represent that sound. A word re-written into a GPC form is easiest to understand with some simple examples from English. Take the word “bat”. The 3 letters map to 3 sounds, or phonemes, and can be re-written like this: “b_b,a_a,t_t”. Each GPC unit is separated by a comma and consists of two items separated by an underscore. The two items capture both the sound and the letter used to represent that sound in this word. For a word like “bat” this kind of work seems redundant and possibly overkill. Why bother? Take another word like “of”. This word has 2 sounds, a schwa and a voiced labio-dental fricative, commonly represented with a “v”. Using these two sounds we can re-write the GPC equivalent of the word “of” as: “schwa_o,v_f”. With such a representation we see that the relation between the pronunciation and the spelling of the word is not as straightforward as the previous example and the rationale for this work begins to make sense. Take another word like “ball”. This word has 3 sounds which we can re-write as: “b_b,o_a,l_ll”. Now we see that the “a” in “bat” and “ball” are represented differently: “a_a” and “o_a” respectively. This reflects the different pronunciation of the “a” in each word. A GPC analysis can be very useful for a complex writing system, and it is essential for disambiguating the sound/symbol relationship in opaque writing sytems such as English.
Some more examples of words written out in a GPC form:
- house = h_h,ow_ou,s_se
- cake = k_c,ae_a-e,k_k
- yacht = y_y,o_a,t_cht
- the = thVD_th,schwa_e
- table = t_t,ae_a,b_b,l_le
The graphemic part of a GPC unit is fixed and stable until the language community undertakes an orthographic reform. What is used to represent the phonemic part of a GPC unit is actually irrelevant to a point. Different people have come up with various ways to represent the phonemes. The most important consideration is to be consistent. Phonemes can be represented using letters from the language’s orthography (eg. using Cyrillic letters to represent phonemes in a language that uses a Cyrillic alphabet), an IPA symbol (International Phonetic Alphabet), a latin-based representation or even a number. Using a computer to search through data sets like this allows us to catalog the GPC components of a language as well as extract any pattern we like. In SynPhony, we have an English database in which we have identified about 480 different GPC units from a corpus of over 44,000 words. GPCs are spread unequally over a language. Some sounds occur frequently, others occur only occasionally. There are common spelling patterns and uncommon ones. The GPC notation method presented here can accomodate any level of complexity in the mapping of sounds to letters in a language. For more information on GPC analysis read this page.
Pedagogical sequences
A pedagogical sequence defines which GPC units will be taught and in which order. There are many established pedagogical sequences in use by many different curricula. SynPhony will not dictate a single sequence but allow multiple pedagogical sequences to be defined per language so that a child or a whole class could use the same sequence as is used in an existing reading curriculum. The scope of each sequence could be different as well. So not only could you specify which letter or sound you start with, you could also determine how many items you wish to teach and the sequence of each item in between. SynPhony will allow as much flexibility as possible in defining sequences. At each stage vocabulary would be selected that builds on the user’s existing knowledge so that they would never need to learn more than 1 new GPC unit at a time. By selecting the most productive GPC unit at each stage we can maximize the learning curve. The graphic below illustrates the number of new words that become decodable at each step of such a sequence for English.
The name SynPhony reflects the approach synthetic phonics takes to teaching reading. All alphabetically written languages can be taught using this method. For more information on the method see:
- A page that demonstrates a synthetic phonics approach to selecting vocabulary based on GPC analysis.
- The Wikipedia article on synthetic phonics
- A summary of a 7-year longitudinal study on the effectiveness of synthetic phonics (a longer version here)
- The syntheticphonics.com website
- The www.dyslexics.org.uk website
SynPhony supported activities
SynPhony’s language resources will be available to game designers and curriculum developers to educate and entertain students as they grow in their literacy skills by presenting them with vocabulary that matches their current abilities. Activities could include reading (from words in isolation all the way to stories), spelling, listening exercises, cloze exercises and playing word-based games and puzzles. SynPhony itself will not develop all activities but offer language resources appropriate to each individual end user. We do plan to develop a tool to help authors write stories to target specific reading levels.
SynPhony’s user tracking capabilities
We hope to build extensive user tracking into SynPhony. We would like to keep track of which GPCs a user has learned, when they were introduced to them, which words have been presented to them by the system, how many times each word has been presented, how each word has been spelled by the user, and time data for many of these items.
SynPhony’s reporting capabilities
SynPhony would report on each child’s progress as it learns to read. This would result in a real-time window into the reading acquistion process allowing either the computer or a teacher to adjust the content to each child’s abilities. Different parties would be able to access various kinds of reports on the reading abilities of users. The reports would use quantifiable measurements and provide graphical views of reading progress.
SynPhony as a research platform
If you could track the reading acquisition of many children in detail you could begin to study things which are normally out of reach. Some of the questions you might be able to ask with SynPhony are:
- Is one pedagogical sequence better than another for teaching a particular language?
- At what pace should new units be introduced to a learner?
- What constitutes normal reading progress in quantitative terms?
- What, if any, are the indicators that predict reading problems?
- At what stage should “sight” vocabulary be introduced into the sequence?
- Does a font influence the reading acquisition process?
- How many times does a word need to be encountered before it is learned?
SynPhony’s research abilities will depend on the kinds of information we track for each user. However, we don’t yet know what kind of items are of interest to researchers. I invite reading researchers who are interested in using SynPhony for research to help us design the system.

0 komentar:
Posting Komentar