Language Processing

The knowledge of language needed to engage in complex language behavior can be separated into six distinct categories:
  • morphology : the way words are built up from small meaning-bearing units
  • syntax : the structural relationships between words
  • semantics : the meaning
  • phonetics and phonology : linguistic sounds
  • pragmatics : how language is used to accomplish goals
  • discourse : the study of linguistic units larger than a single utterance

  • And one should take in account :
  • ambiguities (this means that for an imput there are multiple alternative linguistic structures that can be built for it)
In the field of language processing following topics are important:
  • word segmentation
  • POS tagging
  • phrase identification
  • parsing
  • grammar development
  • lexicon acquisition
  • corpus development

History
1992 : Segmentation Standard, Announcement of the first national standard for word segmentation by PRC government. (GB 13715)

1993 : Lexicon, Completion and Release of the first version of CKIP lexicon (with the category set and ICG thematic roles), First version of K. Chen's parser for Chinese

1998 : Segmentation Standard Official announcement of CNS14366 for Taiwan

2000 : Treebanks, Simultaneous completion and announcement of two Chinese Treebanks:
*Penn Chinese Treebank
*Sinica Treebank



   
Search >


Local links are in blue, links to other websites are in red, commands are in green.

You need unicode fonts, a 4+ browser and acrobat reader to fully explore and enjoy this webpage. (if necessary you can download asian fontpacks for acrobat reader)

Currently translating my thesis to English : more info

© Seba - contact at seba at ulyssis dot org
users online: 3