MacBrains arrow Les brevets arrow Multi-language document search and retrieval system
Multi-language document search and retrieval system Convertir en PDF Version imprimable Suggérer par mail
Écrit par David Borel   
15-05-2007
 Publication Number: US 20070106500
 Inventors: Loofbourrow; Wayne; (Sunnyvale, CA)
Casseres; David; (Palo Alto, CA)
 Assignee Name: Apple Computer, Inc.
 Filling Date: December 29, 2006
 Publication Date: May 10, 2007

Abstract:
A multi-lingual indexing and search system performs tokenization and stemming in a manner which is independent of whether index entries and search terms appear as words in a dictionary. During the tokenization phase of the process, a string of text is separated into individual word tokens, and predetermined types of tokens are eliminated from further processing. The stemming phase of the process reduces words to grammatical stems by removing known word-endings associated with the various languages to be supported. Known word endings are removed from the word tokens without any effort to guarantee that the remaining stem is contained in a dictionary. In a preferred implementation, the stemming process is only applied to nouns.

  

Original Document: Available




Dernière mise à jour : ( 16-05-2007 )