|
|
|
|
|
|
|
|
| ( 1 of 1 ) |
| United States Patent | 7,890,521 |
| Grushetskyy , et al. | February 15, 2011 |
| **Please see images for: ( Certificate of Correction ) ** |
One embodiment of the present invention provides a system that automatically generates synonyms for words from documents. During operation, this system determines co-occurrence frequencies for pairs of words in the documents. The system also determines closeness scores for pairs of words in the documents, wherein a closeness score indicates whether a pair of words are located so close to each other that the words are likely to occur in the same sentence or phrase. Finally, the system determines whether pairs of words are synonyms based on the determined co-occurrence frequencies and the determined closeness scores. While making this determination, the system can additionally consider correlations between words in a title or an anchor of a document and words in the document as well as word-form scores for pairs of words in the documents.
| Inventors: | Grushetskyy; Oleksandr (Cupertino, CA), Baker; Steven D. (San Francisco, CA) |
|---|---|
| Assignee: |
Google Inc.
(Mountain View,
CA)
|
| Family ID: | 43568648 |
| Appl. No.: | 12/027,559 |
| Filed: | February 7, 2008 |
| Application Number | Filing Date | Patent Number | Issue Date | ||
|---|---|---|---|---|---|
| 60900271 | Feb 7, 2007 | ||||
| Current U.S. Class: | 707/755; 704/10; 704/231; 704/4; 707/804; 707/811; 715/258; 715/260 |
| Current CPC Class: | G06F 17/30867 (20130101); G06F 17/2795 (20130101) |
| Current International Class: | G06F 7/00 (20060101) |
| Field of Search: | ;707/804,811,755 ;704/4,10,231 ;715/258,260 |
| 5265065 | November 1993 | Turtle |
| 5331556 | July 1994 | Black et al. |
| 5428707 | June 1995 | Gould et al. |
| 5500920 | March 1996 | Kupiec |
| 5594641 | January 1997 | Kaplan et al. |
| 5708829 | January 1998 | Kadashevich et al. |
| 5745890 | April 1998 | Burrows |
| 5794178 | August 1998 | Caid et al. |
| 5832474 | November 1998 | Lopresti et al. |
| 5915251 | June 1999 | Burrows et al. |
| 6026388 | February 2000 | Liddy et al. |
| 6128613 | October 2000 | Wong et al. |
| 6137911 | October 2000 | Zhilyaev |
| 6453315 | September 2002 | Weissman et al. |
| 6466901 | October 2002 | Loofbourrow et al. |
| 6618727 | September 2003 | Wheeler et al. |
| 6788759 | September 2004 | Op De Beek et al. |
| 7155427 | December 2006 | Prothia et al. |
| 7251637 | July 2007 | Caid et al. |
| 7257574 | August 2007 | Parikh |
| 7483829 | January 2009 | Murakami et al. |
| 7490092 | February 2009 | Sibley et al. |
| 2002/0002550 | January 2002 | Berman |
| 2002/0111792 | August 2002 | Cherny |
| 2003/0023421 | January 2003 | Finn et al. |
| 2003/0061122 | March 2003 | Berkowitz et al. |
| 2004/0064447 | April 2004 | Simske et al. |
| 2004/0122656 | June 2004 | Abir |
| 2004/0181759 | September 2004 | Murakami et al. |
| 2005/0060304 | March 2005 | Parikh |
| 2005/0108001 | May 2005 | Aarskog |
| 2005/0216443 | September 2005 | Morton et al. |
| 2006/0129538 | June 2006 | Baader et al. |
| 2007/0124293 | May 2007 | Lakowske et al. |
| 2007/0282824 | December 2007 | Ellingsworth |
| 2008/0249983 | October 2008 | Meisels et al. |
| 2009/0177463 | July 2009 | Gallagher et al. |
Christopher Landauer and Clinton Mah--"Message extraction through estimation of relevance"--Proceedings of the 3rd annual ACM conference on research and development in Information retrieval--1980, (pp. 117-138). cited by examiner . Lamontagne et al.--"Using Statistical Word Associations for the Retrieval of Strongly-Textual Cases"--Flairs 2003 pp. 124-128. cited by examiner . Gregory Grefenstette--"Comparing Two Language Identification Schemes"--In Processdings of 3.sup.rd International Conference on Statistical Analysis of Textual Data (JADT), Dec. 11-13, 1995, vol. II, pp. 263-268. cited by examiner . Office Action prepared for related case (U.S. Appl. No. 11/582,767), mailed from USPTO on Sep. 15, 2008. cited by other. |
|
|