|Chamblon Systems Inc.||TerminologyExtractor Example 2|
Example 2 - Internet RFC
Internet RFCs (Request for Comments) are documents that contain the specifications for the Internet. We analyzed RFC 1716, which is 186 pages long, using TerminologyExtractor. It produced a list of about 2500 collocations and 2600 words.
The list of words with their frequency produced by TerminologyExtractor starts as follows:
The word list is nice. However, it is by looking at the collocations produced by TerminologyExtractor that we get a real feel for what the document is about. The collocations that appear with a high frequency are clearly terms that have to be managed as such:
It took TerminologyExtractor no more than 15 seconds to create these two lists. If this document had to be translated into several languages, how much time do you think it would save?