The method of corpus analysis

British National Corpus: Corpus is a large collection of computer-readable writing. Corpus linguistics is a study of linguistics that includes all processes related to processing, usage of analysis of written and spoken machine-readable corpora.

Longman Written American Corpus comprised of 100 million words of American newspaper and book texts.

Longman Corpus Network

Application areas for the corpus:

– reference book publishing;
– academic linguistic research;
– language teaching;
artificial intelligence;
– natural language processing;
– speech processing;

Types of texts are chosen according to three features:

– domain (subject field)
– time (with certain data)
– medium.

Nowadays linguists with the help of corpus linguistics have generated research methods: annotation, analysis, abstraction.

The general benefits of Corpus methods:

– modern computers are capable of processing massive amount of of data;
– the material collected in large computerized corpora represents authentic rather than invented language situations;
– the methods of retrieving data are objective rather than intuition, which implies the studies can be replicated by other researches;
– specific corpora selected from particular types of texts may be used for for comparisons and the frequency identification, provided that the corpora is large enough.

