Home | Teaching Jobs in Japan | Acronyms in ELT | ELT Articles | Directory of Resources | Metric Conversion
Teaching English in Japan: Directory

Teaching English in Japan

Resource Directory


 
Search Directory  





The definition page of this site states: "The concept of carrying out research on written or spoken texts is not restricted to corpus linguistics. Indeed, individual texts are often used for many kinds of literary and linguistic analysis - the stylistic analysis of a poem, or a conversation analysis of a tv talk show. However, the notion of a corpus as the basis for a form of empirical linguistics is different from the examination of single texts in several fundamental ways. In principle, any collection of more than one text can be called a corpus, (corpus being Latin for body, hence a corpus is any body of text). But the term corpus when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition. The following list describes the four main characteristics of the modern corpus: Sampling and representativeness, Finite size, Machine-readable form, A standard reference." This site, a supplement the book Corpus Linguistics (Edinburgh University Press), should be your first stop if you're asking "What is a corpus, and what's it it?"

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. VIEW (Variation in English Words and Phrases) allows quick and easy searches of this corpus for a wide range of words and phrases of English, including exact word or phrase, wildcard or part of speech, surrounding words (collocates).

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. From this Simple Search page, you can enter your search terms directly from the page, and the results of your search will be displayed shortly thereafter. This is online as a sampler of the BNC, so there are two restrictions to the output: first, you are only given up to 50 results, and second, the results include only the full sentences which contain your search items (so you cannot backtrack to view the contexts, which puts you at a disadvantage if searching for items such as 'be that as it may').

The Linguistic Data Consortium (LDC) supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.

MICASE consists of approximately 1.7 million words from 152 speech events recorded at the University of Michigan between 1997 and 2001, covering a wide range of university spoken activities. The corpus is available for research, pedagogic applications, and general interest.

As a free trial demonstration, you can login to the corpus system and access a 56 million word corpus of modern English. Access is by telnet or through a Java plug-in interface.