The definition page of this site states: "The concept of carrying out research on written or spoken texts is not restricted to corpus linguistics. Indeed, individual texts are often used for many kinds of literary and linguistic analysis - the stylistic analysis of a poem, or a conversation analysis of a tv talk show. However, the notion of a corpus as the basis for a form of empirical linguistics is different from the examination of single texts in several fundamental ways. In principle, any collection of more than one text can be called a corpus, (corpus being Latin for body, hence a corpus is any body of text). But the term corpus when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition. The following list describes the four main characteristics of the modern corpus: Sampling and representativeness, Finite size, Machine-readable form, A standard reference." This site, a supplement the book Corpus Linguistics (Edinburgh University Press), should be your first stop if you're asking "What is a corpus, and what's it it?"
http://www ling lancs ac uk/monkey/ihe/linguistics/corpus2/2fra1 htm (added 2002-02-23)
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. VIEW (Variation in English Words and Phrases) allows quick and easy searches of this corpus for a wide range of words and phrases of English, including exact word or phrase, wildcard or part of speech, surrounding words (collocates).
http://view byu edu/ (added 2005-10-21)
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. From this Simple Search page, you can enter your search terms directly from the page, and the results of your search will be displayed shortly thereafter. This is online as a sampler of the BNC, so there are two restrictions to the output: first, you are only given up to 50 results, and second, the results include only the full sentences which contain your search items (so you cannot backtrack to view the contexts, which puts you at a disadvantage if searching for items such as 'be that as it may').
http://sara natcorp ox ac uk/lookup html (added 2002-02-23)
The Linguistic Data Consortium (LDC) supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.
http://www ldc upenn edu (added 2002-05-09)
MICASE consists of approximately 1.7 million words from 152 speech events recorded at the University of Michigan between 1997 and 2001, covering a wide range of university spoken activities. The corpus is available for research, pedagogic applications, and general interest.
http://www hti umich edu/m/micase/ (added 2002-05-09)
As a free trial demonstration, you can login to the corpus system and access a 56 million word corpus of modern English. Access is by telnet or through a Java plug-in interface.
http://titania cobuild collins co uk/direct_demo html (added 2002-09-30)