There is a lot of information about corpora and corpus related research
available on the World Wide Web. In the list below you can find links to some
of the sites where you can find further information about different aspects of corpus linguistics.
Do you know of a site that you think should be included in this list? Have you found
links that are not working any longer? Do you have any comments or suggestions?
Please let us know and we will update the
Centre for Computer Analysis of Language and Speech Leeds.
- CECL The
Center for English Corpus Linguistics (Louvain, Belgium). Research on computer learner corpora.
- Cobuild/Bank of
(Contrastive linguistic studies and translation).
Projects "with one common denominator: the use of bilingual text corpora as empirical material".
Center for Spoken Language Understanding, Oregon, USA.
European Network in Language and Speech.
(University of Nijmegen).
University Center for Computer Corpus Research on Language.
Links to online text resources other than corpora. An extensive list can be found on the
Project Gutenberg site (link here).
Go to the Corpora Page for a list of corpus sites.
Collections of texts free for non-commercial use. Much by Swiss and French authours.
- Electronic Newsstand
Magazines of many different kinds (not all available on-line).
Electronic Text Center
University of Virginia.
- The English Server
Arts and Humanities texts online. Arranged by topics.
Internet Corpora Index
"online resources which may serve as corpora for psychologists and other
- The Internet Public Library
"the first public library of and for the Internet community"
Collection of stories written by "kids from all over the planet!"
On Media Directory
Search by geographic location or media type.
- Oxford Text Archive (OTA)
"The OTA collects high-quality scholarly electronic texts and linguistic
corpora (and any related resources) of long-term interest and use across
the range of humanities disciplines"
- Project Gutenberg
Extensive collection of whole books. (Click here to find a list of the texts sorted by publishing year, variety of English, category of text, author and title (courtesy of
- Project Runeberg
Center for Nordic literature. About 200 titles available online.
- The Religious and
Sacred Texts Page.
Collection of various sacred and religious texts
Go to the Software Page for a list of Tools for Corpora.
"used for circulation of information
relating to use of the BNC"
electronic mailing list for everyone interested in linguistic corpora.
To be added send a subscribe message to
TALC on Teaching and Language corpora.
- Internet Grammar of English
Online course in English grammar with interactive exercises, glossary and index.
is a lexical database for English where the words are organized
into synonym sets. Available on-line.