The Kolhapur Corpus consists approximately of a million words of Indian written English dating from 1978. The texts are selected from 15 categories, parallel to those of the LOB and Brown Corpora.
Comprehensive information about the Kolhapur Corpus can be found in the Kolhapur Corpus manual (external link). The corpus is available through ICAME (orthographic text only). For an application form and information about the cost, click here.
Sample of the Kolhapur Corpus
0010A01 **<<*3Politics of Job Reservations*0**>> $**[begin leader comment,
0020A01 underscoring**] *3^The Bihar Government did not foresee or forestall
0030A01 the complications that_ followed its decision to_ reserve jobs for
0040A01 classes. ^The present violence in the State has raised the controversy
0050A01 over the criterion for backwardness-- whether it should be caste or
0060A01 economic conditions.*0 **[end underscoring, end leader comment**]
0070A01 $^WHY has the Bihar Government*'s decision to_ reserve jobs for
0080A01 classes led to a violent outburst? ^It is not such an original idea
0090A01 that it should have triggered demonstrations and riots or attracted