Form-frequency correspondence in adjectives: A cross-linguistic corpus approach
The adjective has always been a puzzle despite the long-standing discussion in the previous literature, e.g., [Chomsky 1970; Dixon 1982]. Cross-linguistically, a substantial variation can be observed regarding the syntactic behavior of adjectives. Adjectives are more noun-like in some languages, while more verb-like in other languages [Wetzer 1992, 1996]. In some languages, adjectives are marked by a copula when used as predicates, while in other languages adjectives are used as predicates without any further marking. Likewise, adjectives behave differently across languages when used as modifiers.
The aim of this study is twofold. The first aim is to explain the cross-linguistic coding pattern of adjectives with reference to the form-frequency correspondence hypothesis [Zipf 1935; Haspelmath 2008; Haspelmath et al. 2014; Haspelmath 2021]. The second aim is to test the hypothesis, using cross-linguistic corpus data from the Universal Dependencies Corpora [Nivre et al. 2017] and the BCC Mandarin Corpus [Xun et al. 2016].
According to the form-frequency correspondence hypothesis, the more frequent forms are less likely to be marked with extra markers. Within the realm of adjectives, the effect of the form-frequency correspondence hypothesis can be understood as follows. Firstly, the relative frequency of the attributive use of adjectives correlates negatively with the probability of their co-occurrence with a relativizer; secondly, the relative frequency of the predicative use of adjectives correlates negatively with the probability of their co-occurrence with a copula. These hypotheses are tested using the logistic regression model on the basis of an 84-language sample from the Universal Dependencies Corpora, which is a suitable database for the purposes of the present study because it has cross-linguistically consistent annotation for parts of speech and their syntactic contexts. In addition, I have also tested the form-frequency correspondence hypothesis based on the data from the BCC Mandarin Chinese Corpus based on the frequency of different adjectives. These results have provided positive evidence for the form-frequency correspondence hypothesis.