Patrice Bertail, Stephan Clémençon, and Carlos A. Fernández
It is the purpose of this paper to investigate the issue of estimating the regularity index \(β>0\)of a discrete heavy-tailed r.v. S, \textiti.e. a r.v. S valued in \mathbbN^* such that \mathbbP(S>n)=L(n)⋅n^-β for all n≥1, where L:\mathbbR^*_+\to \mathbbR_+ is a slowly varying function. Such discrete probability laws, referred to as generalized Zipf’s laws sometimes, are commonly used to model rank-size distributions after a preliminary range segmentation in a wide variety of areas such as \textite.g. quantitative linguistics, social sciences or information theory. As a first go, we consider the situation where inference is based on independent copies S_1,; \ldots,; S_n of the generic variable S. Just like the popular Hill estimator in the continuous heavy-tail situation, the estimator \widehatβ we propose can be derived by means of a suitable reformulation of the regularly varying condition, replacing S’s survivor function by its empirical counterpart. Under mild assumptions, a non-asymptotic bound for the deviation between \widehatβ and βis established, as well as limit results (consistency and asymptotic normality). Beyond the i.i.d. case, the inference method proposed is extended to the estimation of the regularity index of a regenerative β-null recurrent Markov chain. Since the parameter βcan be then viewed as the tail index of the (regularly varying) distribution of the return time of the chain X to any (pseudo-) regenerative set, in this case, the estimator is constructed from the successive regeneration times. Because the durations between consecutive regeneration times are asymptotically independent, we can prove that the consistency of the estimator promoted is preserved. In addition to the theoretical analysis carried out, simulation results provide empirical evidence of the relevance of the inference technique proposed.