Articles
Computational Statistics (09434062)40(5)pp. 2729-2748
It is a common challenge in medical field to obtain the prevalence of a specific disease within a given population. To tackle this problem, researchers usually draw a random sample from the target population to obtain an accurate estimate of the proportion of diseased people. However, some limitations may occur in practice due to constraints, such as complexity or cost. In these situations, some alternative sampling techniques are needed to achieve precision with smaller sample sizes. One such approach is Neoteric Ranked Set Sampling (NRSS), which is a variation of Ranked Set Sampling (RSS) design. NRSS scheme involves selecting sample units using a rank-based method that incorporates auxiliary information to obtain a more informative sample. In this article, we focus on the problem of estimating the population proportion using NRSS. We develop an estimator for the population proportion using the NRSS design and establish some of its properties. We employ Monte Carlo simulations to compare the proposed estimator with competitors in Simple Random Sampling (SRS) and RSS designs. Our results demonstrate that statistical inference based on the introduced estimator can be significantly more efficient than its competitors in RSS and SRS designs. Finally, to demonstrate the effectiveness of the proposed procedure in estimating breast cancer prevalence within the target population, we apply it to analyze Wisconsin Breast Cancer data. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
This research investigates the construction of regression models for scenarios in which the response variable is inflated at specific points. To address this, we propose a comprehensive family of inflated distributions, which encompasses virtually all standard inflated distributions as special cases. The proposed family of distributions is applicable when the variable of interest is discrete, continuous, or a combination of both. We discuss parameter estimation, develop a regression model using the introduced family of distributions, and formulate an expectation-maximization (EM) algorithm to determine the maximum likelihood estimators of the proposed regression model. Additionally, we develop a general likelihood ratio test for the regression parameters. Finally, in two simulation scenarios and two real data sets, (obtained from the US National Center for Health Statistics (NCHS) and the residents of Olmsted County aged 50 or older), we analyse the performance of the proposed model. © 2025 Informa UK Limited, trading as Taylor & Francis Group.
Biometrical Journal (03233847)67(2)
The mean residual life (MRL) function plays an important role in the summary and analysis of survival data. The main advantage of this function is that it summarizes the information in units of time instead of a probability scale, which requires careful interpretation. Ranked set sampling (RSS) is a sampling technique designed for situations, where obtaining precise measurements of sample units is expensive or difficult, but ranking them without referring to their accurate values is cost-effective or easy. However, the practical application of RSS is hindered because each sample unit is required to assign a unique rank. To alleviate this difficulty, Frey developed a novel variation of RSS, called RSS-t, that records and utilizes the tie structure in the ranking process. In this paper, we propose several different nonparametric estimators for the MRL function based on RSS-t. Then, we compare the proposed estimators with their counterparts in simple random sampling (SRS) and RSS, where tie information is not utilized. We also implemented our proposed estimators on a real data set related to patient waiting times for liver transplantation, to show their applicability and efficiency in practice. Our results show that using ties information leads to an improved statistical inference for the MRL function, and therefore a smaller sample size is needed to reach a predetermined precision. © 2025 Wiley-VCH GmbH.
Journal of Statistical Planning and Inference (03783758)235
This paper focuses on drawing statistical inference based on a novel variant of maxima or minima nomination sampling (NS) designs. These sampling designs are useful for obtaining more representative sample units from the tails of the population distribution using the available auxiliary ranking information. However, one common difficulty in performing NS in practice is that the researcher cannot obtain a nominated sample unless he/she uniquely determines the sample unit with the highest or the lowest rank in each set. To overcome this problem, a variant of NS, which is called partial nomination sampling, is proposed, in which the researcher is allowed to declare that two or more units are tied in the ranks whenever he/she cannot find the sample unit with the highest or the lowest rank. Based on this sampling design, two asymptotically unbiased estimators are developed for the cumulative distribution function, which is obtained using maximum likelihood and moment-based approaches, and their asymptotic normalities are proved. Several numerical studies have shown that the proposed estimators have higher relative efficiencies than their counterparts in simple random sampling in analyzing either the upper or the lower tail of the parent distribution. The procedures that we developed are then implemented on a real dataset from the Third National Health and Nutrition Examination Survey (NHANES III) to estimate the prevalence of osteoporosis among adult women aged 50 and over. It is shown that in certain circumstances, the techniques that we have developed require only one-third of the sample size needed in SRS to achieve the desired precision. This results in a considerable reduction in time and cost compared to the standard SRS method. © 2024 Elsevier B.V.
Journal of Computational and Applied Mathematics (03770427)458
In this work, we discuss a general class of estimators for the cumulative distribution function (CDF) based on judgment post stratification (JPS) sampling scheme, which includes both empirical and kernel distribution functions. Specifically, we obtain the expectation of the estimators in this class and show that they are asymptotically more efficient than their competitors in simple random sampling (SRS), as long as the rankings are better than random guessing. We find a mild condition that is necessary and sufficient for them to be asymptotically unbiased. We also prove that given the same condition, the estimators in this class are strongly uniformly consistent estimators of the true CDF, and converge in distribution to a normal distribution when the sample size approaches infinity. We then focus on the kernel distribution function (KDF) in the JPS design and obtain the optimal bandwidth. We next carry out a comprehensive Monte Carlo simulation to compare the performance of the KDF in the JPS design for different choices of sample size, set size, ranking quality, parent distribution, kernel function, as well as both perfect and imperfect rankings set-ups, with its counterpart in the SRS design. We find that the JPS estimator dramatically improves the efficiency of the KDF compared to its SRS competitor across a wide range of the settings. Finally, we apply the described procedure to a real dataset from a medical context to show its usefulness and applicability in practice. © 2024 Elsevier B.V.