Articles
Computational Statistics (09434062)(5)
It is a common challenge in medical field to obtain the prevalence of a specific disease within a given population. To tackle this problem, researchers usually draw a random sample from the target population to obtain an accurate estimate of the proportion of diseased people. However, some limitations may occur in practice due to constraints, such as complexity or cost. In these situations, some alternative sampling techniques are needed to achieve precision with smaller sample sizes. One such approach is Neoteric Ranked Set Sampling (NRSS), which is a variation of Ranked Set Sampling (RSS) design. NRSS scheme involves selecting sample units using a rank-based method that incorporates auxiliary information to obtain a more informative sample. In this article, we focus on the problem of estimating the population proportion using NRSS. We develop an estimator for the population proportion using the NRSS design and establish some of its properties. We employ Monte Carlo simulations to compare the proposed estimator with competitors in Simple Random Sampling (SRS) and RSS designs. Our results demonstrate that statistical inference based on the introduced estimator can be significantly more efficient than its competitors in RSS and SRS designs. Finally, to demonstrate the effectiveness of the proposed procedure in estimating breast cancer prevalence within the target population, we apply it to analyze Wisconsin Breast Cancer data. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
This research investigates the construction of regression models for scenarios in which the response variable is inflated at specific points. To address this, we propose a comprehensive family of inflated distributions, which encompasses virtually all standard inflated distributions as special cases. The proposed family of distributions is applicable when the variable of interest is discrete, continuous, or a combination of both. We discuss parameter estimation, develop a regression model using the introduced family of distributions, and formulate an expectation-maximization (EM) algorithm to determine the maximum likelihood estimators of the proposed regression model. Additionally, we develop a general likelihood ratio test for the regression parameters. Finally, in two simulation scenarios and two real data sets, (obtained from the US National Center for Health Statistics (NCHS) and the residents of Olmsted County aged 50 or older), we analyse the performance of the proposed model. © 2025 Informa UK Limited, trading as Taylor & Francis Group.
Biometrical Journal (03233847)67(2)
The mean residual life (MRL) function plays an important role in the summary and analysis of survival data. The main advantage of this function is that it summarizes the information in units of time instead of a probability scale, which requires careful interpretation. Ranked set sampling (RSS) is a sampling technique designed for situations, where obtaining precise measurements of sample units is expensive or difficult, but ranking them without referring to their accurate values is cost-effective or easy. However, the practical application of RSS is hindered because each sample unit is required to assign a unique rank. To alleviate this difficulty, Frey developed a novel variation of RSS, called RSS-t, that records and utilizes the tie structure in the ranking process. In this paper, we propose several different nonparametric estimators for the MRL function based on RSS-t. Then, we compare the proposed estimators with their counterparts in simple random sampling (SRS) and RSS, where tie information is not utilized. We also implemented our proposed estimators on a real data set related to patient waiting times for liver transplantation, to show their applicability and efficiency in practice. Our results show that using ties information leads to an improved statistical inference for the MRL function, and therefore a smaller sample size is needed to reach a predetermined precision. © 2025 Wiley-VCH GmbH.
Journal of Statistical Planning and Inference (03783758)235
This paper focuses on drawing statistical inference based on a novel variant of maxima or minima nomination sampling (NS) designs. These sampling designs are useful for obtaining more representative sample units from the tails of the population distribution using the available auxiliary ranking information. However, one common difficulty in performing NS in practice is that the researcher cannot obtain a nominated sample unless he/she uniquely determines the sample unit with the highest or the lowest rank in each set. To overcome this problem, a variant of NS, which is called partial nomination sampling, is proposed, in which the researcher is allowed to declare that two or more units are tied in the ranks whenever he/she cannot find the sample unit with the highest or the lowest rank. Based on this sampling design, two asymptotically unbiased estimators are developed for the cumulative distribution function, which is obtained using maximum likelihood and moment-based approaches, and their asymptotic normalities are proved. Several numerical studies have shown that the proposed estimators have higher relative efficiencies than their counterparts in simple random sampling in analyzing either the upper or the lower tail of the parent distribution. The procedures that we developed are then implemented on a real dataset from the Third National Health and Nutrition Examination Survey (NHANES III) to estimate the prevalence of osteoporosis among adult women aged 50 and over. It is shown that in certain circumstances, the techniques that we have developed require only one-third of the sample size needed in SRS to achieve the desired precision. This results in a considerable reduction in time and cost compared to the standard SRS method. © 2024 Elsevier B.V.
Environmental and Ecological Statistics (13528505)(4)
The volume under the receiver operating characteristic (ROC) surface (VUS) is a natural generalization of a classical tool, the area under the ROC curve from a disease with two statuses (e.g., healthy and diseased) to a disease with a three-class status (e.g., healthy, intermediate, and diseased) for evaluating the effectiveness of a continuous biomarker in discriminating the disease status. In this work, we discuss the problem of estimating VUS using ranked set sampling (RSS), a cost-efficient alternative to simple random sampling (SRS), which is applicable in situations in which the actual quantification of the biomarker is hard, time-consuming, costly or tedious but a small number of sample units can still be ordered without referring to their precise values. We develop several nonparametric estimators when SRS or RSS design is applied to each of the healthy, intermediate and diseased subpopulations. We study the properties of the proposed estimators, including unbiasedness, variance expression, asymptotic normality, and efficiency. Specifically, we show that the introduced estimators are at least as efficient as their SRS counterparts and often far more efficient under a large class of imperfect ranking models. Lastly, to demonstrate the applicability and efficiency of the introduced procedures in an environmental context, we apply them to a real environmental dataset, utilizing three of its five classes. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.