ProUCL Software Update and Technical Support

U.S. EPA

Bar graph charting the frequency and data range of arsenic and iron.

ProUCL is a statistical software package developed by USEPA for analysis of environmental data sets. Although, ProUCL is well known for calculation of upper confidence levels (UCLs) for mean values, it also supports a variety of common environmental calculations such as estimation of data quality objective (DQO)-based sample size requirements, establishing background concentration estimates, comparison of background and site sample data sets for site evaluation and risk assessment, and performing basic trend analyses. Methods for analysis of data sets with non-detect (ND) values (left-censored data) are also built into this software. ProUCL supports analysis of moderately skewed data that are very common in environmental site assessment and risk assessment.

The development of ProUCL software is supported by USEPA’s Office of Research and Development (ORD) Site Characterization and Monitoring Technical Support Center (SCMTSC). The software has been in use for more than 20 years for calculations related to site assessment and remediation work. Since 2019 Neptune has provided SCMTSC with technical support for ProUCL, including software updates and maintenance, user support, and technical support.

ProUCL version 5.2 is the latest update of the ProUCL software. The primary change in this update is the adjustment of the decision logic for the recommendation of UCLs. Other improvements include updates of development tools and supporting libraries to make the software compatible with Windows 10 and 11, fixing several software bugs, and revision of the User and Technical Guides to address changes in software and improve User Guide clarity.

Historically, ProUCL has placed emphasis on achieving adequate coverage probability, but not on achieving an accurate estimate of the mean, in the sense of an upper bound for the mean that is as close as possible to the true mean while maintaining the desired coverage. Depending on the data, there are some UCL estimators in ProUCL (particularly Chebyshev and H) that can generate gross overestimates of the mean so that adequate coverage will almost certainly be achieved in these cases, but accuracy suffers. Although this philosophy ensures that the likelihood of one decision error will be small (i.e., Type I error, concluding a site is not contaminated when it is), such an overestimate can result in a high likelihood of the opposite decision error (i.e., Type II error, concluding a site is contaminated when it is not). The objective should be to not only control for Type I error, but also to protect against large Type II errors. This requires balancing both objectives (coverage and accuracy) to select the most appropriate UCL method.

Statisticians at Neptune conducted extensive statistical literature research and ran simulations to investigate the behavior of various UCL estimators in terms of accuracy and coverage for slightly skewed to highly skewed lognormal data sets, adjusted goodness-of-fit rules, and used a machine learning technique, decision tree analysis, for deciding on the best recommendation logic for UCL selection. As a result, in ProUCL version 5.2 the Chebyshev UCL is no longer recommended, and the H UCL is only recommended in cases of moderate to large sample sizes when there is high confidence that the assumption of lognormality is met to a good approximation.

Since the beginning of the contract Neptune has provided user support by answering user questions related to ProUCL software and use. In 2020, Neptune developed and delivered a three-part series of training webinars covering ProUCL functionalities. Each training session was attended by more than 500 ProUCL users and is now available on-demand on the EPA CLU-IN website.

A team of Neptune statisticians, environmental scientists, and programmers provides technical support to the SCMTSC, which supports EPA Regions and Superfund projects. This support includes performing data analyses, providing expert guidance for the application of statistical methods and models, and reviewing site-specific documents and activities. This work has been done on several Superfund and RCRA sites and has included statistical methods beyond ProUCL capabilities, such as spatial weighting and de-clustering of data, and spatial analysis of analyte trends in various media.