Risk Mitigation of Critical Data Loss

Abstract

A commercial drug manufacturer in collaboration with E Tech Group implemented a critical monitoring system that required environmental data to be recorded in a 21 CFR Part 11 compliant manner.

With an intent to mitigate the risk of data loss, the drug manufacturer chose Vaisala instrumentation to read and locally store environmental data. The Vaisala instrumentation chosen can store environmental data locally on the device and report environmental data to an OPC-compliant SCADA system.

The drug manufacturer had a need for a solution that moves data stored on the devices to the Rockwell Automation SCADA system in the event of data loss. E Tech Group provided a solution that detects data loss, recovers the lost data from the devices, and moves it to the SCADA system.

The solution included a Windows service that polls the SCADA for gaps in historian data. If a gap is found, the solution pulls data from the buffers and populates it into the SCADA system. This provides redundancy and verification that critical data resides in a single repository, ensuring data integrity. The customer relied solely on the existing SCADA and used the most robust instrumentation to protect against critical data loss.

Problem Statement

The end-user chose Vaisala instrumentation equipped with loggers that provide data buffering to protect against critical data loss. However, the buffering capability did not integrate “off the shelf” with the existing SCADA, Rockwell Automation FactoryTalk. The buffered data from the loggers did not automatically populate data to the historian after a SCADA outage.

Background

The architecture selection allowed the customer to deviate from the more typical control system implementation which would include PLCs and remote I/O nodes. The architecture consisted of 150+ Vaisala temperature and humidity sensors and data loggers mounted throughout the facility. The data loggers have a 10-year internal battery and a sample storage capacity that can store from days to years’ worth of data based on the number of connected sensors and the configurable sample interval. Each sensor was connected to a data logger which buffered data and transmitted real-time data to an OPC server. The OPC server was also the home of the custom solution described within.

Solution

The software solution developed was a Windows service designed to actively poll the historian to find gaps in critical data. This polling occurred on a configurable, timed interval. A gap in historical data could occur for a variety of reasons, including power loss, communication loss, or servers not running. Once a gap in data was identified, the service retrieved data from the instrument data loggers and populated the historian with missing data. If a critical data gap is detected and the solution is unable to retrieve data or unable to populate the data gap, a SCADA alarm is generated and relevant personnel are notified using WIN-911 notification software. The service is multithreaded, so it takes full advantage of the VMs available resources, reducing the time to recover from a critical data gap. This is especially useful when multiple critical data gaps have been created from different loggers, as data can be read from more than one logger simultaneously. The service is also able to fully recover from any system failure such as unexpected shutdown of the VM. The software solution decision tree that highlights the functionality is detailed below.

This service allowed the client to use their selected data loggers with their existing industry standard historian and SCADA system. It requires no action on the part of the client. The alternative that had been provided to client was a standalone system based around the data logger selection, which would have required separate specialized historian and reporting software to be purchased and maintained. This would mean separate reporting and trending tools for data for just one section of the plant. This would require additional training to use and considerable expansion of the virtual infrastructure. Additionally, a custom solution in the event of non-recoverable data setting off an HMI alarm would have been required.

Conclusion

The solution has been validated, is now running, and has successfully inserted otherwise lost critical data. Currently, power/communication could be lost for 3 months and the provided solution would insert critical data for all 150+ sensors upon communication restoration. This solution has allowed the end-user to implement a critical monitoring system that consists of instrumentation that connects directly to the existing SCADA system. This eliminated the need for multiple sources of critical data, as well as a programmable logic controller with remote input/output nodes spread across the facility.