How to Ensure Useful Data Extraction from a PI Historian System 

Data is an ever-present and growing benchmark in all industries. However, sometimes it may feel like it’s cloaked in mystery. There’s often a question of, why is collecting process data important? You then start to go down the rabbit hole trying to answer…..How do you get data out of your equipment? How do you decide what to collect? After assessing the previous questions, how do you test, and if necessary, validate that all of that data is being collected as expected?

Understanding a holistic approach to data collection and validation from an AVEVA PI Historian System can help provide a high-level, big picture understanding to the importance of process data. With more than 10 years’ experience as a group engineering manager, Matt Martin has a depth of experience with integrating AVEVA PI Systems within sites of various sizes – here’s what I’ve learned:

Why is the Importance of Data Increasing? 

The “why” question is the easiest one to answer. Everyone can agree that in the era of modern manufacturing, extracting data from your manufacturing processes is critical to a business’s success and ability to grow. This includes every level of production and management:

  • Provides the operator at the plant floor level the ability to adjust and better control the manufacturing process on the fly
  • Allows engineers better insight into trends and behaviors of the overall process to improve designs from both a throughput and efficiency standpoint
  • Produces KPIs (key performance indicators) for managers and executives to assist in their decision making

Having large representative data sets is becoming even more important with the advent of AI and more specifically machine learning. Allowing these technologies access to your systems data helps identify patterns, overall system behavior, and correlations that were previously near impossible for the human eye to uncover. 

Now, How to Get the Data From the Equipment?  

Now that we’ve discussed the “easy” part that everyone can agree on, we can begin to examine “How do you get data out of your equipment?” At a high level, you will need a process historian system. E Tech Group currently recommends the AVEVA PI System for its robust and rich feature set around both data collection, and its ability to expose data to other systems. The concepts here overall can apply to most historians, but from a technological standpoint, I will be referencing components specific to that architecture.

Once you have decided on the historian platform, you need to assess your equipment and understand the data sources. One thing worth clarifying is what I mean by data sources, as this can mean different things to different people. In general terms, it is a device, piece of equipment, or system that you wish to collect data from. More specifically, the data could be coming directly from a controller, some form of relational database, or even other local historians to OEM/standalone equipment. 

PI has over 400 types of interfaces, connectors, and adapters. This alone can be the topic of an entire paper so I will keep it brief; in general, the most commonly utilized ones from my experience are PI OPC DA Interface, PI UFL Interface, PI RDBMS Interface, an interface specific to a hardware/software application at your site (such as a Yokogawa or DeltaV batch interface), and OPC UA Connectors. 

One note of interest on the UA connectors, AVEVA has somewhat recently launched their adapter suite of products, and I would personally recommend seriously comparing the UA adapter with the connector, as it offers better functionality in terms of redundancy. Drawbacks would be it doesn’t have a configuration GUI (graphical user interface), it is all command line based, and it does not allow any auto creation of AF (AssetFramework) elements. In my experience, the AF piece is not a large drawback because in most cases customers want full control over how their PI data is being handled within the AF system. 

What Data is Important to Collect? 

Now that we have the means by which to collect data, how do you decide what to collect? This is obviously a question that will need to be answered specifically based on the manufacturing process/system being integrated, but there are some general guidelines and thoughts I can offer:

1. Data from Equipment Relevant to Product Quality

As a baseline, any data that is relevant or even potentially relevant to the quality of your product should be collected. For example, if the process in question is a bioreactor, you would want to collect temperature, pressure, pH, dissolved oxygen, agitation speed, etc – basically, all of the measurable variables that have direct impact on the product.

2. Data from Equipment that Affects Equipment from Item 1

Taking a step back from that, you should consider collecting data on devices that impact those, such as the status or position of the valve that controls the glycol or steam supply to a vessel jacket (depending on if it’s open/close or an analog valve) and the jacket temperature itself. Taking another step back, the process variables of the steam or glycol loop and the controlling equipment for that can be collected.

At this point, you would have the data directly affecting your equipment, and then data all the way back to the source systems that have the potential to impact those direct control variables. After following this iterative process, you would now have a list of data pertinent to your product.

3. Data from Automation Software that Monitors Processes

The next category to evaluate is process status. This includes equipment status/step/phase data that allow you to understand what the equipment was doing at certain points in time. Again, in reference to the bioreactor example, you would want to know when SIP started and ended, or perhaps when a growth phase started and ended.

With this information you can do two things; one, it provides context to all of the process data you previously added, and two, you can start to create baselines to compare future runs of the same operation.

4. Data from the Control Platforms with the Receipts

The last category is troubleshooting or “I need to be able to prove what happened” data. Depending on the nature of the facility the need here differs:

  • If it is a site such as food and beverage, or textile then it leans more heavily into troubleshooting, and we would be concerned with any data that can track status of equipment/devices that can either show abnormal behavior or allow us to catch abnormal events before they happen. 
  • If it is a pharmaceutical plant, the overall data added would be similar, but a more exhaustive and comprehensive list is required as this data serves as evidence for validation protocols and demonstrating process control to the FDA allowing the site to continue producing products. 

How do you Confirm You’re Getting Everything You Need Correctly? 

Now we have configured the system and it has started collecting all of the data identified above, how do you test it to verify that all of that data is being collected as expected? In a perfect world, for each data point, you would be able to force a unique value at a known time and then verify it shows up in the historian system with an exact match. This would guarantee that no data streams are crossed or incorrectly configured, but this is very difficult due to connectivity limitations to end equipment and also requiring downtime.

One possible way to achieve this, if the controller can accept bulk writes from an excel sheet using VBA and a DDE poke or other connectivity means, is to create a sheet with unique values for all of the analog (float, int, etc) values. It also needs a timing mechanism to toggle all of the boolean/digital values, one by one, at unique times. Once this is prepared, the controller can be set to program mode and the excel sheet can first read and make a copy of all the data as it currently is in the controller. That information is stored, and the write can begin.

When the write is completed, you can reload the original data with one more write operation and the controller is returned to normal function. Using a tool like PI Datalink, the data that was collected to the historian can be exported and compared against the writing excel sheet for exact matches. As stated before, this is very much a perfect scenario, and unlikely to be achievable in the real world.

Validating Data Extraction in the Real World:

So, since there is no “perfect scenario”, we’ve got to work with what we have. The next best option is to automate bulk reads from a controller or system using excel or another tool, and then compare that data to data in the historian system at the time of the read for each point.

The two drawbacks here are that the data is almost guaranteed to not be unique (you can’t prove data point A in the controller isn’t going to data point B in the historian, if both values are the same), and because you aren’t forcing data change and holding it steady for a period of time longer than the poll rates, it becomes more difficult to align analog data with exact matches.

In my experience, if this is the only method, then you need to combine this evidence with an export of the point configurations to prove that the source address of each point is pulling from is correct against some approved or validated master tag list.  

Hopefully this helps shed some light on why extracting data from your manufacturing equipment is important, how to do it, and how to do it right!