-
An Efficient Nonlinear Programming Strategy for PCA Models with Incomplete Data Sets
-
Posted in
Articles
: Wednesday, October 6, 2010
Processing plants can produce large amounts of data that process engineers use for analysis, monitoring, or control. Principal component analysis (PCA) is well suited to analyze large amounts of (possibly) correlated data, and for reducing the dimensionality of the variable space. Failing online sensors, lost historical data, or missing experiments can lead to data sets that have missing values where the current methods for obtaining the PCA model parameters may give questionable results due to the properties of the estimated parameters. This paper proposes a method based on nonlinear programming (NLP) techniques to obtain the parameters of PCA models in the presence of incomplete data sets. We show the relationship that exists between the nonlinear iterative partial least squares (NIPALS) algorithm and the optimality conditions of the squared residuals minimization problem, and how this leads to the modified NIPALS used for the missing value problem. Moreover, we compare the current NIPALS-based methods with the proposed NLP with a simulation example and an industrial case study, and show how the latter is better suited when there are large amounts of missing values. The solutions obtained with the NLP and the iterative algorithm (IA) are very similar. However when using the NLP-based method, the loadings and scores are guaranteed to be orthogonal, and the scores will have zero mean. The latter is emphasized in the industrial case study. Also, with the industrial data used here we are able to show that the models obtained with the NLP were easier to interpret. Moreover, when using the NLP many fewer iterations were required to obtain them. Copyright \xa9 2010 John Wiley & Sons, Ltd.
Contributed by Shankar Subramaniam