Failure time prediction using adaptive logical analysis of survival curves and multiple machining signals
Abstract
This paper develops a prognostic technique called the logical analysis of survival curves (LASC). This technique is used to learn the degradation process of any physical asset, and consequently to predict its failure time (T). It combines the reliability information that is obtained from a classical Kaplan–Meier non-parametric curve to that obtained from online measurements of multiple sensed signals of degradation. An analysis of these signals by the machine learning technique, logical analysis of data (LAD), is performed to exploit the instantaneous knowledge about the state of degradation of the asset studied. The experimental results of the predictions of failure times for cutting tools are reported. The results show that LASC prognostic results are better than the results obtained by well-known machine learning techniques. Other advantages of the proposed techniques are also discussed.
Introduction
Prognostics and health management is one of the most important tools in achieving efficient condition-based maintenance (CBM) (Guillén et al. 2016). While CBM recommends maintenance actions based on the information collected through condition monitoring, prognostics enrich the CBM with the ability to predict failures before they occur so maintenance actions can be planned in advance. One of the most important aspects of prognosis is the estimation of the Failure Time (T) of the working physical assets. Based on the estimated T, efficient preventive actions can be planned.
Logical analysis of data (LAD) (Boros et al. 2000), is a binary classifier that discriminates between any two classes using a set of patterns. These patterns are sets of logical conjunctions that are extracted from the collected observations, which are represented in the most concise way. These patterns have interpretability and explanatory powers, and in turn provide insight into factors that will lead to the classification.
This paper proposes a new technique called logical analysis of survival curves (LASC), which uses LAD as a trainable stratifying agent for the well-known Kaplan–Meier (KM) survival curve estimator. The stratification of KM, which is based on sensor readings acquired from the working asset, mitigates the global averaging that results from calculating survival curves based on all the observed assets. The stratification is adjusted after each time a new sensor reading is collected. The estimation of (T) from the stratified survival curves is based on the observed sensor readings at that time.
The next section provides a literature review about previous attempts in literature to solve the failure time prediction problem. “The proposed failure time estimation” section gives a background review about some of the algorithms used in the proposed technique and provides the details of the LASC. An experimental application is outlined in “Experimental application” section, and its results in comparison with other techniques is given in “Results section. Results of the comparison are discussed in “Discussion” section. An outline for the practical usage of LASC is given in “Using LASC for decisionmaking” section. Finally, “Conclusions and future work” section concludes with the findings and outlines future work of LASC.
Literature review
The algorithms for failure time estimation are divided into three approaches; statistically-based, data-driven artificial neural networks-based, and physics-based algorithms (Sikorska et al. 2011; An et al. 2015; Jardine et al. 2006; Si et al. 2011; Heng et al. 2009b). Because in this paper a data-driven approach is presented, only previous literature on failure time estimation using data-driven approaches are considered. These approaches gain popularity as the amount of available data increases.
To avoid the assumption of a specific mathematical model of degradation for a physical asset, artificial neural networks (ANN) were introduced to CBM (Sikorska et al. 2011). ANNs have many architectures. The one that copes well with temporal data is the recurrent neural network (RNN) (Wu et al. 2018). An ANN, whose training targets are an asset’s survival probabilities at each time interval, was proposed by (Heng et al. 2009a). They used the feedforward neural network and Kaplan–Meier estimator to model a degradation-based failure probability density function. In their paper, the authors predicted the reliability of a pump by using vibration signals. Self-organizing maps, which are unsupervised clustering neural networks, are used to extract signals of degradation. These signals are fed to feed forward (FF) ANNs, as proposed by (Huang et al. 2007). Another application of ANNs in CBM is the use of an autoencoder to extract features from rotating machining signals for visualization and classification using support vector machine (SVM) (Shao et al. 2017). The main problems that face ANNs are the following: (1) The sensitivity of performance to structure in terms of the number of neurons, number of hidden layers, and connections between layers. (2) The increase in computational requirements and the amount of required training data as the complexity of the ANN increases. (3) The ANN is a black-box model, which means that the model’s mathematical representation does not provide clear information about the physical phenomenon that drives the classification results. Recent papers presented the results of rules that are extracted from learned neural networks (Hailesilassie 2016). These rules are a byproduct of the learned model. As such, the extracted rules do not reveal the explanatory power and the interpretability that are obtained by rule induction algorithms.
Signal decomposition and instance-based learning are introduced in the literature in order to use the data available to predict the failure time (Pimenov et al. 2018). Such decomposition is applied in a predefined set of bases, such as wavelets, where the coefficients are further reduced using an analysis of principal components as proposed by (Wang et al. 2018). A semi-parametric method using subsequences of machine degradation called shapelets is introduced in (Ye and Keogh 2009). The authors built a database of these shapelets and the corresponding remaining useful life for each shapelet. The database is built using K-means, which is the parametric part, and the subsequence matching is done by using Euclidian distance, which is the non-parametric part of the method. This approach requires the choice of the length and the number of shapelets, as well the discriminative threshold. Moreover, the K-means algorithm that is used in this method is a random algorithm that is highly dependent on its initialization.
In order to avoid the limitations of the prediction methods that have been presented in the previous paragraph, a failure time prognostic technique is presented in this paper. The well-known non-parametric KM curve is used with LAD. The KM curve is built from the failure times of similar non-repairable assets in order to calculate an aggregate survival estimate SKM(t)SKM(t). Since this curve does not consider the operating conditions that affect the degradation of each individual asset, LAD exploits the available information about the degradation of each individual asset in order to generate patterns that characterize subgroups of the assets that have similar degradation profiles. This update of the KM survival curve based on the LAD generated patterns produces individualized and adjusted KM survival curves, which are called LASC. As the asset degrades over time, the generated patterns indicate specific information about the profile of the asset degradation. The idea of combining a KM survival curve with a LAD-generated pattern has been applied successfully in medicine (Kronek and Reddy 2008). The authors used the generated patterns to differentiate between the time of death of patients who received an intervention compared to those who did not. Consequently, the time of death of a new patient was estimated. The approach that is presented in this paper differs from the one used in medicine, in which the data that is used for machine learning is the data collected at or near the time of death of the two groups of patients. Thus, the information gathered between intervention and death is not used for learning. In this paper, the objective is to understand the natural phenomenon of degradation based on a data-driven approach without any mathematical modeling, and to reconstruct this process based on the patterns obtained that stratify the similar assets according to their degradation profiles. Ultimately, the degradation process will be learned and used to predict the failure time of new objects. As such, the learning process is continuous in time, and it is based on every observation gathered from the equipment since its first use until its failure. This paper extends and generalizes this approach by tracking the degradation process through multi-dimensional signals, and develops the necessary technique to deal with these signals in order to stratify the assets into groups of similar degradation profiles. The paper also considers general operating scenarios for the working asset. More specifically, it considers an asset that gives multi-state signals under multiple operating conditions. In the next section, the methodology of building the LASC is presented along with its theoretical background.
The proposed failure time estimation
This section presents an introduction to the LAD approach, the construction of a KM survival curve, and finally, the construction of the LASC and how it uses the information that is obtained from the LAD’s generated patterns to provide adjusted KM survival curves.
Background methods
Logical analysis of data (LAD)
LAD is a knowledge discovery and data analysis technique that was introduced by Peter Hammer (Crama et al. 1988). It is a supervised binary or multi-class classifier (Mohamad-ali et al. 2014). LAD finds causes that discriminate between the different classes of a certain phenomenon. For example, the discrimination between classes of faults. The classification is based on patterns, which are sets of data-driven rules that are found in the monitored signals. Whenever these rules are present, they are used to explain why the asset is in a certain class of phenomenon. As such, the explanatory power of LAD resides within the generated patterns, and the pattern generation algorithms are the main areas of research on this approach (Boros et al. 2011). In general, three methods are used for pattern generation: enumeration-based techniques, mathematical programming algorithms, and heuristics techniques (Boros et al. 2000; Hammer et al. 2004; Ryoo and Jang 2009). The objective of these techniques and algorithms is to find the minimum number of patterns that characterize all of the observations in the dataset that form a robust theory that is capable of correctly classifying any new observation. As with most machine learning techniques, LAD is applied in two phases: the training phase, during which the patterns are generated, and the testing phase, at which the capacity of patterns to classify new data is tested, and finally, the classification phase, during which a new observation is classified.
Kaplan–Meier survival curve
The KM curve is a non-parametric method for calculating the survival at time t, SKM(t)SKM(t) based on the observed failures of similar systems. The survival is calculated as follows:
where YtiYti is the number of tools that are at risk of failure at time titi, and dtidti is the number of tools that failed after ti−1ti−1 and up to time titi. This equation is used to estimate the base SKM(t)SKM(t), which aggregates all of the failure times observed in the training data without taking into account the status of the monitored signals. Hence, the interest in using KM along with LAD takes into account the knowledge that is provided by monitored signals. This merger is illustrated in the following section.
Proposed technique for a logical analysis of survival curves (LASC)
The proposed technique is inspired by the logical analysis of survival data (LASD), which was developed and applied successfully in the medical field by Kronek and Reddy (2008) and Reddy (2009), and in the engineering field by Ragab et al. (2016). Nevertheless, in both papers, the intermediate data is not used for learning. In the field of data science, this is considered a waste of a source of knowledge. Moreover, in Ragab et al., the data are split into two groups before and after only one threshold over the lifetime of the equipment. This leads to gathering data that is not homogenous, since it contains observations gathered over two relatively long periods of time before and after the threshold. Consequently, the explanatory power and the quality of the patterns is weakened. Both papers differ from this paper in that they characterize the phenomenon of failure or no failure, while in this paper the objective is to characterize the degradation phenomenon by generating patterns from the data gathered over the time of the first use until its failure. This phenomenon is generally found in engineering applications. As such, the novelty of the proposed approach is in the development of a technique that we call LASC in order to cope with the nature of industrial applications; specifically, the degradation of physical assets. Unlike the two previously-mentioned references, this proposed approach tracks the evolution of the degradation state, such as wear, throughout the lifetime of the physical asset and produces individualized and adjusted KM survival curves at each time of analysis according to the sensors’ readings. This is an adaptive process that tracks the degradation phenomenon through a series of observations in time, unlike the previous work of Ragab et al. (2016) in which only a single split of data based on a single threshold is made prior to the monitoring process in order to characterize assets with a short life vs. a long life. As such, the approach of Ragab et al. is similar to the work of Kronek and Reddy (2008), in that both papers do not learn from the information embedded in the intermediate data along the life cycle. In this paper, the observations are collected, and the corresponding machine learning is performed at intermediate points of interest in time. This section illustrates the proposed approach.
LASC is based on the LAD approach that is presented in Section “Literature review”. The idea is to update the KM curve based on the monitored signals, and by characterizing the degradation of the asset with the notion of events. These events are states of the degradation that are physically recognized by tool wear experts (Shaban et al. 2017). For each time tktk, a positive (or negative) state is defined by whether or not the asset has reached the predefined event of degradation. The measure of degradation, which is used in the experiments in Section “The proposed failure time estimation”, is the tool wear. The leftmost part of Fig. 1 shows the wear state of a subset of the tools versus time, along with the predefined wear events. Details about the choice of the wear events are given in (Banjevic et al. 2001).
Example of an analysis at time sample t12t12 and split using event 1; (mid-top). Tools having wear at time t12t12 above event 1; (mid-bottom). Tools having wear at time t12t12 below event 1; (top-right). KM curve constructed from the failure time of the 27 tools, and the disaggregate KM curves that are constructed from the failure times of the tools that have experienced event 1 at time t12t12, based on the generated patterns; (bottom-right). KM curves extracted for event-free pattern at time t12.t12.
To track the degradation process, the tools are split into two or more classes of a similar degradation profile at each time of analysis tk,k=1,…,Ktk,k=1,…,K. This is illustrated in the case of two classes by the vertical blue dashed line in Fig. 1. The positive class contains the tools that pass one of the pre-specified levels of degradation, which indicate the wear events on the horizontal red dashed lines in Fig. 1. The tools that pass the pre-specified level are the fast degrading tools. The force readings and the corresponding label, which indicates whether or not the reading has passed the wear event, is illustrated in Table 1. The positive class tools are illustrated in the top-middle of Fig. 1. The negative class contains the tools that have not passed the pre-specified level of degradation. These are the slow degrading tools. These are shown in the bottom-middle of Fig. 1. It is to be noted that the pre-specified levels of degradation also change with time, as they go upward because the trend of degradation is monotonically non-decreasing. These levels are chosen adaptively in time to offer the best separation between the two classes of tools.