Though this may be suitable for some persons, it frequently leads to a picture that is extensively corrupted. The method of image processing is used to do some processes on a picture like an image enhancement or to remove some functional data from the image. Image processing is one kind of signal processing , where the input is a picture, as well as the output, are features or characteristics allied with the image. At the present time, the image processing technique is highly used across different industries, which is used to form core investigate region in engineering as well as in different disciplines too.
Basically, the step by step image processing steps is discussed below. Image processing can be done by using two methods namely analog image processing as well as digital-image-processing. The primary image processing analog technique is employed for photographs, printouts. Image analyst uses different basics of understanding while using some of the image techniques. The secondary image processing Digital technique will assist in digital image analysis by using a PC.
The following image processing projects list is discussed below. This project is used to build a Robot for ball tracing using Raspberry Pi. Here this robot utilizes a camera for capturing the images, as well as to perform image processing for tracking the ball.
This project uses a raspberry pi camera module as a microcontroller for tracing the ball and allows the Python code for image analysis. This project is very useful for monitoring public places like offices, homes, using an Android app. By using this one can capture the images, monitor and record the live streaming videos.
The proposed system requires a power supply, a Raspberry Pi, Pi camera, and an android phone. The video can be recorded with the help of motion software where the motion is present in the room. This project is used in the healthcare system for fake image recognition to confirm that the image is associated with the medical image or not. The working principle of this project is on a noise chart of an image, uses a multi-resolution failure filter, and gives the output to the classifiers like extreme learning and support vector.
The noise-map is formed in a boundary computing source, as the while the classification and filtering are completed in a core cloud-computing source. Similarly, this project works effortlessly. The requirement of bandwidth is also very reasonable for this project. This project is used to identify the human act by image processing in real time, and the main intention is to communicate the identified gestures using the camera system.
There is a variety of non-invasive techniques for measuring brain activity. However, for technical, time resolution, real-time, and price constraints, only EEG monitoring and related techniques are employed in the BCI community. For more details refer to Wolpaw et al. The neuronal electrical activity contain a broad band frequency, so the monitored brain signals are filtered and denoised to extract the relevant information see section 3 and finally this information is decoded see section 6 and commuted into device commands by synchronous control or more efficiently by self-paced or asynchronous control in order to detect whether a user is intending something or not see chapter 7 in Dornhege et al.
For some specific BCI tasks, raw brain signal serves as stimulus as well as a control interface feedback. The direct BCIs can be seen as a new means of communication that may be used to allow tetraplegic or individuals with severe motor or neuromuscular diseases e. Amyotrophic lateral sclerosis ALS , brainstem stroke, brain or spinal cord injury, cerebral palsy, muscular.
The BCI is a communication system that does not require any peripheral muscular activity. Grand average ERD curves recorded during motor imagery from the left C3 and right sensorimotor cortex C4 the electrodes C3 and C4 are placed according to the International system. The ERD time courses were calculated for the selected bands in the alpha range for 16 subjects. Positive and negative deflections, with respect to baseline second 0.
The gray bar indicates the time period of cue presentation i. Figure from Pfurtscheller et al. ERD maps for a single subject calculated for the cortical surface of a realistic head model. The direct BCIs can also be seen as a new means to extend communication for healthy subjects in many fields such as multimedia communication, control of robots, virtual reality and video games Thomas, ; Friedman et al. There are in general two types of BCI systems: endogenous tasks and exogenous tasks based systems Dornhege et al. The endogenous tasks BCI systems, which are based on spontaneous activity, use brain signals that do not depend on external stimuli and that can be influenced by concentrating on a specific mental task.
In order to obtain an efficient task recognition system, several concentration trials of human are, in general, realized. The concentration constraint is a very tiring mental task especially for disabled subjects who might have difficulties in acquiring voluntary control over their brain activity and it must be reduced in order to obtain an efficient task recognition system. The exogenous tasks BCI systems, which are based on evoked activity, use brain signals that do depend on external stimuli.
Advantages of these potentials are that they are relatively well understood from a neurophysiologic point of view and that they can be evoked robustly across different subjects. Moreover, feedback training is not necessary in these systems, as theses potentials appear "automatically" whenever subjects concentrate onto one out of several stimuli presented in random order Hoffman et al.
In order to improve the performance of the BCI system design, it is necessary to use a good method of signal processing to allow easier extraction of physiological characteristics and also to use a good classifier adapted to the specificities of the BCI system. This chapter presents a compact guide to different signal processing techniques that have received more attention in BCIs. We introduce then some selected feature extraction and classification approaches in the context of BCI systems.
A more exhaustive and excellent surveys on signal processing and classification algorithms may be found in the papers Bashashati et al. Then this chapter describes the application of two classification approaches, hidden Markov models HMMs and support vector machines SVM , in the context of exogenous tasks BCI systems based on P evoked potential.
The chapter ends with a global conclusions and perspectives. The methods presented in sections 3. Among these methods, we give here only a brief descriptions of the most applied methods. They are introduced here without referencing all the published papers for the 96 BCI designs. The reader may refer to the paper Bashashati et al.
However, we give only the original references corresponding to each proposed method. Current BCI systems fall into seven main categories, based on the neuromechanisms and recording technology they use to generate control signals Bashashati et al. The following list give a short descriptions of these electrophysiological activities used in BCI designs. This list is borrowed and adapted with the authorization of authors from the paper Bashashati et al. We omitted the references of the different approaches given in this list.
Many of these references are given in Bashashati et al. They are mostly prominent in frontal and parietal locations. After a voluntary movement, the power in the brain rhythms increases.
This phenomenon, called event-related synchronization ERS , is dominant over the contralateral sensorimotor area and reaches a maximum around ms after movement offset. MRPs are low-frequency potentials that start about They have bilateral distribution and present maximum amplitude at the vertex.
Close to the movement, they become contralaterally preponderant. The sensorimotor activities that do not belong to any of the preceding categories are categorized as other sensorimotor activities. These activities are usually not restricted to a particular frequency band or scalp location and usually cover different frequency ranges. An example would be features extracted from an EEG signal filtered to frequencies below 30 Hz. Such a range covers different eventrelated potentials ERPs but no specific neuromechanism is used.
VEPs are small changes in the ongoing brain signal. They are generated in response to a visual stimulus such as flashing lights and their properties depend on the type of the visual stimulus. These potentials are more prominent in the occipital area. If a visual stimulus is presented repetitively at a rate of Hz or greater, a continuous oscillatory electrical response is elicited in the visual pathways.
BCI systems based on non-movement mental tasks assume that different mental tasks e. It has been shown that the firing rates of neurons in the motor cortex are increased when movements are executed in the preferred direction of neurons. Once the movements are away from the preferred direction of neurons, the firing rate is decreased. BCI systems based on multiple neuromechanisms use a combination of two or more of the above mentioned neuromechanisms. To extract features see section 4 , it is necessary to pre-process first the data.
Three steps are necessary to achieve this goal: Referencing , Temporal filtering and signal enhancement. Hagemann et al.
Deep Learning for Signal Processing Applications » Deep Learning - MATLAB & Simulink
In the case of EEG recordings from the cortex or from the scalp, these recordings are obtained using, in general, different electrodes on different positions. Since the brain activity voltage measured by a given electrode is a relative measure, the measurement may be compared to another reference brain voltage situated on another site. This results in a combination of brain activity at the given electrode, brain activity at the reference site and noise. Because of this, the reference site should be chosen such that the brain activity at that site is almost zero.
Typically, the nose, mastoids and earlobes are used Dien, In general, there are three referencing methods. The common reference technique is widely used in BCIs. This method uses one common reference for all electrodes. In general, the site of this reference is situated at large distance from all electrodes. The activity at the reference site influences all measurements equally, and differences between electrode measurements still contain all information needed.
The average reference subtracts the average of the activity at all electrodes from the measurements. This method is based on the principle that the activity at the whole head at every moment sums up to zero. Therefore, the average of all activity represents an estimate of the activity at the reference site. Subtracting this average produces in principle a dereferenced solution. However, the relatively low density of the electrodes and the fact that the lower part of the head is not taken into account, bring some practical problems along Dien, It is "the rate of change of current flowing into and through the scalp" Weber, This quantity can be derived from EEG data, and it may be interpreted as the potential difference between an electrode and a weighted average of their surrounding electrodes.
The CSD can be estimated by computing the laplacian. The laplacian computes the sum of the differences between an electrode and its neighbours. A problem with this estimation is that it is actually only valid when the electrodes are in a two dimensional plane and equally distant. The brain signals are naturally contaminated by many internal and external noises.
They can be removed using simple filters. The relevant information in BCIs is found in the frequencies below 30Hz. Therefore, all noise with higher frequencies e. Specific frequency bands may also be selected using FIR bandpass filters. The choice of a suitable enhancement technique is dependent on several factors such as the recording technology, number of electrodes, and neuromechanism of the BCI Bashashati et al.
Among seventeen pre-processing methods given by Bashashati et al. The proper selection of a spatial filter for any BCI is determined by the location and extent of the selected brain control signal and of the various sources of EEG or non-EEG noise. Common-average referencing involves recording in bipolar fashion from a number of electrodes, all referred to a single site. One then calculates the grand mean EEG waveform, by averaging across electrodes, and subtracts the result pointwise from the EEG recorded at each electrode.
Activity recorded by the reference electrode is theoretically of equal magnitude in the mean and individualelectrode waveforms. Consequently, the effect of the reference electrode should be eliminated from each recording electrode's output when the common-average waveform is subtracted Stanny, The SL is defined as the 2nd order spatial derivative of the surface potential. Due to its intrinsic spatial high-pass filtering characteristics, the SL can reduce the volume conduction effect by enhancing the high-frequency spatial components, therefore can achieve higher spatial resolution than surface potentials.
The PCA Pearson, is a linear mapping that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.
The PCA reveals the internal structure of the data in a way which best explains the variance in the data. If a multivariate dataset is visualised as a set of coordinates in a high-dimensional data space 1 axis per variable , ICA supplies the user with a lower-dimensional representation. Classical automatic methods for removing such artefacts can be classified into rejection methods and subtraction methods. Their success crucially depends on the quality of the detection. It is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals.
It is a special case of blind source separation BSS. For more details about their advantages, their limitations and their applications for the removal of eyes activity artefacts, refer to Jung et al. Such matrix maximizes the differences between the classes. Guger et al. But the selection of these parameters is not very crucial. An advantage of the CSP method is that it does not require a priori selection of subject-specific frequency bands, as necessary for bandpower or frequency estimation methods Pfurtscheller et al. The CSP method is very sensitive to artefacts.
The reason is the sample covariance nonrobust estimate , which is used to estimate the covariance for the calculation of the spatial filters. However, during on-line operation of the BCI , the spatial filters perform a weighted spatial averaging of the EEG , and this reduces the influence of artefacts Guger et al. Therefore, this method requires almost identical electrode positions for all trials and sessions which may be difficult to accomplish Ramoser et al.
This allows the generation of a more robust filter in order to overcome the mentioned problems. Signal matrices or covariance matrices are decomposed using spatial factors common to multiple conditions. The spatial factors and corresponding spatial filters are then dissociated into specific and common parts, according to the common spatial subspace which exists among the data sets.
Finally, the specific signal components are extracted using the corresponding spatial filters and spatial factors. Wang et al. Other methods are given by Bashashati et al. The study of Bashashati et al. Only PCA has been used in both groups, and.
Bashashati et al. Signal enhancement methods in BCI designs. Figure modified from Bashashati et al. In the following, we give a breif description of the two most used methods: Spatial filters and Common spatial patterns. McFarland et al. However, it was shown that the reference method CAR , bipolar , large Laplacian , small Laplacian , and referenced to the ear McFarland et al.
Fast and continuous feedback can also enhance the performance of the system Guger et al. In the following, we introduce only the principles of the CSP given in Guger et al. As described by Guger et al. The decomposition or filtering of the EEG leads to new time series, which are optimal for the discrimination of two populations or classes.
The patterns are designed such that the signal resulting from the EEG filtering with the CSP has maximum variance for population and minimum variance for the second population and vice versa. In this way, the difference between the first and second populations is maximized, and the only information contained in these patterns is where the variance of the EEG varies most when comparing two conditions. This matrix is a set of subject-specific spatial patterns, which reflect the specific activation of cortical areas during hand movement imagination. With the projection matrix W , the decomposition of a trial X is described by.
This mapping projects the variance of X onto the rows of Z and results in new time series. After interpolation, the patterns can be displayed as topographical maps. By construction, the variance for population 1 is largest in the first row of Z and decreases with the increasing number of the subsequent rows. The opposite is the case for a trial with population 2. This section describes briefly the common BCI features extraction methods. Concerning the design of a BCI system, some critical properties of these features must be considered Lotte et al.
Several features are generally extracted from several channels and from several time segments before being concatenated into a single feature vector;. In the following we describe some main and specific methods. Using the notation of Section IV. A good heuristic is to start the local optimizations from many random initial points and to keep the weights yielding the minimum value for the sum of squared errors to prevent the network from converging to a shallow local minimum.
It is advisable to scale the random initial weights so that the inputs to the logistic activation functions are of the order unity [22, Chap. In weight decay regularization [22, Chap. The network inputs and the outputs of the hidden units should be roughly comparable before the weight decay penalty in the form given above makes sense. It may be necessary to rescale the inputs in order to achieve this.
Local Linear Regression Local linear regression LLR [78, 98] is a nonparametric regression method which has its roots in classical methods proposed for the smoothing of time series data; see . Such estimators have received more attention recently; see, e. Given training data xi, y i ,. The local bandwidth h x is controlled by a neighborhood size parameter 0 3. Tree Classifier, Multivariate Adaptive Regression Splines, and Flexible Discriminant Analysis The introduction of tree-based models in statistics dates back to  although their current popularity is largely due to the seminal book .
The cutoff values k are chosen to optimize a suitable fitting criterion. Stopping criteria are used to keep the trees reasonably sized, although the commonly employed strategy is to first grow a large tree that overfits the data and then use a separate pruning stage to improve its generalization performance. A terminal node is labeled according to the class with the largest number of training vectors in the associated hyperrectangle.
The tree classifier therefore uses the Bayes rule with the class posterior probabilities estimated by locally constant functions. The particular tree classifier described here is available as a part of the S-Plus statistical software package . This implementation uses a likelihood function to select the optimal splits . Pruning is performed by the minimal cost-complexity method. MARS  is a regression method that shares features with tree-based modeling.
The algorithm is a two-stage procedure, beginning with a forward stepwise phase which adds basis functions to the model in a deliberate attempt to overfit the data. The second stage of the algorithm is standard linear regression backward subset selection. The maximum order of variable interactions products of variables allowed in the functions Bk, as well as the maximum value of M allowed in the forward stage, are parameters that need to be tuned experimentally.
Backward model selection uses the generalized cross-validation criterion introduced in . The original MARS algorithm fits only scalar-valued functions and is therefore not well suited to discrimination tasks with more than two classes. A recent proposal called flexible discriminant analysis FDA  with its publicly available S-Plus implementation in the StatLib program library contains vector-valued MARS as one of its ingredients. Either the training vectors are retained as such or some sort of a training phase is utilized to extract properties of a multitude of training vectors to each of the memorized prototypes.
In either case, the prototype classifiers are typical representatives of the nonparametric classification methods. Pattern Recognition 31 1. The k closest neighbors of an input pattern vector are found among all the prototypes and the class label is decided by the majority voting rule.
A possible tie of two or more classes can be broken, e. The A:-NN rule should even now be regarded as a sort of a baseline classifier, against which other statistical and neural classifiers should be compared . Its advantage is that no time is needed in training the classifier, and the corresponding disadvantage is that huge amounts of memory and time are needed during the classification phase. An important improvement in memory consumption—while still keeping the classification accuracy moderate—may be achieved using some editing method .
An algorithm known as multiedit  removes spurious vectors from the training set. Another algorithm known as condensing  adds new vectors to the classifier when it is unable to classify the pattern correctly. In both methods, a vector set originally used as a A:-NN classifier is converted to a smaller edited set to be used as a 1-NN classifier. The variations of the LVQ algorithm differ in the way the codebook vectors are updated.
Note that the zero set of s consists of the Bayes optimal decision boundaries. The learning rules of the learning k-NN L-k-NN resemble those of LVQ but at the same time the classifier still utilizes the improved classification 32 Jouko Lampinen et ah accuracy provided by the majority voting rule. The performance of the standard A:-NN classifier depends on the quality and size of the training set, and the performance of the classifier decreases if the available computing resources limit the number of training vectors one can use.
In such a case, the learning A:-NN rule is better able to utilize the available data by using the whole training set to optimize the classification based on a smaller set of prototype vectors. For the training of the A;-NN classifier, three slightly different training schemes have been presented.
As in the LVQ, the learning A:-NN rules use a fixed number of code vectors mtj with predetermined class labels j for classification. Once the code vectors have been tuned by moving them to such positions in the input space that give a minimal error rate, the decision rule for an unknown input vector is based on the majority label among its k closest code vectors. The objective of all the learning rules is to make the correct classification of the training samples more probable.
This goal is achieved by incrementally moving some of the vectors in the neighborhood of a training input vector toward the training sample and some away from it. With a positive sign of a t , the movement of the code vector is directed toward the training sample, and with negative sign away from it. The learning rate a t should decrease slowly in order to make the algorithm convergent; in practice it may be sufficient to use a small constant value. A vector from an unknown class can then be classified according to its shortest distance from the class subspaces. The basis matrices Uy are recalculated after each training epoch as the dominant eigenvectors of the modified S;.
The subspace dimensions ij need to be somehow fixed. One effective iterative search algorithm and a novel weighting solution have been recently presented . In this general view many neural networks can be seen as representatives of certain larger families of statistical techniques. However, this abstract point of view fails to identify some key features of neural networks that characterize them as a distinct methodology.
From the very beginning of neural network research  the goal was to demonstrate problem-solving without explicit programming. The neurons and networks were supposed to learn from examples and store this knowledge in a distributed way among the connection weights. The original methodology was exactly opposite to the goal-driven or top-down design of statistical classifiers in terms of explicit error functions. In neural networks, the approach has been bottom-up: starting from a very simple linear neuron that computes a weighted sum of its inputs, adding a saturating smooth nonlinearity, and constructing layers of similar parallel units, it turned out that "intelligent" behavior such as speech synthesis  emerged by simple learning rules.
The computational aspect has always been central. At least in principle, everything that the neural network does should be accomplished by a large number of simple local computations using the available input and output signals, as in real neurons, but unlike heavy numerical algorithms involving such operations as matrix inversions. Perhaps the best example of a clean-cut neural network classifier is the LeNet system [4, ] for handwritten digit recognition see Section V.
Such a computational model supports well the implementation in regular VLSI circuits. In the current neural network research, these original views are clearly becoming vague as some of the most fundamental neural networks such as the onehidden-layer MLP or RBF networks have been shown to have very close connections to statistical techniques. The goal remains, however, of building much more complex artificial neural systems for demanding tasks such as speech recognition  or computer vision , in which it is difficult or eventually impossible to state the exact optimization criteria for all the consequent processing stages.
Figure 9 is an attempt to assess the neural characteristics of some of the classification methods discussed earlier. The horizontal axis measures the flexibility of a classifier architecture in the sense of the richness of the discriminant function family encompassed by a particular method. High flexibility of architecture is a property often associated with neural networks. In the vertical dimension, the various classifiers are categorized on the basis of how they are designed from a training sample.
Pattern Recognition 35 if the training vectors are used as such in classification e. Neural learning is characterized by simple local computations in a number of real or virtual processing elements.
- An introduction to Husserlian phenomenology.
- Gottland: Mostly True Stories from Half of Czechoslovakia.
- Permaculture for the Rest of Us: Abundant Living on Less than an Acre.
- Noise reduction - Wikipedia!
- Image Processing Projects for Engineering Students using Python & MATLAB.
- Digital Image Forensic - Methods: Bibliography.
- Offshore and Coastal Modelling.
Neural learning algorithms are typically of the error correction type; for some such algorithms, not even an explicit cost function exists. Typically, the training set is used several times epochs in an on-line mode. Note, however, that for some neural networks MLP, RBF the current implementations in fact often employ sophisticated optimization techniques which would justify moving them downwards in our map to the lower half plane.
The whole process of classifier design should then be based strictly on the training sample only. In addition to parameter estimation, the design of some classifiers involves the choice of various tuning parameters and model or architecture selection. To utilize the training sample efficiently, cross-validation  or "rotation," cf.
In i;-fold cross-validation, the training sample is first divided into v disjoint subsets. One subset at a time is then put aside; a classifier is designed based on the union of the remaining i; — 1 subsets and then tested for the subset left out. Cross-validation approximates the design of a classifier using all the training data and then testing it on an independent set of data, which enables defining a reasonable object function to be optimized in classifier design.
For example, for a fixed classifier, the dimension of the pattern vector can be selected so that it minimizes the cross-validated error count.
After optimization, one can obtain an unbiased estimate of the performance of the optimized classifier by means of the separate testing sample. Notice that the performance estimates might become biased if the testing sample were in any way used during the training of the classifier. Jouko Lampinen et al. The use of a reject class can help reduce the misclassification rate e in tasks where exceptional handling e.
One can therefore select a rejection threshold 0 e. This phenomenon can also be observed in Fig. It is then quite possible that combining the opinions of several parallel systems results in improved classification performance. Such hybrid classifiers, classifier ensembles, or committees, have been studied intensively in recent years . Besides improved classification performance, there are other reasons to use a committee classifier. The pattern vectors may be composed of components that originate from very diverse domains. Some may be statistical quantities such as moments and others discrete structural descriptors such as numbers of endpoints, loops, and so on.
There may not be an obvious way to concatenate the various components into a single pattern vector suitable for any single classifier type. In some other situations, the computational burden can be reduced either during training or in the recognition phase if the classification is performed in several stages. Various methods exist for forming a conmiittee of classifiers even when their output information is of different types. In the simplest case, a classifier only outputs its decision about the class of an input pattern, but sometimes some measure Pattern Recognition 37 of the certainty of the decision is also provided.
The classifier may propose a set of classes in the order of decreasing certainty, or a measure of decision certainty may be given for all the classes. Various ways to combine classifiers with such types of output information are analyzed in . The simplest decision rule is to use a majority rule among the classifiers in the committee, possibly ignoring the opinion of some of the classifiers . Two or more classifiers using different sets of features may be combined to implement rejection of ambiguous patterns .
A genetic algorithm can be applied in searching for optimal weights to combine the classifier outputs . Theoretically more advanced methods may be derived from the EM algorithm [, , ] or from the Dempster-Shafer theory of evidence [, ]. The outputs of several regression-type classifiers may be combined linearly  or nonlinearly  to reduce the variance of the posterior probability estimates.
A more general case is the reduction of variance in continuous function estimation: a set of MLPs can be combined into a committee classifier with reduced output variance and thus smaller expected classification error . A separate confidence function may also be incorporated in each of the MLPs . Given a fixed feature extraction method, one can either use a conmion training set to design a number of different types of classifiers  or, alternatively, use different training sets to design several versions of one type of classifier .
Such comparisons need, however, to be considered with utmost caution. During the last years, a large number of papers have been published in which various neural and other classification algorithms have been described and analyzed. The results of such experiments cannot generally be compared due to the use of different raw data material, preprocessing, and testing poUcies. In  the methods employed in experimental evaluations concerning neural algorithms in two major neural networks journals in and were analyzed. The bare conclusion was that the quality of the quantitative results—if presented at all— was poor.
The conclusion was that the original results were hard to reproduce and the regenerated MLP results were outperformed by the A:-NN classifier. Some larger evaluations or benchmarking studies have also been published in which a set of classification algorithms have been tried to be assessed in a fair and impartial setting. Some of the latest in this category include [2,5,6,,]. The profound philosophical questions involved in comparisons are addressed in .
In  the distribution-free bounds for the difference between the achieved and achievable error levels are calculated for a set of classification algorithms in the cases of both finite and infinite training sets. Figure 10 shows the distribution of the cases by application type. To solve some part of the whole task, pattern recognition was applied in a much larger number of the applications; many prediction and identification problems contain similar recognition and classification stages as used in pattern recognition applications.
Pattern Recognition 39 As neural networks provide rather general techniques for modeling and recognition, they have found applications in many diverse engineering fields. Table II presents some neural network application areas together with some typical applications compiled from case Hsting in . Note that pattern recognition is needed in three of the five categories in the table: recognition, classification, and visual processing.
In the early days of neural computing, the first applications were in pattern recognition, but since then neural computing has spread to many other fields of computing. Consequently, large engineering fields, modeling and control, together with prediction and forecasting, made up two-thirds of the cases in the study. Still, the relative impact of neural network techniques is perhaps largest in the area of pattern recognition.
In some application types, such as optical character recognition, neural networks have already become a standard choice in commercial products. Pattern Recognition 41 determining the nonlinear class boundaries, which is a very suitable problem for neural network classifiers. In Table III we have collected recent neural network applications in pattern recognition. Typical architecture of neural pattern recognition algorithms follows that shown in Fig. In most of the applications listed in Table III, conventional features, such as moment invariants or spectral features, are computed from the segmented objects and neural networks are used for the final classification.
Then the value of using neural networks in the application depends on the goodness of the classifier. Although any classifier cannot solve the actual recognition problem if the selected features do not separate the target classes adequately, the choice of the most efficient classifier can give the few extra percent in recognition rate to make the solution sufficient in practice. The advantages of neural classifiers compared to other statistical methods were reviewed in Section IV. In the next section we review some more integral neural network pattern recognition systems, in which the feature extraction is integrated to the learning procedure.
As the vast majority of neural network solutions in pattern recognition are based on carefully engineered preprocessing and feature extraction, and neural network classifier, the most difficult parts of the recognition problem, such as invariances, are thus solved by hand before they ever reach the network. Moreover, the handcrafted feature presentations cannot produce similar invariances and tolerance to varying conditions that are observed in biological visual systems. A possible direction to develop more capable pattern recognition systems might be to include the feature extraction stage as part of the adaptive trained system.
In the pattern recognition systems considered here also a considerable amount of the lower parts of the recognition problem are solved by neural networks. In Table III, examples of such systems are, e. System Solution with Constrained MLP Architecture—LeNet The basic elements of virtually all pattern recognition systems are preprocessing, feature extraction, and classification, as elaborated in previous sections.
The methods and practices to design the feature extraction stage to be efficient with 42 Jouko Lampinen et al. A, including methods such as manual selection, and data reduction by, e. In theory it is possible to integrate the feature extraction and classification in one processing block and to use supervised learning to train the whole system. However, the dimensionality of the input patterns causes a serious challenge in this approach. In a typical visual pattern recognition application the input to the feature extraction stage is an image comprising thousands or even hundreds of thousands of pixels, and in the feature extraction stage this very high-dimensional space is mapped to the feature space of much reduced dimensionality.
A system with the original sub image as the input would have far too many free parameters to generalize correctly, with any practical number of training samples. The network architecture, named LeNet, is rather similar to the Neocognitron architecture see Section V. Figure 11 shows the basic architecture of a LeNet with two layers of feature detectors. In the Neocognitron the feature extracting neurons are trained with unsupervised competitive learning, while in the LeNet network back-propagation is used to train the whole network in a supervised manner.
This has the considerable advantage that the features are matched to separate the target classes, while in unsupervised feature extraction the features are independent of the target classes. The trade-off is that a rather large number of training samples are needed and the training procedure may be computationally expensive.
The task was to recognize handwritten digits, that were segmented and transformed to fit in 16 x 16 pixel images in preprocessing. The network had four feature construction layers named HI, H2, H3, and H4 and an output layer with ten units. Layers HI and H3 corresponded to the feature map layers in Fig. The main differences in the networks are in the training algorithm, the number of feature map layers, and the connection pattern of the classifier.
Thus the output of the HI layer contained four maps produced by scanning the input image with each of the feature detector neurons. The layer had 12 different feature detecting neurons, each neuron connected to one or two of the H2 maps by 5 x 5 receptive fields. In an earlier version of the system the H3 neurons were connected to all H2 maps, resulting in a large number of free parameters in this stage . The reduced connection patterns were determined by pruning the network with the optimal brain damage technique . The output layer was fully connected to layer H4.
The network was trained on a large data base of manually labeled digits, and was able to produce state-of-the-art level recognition . The example shows that it is possible to use back-propagation-based supervised learning techniques to solve large parts of the pattern recognition problem, by carefully constraining the network structure and weights according to prior knowledge about the task A comparison of this architecture, including several variations in the number of feature maps, and other learning algorithms for handwritten digit recognition is 44 Jouko Lampinen et al.
The report concentrates on methods where there is no separate handcrafted feature extraction stage, but the feature extraction is combined with classification and trained together. Invariant Recognition with Neocognitron One of the first pattern recognition systems based solely on neural network techniques was the Neocognitron paradigm, developed by Fukushima et al .
The architecture of the network was originally inspired by Hubel and Wiesel's hierarchy model of the visual cortex . According to the model, cells at the higher layers in the visual cortex have a tendency to respond selectively to more complicated features of the stimulus patterns and, at the same time, have larger receptive fields. The basic structure of the Neocognitron is shown in Fig.
It consists of alternating feature detector and resolution reduction layers, called S and C layers, respectively. Each S layer contains several feature detector arrays called cell planes, shown as the small squares inside the layers in Fig. All neurons in a cell plane have similar synaptic connections, so that functionally a cell plane corresponds to a spatial convolution, since the neurons are linear in weights. The S layers are trained by competitive learning, so that each plane will learn to be sensitive to a different pattern. The C layers are essential to the distortion tolerance of the network.
Each cell plane in the S layer is connected by fixed weights to a similar but smaller cell plane in the successive C layer. The weights of the C cells are chosen so that one active S layer cell in its receptive field will turn the C cell on. The purpose of the C layers is to allow positional variation to the features detected by the preceding S layer.
The successive S layer is of the same size as the previous C layer, and the S cells are connected to all the C planes. Thus the next-level cell planes can detect any combinations of the previous level features. Finally the sizes of the cell planes decrease so that the last C plane contains only one cell, with receptive field covering the whole input plane.
In Fig. The features may appear in any place inside the circles. In the later versions of the Neocognitron  a selective attention mechanism is implemented to allow segmentation and recognition of overlapping patterns, as in cursive handwriting. Self-Organizing Feature Construction System In this section we review a neural pattern recognition system based on selforganizing feature construction. The system is described in more detail in [35, , ].
The basic principle in the system is to define a set of generic local primary features, which are assumed to contain pertinent information of the objects, and then to use unsupervised learning techniques for building higher-order features from the primary features and reducing the number of degrees of freedom in the data. Then the final supervised classifiers can have a comparably small number of free parameters and thus require a small amount of preclassified training samples. The feature extraction-classification system is composed of a pipelined block structure, where the number of neurons and connections decrease and the connections become more adaptive in higher layers.
The major elements of the system are the following. Primary features'. The primary features should detect local, generic shaperelated information from the image. A self-similar family of Gabor filters see, e. Self-organized features'. To form complex features the Gabor filter outputs are clustered to natural, possibly nonconvex clusters by a multilayer self-organizing map. Classifier: Only the classifier is trained in a supervised manner in the highly reduced feature space.
Figure 13 shows the principle of the self-organizing feature construction in face recognition . At the lowest levels, two banks of eight Gabor filters were used. The two filter banks had different spatial resolution and eight orientations, as shown in Fig. The primary feature was thus comprised of the two eightdimensional vectors of the filter outputs. The complex features were then produced by a two-layer self-organizing map. The first-level map contained 10 x 10 units, so that the eight-dimensional feature vectors of both resolutions were separately mapped through the 10 x 10 Jouko Lampinen et al.
Left part: eight-dimensional Gabor vectors at two resolutions are extracted from every pixel location in the x digital image. These were stacked to form a four-dimensional input vector for the second-layer map, that had units in a one-dimensional lattice. Thus the feature extraction stage maps a neighborhood of a pixel to a feature value, such that similar details are mapped to nearby features.
A special virtue of the multilayer SOM is that the cluster shapes can be also nonconvex . Figure 14 shows an example of feature mapping, where a face image is scanned with the feature detector and the resulting feature values are shown as gray scales. It was shown in  and  that such feature images can be classified with very simple classifiers.
Often it is sufficient to take feature histograms of the object regions, to form translation-invariant classification features. The role of the classifier is more important in this feature construction system than with manually selected features, since the features are not directly related to the object classes.
For any given class, many of the filters, and features, are irrelevant, and the classifier must be able to pick up the combination of the relevant features. Thus the pure Euclidean distance of the feature histograms cannot be used as the basis of the classification. The most suitable classifiers are then methods that are based on hyperplanes, such as subspace classifiers and multilayer perceptron, while the distance-based methods, such as nearest neighbor classifiers and radial basis function networks, might be less effective.
The image was a part of a x image. The feature values are represented by gray levels in the feature image. The circle gives the approximate face area to be used in computing the feature histogram. Lower part: the Gaussian weighted feature histogram. Practical Example: Recognition of Wood Surface Defects The proposed self-organizing feature construction method has been applied in some industrial pattern recognition problems, as described in  in detail.
Here we give a short review on the recognition of wood surface defects. As a natural material, wood has significant variation both within and between species, making it a difficult material for automatic grading. In principle, the inspection and quality classification of wood is straightforward: the quality class of each board depends on its defects and their distribution, as dictated by the quality standard. However, the definitions of the defects are based on their biological origin, appearance, or cause, so that the visual appearance of defects in the same class has substantial variation.
The Finnish standards alone define 30 different de- 48 Jouko Lampinen et al. Knots are the most common defect category and have a crucial role in sorting lumber. Figure 15 shows the most important knot classes on spruce boards. Figure 16 shows a schematic of a wood surface defect recognition system, where the shape-related information is encoded by a self-organizing feature construction system into a "shape histogram," and the color histogram is collected by another multilayer SOM as an additional classification feature.
A third type of information used as a classification feature, in addition to the shape and color feature histograms, was the energy of each Gabor filter over the whole image. The image set used in the knot identification tests consisted of spruce samples. The imaging was done at 0. Half of the samples were used for training the classifier and the other half for evaluating the results. Table IV shows the confusion matrix in the knot classification . Based on these results, an industrial machine-visionbased system for automatic wood surface inspection has been developed, and is reported in .
Classification of Handwritten Digits This section summarizes the results of a large comparison between various neural and classical statistical classification methods . The data used in the experiments consisted of handwritten digits. The form was designed to allow simple segmentation of digits: each digit was written in a separate box so that for most cases there was no connecting, touching, or overlapping of the numerals. The size of each digit was normalized retaining the original aspect ratio to fit to a 32 x 32pixel box. In the direction of the smaller size, the image was centered, and then the slant of writing was eliminated.
The whole handwritten digit corpus of vectors was divided equally to form separate training and testing sets. The former was used in computing the Karhunen-Loeve transform which was applied to both sets. The feature vectors so created were dimensional, but each classification algorithm was allowed to select a smaller input vector dimensionality using training set cross-validation.
Figure 17 displays a sample of the digit images in the leftmost column. In the remaining columns, images reconstructed from an increasing number of features are shown. For the clarity of the visualization, the mean of the training set has been first subtracted from the digit images and then added back after the reconstruction. The number of features used is shown below the images. Pattern Recognition 51 It can be noted how rapidly the reconstruction fidehty is increased due to the optimal information-preserving property of the Karhunen-Loeve transform.
In the experiments, the maximum feature vector dimension was thus Due to the effects of the curse of dimensionaUty, cross-vaHdation indicated smaller input dimensionality to be optimal for some classifiers. Each classifier algorithm had its own set of cross-validated parameters. The cross-validation procedure was ten-fold: vectors were used in training a classifier and the remaining vectors of the training set were used to evaluate the classification accuracy.
This procedure was then repeated nine times until all the vectors in the training set had been used exactly nine times in training and once in evaluation. The crossvalidated classification accuracy for the given set of parameter values was then calculated as the mean of the ten evaluations. By varying the parameter values, an optimal combination was found and it was used in creating the actual classifier using the whole training set. The final classification accuracy was calculated with that classifier and the original testing set.
The classification error percentages are collected in Table V. Figure 18 Error-reject curve for the LLR classifier. The rejection percentages are shown on the horizontal axis whereas the logarithmic vertical axis displays the remaining error percentages. The threshold parameter 6 is given at selected points.
The diamonds indicate the results obtained with the committee classifier using different voting strategies. The cross-validated parameters are given and parameters selected without cross-validation are shown in brackets. Some evident conclusions can be drawn from the classification accuracies of Table V.
Select a Web Site
First, the discriminant analysis methods, e. This can be interpreted as an indirect indication that the distribution of the data closely resembles Gaussian in the Bayesian class border areas. Second, MLP performs surprisingly badly without the weight decay regularization modification. The tree classifier and MARS also disappoint. It can be seen that the committee quite clearly outperforms all the individual classifiers.
Rejection option was also implemented. By using the LLR classifier and varying the rejection threshold 0 of Eq. The three diamonds in the figure display reject-error trade-off points obtained using the above described committee classifier with voting strategies allowing for rejection. Our point of view throughout the chapter is that, at the present state of the art, neural techniques are closely related with more con- Pattern Recognition 53 ventional feature extraction and classification algorithms, which emanate from general statistical principles such as data compression, Bayesian classification, and regression.
This helps in understanding the advantages and shortcomings of neural network models in pattern recognition tasks. Yet, we argue that neural networks have indeed brought new and valuable additions and insights to the PR theories, especially in their large flexible architectures and their emphasis on data-driven learning algorithms for massive training sets. It is no accident that the popularity of neural networks has coincided with the growing accessibility of computing power provided by the modem workstations.
We started the chapter by giving an overview of the problem and by introducing the general PR system, consisting of several consequent processing stages, neural or nonneural. We then concentrated on the two most important stages, feature extraction and classification. These are also the system components in which neural network techniques have been used most widely and to their best advantage.
The most popular neural network approaches to these problems were given and contrasted with other existing solution methods. Several concrete applications of neural networks on PR problems were then outlined partly as a literature survey, partly by summarizing the authors' own experiences in the field.
Our original applications deal with face recognition, wood surface defect recognition, and handwritten digit recognition, in all of which neural networks have provided flexible and powerful PR methods. We hope that these case studies indicate that neural networks really work, but also that their use is not simple. As with any other engineering methodology, neural networks have to be carefully integrated into the total PR system in order to get out maximal performance. Cheng and D. Idan, J. Auger, N. Darbel, M. Sales, R. Chevallier, B.
Dorizzi, and G. Aleksander and J. Taylor, Eds. North-Holland, Brighton, England, Bottou, C. Cortes, J. Denker, H.
Drucker, I. Guyon, L. Jackel, Y. LeCun, U. Miiller, E. Sackinger, R Y. Simard, and V. II, pp. Michie, D. Spiegelhalter, and C. Taylor Eds. Machine Learning, Neural and Statistical Classification. Ellis Horwood Limited, Blayo, Y Cheneval, A. Guerin-Dugue, R. Chentouf, C. Aviles-Cruz, J. Madrenas, M. Moreno, and J. Encyclopaedia Britannica on the Internet, Available at. Introduction to Mathematical Techniques in Pattern Recognition. Duda and P. Pattern Recognition and Scene Analysis. Ton and R. Pattern Recognition Principles.
Addison-Wesley, Reading, MA, Young and T. W Calvert. Classification, Estimation and Pattern Recognition. Elsevier Science Publishers, New York, Gonzalez and M. Syntactic Pattern Recognition. Sklansky and G. Pattern Classifiers and Trainable Machine. Devijver and J. Pattern Recognition: A Statistical Approach.
Related Pattern Recognition and Image Preprocessing (Signal Processing and Communications)
Copyright 2019 - All Right Reserved