## Pavel Kordík

Pavel Kordík works as an assistant professor and researcher at the Department of Theoretical Computer Science, FIT, Czech Technical University in Prague. He obtained his master's and Ph.D. degree in 2003 and 2007 at CTU, respectively. He is the co-author of more than 40 publications. His research interests are data mining, knowledge extraction, inductive models, neural networks, evolutionary computing, optimization methods, nature inspired continuous optimization, visualization of black-box behaviour and ensemble techniques.

### Projects

### Publications

- Aleš Pilný, Wolfgang Oertel, Pavel Kordík, Miroslav Šnorek: Correlation-based Feature Ranking in Combination with Embedded Feature Selection.
*vol. nr.*, p. , , 2009. ISSN BibTex, PDFMost of Feature Ranking and Feature Selection approaches can be used for categorial data only. Some of them rely on statistical measures of the data, some are tailored to a specific data mining algorithm (wrapper approach). In this paper we present new methods for feature ranking and selection obtained as a combination of the above mentioned approaches. The data mining algorithm (GAME) is designed for numerical data, but it can be applied to categorial data as well. It incorporates feature selection mechanisms and new methods, proposed in this paper, derive feature ranking from final data mining model. The rank of each feature selected by model is computed by processing correlations of outputs between neighboring model’s neurons in different ways. We used four different methods based on fuzzy logic, certainty factors and simple calculus. The performance of these four feature ranking methods was tested on artificial data sets, on well known Ionosphere data set and on well known Housing data set with continuous variables. The results indicated that the method based on simple calculus approach was significantly worse than other three methods. These methods produce ranking consistent with recently published studies.

- Kordík P.: GAME - Hybrid Self-Organizing Modeling System based on GMDH. Springer-Verlag, Berlin, Heidelberg, Czech Technical University in Prague, FEE, Dep. of Comp. Sci. and Computers, 2009 BibTex, PDF
In this chapter, an algorithm to construct hybrid self-organizing neural network is proposed. It combines niching evolutionary strategies, nature inspired and gradient based optimization algorithms (Quasi-Newton, Conjugate Gradient, GA, PSO, ACO, etc.) to evolve neural network with optimal topology adapted to a data set. The GAME algorithm is something in between the GMDH algorithm and the NEAT algorithm. It is capable to handle irrelevant inputs, short and noisy data samples, but also complex data such as "two intertwined spirals" problem. The self-organization of the topology allows it to produce accurate models for various tasks (classification, prediction, regression, etc.). Bencharking with machine learning algorithms implemented in the Weka software showed that the accuracy of GAME models was superior for both regression and classification problems. The most successful configuration of the GAME algorithm is not changing with problem character, natural evolution selects all important parameters of the algorithm. This is a significant step towards the automated data mining.

- NOVÁK D., PILNÝ A., KORDÍK P., HOLIGA Š., POŠÍK P., ČERNÝ R., BRZEZNÝ R.: Analysis of Vestibular-Ocular Reflex by Evolutionary Framework.
*vol. nr.*, p. 452-461, Springer, 2008. ISSN BibTex, PDFIn this paper the problem of analysis of eye movements using sinu- soidal head rotation test is presented. The goal of the method is to discard au- tomatically the effect of the fast phase-saccades and consequently calculate the response of vestibular system in the form of phase shift and amplitude. The com- parison of threshold detection and inductive models trained on saccades is car- ried out. After saccades detection we are left with discontinuous signal segments. This paper presents an approach to align them to form a smooth signal with the same frequencies that were originally present in the source signal. The approach is based on a direct estimation of the signal component parameters using the evolutionary strategy with covariance matrix adaptation. The performance of evolutionary approach is compared to least-square multimodal sinus fit. The experimental evaluation on real-world signals revealed that threshold saccades detection with combination of the evolutionary strategy is robust, scalable and reliable method

- : Behaviour of FeRaNGA Method for Feature Ranking During Learning Process Using Inductive Modelling.
*Proceedings of the 2nd International Conference on Inductive Modelling. Kiev: Ukr. INTEI vol. nr.*, p. , , 2008. ISSN BibTex, PDFNowadays a Feature Ranking (FR) is commonly used method for obtaining information about a large data sets with various dimensionality. This knowledge can be used in a next step of data processing. Accuracy and a speed of experiments can be improved by this. Our approach is based on Artificial Neural Networks (ANN) instead of classical statistical methods. We obtain the knowledge as a by-product of Niching Genetic Algorithm (NGA) used for creation of a feedforward hybrid neural network called GAME. In this paper we present a behaviour of FeRaNGA (Feature Ranking method using Niching Genetic Algorithm(NGA)) during a learning process, especially in every layer of generated GAME network. We want to answer how important is NGA configuration and processing procedure for FR results because behaviour of GA is nondeterministic and thereby were results of FeRaNGA also indefinitive. This method ranks features depending on a percentage of processing elements that survived a selection process. Processing elements transforms parent input features to an output. The selection process is realized by means of NGA where units connected to the least significant features starve and fade from population. To obtain the best results and to find optimal configuration is behaviour of the FeRaNGA algortithm tested using various parameters of NGA and number of ensemble GAME models on well known artificial data sets.

- Ales Pilny, Pavel Kordik, Miroslav Snorek: Feature Ranking Derived from Data Mining Process.
*In Artificial Neural Networks - ICANN 2008, 18th International Conference Proceedings vol. nr.*, p. , Heidelberg: Springer, http://portal.acm.org/citation.cfm?id=1429510, 2008. ISSN BibTex, PDFMost common feature ranking methods are based on the sta- tistical approach. This paper compare several statistical methods with new method for feature ranking derived from data mining process. This method ranks features depending on percentage of child units that sur- vived the selection process. A child unit is a processing element trans- forming the parent input features to the output. After training, units are interconnected in the feedforward hybrid neural network called GAME. The selection process is realized by means of niching genetic algorithm, where units connected to least significant features starve and fade from population. Parameters of new feature ranking algorithm are investigated and comparison among different methods is presented on well known real world and artificial data sets.

- Pavel Kordík, Oleg Kovářík, Miroslav Šnorek: Optimization of Models: Looking for the Best Strategy. In:
*Proceedings of 6th EUROSIM Congress on Modelling and Simulation*, , Ljubjana, 2007. ISBN 3-901608-32-X BibTex, PDFWhen parameters of model are being adjusted, model is learning to mimic the behaviour of a real world system. Optimization methods are responsible for parameters adjustment. The problem is that each real world system is different and its model should be of different complexity. It is almost impossible to decide which optimization method will perform the best (optimally adjust parameters of the model). In this paper we compare the performance of several methods for nonlinear parameters optimization. The gradient based methods such as Quasi-Newton or Conjugate Gradient are compared to several nature inspired methods. We designed an evolutionary algorithm selecting the best optimization methods for models of various complexity. Our experiments proved that the evolution of optimization methods for particular problems is very promising approach.

- Aleš Pilný, Pavel Kordík: Reconstruction of Eye Movements Signal using Inductive Model Detecting Saccades.
*vol. 1 nr.*, p. , Czech Technical University, 2007. ISSN BibTex, PDFThis article describes a method for reconstruction of eye movement signals interfered with saccades and post-determination of inherent frequencies in the signal. For healthy patients, a signal of their eye movements should contain the same frequencies as movements generated by special rotating chair. To determine frequencies in eye movements, saccades have to be removed first. This is not an easy task, because saccades can have various shapes. To detect saccades, we use inductive models trained on various saccadic eye movement signals. To remove saccades and to reconstruct the eye movement signal we wrote special script replacing saccades with estimated trend of signal based on the output of the inductive model. When the reconstructed signal is transformed to the frequency domain, it is easy to decide, whether the eye movements signal contains the same frequencies as the original signal of the rotating chair.

- Pavel Kordik: Regularization of Evolving Polynomial Models. In:
*Proceeding of Internation Workshop on Inductive Modelling (IWIM 2007)*, , 2007. ISBN ISBN 978-80-01-03881-9 BibTex, PDF - Drchal J., Kordík P., Koutník J.: Visualization of Diversity in Computational Intelligence Methods. In:
*Proceedings of 2nd ISGI, International CODATA Symposium on Generalization of Information*, p. 20-34, CODATA Germany, 2007. ISBN 978-3-00-022382-2 BibTex - Kordik P., Saidl J., Snorek M.: Evolutionary Search for Interesting Behavior of Neural Network Ensembles. In:
*2006 IEEE Congress on Evolutionary Computation*, p. 235-238, Los Alamitos: IEEE Computer Society, 2006. ISBN 0-7803-9489-5 BibTex, PDF - P. Kord'{i}k: Fully Automated Knowledge Extraction using Group of Adaptive Models Evolution. At: , Czech Technical University in Prague, FEE, Dep. of Comp. Sci. and Computers, 2006 BibTex, PDF
Keywords like data mining (DM) and knowledge discovery (KD) appear in several thousands of articles in recent time. Such popularity is driven mainly by demand of private companies. They need to analyze their data effectively to get some new useful knowledge that can be capitalized. This process is called knowledge discovery and data mining is a crucial part of it. Although several methods and algorithms for data mining has been developed, there is still a lot of gaps to fill. The problem is that real world data are so diverse that no universal algorithm has been developed to mine all data effectively. Also stages of the knowledge discovery process need the full time assistance of an expert on data preprocessing, data mining and the knowledge extraction. These problems can be solved by a KD environment capable of automatical data preprocessing, generating regressive, predictive models and classifiers, automatical identification of interesting relationships in data (even in complex and high-dimensional ones) and presenting discovered knowledge in a comprehensible form. In order to develop such environment, this thesis focuses on the research of methods in the areas of data preprocessing, data mining and information visualization. The Group of Adaptive Models Evolution (GAME) is data mining engine able to adapt itself and perform optimally on big (but still limited) group of realworld data sets. The Fully Automated Knowledge Extraction using GAME (FAKE GAME) framework is proposed to automate the KD process and to eliminate the need for the assistance of data mining expert. The GAME engine is the only GMDH type algorithm capable of solving very complex problems (as demonstrated on the Spiral data benchmarking problem). It can handle irrelevant inputs, short and noisy data samples. It uses an evolutionary algorithm to find optimal topology of models. Ensemble techniques are employed to estimate quality and credibility of GAME models. Within the FAKE framework we designed and implemented several modules for data preprocessing, knowledge extraction and for visual knowledge discovery.

- Drchal J., Šnorek M., Kordík P.: Maintaining Diversity in Population of Evolved Models. In:
*Proceedings of 40th Spring International Conference MOSIS 06, Modelling and Simulation of Systems*, Ostrava: MARQ, 2006. ISBN 80-86840-21-2 BibTex, PDFThis paper deals with creation of models by means of evolutionary algorithms, particularly with maintaining diversity of population using niching methods. Niching algorithms are known for their ability to search for more optima simultaneously. This is done by splitting the population of models into separate species. Species protect promising but yet not fully developed models. Search for more optima at the same time helps to avoid a premature convergence and therefore deals effectively with local optima. Efficiency of two different niching methods is compared on NEAT applied to the neuro-evolution of models.