## FAKE GAME

We are developing the open source software FAKE GAME. This software should be able to automatically preprocess various data, to generate regressive, predictive models and classifiers (by means of GAME engine), to automatically identify interesting relationships in data (even in high-dimensional ones) and to present discovered knowledge in a comprehensible form. The software should fill gaps which are not covered by existing open source data mining environment WEKA and possibly integrate with the YALE environment.

- Official site of project: http://fakegame.sourceforge.net/doku.php
- Coordinator: Pavel Kordík
- Partners: 2nd Medical School, Charles University in Prague, Knowledge Miner
- People: Miroslav Čepek, Jan Drchal, Pavel Kordík, Jan Koutník, Oleg Kovářík, Aleš Pilný, Tomáš Siegl

### Publications

- Kordík P.: GAME - Hybrid Self-Organizing Modeling System based on GMDH. Springer-Verlag, Berlin, Heidelberg, Czech Technical University in Prague, FEE, Dep. of Comp. Sci. and Computers, 2009 BibTex, PDF
In this chapter, an algorithm to construct hybrid self-organizing neural network is proposed. It combines niching evolutionary strategies, nature inspired and gradient based optimization algorithms (Quasi-Newton, Conjugate Gradient, GA, PSO, ACO, etc.) to evolve neural network with optimal topology adapted to a data set. The GAME algorithm is something in between the GMDH algorithm and the NEAT algorithm. It is capable to handle irrelevant inputs, short and noisy data samples, but also complex data such as "two intertwined spirals" problem. The self-organization of the topology allows it to produce accurate models for various tasks (classification, prediction, regression, etc.). Bencharking with machine learning algorithms implemented in the Weka software showed that the accuracy of GAME models was superior for both regression and classification problems. The most successful configuration of the GAME algorithm is not changing with problem character, natural evolution selects all important parameters of the algorithm. This is a significant step towards the automated data mining.

- Ales Pilny, Pavel Kordik, Miroslav Snorek: Feature Ranking Derived from Data Mining Process.
*In Artificial Neural Networks - ICANN 2008, 18th International Conference Proceedings vol. nr.*, p. , Heidelberg: Springer, http://portal.acm.org/citation.cfm?id=1429510, 2008. ISSN BibTex, PDFMost common feature ranking methods are based on the sta- tistical approach. This paper compare several statistical methods with new method for feature ranking derived from data mining process. This method ranks features depending on percentage of child units that sur- vived the selection process. A child unit is a processing element trans- forming the parent input features to the output. After training, units are interconnected in the feedforward hybrid neural network called GAME. The selection process is realized by means of niching genetic algorithm, where units connected to least significant features starve and fade from population. Parameters of new feature ranking algorithm are investigated and comparison among different methods is presented on well known real world and artificial data sets.

- Pavel Kordík, Oleg Kovářík, Miroslav Šnorek: Optimization of Models: Looking for the Best Strategy. In:
*Proceedings of 6th EUROSIM Congress on Modelling and Simulation*, , Ljubjana, 2007. ISBN 3-901608-32-X BibTex, PDFWhen parameters of model are being adjusted, model is learning to mimic the behaviour of a real world system. Optimization methods are responsible for parameters adjustment. The problem is that each real world system is different and its model should be of different complexity. It is almost impossible to decide which optimization method will perform the best (optimally adjust parameters of the model). In this paper we compare the performance of several methods for nonlinear parameters optimization. The gradient based methods such as Quasi-Newton or Conjugate Gradient are compared to several nature inspired methods. We designed an evolutionary algorithm selecting the best optimization methods for models of various complexity. Our experiments proved that the evolution of optimization methods for particular problems is very promising approach.

- Pavel Kordik: Regularization of Evolving Polynomial Models. In:
*Proceeding of Internation Workshop on Inductive Modelling (IWIM 2007)*, , 2007. ISBN ISBN 978-80-01-03881-9 BibTex, PDF - Kordik P., Saidl J., Snorek M.: Evolutionary Search for Interesting Behavior of Neural Network Ensembles. In:
*2006 IEEE Congress on Evolutionary Computation*, p. 235-238, Los Alamitos: IEEE Computer Society, 2006. ISBN 0-7803-9489-5 BibTex, PDF - P. Kord'{i}k: Fully Automated Knowledge Extraction using Group of Adaptive Models Evolution. At: , Czech Technical University in Prague, FEE, Dep. of Comp. Sci. and Computers, 2006 BibTex, PDF
Keywords like data mining (DM) and knowledge discovery (KD) appear in several thousands of articles in recent time. Such popularity is driven mainly by demand of private companies. They need to analyze their data effectively to get some new useful knowledge that can be capitalized. This process is called knowledge discovery and data mining is a crucial part of it. Although several methods and algorithms for data mining has been developed, there is still a lot of gaps to fill. The problem is that real world data are so diverse that no universal algorithm has been developed to mine all data effectively. Also stages of the knowledge discovery process need the full time assistance of an expert on data preprocessing, data mining and the knowledge extraction. These problems can be solved by a KD environment capable of automatical data preprocessing, generating regressive, predictive models and classifiers, automatical identification of interesting relationships in data (even in complex and high-dimensional ones) and presenting discovered knowledge in a comprehensible form. In order to develop such environment, this thesis focuses on the research of methods in the areas of data preprocessing, data mining and information visualization. The Group of Adaptive Models Evolution (GAME) is data mining engine able to adapt itself and perform optimally on big (but still limited) group of realworld data sets. The Fully Automated Knowledge Extraction using GAME (FAKE GAME) framework is proposed to automate the KD process and to eliminate the need for the assistance of data mining expert. The GAME engine is the only GMDH type algorithm capable of solving very complex problems (as demonstrated on the Spiral data benchmarking problem). It can handle irrelevant inputs, short and noisy data samples. It uses an evolutionary algorithm to find optimal topology of models. Ensemble techniques are employed to estimate quality and credibility of GAME models. Within the FAKE framework we designed and implemented several modules for data preprocessing, knowledge extraction and for visual knowledge discovery.