复制成功
  • 图案背景
  • 纯色背景
  •   |  注册
  • /
  • 批注本地保存成功,开通会员云端永久保存 去开通
  • 网上书库

    上传于:2012-12-26

    粉丝量:691

    上传资料均来自于互联网,若有侵权,立刻通知删除。

    

    Data Mining Foundations and Intelligent Paradigms Volume 1 Clustering, Associati.

    下载积分:800

    内容提示: Dawn E. Holmes and Lakhmi C. Jain (Eds.)Data Mining: Foundations and Intelligent Paradigms Intelligent Systems Reference Library,Volume 25Editors-in-ChiefProf. Janusz KacprzykSystems Research InstitutePolish AcademyofSciencesul. Newelska 601-447 WarsawPolandE-mail: kacprzyk@ibspan.waw.plProf. Lakhmi C. JainUniversity ofSouth AustraliaAdelaideMawson Lakes CampusSouth Australia 5095AustraliaE-mail: Lakhmi.jain@unisa.edu.auFurther volumes ofthis series can be found on our homepage:springer.comVol. 1. Christi...

    亚博足球app下载格式:PDF| 浏览次数:2| 上传日期:2012-12-26 22:29:01| 亚博足球app下载星级:
    Dawn E. Holmes and Lakhmi C. Jain (Eds.)Data Mining: Foundations and Intelligent Paradigms Intelligent Systems Reference Library,Volume 25Editors-in-ChiefProf. Janusz KacprzykSystems Research InstitutePolish AcademyofSciencesul. Newelska 601-447 WarsawPolandE-mail: kacprzyk@ibspan.waw.plProf. Lakhmi C. JainUniversity ofSouth AustraliaAdelaideMawson Lakes CampusSouth Australia 5095AustraliaE-mail: Lakhmi.jain@unisa.edu.auFurther volumes ofthis series can be found on our homepage:springer.comVol. 1. Christine L. Mumford and Lakhmi C. Jain (Eds.)Computational Intelligence:Collaboration, Fusionand Emergence, 2009ISBN 978-3-642-01798-8Vol. 2.Yuehui Chen and Ajith AbrahamTree-Structure Based HybridComputational Intelligence, 2009ISBN 978-3-642-04738-1Vol. 3.Anthony Finn and Steve SchedingDevelopments and Challenges forAutonomous UnmannedVehicles, 2010ISBN 978-3-642-10703-0Vol. 4. Lakhmi C. Jain and Chee Peng Lim (Eds.)Handbook on Decision Making: TechniquesandApplications, 2010ISBN 978-3-642-13638-2Vol. 5. George A.AnastassiouIntelligentMathematics: Computational Analysis, 2010ISBN 978-3-642-17097-3Vol. 6. Ludmila DymowaSoft Computingin Economics and Finance, 2011ISBN 978-3-642-17718-7Vol. 7. Gerasimos G. RigatosModellingand Control for IntelligentIndustrialSystems, 2011ISBN 978-3-642-17874-0Vol. 8. Edward H.Y. Lim, James N.K. Liu, andRaymond S.T. LeeKnowledge Seeker – Ontology Modellingfor InformationSearch and Management, 2011ISBN 978-3-642-17915-0Vol. 9. Menahem Friedman and Abraham KandelCalculusLight, 2011ISBN 978-3-642-17847-4Vol. 10.Andreas Tolk and Lakhmi C. JainIntelligence-BasedSystems Engineering, 2011ISBN 978-3-642-17930-3Vol. 11. Samuli Niiranen andAndre Ribeiro (Eds.)Information Processingand Biological Systems, 2011ISBN 978-3-642-19620-1Vol. 12. Florin GorunescuData Mining, 2011ISBN 978-3-642-19720-8Vol. 13.Witold Pedrycz and Shyi-Ming Chen (Eds.)GranularComputingand IntelligentSystems, 2011ISBN 978-3-642-19819-9Vol. 14. George A.Anastassiou and Oktay DumanTowards IntelligentModeling:StatisticalApproximationTheory, 2011ISBN 978-3-642-19825-0Vol. 15.Antonino Freno and Edmondo TrentinHybrid Random Fields, 2011ISBN 978-3-642-20307-7Vol. 16.Alexiei DingliKnowledge Annotation: Making ImplicitKnowledgeExplicit, 2011ISBN 978-3-642-20322-0Vol. 17. Crina Grosan andAjith AbrahamIntelligentSystems, 2011ISBN 978-3-642-21003-7Vol. 18.Achim ZielesnyFrom Curve Fittingto Machine Learning, 2011ISBN 978-3-642-21279-6Vol. 19. George A.AnastassiouIntelligentSystems: Approximation by Artificial NeuralNetworks, 2011ISBN 978-3-642-21430-1Vol. 20. Lech PolkowskiApproximate Reasoningby Parts, 2011ISBN 978-3-642-22278-8Vol. 21. Igor ChikalovAverage Time Complexity ofDecision Trees, 2011ISBN 978-3-642-22660-1Vol. 22. Przemys lawRó˙ zewski,Emma Kusztina, Ryszard Tadeusiewicz,and Oleg ZaikinIntelligentOpen LearningSystems, 2011ISBN 978-3-642-22666-3Vol. 23. Dawn E. Holmes and Lakhmi C. Jain (Eds.)Data Mining:Foundations and IntelligentParadigms, 2012ISBN 978-3-642-23165-0Vol. 24. Dawn E. Holmes and Lakhmi C. Jain (Eds.)Data Mining:Foundations and IntelligentParadigms, 2012ISBN 978-3-642-23240-4Vol. 25. Dawn E. Holmes and Lakhmi C. Jain (Eds.)Data Mining:Foundations and IntelligentParadigms, 2012ISBN 978-3-642-23150-6 Dawn E. Holmes and Lakhmi C. Jain (Eds.)Data Mining: Foundations andIntelligent ParadigmsVolume 3: Medical, Health, Social, Biological andotherApplications123 Prof. Dawn E. HolmesDepartment ofStatistics andApplied ProbabilityUniversityofCalifornia,Santa Barbara,CA 93106USAE-mail: holmes@pstat.ucsb.eduProf. Lakhmi C. JainProfessor ofKnowledge-Based EngineeringUniversityofSouth AustraliaAdelaideMawson Lakes, SA 5095AustraliaE-mail: Lakhmi.jain@unisa.edu.auISBN 978-3-642-23150-6e-ISBN 978-3-642-23151-3DOI 10.1007/978-3-642-23151-3Intelligent Systems Reference LibraryISSN 1868-4394Library of Congress Control Number: 2011936705c  2012 Springer-Verlag Berlin HeidelbergThis work is subject to copyright. All rights are reserved, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse ofillustrations, recitation, broadcasting, reproduction on microfilm or in any other way,and storage in data banks. Duplication of this publication or parts thereof is permittedonly under the provisions of the German Copyright Law of September 9, 1965, inits current version, and permission for use must always be obtained from Springer.Violations are liable to prosecution under the German Copyright Law.The use of general descriptive names, registered names, trademarks, etc. in this publi-cation does not imply, even in the absence of a specific statement, that such names areexempt from the relevant protective laws and regulations and therefore free for generaluse.Typeset & CoverDesign: Scientific Publishing Services Pvt. Ltd., Chennai, India.Printed on acid-free paper9 8 7 6 5 4 3 2 1springer.com Preface There are many invaluable books available on data mining theory and applications. However, in compiling a volume titled “DATA MINING: Foundations and Intelligent Paradigms: Volume 3: Medical, Health, Social, Biological and other Applications” we wish to introduce some of the latest developments to a broad audience of both specialists and non-specialists in this field. The term ‘data mining’ was introduced in the 1990’s to describe an emerging field based on classical statistics, artificial intelligence and machine learning. By combining techniques from these areas, and developing new ones researchers are able to innovatively analyze large datasets productively. Patterns found in these datasets are subsequently analyzed with a view to acquiring new knowledge. These techniques have been applied in a broad range of medical, health, social and biological areas. In compiling this volume we have sought to present innovative research from prestigious contributors in the field of data mining. Each chapter is self-contained and is described briefly in Chapter 1. This book will prove valuable to theoreticians as well as application scientists/engineers in the area of Data Mining. Postgraduate students will also find this a useful sourcebook since it shows the direction of current research. We have been fortunate in attracting top class researchers as contributors and wish to offer our thanks for their support in this project. We also acknowledge the expertise and time of the reviewers. Finally, we also wish to thank Springer for their support. Dr. Dawn E. Holmes Dr. Lakhmi C. Jain University of California University of South Australia Santa Barbara, USA Adelaide, Australia ContentsChapter 1Advances in Intelligent Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . .Dawn E. Holmes, Jeffrey W. Tweedale, Lakhmi C. Jain1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Medical Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3Health Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Social Influences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1Information Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2On-Line Communities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5Biological Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1Biological Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2Estimations in Gene Expression . . . . . . . . . . . . . . . . . . . . . .6Chapters Included in the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1122223334466Chapter 2Temporal Pattern Mining for Medical Applications . . . . . . . . . . . . .Giulia Bruno, Paolo Garza1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Types of Temporal Data in Medical Domain . . . . . . . . . . . . . . . . .3Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Temporal Pattern Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . .4.1Temporal Pattern Mining from a Set of Sequences . . . . . .4.2Temporal Pattern Mining from a Single Sequence . . . . . .5Medical Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .991011111214151718Chapter 3BioKeySpotter: An Unsupervised Keyphrase ExtractionTechnique in the Biomedical Full-Text Collection . . . . . . . . . . . . . . .Min Song, Prat Tanapaisankit1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1919 VIIIContents234Backgrounds and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . .The Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2Comparison Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.3Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20212324242526275Chapter 4Mining Health Claims Data for Assessing Patient Risk. . . . . . . . . .Ian Duncan1What Is Health Risk? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Traditional Models for Assessing Health Risk . . . . . . . . . . . . . . . .3Risk Factor-Based Risk Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1Enrollment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2Claims and Coding Systems . . . . . . . . . . . . . . . . . . . . . . . . .4.3Interpretation of Claims Codes . . . . . . . . . . . . . . . . . . . . . . .5Clinical Identification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .6Sensitivity-Specificity Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1Constructing an Identification Algorithm . . . . . . . . . . . . . .6.2Sources of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7Construction and Use of Grouper Models . . . . . . . . . . . . . . . . . . . .7.1Drug Grouper Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.2Drug-Based Risk Adjustment Models . . . . . . . . . . . . . . . . .8Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2929333739404049515656575861616262Chapter 5Mining Biological Networks for Similar Patterns . . . . . . . . . . . . . . . .Ferhat Ay, G¨ unhan G¨ ulsoy, Tamer Kahveci1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Metabolic Network Alignment with One-to-One Mappings . . . . .2.1Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3Pairwise Similarity of Entities . . . . . . . . . . . . . . . . . . . . . . .2.4Similarity of Topologies2.5Combining Homology and Topology . . . . . . . . . . . . . . . . . .2.6Extracting the Mapping of Entities2.7Similarity Score of Networks . . . . . . . . . . . . . . . . . . . . . . . .2.8Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3Metabolic Network Alignment with One-to-Many Mappings . . .3.1Homological Similarity of Subnetworks . . . . . . . . . . . . . . . .3.2Topological Similarity of Subnetworks . . . . . . . . . . . . . . . . .6363676869707476787980808283. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . ContentsIX3.33.4Significance of Network Alignment . . . . . . . . . . . . . . . . . . . . . . . . . .4.1Identification of Alternative Entities4.2Identification of Alternative Subnetworks . . . . . . . . . . . . . .4.3One-to-Many Mappings within and across MajorClades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Further Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Combining Homology and Topology . . . . . . . . . . . . . . . . . .Extracting Subnetwork Mappings . . . . . . . . . . . . . . . . . . . .84848888894. . . . . . . . . . . . . . . . .9192939656Chapter 6Estimation of Distribution Algorithms in Gene ExpressionData Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Elham Salehi, Robin Gras1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Estimation of Distribution of Algorithms . . . . . . . . . . . . . . . . . . . .2.1Model Building in EDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3Models with Independent Variables . . . . . . . . . . . . . . . . . . .2.4Models with Pair Wise Dependencies . . . . . . . . . . . . . . . . .2.5Models with Multiple Dependencies. . . . . . . . . . . . . . . . . . .3Application of EDA in Gene Expression Data Analysis . . . . . . . .3.1State-of-Art of the Application of EDAs in GeneExpression Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .4Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101101102103104104105106108110116116Chapter 7Gene Function Prediction and Functional Network: The Roleof Gene Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Erliang Zeng, Chris Ding, Kalai Mathee, Lisa Schneper, Giri Narasimhan1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.1Gene Function Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2Functional Gene Network Generation . . . . . . . . . . . . . . . . .1.3Related Work and Limitations . . . . . . . . . . . . . . . . . . . . . . .2GO-Based Gene Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . .3Estimating Support for PPI Data with Applications toFunction Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1Mixture Model of PPI Data . . . . . . . . . . . . . . . . . . . . . . . . .3.2Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3Function Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.4Evaluating the Function Prediction . . . . . . . . . . . . . . . . . . .3.5Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.6Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .123124125127128129132132133134135137147 XContents4A Functional Network of Yeast Genes Using Gene OntologyInformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2Constructing a Functional Gene Network . . . . . . . . . . . . . .4.3Using Semantic Similarity (SS) . . . . . . . . . . . . . . . . . . . . . . .4.4Evaluating the Functional Gene Network . . . . . . . . . . . . .4.5Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1491491491501511511581591605Chapter 8Mining Multiple Biological Data for Reconstructing SignalTransduction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Thanh-Phuong Nguyen, Tu-Bao Ho1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1Signal Transduction Network . . . . . . . . . . . . . . . . . . . . . . . .2.2Protein-Protein Interaction . . . . . . . . . . . . . . . . . . . . . . . . . .3Constructing Signal Transduction Networks Using MultipleData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2Materials and Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3Clustering and Protein-Protein Interaction Networks . . . .3.4Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Some Results of Yeast STN Reconstruction . . . . . . . . . . . . . . . . . .5Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .163163164164166167167168169174178180181181Chapter 9Mining Epistatic Interactions from High-DimensionalData Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Xia Jiang, Shyam Visweswaran, Richard E. Neapolitan1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2Detecting Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3High-Dimensional Data Sets . . . . . . . . . . . . . . . . . . . . . . . . .2.4Barriers to Learning Epistasis . . . . . . . . . . . . . . . . . . . . . . . .2.5MDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.6Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3Discovering Epistasis Using Bayesian Networks . . . . . . . . . . . . . . .3.1A Bayesian Network Model for Epistatic Interactions . . .3.2The BNMBL Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .187187188188189190191191193196196197 ContentsXI3.3Efficient Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Discussion, Limitations, and Future Research . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19720220320620745Chapter 10Knowledge Discovery in Adversarial Settings . . . . . . . . . . . . . . . . . . .D.B. Skillicorn1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Characteristics of Adversarial Modelling . . . . . . . . . . . . . . . . . . . . .3Technical Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .211211214216221222Chapter 11Analysis and Mining of Online Communities of InternetForum Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Miko laj Morzy1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.1What Is Web 2.0? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2New Forms of Participation — Push or Pull? . . . . . . . . . .1.3Internet Forums as New Forms of Conversation . . . . . . . .2Social-Driven Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1What Are Social-Driven Data? . . . . . . . . . . . . . . . . . . . . . . .2.2Data from Internet Forums . . . . . . . . . . . . . . . . . . . . . . . . . .3Internet Forums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1Crawling Internet Forums . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3Index Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.4Network Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .225225225228229231231234237237239246253260261262Chapter 12Data Mining for Information Literacy. . . . . . . . . . . . . . . . . . . . . . . . . . .Bettina Berendt1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1Information Literacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2Critical Literacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3Educational Data Mining. . . . . . . . . . . . . . . . . . . . . . . . . . . .3Towards Critical Data Literacy: A Frame for Analysis andDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .265265267267269270270 XIIContents3.13.2A Frame of Analysis: Technique and Object . . . . . . . . . . .On the Chances of Achieving Critical Data Literacy:Principles of Successful Learning as DescriptionCriteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Examples: Tools and Other Approaches Supporting DataMining for Information Literacy . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1Analysing Data: Do-It-Yourself StatisticsVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2Analysing Language: Viewpoints and Bias in MediaReporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.3Analysing Data Mining: Building, Comparing andRe-using Own and Others’ Conceptualizations of aDomain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4Analysing Actions: Feedback and Awareness Tools . . . . . .4.5Analysing Actions: Role Reversals in Data Collectionand Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27027242732732772822842882922935Chapter 13Rule Extraction from Neural Networks and Support VectorMachines for Credit Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Rudy Setiono, Bart Baesens, David Martens1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Re-RX: Recursive Rule Extraction from Neural Networks . . . . . .2.1Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2Finding Optimal Network Structure by Pruning . . . . . . . .2.3Recursive Rule Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4Applying Re-RX for Credit Scoring . . . . . . . . . . . . . . . . . . .3ALBA: Rule Extraction from Support Vector Machines . . . . . . .3.1Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2ALBA: Active Learning Based Approach to SVM RuleExtraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3Applying ALBA for Credit Scoring . . . . . . . . . . . . . . . . . . .4Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .299299300300303304306311311313316318318Chapter 14Using Self-Organizing Map for Data Mining: A Synthesis withAccounting Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Andriy Andreev, Argyris Argyrou1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1Types of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .321321322322323 ContentsXIII2.3Self-Organizing Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1Introduction to SOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2Formation of SOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Performance Metrics and Cluster Validity . . . . . . . . . . . . . . . . . . .Extensions of SOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1Non-metric Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2SOM for Temporal Sequence Processing . . . . . . . . . . . . . . .5.3SOM for Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4SOM for Visualizing High-Dimensional Data . . . . . . . . . . .Financial Applications of SOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Case Study: Clustering Accounting Databases . . . . . . . . . . . . . . . .7.1Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.2Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.4Results Presentation and Discussion . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Rescaling Input Variables . . . . . . . . . . . . . . . . . . . . . . . . . . .32332432432432632832832933133333433533533633733833834567Chapter 15Applying Data Mining Techniques to Assess Steel PlantOperation Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Khan Muhammad Badruddin, Isao Yagi, Takao Terano1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Brief Description of EAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1Performance Evaluation Criteria . . . . . . . . . . . . . . . . . . . . .2.2Innovations in Electric Arc Furnaces . . . . . . . . . . . . . . . . . .2.3Details of the Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4Understanding SCIPs and Stages of a Heat . . . . . . . . . . . .3Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Data Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.3Attribute Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4The Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.5Data Mining Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . .5Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .343343345346346347349350351351351353354354355358359360Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .363 Editors Dr. Dawn E. Holmes serves as Senior Lec-turer in the Department of Statistics and Ap-plied Probability and Senior Associate Dean in the Division of Undergraduate Education at UCSB. Her main research area, Bayesian Networks with Maximum Entropy, has re-sulted in numerous journal articles and con-ference presentations. Her other research interests include Machine Learning, Data Mining, Foundations of Bayesianism and Intuitionistic Mathematics. Dr. Holmes has co-edited, with Professor Lakhmi C. Jain, volumes ‘Innovations in Bayesian Net-works’ and ‘Innovations in Machine Learn-ing’. Dr. Holmes teaches a broad range of courses, including SAS programming, Bayesian Networks and Data Mining. She was awarded the Distinguished Teaching Award by Academic Senate, UCSB in 2008. As well as being Associate Editor of the International Journal of Knowledge-Based and Intelligent Information Systems, Dr. Holmes reviews extensively and is on the editorial board of several journals, including the Journal of Neurocomputing. She serves as Program Scientific Committee Member for numerous conferences; includ-ing the International Conference on Artificial Intelligence and the International Con-ference on Machine Learning. In 2009 Dr. Holmes accepted an invitation to join Center for Research in Financial Mathematics and Statistics (CRFMS), UCSB. She was made a Senior Member of the IEEE in 2011. Professor Lakhmi C. Jain is a Director/Founder of the Knowledge-Based Intelligent Engineering Systems (KES) Centre, located in the University of South Aus-tralia. He is a fellow of the Institution of Engineers Australia. His interests focus on the artificial intelligence para-digms and their applications in complex systems, art-science fusion, e-education, e-healthcare, unmanned air vehicles and intelligent agents. Chapter 1Advances in Intelligent Data MiningDawn E. Holmes, Jeffrey W. Tweedale, and Lakhmi C. Jain1Department of Statistics and Applied Probability,University of California Santa Barbara,Santa Barbara, CA 93106-3110, USAholmes@pstat. ucsb. edu2Defence Science and Technology Organisation,PO. Box. 1500, EdinburghSouth Australia SA 5111, Australiaj effrey. tweedale@dsto. defence. gov. au3School of Electrical and Information Engineering,University ofSouth Australia, Adelaide,Mawson Lakes Campus, South Australia SA 5095,Australialakhmi. jain@unisa. edu. au1IntroductionThe human body is composed of eleven sub-systems. These include the: respiratory,digestive, muscular, immune, circulatory, digestive, skeletal, endocrine, urinary, integu-mentary and reproductive systems [1]. Science shows how complex systems interop-erate and have even mapped the human genome. This knowledge resulted through theexploitation of significant volumes of empirical data. The size of medical databasesare many orders of magnitude those of text and transactional repositories. Acquisition,storage and exploitation ofthis data requires a disparate approach due to the m...

    关注我们

  • 新浪微博
  • 关注微信公众号

  • 打印亚博足球app下载
  • 复制文本
  • 下载Data Mining Foundations and Intelligent Paradigms Volume 1 Clustering, Association and Classification.XDF
  • 您选择了以下内容