Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
- Full paper submission due:
Friday, June 17th, 2011
- Notification of acceptance:
Friday, July 15th, 2011
- Final papers due:
Friday, August 12th, 2011
- Workshop: October 23rd, 2011
8:50 – 9:00 intro
9:00 – 10:00
- Practical Experiences with Chronics Discovery in Large Telecommunications Systems. Soila P. Kavulya (CMU), Kaustubh Joshi, Matti Hiltunen , Scott Daniels (AT&T Labs, Research), Rajeev Gandhi and Priya Narasimhan (CMU).
- BLR-D: Applying Bilinear Logistic Regression to Factored Diagnosis Problems. Sumit Basu (Microsoft Research), John Dunagan (Microsoft), Kevin Duh (NTT Labs) and Kiran-Kumar Munuswamy-Reddy (Harvard University).
10:00 – 10:30 coffee break
10:30 – 12:00
- Mining Temporal Invariants from Partially Ordered Logs. Ivan Beschastnikh, Yuriy Brun, Michael D. Ernst, Arvind Krishnamurthy and Thomas E. Anderson (University of Washington).
- Adaptive Event Prediction Strategy with Dynamic Time Window for Large-Scale HPC Systems. Ana Gainaru (UIUC/UPB), Franck Cappello (INRIA/UIUC), Stefan Trausan-Matu (UPB), Joshi Fullop (UIUC) and William Kramer (UIUC).
- Mining large distributed log-data in near real-time. Stefan Weigert (TU Dresden), Matti Hiltunen (AT&T Labs Research) and Christof Fetzer (TU Dresden).
12:00 – 1:30 lunch
1:30 – 2:00
- Web Analytics and the Art of Data Summarization. Archana Ganapathi and Steve Zhang (Splunk)
2:00 – 3:00 Panel: Assessing and improving the quality of program logs.
- Ari Rabkin (UC Berkeley)
- Ding Yuan (UIUC / UC San Diego)
- Wei Xu (Google)
- Steve Zhang (Splunk)
3:00 – 3:30 coffee break
3:30 – 4:00
- PAL: Propagation-aware Anomaly Localization for Cloud Hosted Distributed Applications. Hiep Nguyen, Yongmin Tan and Xiaohui Gu (North Carolina State University).
4:00 – 5:00 discussion
Paper Submission Linkhttps://www.easychair.org/conferences/?conf=slaml2011
Modern large-scale systems are challenging to manage. Fortunately, as these systems generate massive amounts of performance and diagnostic data, there is an opportunity to make system administration and development simpler via automated techniques to extract actionable information from the data. This workshop addresses this problem in two thrusts: (i) the analysis of raw system data logs, and (ii) the application of machine learning to systems problems. We expect the large overlap in these topics to promote a rich interchange of ideas between the areas.
Log Analysis: It is well known that raw system logs are an abundant source of information for the analysis and diagnosis of system problems and prediction of future system events. However, a lack of organization and semantic consistency between system data from various software and hardware vendors means that most of this information content is wasted. Current approaches of extracting information from the raw system data capture only a fraction of the information available and do not scale to the large systems common in business and supercomputing environments. It is thus a significant research challenge to determine how to better process and combine information from these data sources.
Machine Learning: The large scale of available data requires automated and machine-assisted analysis. Statistical machine learning techniques have recently shown great promise in meeting the challenges of scale and complexity in datacenter-scale and Internet-scale computing systems. However, applying these techniques to real systems scenarios requires careful analysis and engineering of the techniques to fit them to specific scenarios; there is also sometimes the opportunity to develop new algorithms specific to systems scenarios. This workshop thrust thus also presents a substantial research area: the exploration of new approaches to using machine learning to help us understand, measure, and diagnose complex systems.
Topics include but are not limited to:
- Reports on publicly available sources of sample system logs
- Prediction of malfunction or misuse based on system data
- Statistical analysis of system logs
- Applications of Natural-Language Processing (NLP) to system data
- Techniques for system log analysis, comparison, standardization, compression, anonymization, and visualization
- Applications of log analysis to system administration problems
- Use of machine learning techniques to address reliability, performance, power management, security, fault diagnosis, scheduling, or manageability issues
- Challenges of scale in applying machine learning to large systems
- Integration of machine learning into real-world systems and processes
- Evaluating the quality of learned models, including assessing the confidence/reliability of models and comparisons between different methods
- Peter Bodik, Microsoft Research
- Marc Casas, Lawrence Livermore National Laboratory
- Greg Bronevetsky, Lawrence Livermore National Laboratory
- Pryia Narasimhan, Carnegie Mellon
- Daniel V. Kliein, LoneWolf systems
- Adam Oliner, Stanford
- Jon Stearley, Sandia Labs
- Anton Chuvakin, Security Warrior Consulting
- Srikanth Kandula, Microsoft Research
- Eno Thereska, Microsoft Research
- Kristal Curtis, University of California, Berkeley
- Archana Ganapathi, Splunk
- Wei Xu, Google
- Shivnath Babu, Duke University
Ira Cohen, HP Labs
Shobha Venkataraman, AT&T Research
Ethan Miller, University of California, Santa Cruz
John Mark Agosta, Intel Research
Submitted papers must be no longer than 8 (8) 8.5"x11" or A4 pages, using a 10 point font on 12 point (single spaced) leading, with a maximum text block of 6.5 inches wide by 9 inches deep. The page limit includes everything except for references, for which there is no limit. The use of color is acceptable, but the paper should be easily readable if viewed or printed in gray scale. Authors must make a good faith effort to anonymize their submissions, and they should not identify themselves either explicitly or by implication (e.g., through the references or acknowledgments). Submissions violating the detailed formatting and anonymization rules on the Web site will not be considered for publication. There will be no extensions for reformatting.
Blind reviewing of full papers will be done by the program committee (TBD), with limited use of outside referees. Papers will be provisionally accepted subject to revision and approval by a program committee member acting as a shepherd. On acceptance, authors will be required to sign an ACM copyright release form. Your submission indicates that you agree to this. Papers will be held in full confidence during the reviewing process, but papers accompanied by nondisclosure agreement forms are not acceptable and will be rejected without review. Authors of accepted papers will be expected to supply electronic versions of their papers and encouraged to supply source code and raw data to help others replicate and better understand their results.