All tutorials will occur in room 3405 of the Siebel Center
As part of the MIAS Data Sciences Summer Institute, renowned faculty from the University of Illinois Department of Computer Science will offer a collection of short courses designed to introduce students, scientists, and researchers to a variety of practices and software tools designed at UIUC in support of knowledge discovery. Because the tutorials are taught by some of the most innovative scientists in the world, the content will be up-to-the-moment and keenly relevant.
Enrollment in the tutorial includes the 15 hour course (over 5 days) and the opportunity to explore the possibility of collaboration with university faculty on projects of joint interest.
Tue: 10am-12, 1-4pm
Wed, Thur, Fri: 1-4pm
Kevin Chang
Information abounds on the Internet, in multiple modalities and in
structured and unstructured forms-- How can we access such information
effectively? This tutorial will give a system perspective for building
"data-aware" search services. First, to concretely study a novel
search system, we will present the anatomy of "entity search"-- a new
type of search for reaching directly to data entities (emails, phones,
locations, etc.). Second, to develop new search services, we will
introduce pertinent techniques, such as data crawling, information
extraction, image recognition, inverted indexing and searching, and
database querying. Each topic will highlight existing tools and
resources, and emphasize hands-on exercises. Overall, this tutorial
will lay the foundation for students to get started with their research
projects.
Tue, Wed, Thur, Fri: 10:30am-12, 1-3pm
Jiawei Han
We will offer a tutorial course on data mining, which introduces the concepts, algorithms, techniques, and systems of data mining, including (1) data preprocessing, (2) frequent pattern and correlation analysis, (3) cluster and outlier analysis, (4) mining sequential and complex structured data, (5) information network analysis, (6) mining data streams, (7) mining RFID, moving object and spatiotemporal data, and (8) data mining applications. The course may attract students who need to implement and/or use data mining methods and systems to analyze large amounts of data.
Mon-Fri: 1-4pm
Dan Roth
Statistical and machine learning techniques have brought significant advances in language processing and information extraction, and have allowed researchers to start dealing robustly and broadly with realistic size problems. This short course will introduce some of the central learning frameworks and techniques that have emerged in this field and found applications in several areas in text processing. We will present the main theoretical paradigms used in natural language processing - learning theoretic, probabilistic, and information theoretic - the relations between them, and the main algorithmic techniques developed within these paradigms. Building on a brief theoretical introduction we will introduce key algorithmic techniques for classification (e.g naive Bayes, and variations of Perceptron and SVM) and structured prediction in the context of NLP and information extraction tasks. We will also discuss issues such as feature extraction and training paradigms (supervised; semi-supervised; EM), and address some of the issues involved in using these techniques in real world NLP applications.
Mon-Fri: 1-4pm
ChengXiang Zhai
This tutorial will introduce the foundation and technologies of Web search. We will first systematically review the basic concepts, models, and techniques in information retrieval, which is the foundation of all search engine technologies. We will then discuss special challenges in Web search and review new technologies developed recently for Web search.
Mon-Fri: 1-4pm
David Forsyth
This tutorial will be an intensive course in methods to interpret human activities in pictures and in video. This is a topic of wide current importance in security applications. Relevant material is currently scattered across the animation, computer vision, and tracking literature. The syllabus will include methods to find people in static images and in video; methods to recover the 3D configuration of the body from 2D image information; and methods to infer what a person is doing from this information.
This tutorial instructs students in the fundamentals of DBMS and then takes them through a state-of-the-art tour of issues and techniques in data integration. Throughout, we emphasize modeling, query processing, semantic integration, and managing uncertainty and inconsistency. Modern techniques from the database, information retrieval, and artificial intelligence communities are applied in problems in data integration arising from the web, Deep-Web, and other inconsistent sources.
We introduce the foundation of information retrieval as well as its application and new development in Web information access. We will start with basic IR retrieval models for text retrieval. Then, we will study the new challenges of the Web and new techniques in Web search, integration, and mining--for finding information, integrating dynamic "deep" sources, and discovering knowledge.
This tutorial will introduce methodologies and tools both for preparing data for use with machine learning tools (including feature extraction) and for applying machine learning techniques and tools to practical problems.
Examples will be given in the textual domain, starting with free-form text and building machine-learning-based natural-language processing tools. Participants will have the opportunity to develop tools to solve three text-processing problems during the three sessions.
The tutorial will be based around our textbook, "Computer Vision: A Modern Approach," which is now used in all major departments teaching the topic. The syllabus will emphasize aspects of computer vision most relevant to information discovery and retrieval. In particular, we will examine different technologies for image feature extraction, object recognition, camera calibration, and linking information in images with text information, metadata, and information in other formats.
We will offer a tutorial course on data mining and machine learning, which introduces the concepts, algorithms, techniques, and systems of data mining and machine learning, including (1) data preprocessing, (2) frequent pattern and correlation analysis, (3) supervised learning (classification), (4) unsupervised learning (cluster analysis), (5) mining sequential and complex structured data, (6) mining data streams, text data, Web data, spatiotemporal data, biomedical data, and other forms of complex data, and (7) data mining and machine learning applications. The course may attract studednts from computer science and other disciplines who need to implement and/or use data mining and machine learning methods and systems to analyze large amounts of data.