Access Keys:
Skip to content (Access Key - 0)

The Yahoo!-DAIS Seminar (CS591DAI)


The Yahoo!-DAIS Seminar will be held on Wednesday at 4:00am - 5:00pm in 3405 SC. As in other semesters, we will have a few visiting speakers who must be scheduled at a different day or time, due to their travel schedules. Students who take the DAIS Seminar for credit can miss up to two regular seminars. Speakers are announced on the DAIS mailing list (as are other items of interest to the DAIS community). It is quick and easy to subscribe to the DAIS mailing list.

Seminar schedules for past semesters: Fall 2013 | Spring 2013 | Fall 2012 | Spring 2012 | Fall 2011| Spring 2011 | Fall 2010 | Spring 2010 | Fall 2009| Summer 2009 | Spring 2009 | Fall 2008 | Spring 2008 | Fall 2007 | Spring 2007 | Fall 2006 | Spring 2006 | Fall 2005 | Spring 2005 | Fall 2004

Spring 2014 Schedule

Coordinator: Xiaolong Wang (xwang95 AT illinois DOT edu)

Date Content
Wednesday, Jan. 22, 2014
Place: SC3405
Title Big graph search and analytics: a journey of usability and scalability
Speaker: Yinghui Wu
Abstract: Real-life graphs are messy and huge. These bring two challenges to the applications of graph data analytics: how to make real-life graphs usable and useful? and how to scale graph data analytics to the growth of data? In this talk, I will share our experience on the journey of improving uscability and scalability for big graph analytics, and in particular, for the general graph search problem. (1) Query writing and result understanding are among the first daunting tasks for end users. We proposed summarization techniques to help users understand complex results and refine their search, without inspecting answers one by one. (2) This said, potential matches are hard to capture using conventional similarity metrics even for refined search. We developed transformation-based graph search to identify these matches. Specifically, we propose efficient ontology-based graph search to harvest external ontologies for interpreting query semantics. (3) We further examine the challenge for automatically learning a proper ranking model, integrating a set of transformations that leads to top ranked matches. Putting these together, we propose a user-friendly graph search system that enable easy graph data access, search and exploration. Finally, I will briefly introduce our ongoing work on network causality analysis, a real-life application of graph analytics.
Short Bio: Yinghui Wu is a research scientist at Department of Computer Science, University of California Santa Barbara, and a member of Network Science Collaborativie Technique Alliance (NS-CTA). His research iterests mainly focus on graph databases, graph analytics and network science, with applications in social/information network analytics and network security. He receives his Ph.D. from the University of Edinburgh in 2010.
Link video
Wednesday, Jan. 29, 2014
Place: SC3405
Title Entity Recommendation in Heterogeneous Information Networks
Speaker: Xiao Yu
Abstract: Recommender systems, which provide users with recommendations for products or services, have seen widespread implementation in various domains. In many scenarios, the entity recommendation problem exists in a heterogeneous information network environment with multi-typed relationships between users and entities. In this talk, Xiao will first explore the relationship heterogeneity in information networks and introduce an entity recommendation approach which generates personalized recommendation models for different users. Motivated by this study, he will then introduce a large-scale real-world application, which is a personalized entity recommendation system for search engine users, using search engine user log and the freebase knowledge graph, to integrate entity recommendation into users’ search experience. A scalable, robust and time-aware recommendation framework is proposed for this application. Experiments demonstrate the effectiveness of the proposed approaches in both studies.
Short Bio: Xiao Yu is a Ph.D candidate in the Department of Computer Science, at University of Illinois at Urbana-Champaign. He is advised by Prof. Jiawei Han. Xiao is broadly interested in data mining, information retrieval and machine learning with a focus on entity search and recommendation in information networks, cyber-physical network analysis and large-scale data mining algorithms and applications. Xiao has over 20 publications in major data mining and information retrieval journals and conferences, such as KDD, WSDM, SDM and ICDE.
Link video
Wednesday, Feb. 5, 2014
Place: SC3405
Title Towards Large Scale Open Domain Natural Language Processing
Speaker: Gourab Kundu
Abstract: Machine Learning and Inference methods are becoming ubiquitous ñ a broad range of scientific advances and technologies rely on machine learning techniques. In particular, the big data revolution heavily depends on our ability to use statistical machine learning methods to make sense of the large amounts of data we have.
Research in Natural Language Processing has both benefited and contributed to the advancement of machine learning and inference methods. However multiple problems still hinder the broad application of some of these methods. Domain adaptation is one of the key problems hindering widespread deployment of natural language processing
tools. In this talk, I will present techniques for domain adaptation "on the fly", that allows
adaptation to test domains using the same model from training domain, thus saving time and making possible
the adaptation of complex pipeleine systems as black box. For this, we formulate the prediction
problem as an integer program where task / domain specific knowledge is incorporated as constraints.
Formulating prediction problem as an integer program is currently widespread in NLP, from semantic role
labeling, sentiment analysis, dependency parsing etc. The later part of the talk will focus on improving the
scalability of all these tools with complex prediction stage to meet the challenges of big data.
I will show how we can amortize the cost of prediction over the lifetime of any NLP tool if the prediction problem can be represented as an integer linear program. I will present exact and approximate theorems for reusing solutions of integer programs from the past to speed up the solution time of future integer programs.
Short Bio: Gourab Kundu is a doctoral candidate in Computer Science Department of University of Illinois at Urbana-Champaign.
He is supervised by Professor Dan Roth. He has also worked in IBM research and Google for summer internships.
He has worked on a range of NLP problems like semantic role labeling, named entity recogntion, entity relation extraction etc. He is broadly interested in transfer learning and large scale inference. He has published in top tier NLP conferences along with a best student paper in CoNLL 2011.
Link video
Wednesday, Feb. 12, 2014
Place: SC3405
Title Big Network Analytics: Online and Active learning Approaches
Speaker: Quanquan Gu
Abstract: We are living in the Internet Age, in which information entities and objects are interconnected, thereby forming gigantic information networks. Examples of real-world information networks include social networks, bibliographic networks, gene regulation and protein interaction networks, knowledge graph, and the World Wide Web. It is critical to quickly process and understand these networks in order to enable data-driven applications. However, there are two main challenges for analyzing big networks. First, modern networks grow and involve over time, we require learning algorithms which are able to work on the fly and are adaptive to the variation of the networks. Second, the labels of the nodes or edges in big networks are scarce, it is urgent to optimize the process by which the labels are collected. In this talk, to address the above challenges, I will present several online and active learning algorithms for big network analytics, which are both statistically and computationally efficient, and with provable guarantee on their performance. Empirical studies on real-world networked data validate the effectiveness of the proposed algorithms.
Short Bio: Quanquan Gu is a Ph.D. candidate in Department of Computer Science, University of Illinois at Urbana-Champaign, supervised by Prof. Jiawei Han. He received his MS and BS degrees in Tsinghua University, China. He is the recipient of IBM PhD Fellowship for 2013-2014. His main research interests include theory and algorithms for data mining and machine learning, with focus on networked data.
Link video
Wednesday, Feb. 19, 2014
Place: SC3405
Title Distributed Optimization over Graphs
Speaker: Angelia Nedich
Abstract: Recent advances in wired and wireless technology necessitate the development of theory, models and tools to cope with new challenges posed by large-scale networks and various problems arising in current and anticipated applications over such networks. In this talk, optimization problems and algorithms for distributed multi-agent networked systems will be discussed. The distributed nature of the problem is reflected in agents having their own local (private) information while they have a common goal to optimize the sum of their objectives through some limited information exchange. The inherent lack of a central coordinator is compensated through the use of network to communicate certain estimates and the use of appropriate local-aggregation schemes. The overall approach allows agents to achieve the desired optimization goal without sharing the explicit form of their locally known objective functions. However, the agents are willing to cooperate with each other locally to solve the problem by exchanging some estimates of relevant information. Distributed algorithms will be discussed for synchronous and asynchronous implementations together with their basic convergence properties. A special attention will be devoted to directed graphs.
Short Bio: Angelia Nedich received her B.S. degree from the University of Montenegro (1987) and M.S. degree from the University of Belgrade (1990), both in Mathematics. She received her Ph.D. degrees from Moscow State University (1994) in Mathematics and Mathematical Physics, and from Massachusetts Institute of Technology in Electrical Engineering and Computer Science (2002). She has been at the BAE Systems Advanced Information Technology from 2002-2006. In Fall 2006, as Assistant Professor, she has joined the Department of industrial and Enterprise Systems Engineering at the University of Illinois at Urbana-Champaign, USA. Her general interest is in optimization including fundamental theory, models, algorithms, and applications. Her current research interest is focused on large scale convex optimization, distributed multi-agent optimization, and duality theory with applications in decentralized optimization. She received an NSF Faculty Early Career Development (CAREER) Award in 2008 in Operations Research.
Link video
Monday, Feb. 24, 2014
Place: SC3405
Title Toward Multi-level Query Understanding – From Query Lexicon to Query Semantics
Speaker: Yanen Li
Abstract: Search technologies have significantly transformed the way people seek information and acquire knowledge from the internet. To further improve the search accuracy and usability of the current-generation search engines, one of the most important research challenges is to understand a user's intent or information need underlying the query. However, understanding a query in the form of plain text is a non-trivial task. In this talk I will first introduce a framework in which a query is interpreted and represented in multiple levels. Then I will briefly overview our efforts on addressing key research questions from query lexicon, query syntactic, to query semantic understanding. In the rest of the talk I will present our recent work on query auto-completion in which we aim at predicting query representation given only a short prefix.
Short Bio: Yanen Li is a 5rd year Ph.D student in the Department of Computer Science at University of Illinois at Urbana-Champaign; his Ph.D advisor is Prof. ChengXiang Zhai. His research interests include information retrieval, data mining and medical informatics, with special focus on systematic query understanding in web search by mining query logs. He is a winner of the Microsoft Speller Challenge 2011. Before entering UIUC, he obtained the Bachelor and Master Degree both at the Department of Computer Science at Huazhong University of Science and Technology, China.
Link video
Wednesday, Mar. 5, 2014
Place: SC3405
Title MedSafe: Measurement-driven Accident Analysis for Safety-critical Medical Devices
Speaker: Homa Alemzadeh
Abstract: Medical device incidents are one of the major causes of serious injury and death in the United States. In 2011, about 1,190 recalls, 92,600 patient injuries, and 4,590 deaths were reported to the US Food and Drug Administration (FDA). The FDA recalls and adverse event reports provide valuable insights on the past failures and safety issues of medical devices and how the designs could be improved to prevent catastrophic patient impacts in the future. However, those reports are mainly composed of unstructured natural language text written by the manufacturers and volunteer reporters and are often difficult to analyze without considering domain-specific semantics and contextual factors. We present MedSafe, a framework for automated analysis of medical device reports to identify the causes of device failures and their impact on patients. We propose an ontology model based on the control-system structures that involve humans in the loop, to formalize the semantic interpretation of the reports and facilitate causal analysis of accidents. We demonstrate the effectiveness of MedSafe by showing sample results on analysis of 18,200 recall records reported for various types of medical devices during 2006-2013, and about 5,400 adverse events reported for robotic surgical systems, over the 13-year period of 2000-2012.
Short Bio: Homa Alemzadeh is a PhD candidate in electrical and computer engineering and a graduate research assistant at Coordinated Science Laboratory at UIUC. She received her BSc and MSc degrees in computer engineering from the University of Tehran, Iran. Her research interests include measurement-based dependability evaluation and accident analysis, hardware-based techniques for improving safety and reliability, and design of medical monitoring systems.
Link video
Wednesday, Mar. 12, 2014
Place: SC3405
Title Lost in Publications? Let Text Mining Help!
Speaker: Zhiyong Lu
Abstract: The explosion of biomedical information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge mostly captured as free text in journal articles and the interdisciplinary nature of biomedical research also presents a grand new challenge: how can scientists and health care professionals find and assimilate all the publications relevant to their research and practice? In this regard, in the first part of the talk, I will present our research on text mining and its application for improved information access for the worldwide scientific community Real-world use cases of text mining research in PubMed will be demonstrated. Next, I will present our effort on computer-assisted literature curation, with a focus on our recent experience in BioCreative, a community-based worldwide challenge event in biomedical text mining.
Short Bio: Dr. Lu is a Stadtman investigator at the National Institutes of Health, where he joined immediately after earning a PhD in Bioinformatics at the University of Colorado School of Medicine. His research group is developing computational methods for analyzing and making sense of natural language data in biomedical literature and clinical text. Several of his recent research has been successfully integrated into and widely used in PubMed and other NCBI databases. Dr. Lu is an Associate Editor for BMC Bioinformatics and serves on the editorial board for the Journal Database. He is also involved in the organization of several international scientific meetings such as the BioCreative challenge series, PSB sessions on computational drug repurposing, and IEEE conference on health informatics.
Link video
Wednesday, Mar. 19, 2014
Place: SC3405
Title Similarity Query Processing Techniques for Text Data
Speaker: Younghoon Kim
Abstract: With the widespread use of the internet, text-based data sources have become ubiquitous and the demand for effective support of similarity matching queries in text data continues to increase. While the applications for text similarity queries are diverse, similarity queries are essential and useful in many applications. In this talk, I will first introduce the optimal and approximate exact substring matching algorithms to find the best query plan utilizing inverted variable-length gram indexes. Then, I will present efficient algorithms for top-k approximate substring matching utilizing our novel lower bounds for substring edit distance. Furthermore, I want to briefly introduce an parallel algorithm developed for top-k approximate string joins.
Short Bio: He is a postdoctoral researcher at the Department of Computer Science of UIUC hosted by Professor Jiawei Han. He received a Ph.D under the supervision of Professor Kyuseok Shim from Seoul National University in 2013 and a B.S. degree in Computer Science from Seoul National University in 2006. He has been working in the area of substring query processing in database and text mining using probabilistic modeling in social network services.
Link video
Friday, Apr. 4, 2014
Place: SC0216
Title Big Trajectory Data: from fundamentals to performance
Speaker: Xiaofang Zhou
Abstract: Spatial trajectory data record movement history of objects in the geographical space. They can be used to find behaviours and patterns and make predications for individual objects as well as a group of objects. Spatiotemporal data management and query processing have been an active research topic over the last three decades, spanning a wide range of areas including databases, geographical information systems and data mining. With more and more trajectory data available and an increasing amount of interest from business communities, we now need to revisit trajectory database research from some basic questions such as trajectory data representation in databases,trajectory similarity measures, to more advanced questions such as how we can take advantages of modern hardware platforms to support TB level trajectory data processing. In this talk we will share our thoughts on these issues, and discuss some recent work at the University of Queensland.
Short Bio: Xiaofang Zhou is a Professor of Computer Science at the University of Queensland. He received his BSc and MSc degrees in Computer Science from Nanjing University, China, and PhD in Computer Science from the University of Queensland. Before joining UQ in 1999, he worked as a researcher in Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia, leading its Spatial Information Systems group. He has been working in the area of spatial and multimedia databases, data quality, high performance query processing, Web information systems and bioinformatics, co-authored over 250 research papers with many published in top journals and conferences such as SIGMOD, VLDB , ICDE, ACM Multimedia, The VLDB Journal, ACM Transactions and IEEE Transactions. He was a Program Committee Co-chair the 29th International Conference on Data Engineering (ICDE 2013), and a General Co-chair of ACM Multimedia conference in 2015. He has been on the program committees of numerous international conferences, including SIGMOD, VLDB, ICDE, WWW and ACM Multimedia. Currently he is an Associate Editor of The VLDB Journal, IEEE Transactions on Cloud Computing, World Wide Web Journal, and Distributed and Parallel Databases. He is a current member of IEEE Technical Committee on Data Engineering (TCDE) Executive Committee, IEEE TCDE Award Committee, and the Steering Committees of DASFAA, WISE, APWeb and Australasian Database Conferences. In the past he was an Associate Editor of IEEE Transactions on Knowledge and Data Engineering (2009-2013) and Information Processing Letters. Xiaofang is a specially appointed Adjunct Professor under the Chinese National Qianren Scheme hosted by Renmin University of China (2010-2013), and by Soochow University since July 2013 where he leads the Research Center on Advanced Data Analytics (ADA).
Link video

Adaptavist Theme Builder (4.2.2) Powered by Atlassian Confluence 3.4.9, the Enterprise Wiki