Department of Computer Science and Automation Department of Computer Science and Automation, IISc, Bangalore, India Indian Institute of Science
HOME | ABOUT US | PEOPLE | RESEARCH | ACADEMICS | FACILITIES | EVENTS / SEMINARS | NEWS | CONTACT US

UPCOMING SEMINARS

PAST SEMINARS

Series: Department Seminar
Title: Fast and Sloppy - scaling up linear models

  • Speaker: Prof. Alexander J. Smola, Yahoo
  • Date and Time: Wednesday, November 18, 2009, 4:00 PM
  • Venue: Room No. 254, Seminar Hall, Room No. 254

Abstract
In this talk I discuss a number of algorithms which, in combination, can be used to scale up linear models to deal with the amounts of data available at Yahoo. In particular, I will discuss issues of collaborative classification with a very large number of classes, hashing to reduce dimensionality, compressed memory representations for collaborative filtering, and algorithms to accelerate online learning on parallel computers.

Speaker Bio:
Dr. A. Smola is currently principal research scientist with Yahoo! Prior to his joining Yahoo! he was a professor at the Australian National University(ANU) and Group leader at NICTA, Australia.


 

Series: Department Seminar
Title: Migrating into the Cloud

  • Speaker: Dr. T.S. Mohan
                    Principal Researcher, ECom Research Lab,
                    Infosys Technologies Ltd.
  • Date and Time: Monday, November 16, 2009, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254)

Abstract
While Cloud Computing is hyped by Gartner to be the top of the top ten Strategic Technology Areas to be watched out for in 2010, there are big challenges for an enterprise in leveraging and using this techno-business disruptive model called cloud. In this talk we focus on key technical issues and research problems as well as solutions in using or adopting or integrating or more specifically migrating into existing Cloud Models and offerings. Cloud offerings are typically modeled at three levels - IAAS, PAAS or SAAS. We will detail what it means to migrate into each of these models as well as the issues and challenges facing the architects developing the migration strategy. We detail a seven step process of Cloud Migration that we had proposed and share the best practices associated with the development of software architecture best fitting for each of these cloud models. We conclude this talk touching upon some of key engineering and research challenges in 'making the cloud happen under the hood'.

Speaker Bio:
Dr. T S Mohan is with E&R's ECom Research Lab working as a Principal Researcher. His areas of research interests include Distributed Systems, High Performance Computing, Cloud and Grid as well as Software Architecture and Engineering. He has a varied experience of 22 years in the academia and industry. He holds a Masters and PhD in Computer Science from IISc and has worked there at SERC for about a decade before moving into the industry, working at HP ISO and interesting IT technology startups. He was an entrepreneur as well having run his own startup for more than six years before joining Infosys. Prof Rajaraman, Emeritus Professor, supervised his PhD Thesis entitled, "Interaction Paradigms in Distributed Object Oriented Programming Languages".


 

Series: Department Seminar
Title: PerfCenter and AutoPerf: Tools and Techniques for Modeling and Measurement of the Performance of Distributed Applications

  • Speaker: Dr. Varsha Apte, Visiting Professor
  • Date and Time: Wednesday, November 11, 2009, 11:30 AM
  • Venue: Room No. 254, CSA Seminar Hall, First Floor

Abstract
In this talk, we will present the design and methodology underlying two software tools that we have developed in the last few years at IIT Bombay for performance measurement and modeling of distributed applications.

We present a tool, PerfCenter, which can be used for performance oriented deployment and configuration of a multi-tier application in a hosting center, or a data center. While there are a number of tools which aid in the process of performance analysis during the software development cycle, few tools are geared towards aiding a data center architect in making appropriate decisions during the deployment of an application. PerfCenter facilitates this process by allowing specification in terms that are natural to a data center architect. Thus, PerfCenter takes, as input, the number and specs of hosts available in a data center, the network architecture of geographically diverse data centers, the deployment of software on hosts, hosts on data centers, and the usage information of the application (scenarios, resource consumption), and provides various performance measures such as scenario response times, and resource utilizations. We describe the PerfCenter specification, and its performance analysis utilities, and illustrate its use in the deployment and configuration of a Webmail application. PerfCenter works by generating the underlying queueing network model of the distributed system and solving it either by analytical methods or discrete-event simulation. We will provide an insight into the primary challenges of solving this complex model analytically. Finally, we present some validation results, where PerfCenter model predictions were compared against measured data, which confirmed the soundness of the tool.

We also present a load generator and performance measurement tool (AutoPerf ) which requires minimal input and conguration from the user, and produces a comprehensive capacity analysis as well as server-side resource usage prole of a Web-based distributed system, in an automated fashion. The tool requires only the workload and deployment description of the distributed system, and automatically sets typical parameters that load generator programs need, such as maximum number of users to be emulated, number of users for each experiment, warm-up time, etc. The tool also does all the co-ordination required to generate a critical type of measure, namely, resource usage per transaction or per user for each software server. This is a necessary input for creating a performance model of a software system.

Speaker Bio:
Varsha Apte is a faculty member in the Department of Computer Science and Engineering, IIT Bombay, where she has been since 2002. During the year 2009-10 (sabbatical leave from IITB) she is Visiting Faculty at the Computer Science and Automation Dept. at IISC Bangalore and part-time Visiting Researcher in the IBM India Research Lab, Bangalore. Prior to joining IIT Bombay, she was in the Network Design and Performance Analysis Department of AT&T Labs in Middletown, NJ. She received her Ph.D. from Duke University in 1994, and Masters from Pune University in 1989. Her primary research interest is in performance management (modeling, analysis and control) of distributed applications.


 

Series: Department Seminar
Title: Analysis Techniques for Cyber-Physical Systems

  • Speaker: Dr. Sibin Mohan
                    Research Scientist, Computer Science department
                    University of Illinois at Urbana-Champaign (UIUC)
  • Date and Time: Tuesday, November 10, 2009, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254)

Abstract
Embedded Systems are ubiquitous. Most modern embedded systems have (a) real-time constraints and (b) interact with the physical world. They are increasing in size, complexity and scope and are increasingly interconnected. Yet, they suffer from a serious lack of analysis techniques and tools that makes the task of designing and verifying such systems a laborious and complicated process. With the advent of "cyber-physical systems", i.e. systems that have both, computational as well as physical world requirements, this problem is exacerbated.

In today's talk I will focus on analysis techniques and tools that I have developed/am working on for the following three broad areas: I. Virtual Integration: system integration techniques for dealing with complex systems such as Avionics II. Integration of Security into systems with safety-critical/real-time constraints III. Analysis of contemporary processor architectures (out-of-order and multicore processors) for use in real-time systems.

Speaker Bio:
Sibin Mohan is a Research Scientist in the Computer Science department at the University of Illinois at Urbana-Champaign (UIUC).

Sibin completed his Bachelor of Engineering (B.E.) from Bangalore University in Computer Science and Engineering in 2001. He worked for Hewlett-Packard, Bangalore for a year before enrolling in the doctoral program at North Carolina State University in 2002. He obtained his M.S. and Ph.D. degrees in Computer Science from NC State in 2004 and 2008 respectively, where he was awarded a Teaching Fellowship from the graduate school.

Sibin's research interests include: Systems (embedded and real-time systems, cyber-physical systems, operating systems), Computer Architecture and Compilers.


 

Series: Msc(Engg) Thesis Defense
Title: Efficient Compilation of Stream Programs onto Multi-cores with Accelerators

  • Speaker: Mr. Abhishek Udupa, M.Sc Engg.
  • Faculty Advisor: Prof. Govindarajan, Prof. T. Matthew Jacob
  • Date and Time: Tuesday, November 10, 2009, 10:00 AM
  • Venue: Room No. 254, Seminar Hall [First Floor]

Abstract
Over the past two decades, microprocessor manufacturers have typically relied on wider issue widths and deeper pipelines to obtain performance improvements for single threaded applications. However, in the recent years, with power dissipation and wire delays becoming primary design constraints, this approach can no longer be effectively used to yield performance improvements. Thus processor designers and vendors are universally moving towards multi-core designs. Examples for these are the commodity general purpose multi-core processors, the CellBE accelerator from IBM and the Graphics Processing Units from NVIDIA and ATI. Although these many and multi-core architectures can provide enormous performance benefits, it is difficult to program for them due to the complexity of writing explicitly parallel code. The ubiquity of computationally intensive media processing applications makes it imperative to consider new programming frameworks and languages that can express parallelism in an easy, portable manner. The StreamIt programming language has been proposed to efficiently exploit parallelism at various levels on general purpose multi-core architectures and stream processors and allow media processing and DSP application to be developed in an easy and portable fashion. The StreamIt model allows programmers to specify a program as a set of filters connected by FIFO communication channels. The graphs thus specified by the StreamIt programs describe task, data and pipeline parallelism which can be potentially exploited on modern Graphics Processing Units (GPUs), which have emerged as powerful, commodity stream processors, which support abundant parallelism in hardware.

This thesis deals with the challenges in mapping StreamIt programs to GPUs and proposes an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem ? both scheduling and assignment of filters to processors ? as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipeline parallelism.

While the approach of software pipelining the execution of stream programs only on GPUs is efficient and performs well, it does not utilize the CPU cores to perform useful computation. Further, it does not support programs with stateful filters, which are essentially filters that are not data parallel owing to dependences between successive firings that is carried through the implicit state of the filter. The second part of the thesis aims at addressing these issues and describes a novel method to orchestrate the execution of a StreamIt program on the multiple cores of a system and GPUs in a synergistic manner. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers, the limited DMA bandwidth available and the required buffer layout transformations associated with the partitioning, as an integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. Since solving an ILP is NP-Hard in the general case and may thus require a large amount of time, we also propose an efficient heuristic algorithm for the work partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solutions to the ILP formulation on an average across the benchmark suite, while requiring 2 ? 3 orders of magnitude less time than the ILP approach. Our experiments on a platform with eight CPU cores, out of which four were used, and a GeForce 8800 GTS 512 GPU show a (geometric) mean speedup of 6.84X with a maximum of 51.96X over a single threaded CPU execution across a set of StreamIt benchmarks.


 

Series: Msc(Engg) Thesis Defense
Title: Extension of Path Probability Method to Approximate Inference over Time

  • Speaker: Mr. Vinay Jethava
  • Faculty Advisor: Prof. Chiranjib Bhattacharyya
  • Date and Time: Thursday, October 29, 2009, 11:00 AM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
There has been a tremendous growth in publicly available digital video footage over the past decade. This has necessitated the development of new techniques in computer vision geared towards e_cient analysis, storage and retrieval of such data. Many mid-level computer vision tasks such as segmen- tation, object detection, tracking, etc. involve an inference problem based on the video data available. Video data has a high degree of spatial and temporal coherence. For example, pixels near a black pixel tend to have the same color, or that an object in motion in the current video frame, is likely to remain in motion in the next frame. The property must be intelligently leveraged in order to obtain better results.

Graphical models, such as Markov Random Fields, have emerged as a powerful tool for such inference problems. They are naturally suited for ex- pressing the spatial dependencies present in video data, It is however, not clear, how to extend the existing techniques for the problem of inference over time. This thesis explores the Path Probability Method, a variational technique in statistical mechanics, in the context of graphical models and ap- proximate inference problems. It extends the method to a general framework for problems involving inference in time, resulting in an algorithm, DynBP. We explore the relation of the algorithm with existing techniques, and _nd the algorithm competitive with existing approaches.

The main contribution of this thesis are the extended GBP algorithm, the extension of Path Probability Methods to the DynBP algorithm and the relationship between them. We have also explored some applications in computer vision involving temporal evolution with promising results.


 

Series: Department Seminar
Title: Where is the technology in an animation company

  • Speaker: Dr.Michael Henderson, DreamWorks Animation in India
  • Date and Time: Tuesday, October 27, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
Will discuss the many ways that technology is involved in the animated filmmaking process. Will cover what the different technology roles typically are, how they contribute, and who best fills those roles.

Speaker Bio:
Michael Henderson is currently the Director of Technologies at DreamWorks Animation India. Michael has worked in the animated film industry for 10 years, actively involved in Disney Feature Animation and DreamWorks’ transitions from traditional 2D to CG filmmaking, as well as the recent move to fully stereoscopic filmmaking at DreamWorks as the Technical Supervisor on Monsters vs. Aliens.


 

Series: Department Seminar
Title: Art and Technology in Animation Film Making - Behind the Scenes: Monsters vs. Aliens

  • Speaker: Dr. Mahesh Ramasubramanian, DreamWorks Animation
  • Date and Time: Tuesday, October 27, 2009, 3:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
What do you get when you cross the idea of telling a film from monsters' point of view, with "Mad Magazine" style, and a "Dirty Dozen" plot? Find out more when we look behind the scenes at DreamWorks' animated movie "Monsters vs. Aliens" and get inspired by the creative and technical challenges of developing a blobby main character, simulating open water, building San Francisco, and then destroying it.

Speaker Bio:
Mahesh Ramasubramanian most recently served as visual effects supervisor on DreamWorks' "Monsters vs. Aliens: Mutant Pumpkins From Outer Space". He previously served as the digital supervisor on DreamWorks' "Monsters vs. Aliens." His credits include "Shrek," the "Shrek 4-D" Universal attraction, "Shrek 2," "Madagascar," "Over the Hedge," and "Bee Movie." Ramasubramanian is from Chennai, India, and graduated from Birla Institute of Technology and Science, Pilani, India.


 

Series: Department Seminar
Title: An Overhead and Resource Contention Aware Analytical Model for Overloaded Web Servers

  • Speaker: Dr. Varsha Apte, Visiting Professor, CSA
  • Date and Time: Wednesday, October 14, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
Increased response time during periods of overload on aWeb server may cause impatient users to time-out, causing the server to do un-productive work in processing these abandoned requests. Overhead time spent in preprocessing each request adds to the unproductive work even for requests that are not taken up for service. This causes the usable throughput, i.e. goodput, of the overloaded Web server to drop drastically, while resource utilization remains at 100%. Al- though this behaviour can be easily reproduced experimentally, ex- isting analytical models of queues with abandonments are not able to do so.

We present an analytical model that captures characteristics spe- cic to networked software servers, namely, overhead processing and contention for shared hardware resources, that is able to explain the goodput degradation typically observed in overloaded servers. We use this model to compare the performance of the LIFO and FIFO queueing disciplines during overload and show that LIFO goodput and response time are better than those of FIFO.

Speaker Bio:
Varsha Apte is a faculty member in the Department of Computer Science and Engineering, IIT Bombay, where she has been since 2002. During the year 2009-10 (sabbatical leave from IITB) she is Visiting Faculty at the Computer Science and Automation Dept. at IISC Bangalore and part-time Visiting Researcher in the IBM India Research Lab, Bangalore. Prior to joining IIT Bombay, she was in the Network Design and Performance Analysis Department of AT&T Labs in Middletown, NJ. She received her Ph.D. from Duke University in 1994, and Masters from Pune University in 1989. Her primary research interest is in performance management (modeling, analysis and control) of distributed applications.


 

Series: Theory Seminar
Title: Chordal Bipartite Graphs with High Boxicity

  • Speaker: Mr. Rogers Mathew, Ph.D student
  • Faculty Advisor: Dr. Sunil Chandran
  • Date and Time: Monday, October 12, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
The boxicity of a graph G is defined as the minimum integer k such that G is an intersection graph of axis-parallel k-dimensional boxes. Chordal bipartite graphs are bipartite graphs that do not contain an induced cycle of length greater than 4. It was conjectured by Otachi, Okamoto and Yamazaki that chordal bipartite graphs have boxicity at most 2. We disprove this conjecture by exhibiting an infinite family of chordal bipartite graphs that have unbounded boxicity.


 

Series: Msc(Engg) Colloquium
Title: Discovering Rules from Disk Events for Predicting Hard Drive Failures

  • Speaker: Vipul Agrawal
  • Faculty Advisor: Dr. Chiranjib Bhattacharyya
  • Date and Time: Friday, October 09, 2009, 10:00 AM
  • Venue: Room No. 252, CSA Dept.

Abstract
The ability to accurately predict an impending hard disk failure is important for reliable storage system design. The facility provided by most hard drive manufacturers, called S.M.A.R.T. (self-monitoring, analysis and reporting technology), has been shown by current research to have poor predictive value. The problem of finding alternatives to S.M.A.R.T. for predicting disk failure is an area of active research. In this work, we present a rule discovery methodology, and show that it is possible to construct decision support systems that can detect such failures using information recorded from live disks.

It is desired that any such prediction methodology should have high accuracy and must have ease of interpretability. Black box models can deliver highly accurate solutions but do not provide an understanding of events which explains the decision given by it. To this end we explore rule based classifiers for predicting hard disk failures from various disk events. We show that it is possible to learn easy to understand rules from disk events. Our evaluation shows that our system can be tuned either to have a high failure detection rate (i.e., classify a bad disk as bad) or to have a low false alarm rate (i.e., not classify a good disk as bad).

We also propose a modification of MLRules algorithm for classification of data with imbalanced class distributions. The existing algorithm, assuming relatively balanced class distributions and equal misclassfication costs, performs poorly in classification of such datasets. The performance can be considerably improved by introducing cost- sensitive learning to the existing framework.


 

Series: Department Seminar
Title: Approximating Optimal Decision Trees

  • Speaker: Dr. Venkat Chakaravarthy
  • Date and Time: Thursday, October 08, 2009, 4:00 PM
  • Venue: CSA Seminar Hall

Abstract
In this talk, we shall consider the problem of constructing decision trees for entity identification from a given table. The input is a table containing information about a set of entities over a fixed set of attributes. The goal is to construct a decision tree that identifies each entity unambiguously by testing the attribute values such that the average number of tests is minimized. This well-studied problem finds applications in machine fault detection, species identification in biology and medical diagnosis. After a quick review of prior work, we will discuss an O(log n)-approximation algorithm that is based on a new greedy heuristic. We will conclude our discussion by stating some interesting open problems.

Joint work with Vinayaka Pandit, Sambuddha Roy and Yogish Sabharwal. A paper based on the work can be found in the proceedings of the ICALP'09 conference.

Speaker Bio:
Dr. Venkat Chakaravarthy received his PhD from the University of Wisconsin, Madison. He is currently working at IBM Research Lab in New Delhi. His area of interest is theory of computation; in particular, complexity theory and approximation algorithms. He also enjoys working on theoretical issues related to database systems. Recently, he has picked up an interest in algorithmic issues related to high performance computing.


 

Series: Department Seminar
Title: Ranking Problems in Machine Learning: Theory and Applications

  • Speaker: Dr. Shivani Agarwal
  • Date and Time: Wednesday, October 07, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
In the last few decades, there has been considerable progress in the understanding of binary classification (learning of binary-valued functions) and regression (learning of real-valued functions), both classical problems in machine learning. Although several questions remain to be answered, there is a well-developed theory in place for these problems, and practical successes have been demonstrated in a variety of applications.

Recently, a new class of learning problems, namely ranking problems, have begun to gain attention. In ranking, one learns a real-valued function that assigns scores to objects, but the scores themselves do not matter; instead, what is important is the relative ranking of objects induced by those scores. Ranking problems arise in a variety of domains: in information retrieval, one wants to rank documents according to relevance to some topic or query; in user-preference modeling, one wants to rank items according to a user's likes and dislikes; in computational biology, one wants to rank genes according to relevance to some disease. Ranking problems are mathematically distinct from both classification and regression, and cannot be analyzed using existing results for these problems.

In this talk, I will describe some recent results in both the theoretical understanding of ranking and its applications. In particular, I will describe generalization bounds for ranking algorithms based on the tools of uniform convergence and algorithmic stability, and some preliminary results on the sample complexity of learning ranking functions. I will conclude with some recent applications to ranking chemical structures for drug discovery.

Speaker Bio:
Dr Agarwal is currently a Postdoctoral fellow in CS Dept, MIT. Her research interests are in Machine Learning and its applications to Computational Biology


 

Series: Department Seminar
Title: Collective Clustering in Relational Networks

  • Speaker: Dr.Indrajit Bhattacharya
  • Date and Time: Tuesday, October 06, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
We are commonly faced with heterogeneous data involving different types of entities connected by relational networks, such as people and topics in social networks, proteins in biological networks, and so on. For clustering data from such relational networks, traditional clustering approaches are based on features of individual data items. I will focus on collective clustering in a relational network, where clustering decisions are made jointly over relational neighborhoods in the network. To address this task, in this talk, I will present probabilistic graphical models that capture cluster relationships using latent groups over clusters. Inference in such models is a challenge, and I will present an efficient approximate method based on Gibbs Sampling. I will present experiment results demonstrating that collective clustering improves both clustering accuracy as well as interpretability of the discovered clusters compared to traditional approaches.

Speaker Bio:
Dr Indrajit Bhattacharya is a research scientist working in IBM research labs in Delhi. His research interest lies in machine learning techniques using probabilistic graphical models for analysis and prediction in structured and unstructured data. His research includes developing probabilistic and other models for collective relational clustering resolution for entity resolution in relational data and word sense disambiguation from multilingual corpora. Dr Bhattacharya holds a BTech from Dept of CS, IIT Kharaghpur and later received his PhD from University of Maryland.


 

Series: Department Seminar
Title: Theoretical and Experimental Self-Assembly

  • Speaker: Dr. Manoj Gopalkrishnan, TIFR Mumbai
  • Date and Time: Friday, September 25, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
Recent work on the tile assembly model has revealed the close connections between computer science and self-assembly. I shall describe two approaches to the study of self-assembly that are inspired by the tile assembly model.

The first approach involves experiments with DNA molecules. We use DNA like a construction material --- akin to the uses of brick, cement, glass, etc. --- to form nanostructures like hexagonal tilings, cylinders and mobius strips.

The second approach involves a mathematical investigation of the law of mass action in chemistry. A major goal of our work is to solidify the mathematical foundations of mass action chemistry. In addition, we believe that the law of mass action is of intrinsic mathematical interest. We are led to a dynamical theory of sets of binomials over the complex numbers, with hints of connections to number theory and toric varieties.

Speaker Bio:
Manoj is a faculty member in the School of Technology and Computer Science at the Tata Institute of Fundamental Research, Mumbai since March 2009. He received his Ph. D. in Computer Science from the University of Southern California in December 2008. His advisor was Professor Len Adleman, and dissertation topic was "Theoretical and Experimental Self-Assembly." Before that, he received a B. Tech. in computer science and engineering from IIT Kharagpur.


 

Series: Department Seminar
Title: Large-scale Data Management for the Sciences

  • Speaker: Dr. Tanu Malik
  • Date and Time: Wednesday, September 23, 2009, 4:00 PM
  • Venue: CSA Seminar Hall

Abstract
Traditional enterprises and novel scientific applications are accumulating petabyte-scale datasets, which makes the need for large-scale data management more pressing than ever. Geographic distribution of the datasets accompanied by complex demands on data makes large-scale data management challenging. This is especially true for sciences that model complex physical and biological phenomena using data from multiple sources.

In this talk I will address two critical problems in scientific data management: combining large number of diverse data sources for execution of scientific queries and executing data-intensive scientific queries efficiently, in terms of both network and I/O, on these data sources. I will present SkyQuery--a system that federates data from several petabyte size, autonomous and heterogeneous astronomy databases scattered worldwide. Using SkyQuery, scientists can write declarative queries that compare and merge multiple astronomical datasets. For efficient query execution and scalability, I will present River--a novel caching framework for database systems that dramatically reduces the network bandwidth requirements of data-intensive federations such as SkyQuery. Distributed applications such as the River often rely on a priori knowledge of query cardinalities to make optimization decisions. In this context, I will present a black-box approach to selectivity estimation that is suitable for distributed applications. The success of SkyQuery and its adoption by the National Virtual Observatory is an example of data management systems enabling scientific endeavors.

Speaker Bio:
Tanu Malik is a Research Assistant Professor with the Cyber Center in Discovery Park at Purdue University and with the Indiana Center of Database Systems. Her research interests are in a wide variety of areas including but not limited to data federations, database caching, query execution and optimization, self-organizing database systems, and summary structures for cardinality estimation. A recurrent theme in her research is to re-examine the core principles of database technology in the light of new requirements emerging from scientific data. Her research has resulted in some innovative database technology for handling large distributed scientific data.

Tanu earned her PHD and MS in 2007 from the Department of Computer Science at Johns Hopkins University. She earned her B.Tech in 1999 from the Department of Civil Engineering at Indian Institute of Technology, Kanpur.


 

Series: Msc(Engg) Colloquium
Title: Discovery of Application Workloads from Network File Traces

  • Speaker: Ms. Neeraja Yadwadkar
  • Faculty Advisor: Prof. Chiranjib Bhattacharyya
  • Date and Time: Tuesday, September 15, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
An understanding of I/O data access patterns of applications is useful in several situations. First, gaining insight into what applications are doing with their data at a semantic level helps in designing efficient storage systems. Second, it helps create benchmarks that mimic realistic application behavior closely. Third, it enables autonomic systems as the information obtained can be used to adapt the system in a closed loop.

All these use cases require the ability to extract the application-level semantics of I/O operations. Methods such as modifying application code to associate I/O operations with semantic tags are intrusive. It is well known that network file system traces are an important source of information that can be obtained non-intrusively and analyzed either online or offline. These traces are a sequence of primitive file system operations and their parameters. Simple counting, statistical analysis or deterministic search techniques are inadequate for discovering application-level semantics in the general case, because of the inherent variation and noise in realistic traces.

In this work, we describe a trace analysis methodology based on Profile Hidden Markov Models. We show that the methodology has powerful discriminatory capabilities that enable it to recognize applications based on the patterns in the traces, and to mark out regions in a long trace that encapsulate sets of primitive operations that represent higher-level application actions. It is robust enough that it can work around discrepancies between training and target traces such as in length and interleaving with other operations. We demonstrate the feasibility of recognizing patterns based on a small sampling of the trace, enabling faster trace analysis. Preliminary experiments show that the method is capable of learning accurate profile models on live traces in an online setting. We present a detailed evaluation of this methodology in a UNIX environment using NFS traces of selected commonly used applications such as compilations as well as on industrial strength benchmarks such as TPC-C and Postmark, and discuss its capabilities and limitations in the context of the use cases mentioned above.


 

Series: Department Seminar
Title: Unsupervised Learning of Word Senses

  • Speaker: Prof. Suresh Manandhar, University of York, UK
  • Date and Time: Thursday, September 10, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
Unsupervised learning of lexical semantics is an emerging area within NLP that poses interesting and challenging problems. The primary advantage of unsupervised and minimally supervised methods is that annotated data is not required or required only in small quantities. In this talk, I will present our current work on word sense induction. Sense induction is the task of discovering all the senses of a given word from raw unannotated data. Our collocational graph based method achieves high evaluation scores while overcoming some of the limitations of existing methods. We show graph connectivity measures can be employed to avoid the need for supervised parameter tuning. Hierarchical clustering and hierarchical random graphs can be employed for inducing concept hierarchies.

Speaker Bio:
Prof. S. Manandhar is on the faculty of CS Dept, Univ of York, UK. His research interest lies in Natural language processing and its applications.


 

Series: Department Seminar
Title: "Sherpa: Yahoo's distributed storage service"

  • Speaker: Dr. P.P.S. Narayan,
                    Yahoo
  • Date and Time: Tuesday, September 08, 2009, 4:00 PM
  • Venue: Room No. 254, CSA Seminar Hall, [First Floor]

Abstract
Sherpa is the next-generation structured-record distributed storage service that addresses growing scalability needs of Yahoo! properties. Key features of Sherpa include -- high scalability, elastic growth, global footprint for local low-latency access, asynchronous replication, RESTful web service APIs, novel per-record consistency knobs, high availability, low capex and opex, and many more. Sherpa is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. Over the last year, we have built the Sherpa platform from the ground up, and in May 2009 Sherpa version 1.5 was announced to be generally available within Yahoo!. In this talk, we present the technology and architecture of the Sherpa platform, the engineering challenges of building a large-scale highly-available distributed system, and finally we present the roadmap and future of the product.

Speaker Bio:
P.P.S. Narayan (PPSN) is the Director of Engineering of the Sherpa product. His team is responsible for the design, architecture, and engineering of Sherpa. Before joining Yahoo!, PPSN was at Bell Labs where he was involved in database, and network management research. He also led a small Bell Labs Venture team to build the next-generation LBS platform called Geopepper. PPSN has publications at reputed database conferences in the areas of transaction management, XML, and query processing.


 

Series: Department Seminar
Title: Compiling for Multicore Systems

  • Speaker: V. Krishna Nandivada
                    IBM India Research Lab, Bangalore
  • Date and Time: Friday, September 04, 2009, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254)

Abstract
As multi-core systems are gaining popularity, there is a definite need for languages, tools and techniques that can simplify programming high performance machines to exploit the hardware features to a significant level and achieve higher throughput. Two of the main issues that we encounter while compiling for multi-core systems are the following: (a) the difference between user perceived parallelism and parallelism based on the actual hardware, and (b) reasoning about the locality of data and computation. In the talk, we will be discussing few steps in these directions by taking X10 as the language of reference.

To statically establish the place locality in X10 programs, we present an analysis framework based on a static abstraction of activities (threads) and an extension to classical escape analysis to track the abstract activities to which an object can escape. Our framework takes advantage of the high level abstraction of X10 distributions to reason about place locality of array accesses in loops as well.

A simple and important example of the separation of concerns between ideal and useful parallelism can be found in chunking of parallel loops, where the programmer expresses ideal parallelism by declaring all iterations of a loop to be parallel and the implementation exploits useful parallelism by executing iterations of the loop in sequential chunks. We addressed the problem of chunking parallel loops via a semantic preserving transformation framework that uses a combination of transformations from past work to obtain an equivalent set of parallel loops that chunk together statements from multiple iterations. In another effort to reduce the gap between ideal and useful parallelism, we designed a compiler that transforms explicitly parallel programs into SPMD programs. Our submissions based on our X10 compiler and runtime have won the HPC (class II) awards, at the SuperComputing conference held in years 2007 and 2008.

Speaker Bio:
V. Krishna Nandivada graduated from UCLA and currently works in IBM India Research Lab, Bangalore. He did his masters from Indian Institute of Science, Bangalore and his bachelors from Regional Engineering College (now known as National Institute of Technology) Rourkela. Before joining in IBM, he has been associated with Hewlett Packard, Bangalore and Sun Labs, Burlington for different periods of times.

His interests include program analysis, programming language design, compiler optimizations, and program reasoning.


 

 

 

 

 

 

 

 

 

 

Copyright: CSA, IISc Phone: +91-80-22932368          Fax: +91-80-23602911 Feedback      Credits