Events |
|
Seminars |
|
|
 |
| UPCOMING SEMINARS |
 |
|
PAST SEMINARS |
 |
Series: Department Seminar Title: Fast and Sloppy - scaling up linear models - Speaker: Prof. Alexander J. Smola, Yahoo
- Date and Time: Wednesday, November 18, 2009, 4:00 PM
- Venue: Room No. 254, Seminar Hall, Room No. 254
Abstract In this talk I discuss a number of algorithms which, in combination, can
be used to scale up linear models to deal with the amounts of data
available at Yahoo. In particular, I will discuss issues of collaborative
classification with a very large number of classes, hashing to reduce
dimensionality, compressed memory representations for collaborative
filtering, and algorithms to accelerate online learning on parallel
computers.
Speaker Bio: Dr. A. Smola is currently principal research scientist with
Yahoo!
Prior to his joining Yahoo! he was a professor at the Australian National
University(ANU) and Group leader at NICTA, Australia.
| Series: Department Seminar Title: Migrating into the Cloud - Speaker: Dr. T.S. Mohan
Principal Researcher, ECom Research Lab,
Infosys Technologies Ltd. - Date and Time: Monday, November 16, 2009, 4:00 PM
- Venue: CSA Seminar Hall (Room No. 254)
Abstract While Cloud Computing is hyped by Gartner to be the top of the top ten
Strategic Technology Areas to be watched out for in 2010, there are big challenges
for an enterprise in leveraging and using this techno-business disruptive model
called cloud. In this talk we focus on key technical issues and research problems as
well as solutions in using or adopting or integrating or more specifically migrating
into existing Cloud Models and offerings. Cloud offerings are typically modeled at
three levels - IAAS, PAAS or SAAS. We will detail what it means to migrate into
each of these models as well as the issues and challenges facing the architects
developing the migration strategy. We detail a seven step process of Cloud Migration
that we had proposed and share the best practices associated with the development of
software architecture best fitting for each of these cloud models. We conclude this
talk touching upon some of key engineering and research challenges in 'making the
cloud happen under the hood'.
Speaker Bio: Dr. T S Mohan is with E&R's ECom Research Lab working as a Principal Researcher.
His areas of research interests include Distributed Systems, High Performance
Computing, Cloud and Grid as well as Software Architecture and Engineering. He has
a varied experience of 22 years in the academia and industry. He holds a Masters
and PhD in Computer Science from IISc and has worked there at SERC for about a
decade before moving into the industry, working at HP ISO and interesting IT
technology startups. He was an entrepreneur as well having run his own startup for
more than six years before joining Infosys. Prof Rajaraman, Emeritus Professor,
supervised his PhD Thesis entitled, "Interaction Paradigms in Distributed Object
Oriented Programming Languages".
| Series: Department Seminar Title: PerfCenter and AutoPerf: Tools and Techniques for Modeling and
Measurement of the Performance of Distributed Applications - Speaker: Dr. Varsha Apte, Visiting Professor
- Date and Time: Wednesday, November 11, 2009, 11:30 AM
- Venue: Room No. 254, CSA Seminar Hall, First Floor
Abstract In this talk, we will present the design and methodology underlying two
software tools that we have developed in the last few years at IIT Bombay
for performance measurement and modeling of distributed applications.
We present a tool, PerfCenter, which can be used for performance oriented
deployment and configuration of a multi-tier application in a hosting
center, or a data center. While there are a number of tools which aid in the
process of performance analysis during the software development cycle, few
tools are geared towards aiding a data center architect in making
appropriate decisions during the deployment of an application. PerfCenter
facilitates this process by allowing specification in terms that are natural
to a data center architect. Thus, PerfCenter takes, as input, the number and
specs of hosts available in a data center, the network architecture of geographically
diverse data centers, the deployment of software on hosts, hosts on data
centers, and the usage information of the application (scenarios,
resource consumption), and provides various performance measures such as
scenario response times, and resource utilizations. We describe the
PerfCenter specification, and its performance analysis utilities, and
illustrate its use in the deployment and
configuration of a Webmail application. PerfCenter works by generating the
underlying queueing network model of the distributed system and solving it
either by analytical methods or discrete-event simulation. We will provide
an insight into the primary challenges of solving this complex model
analytically. Finally, we present some validation results, where PerfCenter
model predictions were compared against measured data, which confirmed the
soundness of the tool.
We also present a load generator and performance measurement
tool (AutoPerf ) which requires minimal input and conguration from the user,
and produces a comprehensive capacity analysis as well as server-side
resource usage prole of a Web-based distributed system, in an automated
fashion. The tool requires only the workload and deployment description of
the distributed system, and automatically sets typical parameters that load
generator programs need, such as maximum number of users to be emulated,
number of users for each experiment, warm-up time, etc. The tool also does
all the co-ordination required to generate a critical type of measure,
namely, resource usage per transaction or per
user for each software server. This is a necessary input for
creating a performance model of a software system.
Speaker Bio: Varsha Apte is a faculty member in the Department of Computer Science and
Engineering, IIT Bombay, where she has been since 2002. During the year
2009-10 (sabbatical leave from IITB) she is Visiting Faculty at the Computer
Science and Automation Dept. at IISC Bangalore and part-time Visiting
Researcher in the IBM India Research Lab, Bangalore. Prior to joining IIT
Bombay, she was in the Network Design and Performance Analysis Department of
AT&T Labs in Middletown, NJ. She received her Ph.D. from Duke University in
1994, and Masters from Pune University in 1989. Her primary research
interest is in performance management (modeling, analysis and control) of
distributed applications.
| Series: Department Seminar Title: Analysis Techniques for Cyber-Physical Systems - Speaker: Dr. Sibin Mohan
Research Scientist, Computer Science department
University of Illinois at Urbana-Champaign (UIUC) - Date and Time: Tuesday, November 10, 2009, 4:00 PM
- Venue: CSA Seminar Hall (Room No. 254)
Abstract Embedded Systems are ubiquitous. Most modern embedded systems have (a) real-time
constraints and (b) interact with the physical world. They are increasing in size,
complexity and scope and are increasingly interconnected. Yet, they suffer from a
serious lack of analysis techniques and tools that makes the task of designing and
verifying such systems a laborious and complicated process. With the advent of
"cyber-physical systems", i.e. systems that have both, computational as well as
physical world requirements, this problem is exacerbated.
In today's talk I will focus on analysis techniques and tools that I have
developed/am working on for the following three broad areas:
I. Virtual Integration: system integration techniques for dealing with complex
systems such as Avionics
II. Integration of Security into systems with safety-critical/real-time constraints
III. Analysis of contemporary processor architectures (out-of-order and multicore
processors) for use in real-time systems.
Speaker Bio: Sibin Mohan is a Research Scientist in the Computer Science department at the
University of Illinois at Urbana-Champaign (UIUC).
Sibin completed his Bachelor of Engineering (B.E.) from Bangalore University in
Computer Science and Engineering in 2001. He worked for Hewlett-Packard, Bangalore
for a year before enrolling in the doctoral program at North Carolina State
University in 2002. He obtained his M.S. and Ph.D. degrees in Computer Science from
NC State in 2004 and 2008 respectively, where he was awarded a Teaching Fellowship
from the graduate school.
Sibin's research interests include: Systems (embedded and real-time systems,
cyber-physical systems, operating systems), Computer Architecture and Compilers.
| Series: Msc(Engg) Thesis Defense Title: Efficient Compilation of Stream Programs onto Multi-cores with
Accelerators - Speaker: Mr. Abhishek Udupa, M.Sc Engg.
- Faculty Advisor: Prof. Govindarajan, Prof. T. Matthew Jacob
- Date and Time: Tuesday, November 10, 2009, 10:00 AM
- Venue: Room No. 254, Seminar Hall [First Floor]
Abstract Over the past two decades, microprocessor manufacturers have typically
relied on wider issue widths and deeper pipelines to obtain
performance improvements for single threaded applications. However, in
the recent years, with power dissipation and wire delays becoming
primary design constraints, this approach can no longer be effectively
used to yield performance improvements. Thus processor designers and
vendors are universally moving towards multi-core designs. Examples
for these are the commodity general purpose multi-core processors, the
CellBE accelerator from IBM and the Graphics Processing Units from
NVIDIA and ATI. Although these many and multi-core architectures can
provide enormous performance benefits, it is difficult to program for
them due to the complexity of writing explicitly parallel code. The
ubiquity of computationally intensive media processing applications
makes it imperative to consider new programming frameworks and
languages that can express parallelism in an easy, portable manner.
The StreamIt programming language has been proposed to efficiently
exploit parallelism at various levels on general purpose multi-core
architectures and stream processors and allow media processing and DSP
application to be developed in an easy and portable fashion. The
StreamIt model allows programmers to specify a program as a set of
filters connected by FIFO communication channels. The graphs thus
specified by the StreamIt programs describe task, data and pipeline
parallelism which can be potentially exploited on modern Graphics
Processing Units (GPUs), which have emerged as powerful, commodity
stream processors, which support abundant parallelism in hardware.
This thesis deals with the challenges in mapping StreamIt programs to
GPUs and proposes an efficient technique to software pipeline the
execution of stream programs on GPUs. We formulate this problem ? both
scheduling and assignment of filters to processors ? as an efficient
Integer Linear Program (ILP), which is then solved using ILP solvers.
We also describe a novel buffer layout technique for GPUs which
facilitates exploiting the high memory bandwidth available in GPUs.
The proposed scheduling utilizes both the scalar units in GPU, to
exploit data parallelism, and multiprocessors, to exploit task and
pipeline parallelism.
While the approach of software pipelining the execution of stream
programs only on GPUs is efficient and performs well, it does not
utilize the CPU cores to perform useful computation. Further, it does
not support programs with stateful filters, which are essentially
filters that are not data parallel owing to dependences between
successive firings that is carried through the implicit state of the
filter. The second part of the thesis aims at addressing these issues
and describes a novel method to orchestrate the execution of a
StreamIt program on the multiple cores of a system and GPUs in a
synergistic manner. We formulate the problem of partitioning the work
between the CPU cores and the GPU, taking into account the latencies
for data transfers, the limited DMA bandwidth available and the
required buffer layout transformations associated with the
partitioning, as an integrated Integer Linear Program (ILP) which can
then be solved by an ILP solver. Since solving an ILP is NP-Hard in
the general case and may thus require a large amount of time, we also
propose an efficient heuristic algorithm for the work partitioning
between the CPU and the GPU, which provides solutions which are within
9.05% of the optimal solutions to the ILP formulation on an average
across the benchmark suite, while requiring 2 ? 3 orders of magnitude
less time than the ILP approach. Our experiments on a platform with
eight CPU cores, out of which four were used, and a GeForce 8800 GTS
512 GPU show a (geometric) mean speedup of 6.84X with a maximum of
51.96X over a single threaded CPU execution across a set of StreamIt
benchmarks.
| Series: Msc(Engg) Thesis Defense Title: Extension of Path Probability Method to Approximate Inference over Time - Speaker: Mr. Vinay Jethava
- Faculty Advisor: Prof. Chiranjib Bhattacharyya
- Date and Time: Thursday, October 29, 2009, 11:00 AM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract There has been a tremendous growth in publicly available digital video
footage over the past decade. This has necessitated the development of new
techniques in computer vision geared towards e_cient analysis, storage and
retrieval of such data. Many mid-level computer vision tasks such as segmen-
tation, object detection, tracking, etc. involve an inference problem based
on the video data available. Video data has a high degree of spatial and
temporal coherence. For example, pixels near a black pixel tend to have the
same color, or that an object in motion in the current video frame, is likely
to remain in motion in the next frame. The property must be intelligently
leveraged in order to obtain better results.
Graphical models, such as Markov Random Fields, have emerged as a
powerful tool for such inference problems. They are naturally suited for ex-
pressing the spatial dependencies present in video data, It is however, not
clear, how to extend the existing techniques for the problem of inference
over time. This thesis explores the Path Probability Method, a variational
technique in statistical mechanics, in the context of graphical models and ap-
proximate inference problems. It extends the method to a general framework
for problems involving inference in time, resulting in an algorithm, DynBP.
We explore the relation of the algorithm with existing techniques, and _nd
the algorithm competitive with existing approaches.
The main contribution of this thesis are the extended GBP algorithm,
the extension of Path Probability Methods to the DynBP algorithm and
the relationship between them. We have also explored some applications in
computer vision involving temporal evolution with promising results.
| Series: Department Seminar Title: Where is the technology in an animation company - Speaker: Dr.Michael Henderson, DreamWorks Animation in India
- Date and Time: Tuesday, October 27, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract Will discuss the many ways that technology is involved in the animated filmmaking process. Will cover what the different technology roles typically are, how they contribute, and who best fills those roles.
Speaker Bio: Michael Henderson is currently the Director of Technologies at DreamWorks Animation India. Michael has worked in the animated film industry for 10 years, actively involved in Disney Feature Animation and DreamWorks’ transitions from traditional 2D to CG filmmaking, as well as the recent move to fully stereoscopic filmmaking at DreamWorks as the Technical Supervisor on Monsters vs. Aliens.
| Series: Department Seminar Title: Art and Technology in Animation Film Making - Behind the Scenes: Monsters vs. Aliens - Speaker: Dr. Mahesh Ramasubramanian, DreamWorks Animation
- Date and Time: Tuesday, October 27, 2009, 3:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract What do you get when you cross the idea of telling a film from monsters' point of view, with "Mad Magazine" style, and a "Dirty Dozen" plot? Find out more when we look behind the scenes at DreamWorks' animated movie "Monsters vs. Aliens" and get inspired by the creative and technical challenges of developing a blobby main character, simulating open water, building San Francisco, and then destroying it.
Speaker Bio: Mahesh Ramasubramanian most recently served as visual effects supervisor on DreamWorks' "Monsters vs. Aliens: Mutant Pumpkins From Outer Space". He previously served as the digital supervisor on DreamWorks' "Monsters vs. Aliens." His credits include "Shrek," the "Shrek 4-D" Universal attraction, "Shrek 2," "Madagascar," "Over the Hedge," and "Bee Movie." Ramasubramanian is from Chennai, India, and graduated from Birla Institute of Technology and Science, Pilani, India.
| Series: Department Seminar Title: An Overhead and Resource Contention Aware Analytical Model
for Overloaded Web Servers - Speaker: Dr. Varsha Apte, Visiting Professor, CSA
- Date and Time: Wednesday, October 14, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract Increased response time during periods of overload on
aWeb server may cause impatient users to time-out, causing the
server to do un-productive work in processing these abandoned
requests. Overhead
time spent in preprocessing each request adds to the unproductive
work even for requests that are not taken up for service. This
causes
the usable throughput, i.e. goodput, of the overloaded Web server
to drop drastically, while resource utilization remains at 100%.
Al-
though this behaviour can be easily reproduced experimentally, ex-
isting analytical models of queues with abandonments are not able
to do so.
We present an analytical model that captures characteristics spe-
cic to networked software servers, namely, overhead processing
and contention for shared hardware resources, that is able to
explain
the goodput degradation typically observed in overloaded servers.
We use this model to compare the performance of the LIFO and
FIFO queueing disciplines during overload and show that LIFO
goodput and response time are better than those of FIFO.
Speaker Bio: Varsha Apte is a faculty member in the Department of Computer
Science and Engineering, IIT Bombay, where she has been since
2002. During the year 2009-10 (sabbatical leave from IITB) she is
Visiting Faculty at the Computer Science and Automation Dept. at
IISC Bangalore and part-time Visiting Researcher in the IBM India
Research Lab, Bangalore. Prior to joining IIT Bombay, she was in
the Network Design and Performance Analysis Department of AT&T
Labs in Middletown, NJ. She received her Ph.D. from Duke
University in 1994, and Masters from Pune University in 1989. Her
primary research interest is in performance management (modeling,
analysis and control) of distributed applications.
| Series: Theory Seminar Title: Chordal Bipartite Graphs with High Boxicity - Speaker: Mr. Rogers Mathew, Ph.D student
- Faculty Advisor: Dr. Sunil Chandran
- Date and Time: Monday, October 12, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract The boxicity of a graph G is defined as the minimum integer k such
that G is an intersection graph of axis-parallel k-dimensional boxes.
Chordal bipartite graphs are bipartite graphs that do not contain an
induced
cycle of length greater than 4. It was conjectured by Otachi, Okamoto
and
Yamazaki that chordal bipartite graphs have boxicity at most 2. We
disprove
this conjecture by exhibiting an infinite family of chordal bipartite
graphs
that have unbounded boxicity.
| Series: Msc(Engg) Colloquium Title: Discovering Rules from Disk Events for Predicting Hard Drive Failures - Speaker: Vipul Agrawal
- Faculty Advisor: Dr. Chiranjib Bhattacharyya
- Date and Time: Friday, October 09, 2009, 10:00 AM
- Venue: Room No. 252, CSA Dept.
Abstract The ability to accurately predict an impending hard disk failure is important for reliable storage system design. The facility provided by most hard drive manufacturers, called S.M.A.R.T. (self-monitoring, analysis and reporting technology), has been shown by current research to have poor predictive value. The problem of finding
alternatives to S.M.A.R.T. for predicting disk failure is an area of active research.
In this work, we present a rule discovery methodology, and show that it is possible to
construct decision support systems that can detect such failures using information
recorded from live disks.
It is desired that any such prediction methodology should have high accuracy and must
have ease of interpretability. Black box models can deliver highly accurate solutions but
do not provide an understanding of events which explains the decision given by it. To
this end we explore rule based classifiers for predicting hard disk failures from various
disk events. We show that it is possible to learn easy to understand rules from disk
events. Our evaluation shows that our system can be tuned either to have a high failure
detection rate (i.e., classify a bad disk as bad) or to have a low false alarm rate (i.e., not
classify a good disk as bad).
We also propose a modification of MLRules algorithm for classification of data with
imbalanced class distributions. The existing algorithm, assuming relatively balanced
class distributions and equal misclassfication costs, performs poorly in classification of
such datasets. The performance can be considerably improved by introducing cost-
sensitive learning to the existing framework.
| Series: Department Seminar Title: Approximating Optimal Decision Trees - Speaker: Dr. Venkat Chakaravarthy
- Date and Time: Thursday, October 08, 2009, 4:00 PM
- Venue: CSA Seminar Hall
Abstract In this talk, we shall consider the problem of constructing
decision trees for entity identification from a given table. The
input is a table containing information about a set of entities
over a fixed set of attributes. The goal is to construct a
decision tree that identifies each entity unambiguously by
testing the attribute values such that the average number of
tests is minimized. This well-studied problem finds applications
in machine fault detection, species identification in biology and
medical diagnosis. After a quick review of prior work, we will
discuss an O(log n)-approximation algorithm that is based on a
new greedy heuristic. We will conclude our discussion by stating
some interesting open problems.
Joint work with Vinayaka Pandit, Sambuddha Roy and Yogish Sabharwal.
A paper based on the work can be found in the proceedings of the
ICALP'09 conference.
Speaker Bio: Dr. Venkat Chakaravarthy received his PhD from the University of
Wisconsin, Madison. He is currently working at IBM Research Lab
in New Delhi. His area of interest is theory of computation; in
particular, complexity theory and approximation algorithms. He
also enjoys working on theoretical issues related to database
systems. Recently, he has picked up an interest in algorithmic
issues related to high performance computing.
| Series: Department Seminar Title: Ranking Problems in Machine Learning: Theory and Applications - Speaker: Dr. Shivani Agarwal
- Date and Time: Wednesday, October 07, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract In the last few decades, there has been considerable progress in the
understanding of binary classification (learning of binary-valued
functions) and regression (learning of real-valued functions), both
classical problems in machine learning. Although several questions
remain to be answered, there is a well-developed theory in place for
these problems, and practical successes have been demonstrated in a
variety of applications.
Recently, a new class of learning problems, namely ranking problems,
have begun to gain attention. In ranking, one learns a real-valued
function that assigns scores to objects, but the scores themselves
do not matter; instead, what is important is the relative ranking of
objects induced by those scores. Ranking problems arise in a variety
of domains: in information retrieval, one wants to rank documents
according to relevance to some topic or query; in user-preference
modeling, one wants to rank items according to a user's likes and
dislikes; in computational biology, one wants to rank genes
according to relevance to some disease. Ranking problems are
mathematically distinct from both classification and regression, and
cannot be analyzed using existing results for these problems.
In this talk, I will describe some recent results in both the
theoretical understanding of ranking and its applications. In
particular, I will describe generalization bounds for ranking
algorithms based on the tools of uniform convergence and algorithmic
stability, and some preliminary results on the sample complexity of
learning ranking functions. I will conclude with some recent
applications to ranking chemical structures for drug discovery.
Speaker Bio: Dr Agarwal is currently a Postdoctoral fellow in CS Dept, MIT.
Her research interests are in Machine Learning and its applications
to Computational Biology
| Series: Department Seminar Title: Collective Clustering in Relational Networks - Speaker: Dr.Indrajit Bhattacharya
- Date and Time: Tuesday, October 06, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract We are commonly faced with heterogeneous data involving different
types of entities connected by relational networks, such as people and topics
in social networks, proteins in biological networks, and so on.
For clustering data from such relational networks, traditional
clustering approaches are based on features of individual data items.
I will focus on collective clustering in a relational network, where
clustering decisions are made jointly over relational neighborhoods
in the network.
To address this task, in this talk, I will present probabilistic
graphical models that capture cluster relationships using latent groups over
clusters. Inference in such models is a challenge, and I will present an
efficient approximate method based on Gibbs Sampling.
I will present experiment results demonstrating that collective
clustering improves both clustering accuracy as well as interpretability of the
discovered clusters compared to traditional approaches.
Speaker Bio: Dr Indrajit Bhattacharya is a research scientist working
in IBM research labs in Delhi.
His research interest lies in machine learning techniques using
probabilistic graphical models for analysis and prediction in
structured and unstructured data.
His research includes developing probabilistic and other models for
collective relational
clustering resolution for entity resolution in relational data and
word sense disambiguation from multilingual
corpora. Dr Bhattacharya holds a BTech from Dept of CS, IIT
Kharaghpur and later received his PhD from University of Maryland.
| Series: Department Seminar Title: Theoretical and Experimental Self-Assembly - Speaker: Dr. Manoj Gopalkrishnan, TIFR Mumbai
- Date and Time: Friday, September 25, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract Recent work on the tile assembly model has revealed the
close connections between computer science and self-assembly. I shall
describe two approaches to the study of self-assembly that are
inspired by the tile assembly model.
The first approach involves experiments with DNA molecules. We use
DNA
like a construction material --- akin to the uses of brick, cement,
glass,
etc. --- to form nanostructures like hexagonal tilings,
cylinders and mobius strips.
The second approach involves a mathematical investigation of the law
of
mass action in chemistry. A major goal of our work is to solidify the
mathematical foundations of mass action chemistry. In addition, we
believe
that the law of mass action is of intrinsic mathematical
interest. We are led to a dynamical theory of sets of binomials over
the
complex numbers, with hints of connections to number theory and toric
varieties.
Speaker Bio: Manoj is a faculty member in the School of Technology
and
Computer Science at the Tata Institute of Fundamental Research,
Mumbai
since March 2009. He received his Ph. D. in Computer Science from the
University of Southern California in December 2008. His advisor was
Professor Len Adleman, and dissertation topic was "Theoretical and
Experimental Self-Assembly." Before that, he received a B. Tech. in
computer science and engineering from IIT Kharagpur.
| Series: Department Seminar Title: Large-scale Data Management for the Sciences - Speaker: Dr. Tanu Malik
- Date and Time: Wednesday, September 23, 2009, 4:00 PM
- Venue: CSA Seminar Hall
Abstract Traditional enterprises and novel scientific applications are
accumulating petabyte-scale datasets, which makes the need for large-scale
data management more pressing than ever. Geographic distribution of the
datasets accompanied by complex demands on data makes large-scale data
management challenging. This is especially true for sciences that model
complex physical and biological phenomena using data from multiple sources.
In this talk I will address two critical problems in scientific data
management: combining large number of diverse data
sources for execution of scientific queries and executing data-intensive
scientific queries efficiently, in terms of both
network and I/O, on these data sources. I will present SkyQuery--a system
that federates data from several petabyte size,
autonomous and heterogeneous astronomy databases scattered worldwide.
Using SkyQuery, scientists can write declarative queries
that compare and merge multiple astronomical datasets. For efficient query
execution and scalability, I will present River--a novel
caching framework for database systems that dramatically reduces the
network bandwidth requirements of data-intensive federations
such as SkyQuery. Distributed applications such as the River often rely on
a priori knowledge of query cardinalities to make optimization
decisions. In this context, I will present a black-box approach to
selectivity estimation that is suitable for distributed applications.
The success of SkyQuery and its adoption by the National Virtual
Observatory is an example of data management systems enabling scientific
endeavors.
Speaker Bio: Tanu Malik is a Research Assistant Professor with the Cyber Center in
Discovery Park at Purdue University and with the Indiana Center of
Database Systems. Her research interests are in a wide variety of areas
including but not limited to data federations, database caching, query
execution and optimization, self-organizing database systems, and summary
structures for cardinality estimation. A recurrent theme in her research
is to re-examine the core principles of database technology in the light
of new requirements emerging from scientific data. Her research has
resulted in some innovative database technology for handling large
distributed scientific data.
Tanu earned her PHD and MS in 2007 from the Department of Computer
Science at Johns Hopkins University. She earned her B.Tech in 1999 from
the Department of Civil Engineering at Indian Institute of Technology,
Kanpur.
| Series: Msc(Engg) Colloquium Title: Discovery of Application Workloads from Network File Traces - Speaker: Ms. Neeraja Yadwadkar
- Faculty Advisor: Prof. Chiranjib Bhattacharyya
- Date and Time: Tuesday, September 15, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract An understanding of I/O data access patterns of applications
is useful
in several situations. First, gaining insight
into what applications are doing with their data at a semantic
level
helps in designing efficient storage systems. Second, it helps
create
benchmarks that mimic realistic application behavior closely.
Third,
it enables autonomic systems as the information obtained can
be used to
adapt the system in a closed loop.
All these use cases require the ability to extract the
application-level
semantics of I/O operations. Methods such as modifying
application
code to associate I/O operations with semantic tags are
intrusive.
It is well known that network file system traces are an
important
source of information that can be obtained non-intrusively and
analyzed
either online or offline. These traces are a sequence of
primitive
file system operations and their parameters. Simple counting,
statistical
analysis or deterministic search techniques are inadequate for
discovering
application-level semantics in the general case, because of
the inherent
variation and noise in realistic traces.
In this work, we describe a trace analysis methodology based
on Profile
Hidden Markov Models. We show that the methodology has
powerful discriminatory
capabilities that enable it to recognize applications based on
the
patterns in the traces, and to mark out regions in a long
trace that
encapsulate sets of primitive operations that represent
higher-level
application actions. It is robust enough that it can work
around discrepancies
between training and target traces such as in length and
interleaving
with other operations. We demonstrate the feasibility of
recognizing
patterns based on a small sampling of the trace, enabling
faster trace
analysis. Preliminary experiments show that the method is
capable of
learning accurate profile models on live traces in an online
setting.
We present a detailed evaluation of this methodology in
a UNIX environment using NFS traces of selected commonly used
applications such as compilations as well as on
industrial strength benchmarks such as TPC-C and Postmark, and
discuss
its capabilities and limitations in the context of the use
cases
mentioned above.
| Series: Department Seminar Title: Unsupervised Learning of Word Senses - Speaker: Prof. Suresh Manandhar, University of York, UK
- Date and Time: Thursday, September 10, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract Unsupervised learning of lexical semantics is an emerging
area
within NLP that poses interesting and challenging problems. The primary
advantage of unsupervised and minimally supervised methods is that
annotated data is not required or required only in small quantities. In
this talk, I will present our current work on word sense induction.
Sense induction is the task of discovering all the senses of a given
word from raw unannotated data. Our collocational graph based method
achieves high evaluation scores while overcoming some of the
limitations
of existing methods. We show graph connectivity measures can be
employed
to avoid the need for supervised parameter tuning. Hierarchical
clustering and hierarchical random graphs can be employed for inducing
concept hierarchies.
Speaker Bio: Prof. S. Manandhar is on the faculty of CS Dept, Univ of York, UK.
His research interest lies in Natural language processing and its
applications.
| Series: Department Seminar Title: "Sherpa: Yahoo's distributed storage service" - Speaker: Dr. P.P.S. Narayan,
Yahoo - Date and Time: Tuesday, September 08, 2009, 4:00 PM
- Venue: Room No. 254, CSA Seminar Hall, [First Floor]
Abstract Sherpa is the next-generation structured-record distributed storage
service that addresses growing scalability needs of Yahoo!
properties.
Key features of Sherpa include -- high scalability, elastic growth,
global footprint for local low-latency access, asynchronous
replication,
RESTful web service APIs, novel per-record consistency knobs, high
availability, low capex and opex, and many more. Sherpa is a hosted,
centrally managed, and geographically distributed service, and
utilizes
automated load-balancing and failover to reduce operational
complexity.
Over the last year, we have built the Sherpa platform from the ground
up, and in May 2009 Sherpa version 1.5 was announced to be generally
available within Yahoo!. In this talk, we present the technology and
architecture of the Sherpa platform, the engineering challenges of
building a large-scale highly-available distributed system, and
finally
we present the roadmap and future of the product.
Speaker Bio: P.P.S. Narayan (PPSN) is the Director of Engineering of the Sherpa
product. His team is responsible for the design, architecture, and
engineering of Sherpa. Before joining Yahoo!, PPSN was at Bell Labs
where he was involved in database, and network management research.
He
also led a small Bell Labs Venture team to build the next-generation
LBS platform called Geopepper. PPSN has publications at reputed
database
conferences in the areas of transaction management, XML, and query
processing.
| Series: Department Seminar Title: Compiling for Multicore Systems - Speaker: V. Krishna Nandivada
IBM India Research Lab, Bangalore - Date and Time: Friday, September 04, 2009, 4:00 PM
- Venue: CSA Seminar Hall (Room No. 254)
Abstract As multi-core systems are gaining popularity, there is a definite need for languages, tools and techniques that can simplify programming high performance machines to exploit the hardware features to a significant level and achieve higher throughput. Two of the main issues that we encounter while compiling for multi-core systems are the following: (a) the difference between user perceived parallelism and parallelism based on the actual hardware, and (b) reasoning about the locality of data and computation. In the talk, we will be discussing few steps in these directions by taking X10 as the language of reference.
To statically establish the place locality in X10 programs, we present an analysis framework based on a static abstraction of activities (threads) and an extension to classical escape analysis to track the abstract activities to which an object can escape. Our framework takes advantage of the high level abstraction of X10 distributions to reason about place locality of array accesses in loops as well.
A simple and important example of the separation of concerns between ideal and useful parallelism can be found in chunking of parallel loops, where the programmer expresses ideal parallelism by declaring all iterations of a loop to be parallel and the implementation exploits useful parallelism by executing iterations of the loop in sequential chunks. We addressed the problem of chunking parallel loops via a semantic preserving transformation framework that uses a combination of transformations from past work to obtain an equivalent set of parallel loops that chunk together statements from multiple iterations. In another effort to reduce the gap between ideal and useful parallelism, we designed a compiler that transforms explicitly parallel programs into SPMD programs. Our submissions based on our X10 compiler and runtime have won the HPC (class II) awards, at the SuperComputing conference held in years 2007 and 2008.
Speaker Bio: V. Krishna Nandivada graduated from UCLA and currently works in IBM India Research Lab, Bangalore. He did his masters from Indian Institute of Science, Bangalore and his bachelors from Regional Engineering College (now known as National Institute of Technology) Rourkela. Before joining in IBM, he has been associated with Hewlett Packard, Bangalore and Sun Labs, Burlington for different periods of times.
His interests include program analysis, programming language design, compiler optimizations, and program reasoning.
|
|