LA-WEb 2008

6st Latin American Web Congress
October 28 - 30, 2008 Vila Velha, Espírito Santo, Brazil

Speakers

Invited Talks

Tutorials

Joint Invited Talks

Invited Talks

Past Searches Teach Everything: Including the Future! [Slides]

Speaker: Fabrizio Silvestri, CNR, Italy

Abstract: "History Teaches Everything, Including the Future" wrote Alphonse de Lamartine in the nineteen century. Even if history cannot be really considered a predictive science, historical information can successfully be used in many fields. In this talk I will discuss Web Search Engines and their Query logs, which contain historical information about past usage of such systems. Query Logs are nowadays considered as one of the most important sources of knowledge that is routinely exploited to improve performance in terms of both effectiveness and efficiency of Web Search Engines. The techniques I am going to review are mainly focused on enhancing efficiency of real-world distributed search systems. I will present some of the most interesting results obtained in the last five years by us at the High Performance Computing Laboratory in Pisa in collaboration with research labs Worldwide. I will start by showing state-of-the-art caching techniques for search results and posting lists and ways to optimally combine them. Next, I will show applications of data mining techniques aimed to speed-up the processing of queries in both document-based, and term-based distributed search engines. I will conclude by presenting two new results. One is related to caching in similarity-based multimedia search systems. The latter is about caching for efficient snippet generation on a distributed repository of documents.

Short Bio: Fabrizio Silvestri is currently a Researcher at ISTI - CNR in Pisa. He received his Ph.D. from the Computer Science Department of the University of Pisa in 2004. His research interests are mainly focused on Web Information Retrieval with particular focus on Efficiency related problems like caching, collection partitioning, distributed IR in general. In his professional activities Fabrizio Silvestri is member of the Program committee of many of the most important conferences in IR as well as organizer and, currently, member of the steering committee, of the workshop Large Scale and Distributed Systems for Information Retrieval (LSDS-IR).


Managing Linked Data on the Web: the LinkedMDB showcase

Speaker: Mariano Consens, Universidad de Toronto

Abstract: The Linking Open Data community project is extending the Web by encouraging the creation of interlinks (RDF links between data items from different datasets identified using dereferenceable URIs). This emerging web of linked data is closely intertwined with the existing web, since structured data items can be embedded into web documents (e.g., RDFa or microformat encoded data), and RDF links can reference classic web pages. Abundant linked data justifies extending the capabilities of web browsers and search engines, and enables new usage scenarios, novel applications and sophisticated mashups.

This promising direction for publishing data on the web brings forward a number of challenges. While existing data management techniques can be leveraged to address the challenges, there are unique aspects to managing web scale interlinking. In this presentation, we describe two specific challenges; achieving and managing dense interlinking, and describing data, metadata, and interlinking within and among linked web datasets. Approaches to solve these challenges are showcased in the context of LinkedMDB, the first open linked dataset for movies. LinkedMDB has a large number of interlinks (over a quarter of a million) to other open linked datasets, as well as RDF links to movie-related webpages.

Short Bio: Mariano Consens research interests are in the areas of Data Management and the Web, with a focus on linked data, XML searching, analytics for semistructured data, and autonomic systems. He has over 40 publications, including journal publications selected from best conference papers and several patents and patent applications.

Mariano received his PhD and MSc degrees in Computer Science from the University of Toronto. He also holds a Computer Systems Engineer degree from the Universidad de la Republica, Uruguay.

Consens has been a faculty member in Information Engineering at the MIE Department, University of Toronto, since 2003. Before that, he was research faculty at the School of Computer Science, University of Waterloo, from 1994 to 1999. In addition, he has been active in the software industry as a founder and CTO of a couple of software startups and is currently a Visiting Scientist at the IBM Center for Advanced Studies in Toronto.


Personal Information Ecosystems: Design Concerns for Net-Enabled Devices [Slides]

Speaker: Manuel A. Pérez-Quiñones, Center for Human-Computer Interaction, Department of Computer Science, Virginia Tech.

Abstract: Today, with the proliferation of affordable computing, people use multiple devices to fulfill their information needs. However, designers approach each device platform individually, without accounting for the other devices that users may also use. In many cases, the applications on all the user's devices are designed to be functional replicates of each other, often with an emphasis on keeping their form and function consistent with one another. We argue that this emphasis on recreating consistent clones on each platform should not be the dominant concern for designers. In this presentation, we present the idea of a personal information ecosystem, an analogy to biological ecosystems, which allows us to discuss the inter-relationships among users devices to fulfill their information needs. Using the examples of now ubiquitous web technologies on different platforms, we discuss how considering the user's ecosystem of devices as a whole as a design target lends to better user experience and encourages designers to tackle the more important concern of seamless task migration across devices.

Short Bio: Manuel A. Pérez-Quiñones is Associate Dean in Residence and Director of the Office for Diversity Programs at the Graduate School, and an Associate Professor in the Department of Computer Science at Virginia Polytechnic Institute and State University. Pérez-Quiñones holds a DSc. in Computer Science from The George Washington University. His research interests include human-computer interaction, personal information management, user interface software, digital government, and educational uses of computers. He is an NSF CAREER awardee, and for 2008-2010 has been included in the IEEE Computer Society Distinguished Visitor program. He is a member of the Coalition to Diversify Computing, where he co-directs the national program Collaborative Research Experience for Undergraduates in Computer Science and Engineering. He serves on the editorial board for the Journal on Educational Resources in Computing.


Tutorials

Introduction to Web Mining [Slides]

Speaker: Ricardo Baeza-Yates, VP of Yahoo! Research

Part I: 28/10, 14:00 - 16:00.
Part II: 29/10, 18:00 - 19:00

Summary: The Web continues to grow and evolve very fast, changing our daily lives. This activity represents the collaborative work of the millions of institutions and people that contribute content to the Web as well as the one billion people that use it. In this ocean of hyperlinked data there is explicit and implicit information and knowledge. Web Mining is the task of analyzing this data and extracting information and knowledge for many different purposes. The data comes in three main flavors: content (text, images, etc.), structure (hyperlinks) and usage (navigation, queries, etc.), implying different techniques such as text, graph or sequence mining. Each case reflects the wisdom of some group of people that can be used to make the Web better. For example, user generated tags in Web 2.0 sites. In this talk we walk through the mining process and show several applications, ranging from Web site design to search engines.

Short Bio: Ricardo Baeza Yates es Ph.D. en Computer Science (Univ. of Waterloo, Canadá, 1989), Magister en Ing. Eléctrica (1986) y Cs. de la Computación (1985) de la Univ. de Chile; e Ingeniero Civil Eléctrico de la misma universidad. Actualmente es vicepresidente de Yahoo! Research para Europa & Latinoamérica, basado actualmente en Barcelona. Sus áreas de investigación son recuperación de información, minería de la Web, algoritmos y visualización de información. Es co-autor de un libro en recuperación de informción (Addison-Wesley, 1999), de un manual de referencia en algoritmos y estructuras de datos (Addison-Wesley, 1991) y co-editor de un libro en recuperación de la información (Prentice-Hall, 1992). Ha sido dos veces presidente de la Sociedad Chilena de Ciencia de la Computación y ha recibido premios de la Organización de Estados Americanos y el Instituto de Ingenieros de Chile. También recientemente fue presidente del CLEI (Centro Latinoamericano de Estudios en Informática), miembro del directorio de IEEE-CS y coordinador internacional del subprograma de informática y electrónica aplicadas de CYTED (Programa de Cooperación Iberoamericano). Durante el año 2000 comenzó un "spin-off" de Internet para buscar en la Web Chilena (www.todocl.cl). En 2002 fundó en Chile el Centro de Investigación de la Web (www.ciw.cl), del cual fue su primer director, y fue la primera persona de su área científica en ser incorporada a la Academia de Ciencias de Chile en 2003. En el 2007 obtuvo la medalla J.W. Graham de la Univ. de Waterloo que se otorga a ex-alumnos por innovación en computación.


Joint Invited Talks

NeoVictorian Computing, with a Twist

Speaker: Simon Harper, Human Centred Web Laboratory, University of Manchester (UK)

Abstract: Experience in World Wide Web (Web) accessibility has taught us: to think about small bespoke solutions; to tailor interaction and requirements to the user and job at hand; to value high data interoperability; to realise that large enterprise systems, become unmanageable and unable to change at the speed required by both users and technology; and finally, to value heterogeneity. Indeed, with the advent of Workflows, RPCs, RESTful services, and Cloud Computing we can also see these viewpoints becoming more common in mainstream thought. Here, we use Bernstein's concept of 'NeoVictorian Computing' as a counterfoil to Andriole's new viewpoint of 21st century software development. We extrapolate from the Web development model into corporate and enterprise systems and propose a architecture of client based heterogeneous applications each tailored to a specific user, and their job, with highly interoperable data, controlled by workflow's that are transferred with the data itself. We discount the new client-computing fad, as this really means centrally controlled, sometimes unavailable, old style enterprise systems. We suggest that by moving toward user centred agile systems we follow the conceptual, if not the technological, underpinnings of the Web. In this case we realise that Web developers are in a privileged position to shape and push forward this new kind of software architecture and the 'craft' based approaches which will drive it.

Short Bio: Simon Harper is Research Lead of the Human Centred Web Laboratory in the School of Computer Science at the University of Manchester (UK). He is interested in how users interact with the Web and how the Web, through its design and technology, enables users to interact with it. He believes that by understanding disabled-users' interaction we enhance our understanding of all users operating in constrained modalities where the user is handicapped by both environment and technology. He sees fundamental research into users with disabilities as a natural preface to wide human factors research.

Simon is a founder of the W4A Conference on Web Accessibility (2004--2008). He was the Programme Chair for ACM ASSETS (2006), Submissions Chair for the WWW Conference (2006), General Chair for ACM Hypertext (2007) and ACM ASSETS (2008); he is also a member of the steering committee for these two conferences. He has guest edited the New Review of Hypermedia and Multimedia; Journal of Web Engineering; Journal of Disability and Rehabilitation, and Universal Access in the Information Society. He is also a recipient of the ADDW IBM Research prize and the ACM Doug Engelbart prize.


OWL-Dogmatism Considered Harmful: The Role of Foundational Ontologies for The Semantic Web

Speaker: Giancarlo Guizzardi, Departamento de Informática, Universidade Federal do Espírito Santo (UFES)

Abstract: Conceptual Modeling is a discipline of great importance to several areas in Computer Science such as Software and Domain Engineering, Organizational Modeling, Information Systems Design, Database Design, Knowledge Management, among many others. In particular, what is termed a Domain Ontology in Computer Science is a special type of Conceptual Model. In recent years, there has been a growing interest in the development and use of domain ontologies, strongly motivated by the Semantic Web initiative. However, as we argue in this talk, an approach for ontology engineering uniquely based on the modeling languages adopted in the Semantic Web (e.g., OWL, RDF) is insufficient to address a number of representation semantic interoperability problems that arise in open and dynamic scenarios (such as, for instance, the Semantic Web itself). In this talk, we argue that these languages should be complemented by a language and methodology based on a Foundational Ontology, i.e., a domain-independent common-sense theory constructed by aggregating suitable contributions from areas such as philosophical ontology and logics, cognitive science and linguistics. Moreover, we discuss the requirements of a theoretically well-founded conceptual modeling language needed to meet the desiderata for a general conceptual modeling and ontological engineering language. Finally, we briefly present advanced conceptual modeling techniques (such as design patterns and methodological guidelines) based on principled ontological foundations and show how they can be used to solve some classical and recurrent conceptual modeling problems that (re)appear in concrete application scenarios.

Short Bio: Giancarlo Guizzardi obtained a PhD degree (Cum Laude) from the University of Twente, in The Netherlands. Since 2003 he has been a Visiting Scientist, Research Collaborator and Associated Researcher at the Laboratory for Applied Ontology (LOA), Institute for Cognitive Science in Technology (ISTC), in Trento, Italy. He is currently a member of the Computer Science Department at the Federal University of Espírito Santo, in Vitória, Brazil, where he is one of the coordinators of the Ontology and Conceptual Modeling Research Group (NEMO).

Giancarlo Guizzardi has been working for more than a decade in the development of Domain and Foundational Ontologies and their application in computer science and, primarily, in the area of Conceptual Modeling. His experience in the area has also been acquired in a number of academic and industrial projects in domains such as Software Engineering, Oil and Gas, and Medical Informatics. He is one of the initiators of the workshop series VORTE (Vocabularies, Ontologies and Rules for The Enterprise), a satellite event of the IEEE EDOC (Enterprise Computing) Conference and WOMSDE (Workshop on Ontologies and Metamodels in Software and Data Engineering). He is the authors of more than 60 papers in the subject of Ontologies published in journals, book chapters, conference proceedings and a book, including the best-paper award winning of the CAiSE'2004 conference. For the past five years, he has been promoting the discipline of Ontology-Driven Conceptual Modeling in different international scientific events and graduate programs as a panelist, distinguished lecturer and keynote speaker. Finally, he is currently a guest editor of journals such as Applied Ontology (IOS Press), Information Systems (Elsevier), and the International Journal of Business Process Integration and Management (InderScience).


CSCW and Collaborative Education

Speaker: Clarence Ellis, Collaboration Technology Research Group, University of Colorado (USA)

Short Bio: Dr. Clarence A. Ellis is a Professor of Computer Science, and Director of the Collaboration Technology Research Group at the University of Colorado at Boulder. At Colorado, he is a member of the Systems Software Lab, and the Institute for Cognitive Science. During 1991, he was chief architect of the FlowPath workflow product of Bull S.A. Previously he was the head of the Groupware Research Group within the Software Technology Program at MCC. For the decade prior to joining MCC, he was a research scientist at Xerox Palo Alto Research Center.

Clarence (Skip) Ellis is on the editorial board of numerous journals, and has been an active instigator and leader of a number of computer associations and functions. He has been a member of the National Science Foundation Computer Science Advisory Board; of the University of Singapore ISS International Advisory Board; of the NSF Computer Science Education Committee; and chairman of the ACM Special Interest Group on Office Information Systems (SIGOIS). His interests include groupware, coordination theory, object oriented systems, CSCW, office systems, databases, distributed systems, software engineering, world-wide-web (internetworking), systems design and modeling, workflow systems, and humane interfaces to computers. Mr. Ellis has also worked as a researcher and developer at Bell Telephone Laboratories, IBM, Xerox, Microelectronics and Computer Technology Corporation, Los Alamos Scientific Labs, and Argonne National Lab. His academic experience includes teaching at Stanford University, the University of Texas, MIT, Stevens Institute of Technology, and in Taiwan under an AFIPS overseas teaching fellowship. He has published several books, and over 100 technical papers and reports, lectured in more than a dozen countries, and was an invited speaker on object oriented systems at the most recent IFIP World Computer Conference.


Resource Lifecycle Management: BPM at work in the social web

Speaker: Fabio Casati, Department of Computer Science, University of Trento (Italy)

Abstract: The social web is (also) about a web of resources that are collaboratively managed, often in a fairly unstructured fashion. Examples are wikis or google docs. The same philosophy is spreading to "business resources", such as project schedules and project deliverables, often shared and edited collaboratively, though often with a little more structure. In open source software development, in a way a precursor to Web 2.0, the "software resources" are managed similarly. Some aspects common to all these resources is that 1) they go through some lifecycle (which may or may not be structured and defined); 2) even when defined, the lifecycle varies frequently, and this variation is not an "exception", is normal business life (just imagine defining a strict lifecycle for your project deliverables and sticking to it 100%....); 3) humans are in control: managing resources is not about automating the execution of their lifecycle. I certainly would not want a BPM to manage my activity of writing a paper. However, it is about being able to initiate actions when needed (submission for review, testing of software) and to monitor the status and history of a possibly large set of resources that I am managing (e.g., a set of deliverables for my project). In this talk I will present a resource lifecycle management system that can manage anything that can be referred to by a URI, that can be used by any skilled web user even without programming skills. It's key aspects are the simplicity of the model, the embracing of the inherent "unstructuredness" of the life of (Web) resources, the ability to monitor status and history of all managed resources, the ability to make changes easily on the fly, and the balance between human and automated control. In particular, in the RLM system, the "workflow engine" are in fact the humans managing the resources, initiating automated actions if and when needed. The approach is inspired in part by BPM systems and in part by the philosophy of the Web.The RLM system will be hosted and available for use by anybody who wants to model the lifecycle of any resource, to automate certain (resource-specific) actions (e.g., submission for review), and to observe the evolution of a set of resources of interest. The applications are countless, but examples include project deliverables (which typically vaguely follow a lifecycle based on a project quality plan), collaborative editing on wikis, writing of code and documentation, or composing, arranging, and recording a song.

Short Bio: Fabio Casati is professor of computer science at the University of Trento. He recently joined the University of Trento after 7 years in Hewlett-Packard USA, where he was technical lead for the research program on business process intelligence. Fabio has also contributed (as architect and data modeller) to the development of several HP commercial products and solutions in the area of web services and business process management. He is co-author of a book on Web services, member of the editorial board of ACM TWEB, and member of the steering committee of the international conferences on Service-Oriented Computing and Business Process Management.


Flexible Information Presentation: automatic generation of texts, diagrams and movies

Speaker: Donia Scott, Computational Linguistics, Open University (UK)

Abstract: A major challenge in AI remains that of designing presentation systems that are able to tailor their output to meet different needs. Naturally, much work in this area has focussed on natural language processing, and in particular, natural language generation. I will describe some of the ways in which this goal has been achieved in the context of a currently "hot" application in the European context: electronic patient records. I will show how natural language generation technology has been used to underpin the Clinical E-Science Framework (CLEF) initiative, generating a range of descriptions of patient histories, making use of a range of media and aimed at a range of audiences.

Short Bio: She is a Professor of Computational Linguistics at the Open University in the UK. The primary focus of her research is natural language generation – that is, designing systems that will automatically generate fluent text from a computational model of the meaning to be conveyed. She is particularly interested in the way in which linguistic phenomena above the sentence level contribute to the fluency and comprehensibility of a text; for example, the way the information to be conveyed is ordered and structured, and the way in which it is rendered on the page. She has approached this general topic from a number of different perspectives, including multilingual and multimodal generation, document structuring and paraphrasing. Although much of her work is theoretical, she does like to apply it where possible to interesting practical problems; recent examples include generating pharmaceutical leaflets and technical manuals in several languages and styles, and medical records tailored to different audiences or diffe rent needs. She also has a strong interest in natural language dialogue systems, e-learning and semantic web as testbeds for natural language generation systems, architectures for natural language generation, and in the relation between speech prosody and text formatting.