Efficient And Exact Computation Of Inclusion Dependencies For Data Integration

DOWNLOAD
Download Efficient And Exact Computation Of Inclusion Dependencies For Data Integration PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Efficient And Exact Computation Of Inclusion Dependencies For Data Integration book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages. If the content not found or just blank you must refresh this page
Efficient And Exact Computation Of Inclusion Dependencies For Data Integration
DOWNLOAD
Author : Jana Bauckmann
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2010
Efficient And Exact Computation Of Inclusion Dependencies For Data Integration written by Jana Bauckmann and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2010 with Computers categories.
Data obtained from foreign data sources often come with only superficial structural information, such as relation names and attribute names. Other types of metadata that are important for effective integration and meaningful querying of such data sets are missing. In particular, relationships among attributes, such as foreign keys, are crucial metadata for understanding the structure of an unknown database. The discovery of such relationships is difficult, because in principle for each pair of attributes in the database each pair of data values must be compared. A precondition for a foreign key is an inclusion dependency (IND) between the key and the foreign key attributes. We present with Spider an algorithm that efficiently finds all INDs in a given relational database. It leverages the sorting facilities of DBMS but performs the actual comparisons outside of the database to save computation. Spider analyzes very large databases up to an order of magnitude faster than previous approaches. We also evaluate in detail the effectiveness of several heuristics to reduce the number of necessary comparisons. Furthermore, we generalize Spider to find composite INDs covering multiple attributes, and partial INDs, which are true INDs for all but a certain number of values. This last type is particularly relevant when integrating dirty data as is often the case in the life sciences domain - our driving motivation.
Selected Papers Of The International Workshop On Smalltalk Technologies
DOWNLOAD
Author : Michael Haupt
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2010
Selected Papers Of The International Workshop On Smalltalk Technologies written by Michael Haupt and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2010 with Computers categories.
The goal of the IWST workshop series is to create and foster a forum around advancements of or experience in Smalltalk. The workshop welcomes contributions to all aspects, theoretical as well as practical, of Smalltalk-related topics.
Proceedings Of The Ph D Retreat Of The Hpi Research School On Service Oriented Systems Engineering
DOWNLOAD
Author : Christoph Meinel
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2011
Proceedings Of The Ph D Retreat Of The Hpi Research School On Service Oriented Systems Engineering written by Christoph Meinel and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2011 with Computers categories.
Toward Bridging The Gap Between Formal Semantics And Implementation Of Triple Graph Grammars
DOWNLOAD
Author : Holger Giese
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2010
Toward Bridging The Gap Between Formal Semantics And Implementation Of Triple Graph Grammars written by Holger Giese and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2010 with Computers categories.
The correctness of model transformations is a crucial element for the model-driven engineering of high quality software. A prerequisite to verify model transformations at the level of the model transformation specification is that an unambiguous formal semantics exists and that the employed implementation of the model transformation language adheres to this semantics. However, for existing relational model transformation approaches it is usually not really clear under which constraints particular implementations are really conform to the formal semantics. In this paper, we will bridge this gap for the formal semantics of triple graph grammars (TGG) and an existing efficient implementation. Whereas the formal semantics assumes backtracking and ignores non-determinism, practical implementations do not support backtracking, require rule sets that ensure determinism, and include further optimizations. Therefore, we capture how the considered TGG implementation realizes the transformation by means of operational rules, define required criteria and show conformance to the formal semantics if these criteria are fulfilled. We further outline how static analysis can be employed to guarantee these criteria.
Proceedings Of The Fall 2010 Future Soc Lab Day
DOWNLOAD
Author : Christoph Meinel
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2011
Proceedings Of The Fall 2010 Future Soc Lab Day written by Christoph Meinel and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2011 with Computers categories.
In Kooperation mit Partnern aus der Industrie etabliert das Hasso-Plattner-Institut (HPI) ein "HPI Future SOC Lab", das eine komplette Infrastruktur von hochkomplexen on-demand Systemen auf neuester, am Markt noch nicht verfügbarer, massiv paralleler (multi-/many-core) Hardware mit enormen Hauptspeicherkapazitäten und dafür konzipierte Software bereitstellt. Das HPI Future SOC Lab verfügt über prototypische 4- und 8-way Intel 64-Bit Serversysteme von Fujitsu und Hewlett-Packard mit 32- bzw. 64-Cores und 1 - 2 TB Hauptspeicher. Es kommen weiterhin hochperformante Speichersysteme von EMC2 sowie Virtualisierungslösungen von VMware zum Einsatz. SAP stellt ihre neueste Business by Design (ByD) Software zur Verfügung und auch komplexe reale Unternehmensdaten stehen zur Verfügung, auf die für Forschungszwecke zugegriffen werden kann. Interessierte Wissenschaftler aus universitären und außeruniversitären Forschungsinstitutionen können im HPI Future SOC Lab zukünftige hoch-komplexe IT-Systeme untersuchen, neue Ideen / Datenstrukturen / Algorithmen entwickeln und bis hin zur praktischen Erprobung verfolgen. Dieser Technische Bericht stellt erste Ergebnisse der im Rahmen der Eröffnung des Future SOC Labs im Juni 2010 gestarteten Forschungsprojekte vor. Ausgewählte Projekte stellten ihre Ergebnisse am 27. Oktober 2010 im Rahmen der Future SOC Lab Tag Veranstaltung vor.
Advancing The Discovery Of Unique Column Combinations
DOWNLOAD
Author : Ziawasch Abedjan
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2011
Advancing The Discovery Of Unique Column Combinations written by Ziawasch Abedjan and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2011 with Computers categories.
Unique column combinations of a relational database table are sets of columns that contain only unique values. Discovering such combinations is a fundamental research problem and has many different data management and knowledge discovery applications. Existing discovery algorithms are either brute force or have a high memory load and can thus be applied only to small datasets or samples. In this paper, the wellknown GORDIAN algorithm and "Apriori-based" algorithms are compared and analyzed for further optimization. We greatly improve the Apriori algorithms through efficient candidate generation and statistics-based pruning methods. A hybrid solution HCAGORDIAN combines the advantages of GORDIAN and our new algorithm HCA, and it significantly outperforms all previous work in many situations.
Covering Or Complete
DOWNLOAD
Author : Jana Bauckmann
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2012
Covering Or Complete written by Jana Bauckmann and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2012 with Computers categories.
Data dependencies, or integrity constraints, are used to improve the quality of a database schema, to optimize queries, and to ensure consistency in a database. In the last years conditional dependencies have been introduced to analyze and improve data quality. In short, a conditional dependency is a dependency with a limited scope defined by conditions over one or more attributes. Only the matching part of the instance must adhere to the dependency. In this paper we focus on conditional inclusion dependencies (CINDs). We generalize the definition of CINDs, distinguishing covering and completeness conditions. We present a new use case for such CINDs showing their value for solving complex data quality tasks. Further, we define quality measures for conditions inspired by precision and recall. We propose efficient algorithms that identify covering and completeness conditions conforming to given quality thresholds. Our algorithms choose not only the condition values but also the condition attributes automatically. Finally, we show that our approach efficiently provides meaningful and helpful results for our use case.
Business Process Model Abstraction
DOWNLOAD
Author : Sergey Smirnov
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2010
Business Process Model Abstraction written by Sergey Smirnov and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2010 with Computers categories.
Business process management aims at capturing, understanding, and improving work in organizations. The central artifacts are process models, which serve different purposes. Detailed process models are used to analyze concrete working procedures, while high-level models show, for instance, handovers between departments. To provide different views on process models, business process model abstraction has emerged. While several approaches have been proposed, a number of abstraction use case that are both relevant for industry and scientifically challenging are yet to be addressed. In this paper we systematically develop, classify, and consolidate different use cases for business process model abstraction. The reported work is based on a study with BPM users in the health insurance sector and validated with a BPM consultancy company and a large BPM vendor. The identified fifteen abstraction use cases reflect the industry demand. The related work on business process model abstraction is evaluated against the use cases, which leads to a research agenda.
Pattern Matching For An Object Oriented And Dynamically Typed Programming Language
DOWNLOAD
Author : Felix Geller
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2010
Pattern Matching For An Object Oriented And Dynamically Typed Programming Language written by Felix Geller and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2010 with Computers categories.
Pattern matching is a well-established concept in the functional programming community. It provides the means for concisely identifying and destructuring values of interest. This enables a clean separation of data structures and respective functionality, as well as dispatching functionality based on more than a single value. Unfortunately, expressive pattern matching facilities are seldomly incorporated in present object-oriented programming languages. We present a seamless integration of pattern matching facilities in an object-oriented and dynamically typed programming language: Newspeak. We describe language extensions to improve the practicability and integrate our additions with the existing programming environment for Newspeak. This report is based on the first author’s master’s thesis.
State Propagation In Abstracted Business Processes
DOWNLOAD
Author : Sergey Smirnov
language : en
Publisher: Universitätsverlag Potsdam
Release Date : 2011
State Propagation In Abstracted Business Processes written by Sergey Smirnov and has been published by Universitätsverlag Potsdam this book supported file pdf, txt, epub, kindle and other format this book has been release on 2011 with Computers categories.
Business process models are abstractions of concrete operational procedures that occur in the daily business of organizations. To cope with the complexity of these models, business process model abstraction has been introduced recently. Its goal is to derive from a detailed process model several abstract models that provide a high-level understanding of the process. While techniques for constructing abstract models are reported in the literature, little is known about the relationships between process instances and abstract models. In this paper we show how the state of an abstract activity can be calculated from the states of related, detailed process activities as they happen. The approach uses activity state propagation. With state uniqueness and state transition correctness we introduce formal properties that improve the understanding of state propagation. Algorithms to check these properties are devised. Finally, we use behavioral profiles to identify and classify behavioral inconsistencies in abstract process models that might occur, once activity state propagation is used.