Major change
[u/philim/db2osl_thesis.git] / introduction.tex
CommitLineData
c31df1ed
PM
1\chapter{Introduction}
2Sie führt in die Problematik ein, skizziert die Motivation und
3Zielsetzung sowie das geplante Vorgehen und die angestrebten
4Ergebnisse und sollte ca. 1 - 2 Seiten umfassen.\\
5
6\section{Motivation}
7As estimated in 2007 \cite{deepweb}, publicly available databases contained
8up to 500 times more data than the static web and roughly 70 \% of all
9websites were backed by relational databases back then.
10As hardware has become cheaper yet more powerful, open source tools have
11become more and more widespread and the web has gotten more and more dynamic
12and interactive, it's likely that these numbers have even increased since then.
13This makes the publication of available data in a structured,
14machine-processable form
15and its retrieval with eligible software (Ontology based data access, OBDA)
16an interesting topic.
17This vision emerged as early as TODO and was entitled with the term ``semantic web''
18by Tim Berners-Lee \cite{thesemanticweb}.
19Definitely, the automatic translation of relational databases to RDF
20or similar representations of structured information is
21an integral part of the success of the semantic web \cite{deepweb}.
22This automatic translation process is commonly called ``bootstrapping''.
23
24Early work regarding the development of bootstrapping systems includes TODO.
25Today, the pure translation process is a relatively well understood topic,
26ranging from the rather simple direct mapping approach \cite{dirm} to TODO.
27On the other hand, the handling of the complexity introduced by these approaches
28and the use of sophisticated tools to perform various related tasks
29meanwhile has become a significant challenge in its own right \cite{eng}.
30Besides the parametrization of the tools in use, this includes the management of
31the several kinds of artifacts accruing during the process, possibly needed in
32different versions and formats for the use of different tools and output formats,
33while also taking changing input data into account \cite{eng}.
34Skjæveland and others therefore suggested an approach using a
35declarative description of the data to be mapped, concentrating in one place
36all the information needed to coordinate the bootstrapping process
37and to drive the entire tool chain \cite{eng}.
38
39\section{Approach}
40This thesis describes the development of a specification language to serialize
41the declarative specification of the bootstrapping process and of a
42software to in turn bootstrap it from a relational database schema.
43After the tasks they accomplish,
44the specification language was called ``OBDA Specification Language'' (``OSL'')
45and the software bootstrapping the specification was called ``db2osl''.
46
47Using a declarative specification makes the entire bootstrapping process a
48two-step-procedure: First, the OBDA specification is derived from the
49database schema using \myprog{}.
50It specifies the actual bootstrapping process in a very general way,
51so it only has to be recreated when the database schema changes.
52The second step is to use the OBDA specification to coordinate and drive the
53actual bootstrapping process.
54The development of a software that uses the OBDA specification
55to perform this second step currently is subject to ongoing work.
56It will be able to be parameterized accordingly to support different output
57formats, tools, tool versions and application ranges.
58\\\\TODO: illustration of overall process
59
60\section{Requirements and goals}
61The final system shall be able to cleanly fit into existing bootstrapping systems
62while being easy to use, taking the burden of dealing with \osl{} specifications
63manually from its users instead of adding even more complexity to the process.
64To achieve the former goal, use of existing tools, languages and conventions was
65made wherever possible.
66To fit into the environment used in the OPTIQUE project TODO it is ultimately
28b54c67 67part of, \name{Java} was used for the bootstrapping software.
c31df1ed
PM
68Care was taken to design it to be modular and flexible, making it
69usable not only as a whole but also as a collection of independent components
70possibly serving as the basis for a program library in the future.
71To further support this aim TODO and to make the software more easily
72understandable and extensible, it was documented carefully and thoroughly.
73
74As the software will be maintained by diverse people after its development and will
75likely be subject to changes, general code quality was also an issue to consider.
76Following good object-oriented software development practice TODO,
77real world artifacts like database schemas, database tables, columns, keys,
78OBDA specifications et cetera were modeled as software objects, provided with a
79carefully chosen set of operations to manipulate them and make them collaborate.
80Scarce, informative comments were inserted at places of higher complexity and to
81expose logical subdivisions, but care was taken to use ``speaking code'' in favor
82of rampant comments.
83Complex, misleading and hard-to-use interfaces were avoided wherever possible.
84External software libraries employed were chosen to be stable, specific,
85well structured, publicly available and ideally in wide use.