Major change
[u/philim/db2osl_thesis.git] / introduction.tex
c31df1ed 1\chapter{Introduction}
002fa020 2\label{intro}
002fa020 5\label{motivation}
6As estimated in 2007 \cite{deepweb}, publicly available databases contained
7up to 500 times more data than the static web and roughly 70 \% of all
8websites were backed by relational databases back then.
9As hardware has become cheaper yet more powerful, open source tools have
10become more and more widespread and the web has gotten more and more dynamic
11and interactive, it's likely that these numbers have even increased since then.
12This makes the publication of available data in a structured, machine-processable
13form and its retrieval with eligible software an interesting topic.
14The most important formalism to represent structured data without the need of
15a fixed (database) schema is ontologies, and thus this approach is known
16under the term ``Ontology based data access'' (``OBDA'').
b96bb723 17
18The vision of a machine-processable web emerged as early as 1989 \cite{web}
19and was entitled with the term ``semantic web''
20by Tim Berners-Lee in 1999 \cite{weavingweb}.
21Definitely, the automatic translation of relational databases to \name{RDF}
22\cite{rdf} or similar representations of structured information is
23an integral part of the success of the semantic web \cite{deepweb}.
24This automatic translation process is commonly called ``bootstrapping''.
002fa020 26Today, the pure bootstrapping process is a relatively well understood topic,
27ranging from the rather simple direct mapping approach \cite{dirm} to TODO.
28On the other hand, the handling of the complexity introduced by these approaches
29and the use of sophisticated tools to perform various related tasks
30meanwhile has become a significant challenge in its own right \cite{eng}.
31Besides the parametrization of the tools in use, this includes the management of
32the several kinds of artifacts accruing during the process, possibly needed in
33different versions and formats for the use of different tools and output formats,
34while also taking changing input data into account \cite{eng}.
35Skjæveland and others therefore suggested an approach using a
36declarative description of the data to be mapped, concentrating in one place
37all the information needed to coordinate the bootstrapping process
38and to drive the entire tool chain \cite{eng}.
002fa020 41\label{approach}
c31df1ed 42This thesis describes the development of a specification language to serialize
002fa020 43the declarative specification of the bootstrapping process
b96bb723 44(see Section~\fullref{motivation}) and of a
45software to in turn bootstrap it from a relational database schema.
46After the tasks they accomplish,
47the specification language was called ``OBDA Specification Language'' (``OSL'')
48and the software bootstrapping the specification was called ``db2osl''.
50Furthermore, this thesis suggests a scheme for generating the IRIs that
51occur in OBDA specification, identifying their parts \cite{eng}.
52Currently, this issue is only exemplified and there is room for improvement
53in that a simple and straight-forward approach can be used to generate
54IRIs for all constituents of OBDA specifications without introducing
55name clashes in corner cases.
56This approach is described in Section~\fullref{iris}.
c31df1ed 58Using a declarative specification makes the entire bootstrapping process a
b96bb723 59two-step-procedure, illustrated in Figure~\ref{intro_fig_bootstrapping}:
002fa020 60First, the OBDA specification is derived from the
61database schema using \myprog{}.
62It specifies the actual bootstrapping process in a very general way,
63so it only has to be recreated when the database schema changes.
64The second step is to use the OBDA specification to coordinate and drive the
65actual bootstrapping process.
66The development of a software that uses the OBDA specification
67to perform this second step currently is subject to ongoing work.
68It will be able to be parameterized accordingly to support different output
69formats, tools, tool versions and application ranges.
72 \includegraphics[scale=0.9]{Images/bootstrapping_illustration.pdf}
73 \caption[Illustration of the overall
74 bootstrapping process]{Illustration of the overall
75 bootstrapping process using a declarative OBDA specification}
76 \label{intro_fig_bootstrapping}
77 \end{center}\end{figure}
79\section{Requirements and goals}
80The final system shall be able to cleanly fit into existing bootstrapping systems
81while being easy to use, taking the burden of dealing with \osl{} specifications
82manually from its users instead of adding even more complexity to the process.
002fa020 83To achieve these goals, use of existing tools, languages and conventions was
c31df1ed 84made wherever possible.
85For example, the \osllong{} was defined to be a subset of \name{OWL}.
86This facilitates meeting the objective of a powerful, easy-to-use, flawless and
87well-documented language that can be extended and handled by existing tools.
89To fit into the environment used in the \name{OPTIQUE} project \cite{optique2}
90it is ultimately part of, \name{Java} was used for the bootstrapping software.
c31df1ed 91Care was taken to design it to be modular and flexible, making it
002fa020 92usable not only as a whole but also as a collection of independent components,
c31df1ed 93possibly serving as the basis for a program library in the future.
002fa020 94To achieve this aim and to make the software more easily
95understandable and extensible, it was documented carefully and thoroughly.
97As the software will be maintained by diverse people after its development and will
98likely be subject to changes, general code quality was also an issue to consider.
99Following good object-oriented software development practice \cite{str3},
100real world artifacts like database schemata, database tables, columns, keys,
101and OBDA specifications were modeled as software objects, provided with a
c31df1ed 102carefully chosen set of operations to manipulate them and make them collaborate.
002fa020 103This approach and other actions aiming at yielding clean code are described more
104thoroughly in Section~\fullref{code}, while
105the resulting structure of the software is discussed in Section~\fullref{arch}.