Major change
[u/philim/db2osl_thesis.git] / bootstrapping_dirm.tex
CommitLineData
b96bb723
PM
1\section{Ontology bootstrapping using direct mapping}
2\label{dirm}
45d598e9
PM
3TODO: more, individuals <- data
4TODO: no alternative approach
b96bb723
PM
5
6As its name suggests, the direct mapping approach is a relatively simple
7and straight forward approach.
8Direct mapping is currently a \name{W3C} recommendation, which defines the
9production of an \name{RDF} graph TODO -- which is called the
10\emph{direct graph} -- from a relational schema \cite{dirm}.
11As a matter of fact, the main definition of the direct graph, excluding
12definitions of rather trivial subcomponents, fits on one computer screen.
13
14The direct graph contains all data held in the source database but it
15does not contain additional schema information like uniqueness of or
16non-null constraints on columns \cite{dirm}.
17
18\subsection{Overview on the direct graph}
19\label{dirm_overview}
20The constitution of the direct graph is illustrated in
21Figure~\ref{dirm_fig_direct_graph}.
22Its basic components are, for each row, the \emph{row type triple},
23its \emph{literal triple}s and its \emph{reference triple}s.
24Here, the row type triple encodes which table the respective row
25belongs to, the literal triples encode the data in non-foreign-key
26columns and the reference triples encode the data in foreign key
27columns.
28These triples are then by degrees united to the direct graph:
29the row triples of each row form the row graph, the row graphs of
30each table form the table graph and all table graphs united
31constitute the final direct graph.
32
33\begin{figure}[H]\begin{center}
34 \includegraphics[scale=0.86]{Images/direct_graph.pdf}
35 \caption[Constitution of the \emph{direct graph}]{
36 Constitution of the \emph{direct graph}.
37 ``$\rightarrow$'' means ``is part of''.}
38 \label{dirm_fig_direct_graph}
39\end{center}\end{figure}
40
41Carefully assigning IRIs to the \name{RDF} entities is an essential
42part of the approach, since otherwise, name clashes can occur.
43Indeed, there seems to be a corner case which was not considered.
44For details on IRI generation in direct mapping, see
45Section~\ref{dirm_iris}.
46
47\subsection{Data representation in direct mapping}
48Since the result of a direct mapping is an \name{RDF} graph, the
49means to represent data are limited to valid \name{RDF} vocabulary.
50This is no problem for IRIs and expressions that only involve IRIs
51(row type triples and reference triples).
52
53To encode the non-foreign-key data content of the source database,
54thus literal triples, the \name{R2RML} mapping language is used,
55a language providing a mapping from the relational data model to
56the \name{RDF} data model \cite{r2rml}.
57\name{R2RML} expressions thereby are by themselves \name{RDF} statements.
58The data value contained in a direct mapping literal triple is
59defined to be the \emph{R2RML natural RDF literal} representation of
60the value \cite{dirm}, which, as the name suggests, is a single
61\name{RDF} literal \cite{r2rml}.
62
63\subsection{IRI generation in direct mapping}
64\label{dirm_iris}
65As all \name{RDF} triples generated by the direct mapping are simply
66united to constitute the final \emph{direct graph}
67(see Section~\fullref{dirm_overview} for details),
68a senseful IRI assigning is vital to the functioning of the approach.
69By design, IRIs for different kinds of entities have a different
70structure, which prevents name clashes on the one hand, but on the other
71hand induces that, in case of a clash, all entities with the conflicting
72IRI are of the same kind, which means a high risk of producing ambiguous
73information and thus losing data.
74
75The relatively simple way IRIs are assigned in direct mapping is
76described in the following \cite{dirm}:
77\begin{itemize}
78 \item Table IRIs correspond to the table name.
79 \item Literal property IRIs consist of the table name and the
80 column name, separated by a hash character (`\#').
45d598e9
PM
81 \item Reference property IRIs consist of the child table name, the
82 string ``\#ref-'' and the child table column names of the
b96bb723
PM
83 respective foreign key, separated by a semicolon (`;').
84 \item All contained names are included in their percent-encoded
85 form TODO.
86\end{itemize}
87
45d598e9
PM
88The encoding of reference property IRIs can lead to name clashes in
89cases multiple foreign keys exist which contain exactly the
90same columns -- which is allowed for example in \name{SQL} TODO.
91To remove this flaw, the parent table name and the parent table column
92names of the respective foreign key must also be included in the IRI.