Major change
[u/philim/db2osl_thesis.git] / bootstrapping_dirm.tex
1\section{Ontology bootstrapping using direct mapping}
4As its name suggests, the direct mapping approach is a relatively simple
5and straight forward approach.
6Direct mapping is currently a \name{W3C} recommendation, which defines the
7production of an \name{RDF} graph TODO -- which is called the
8\emph{direct graph} -- from a relational schema \cite{dirm}.
9As a matter of fact, the main definition of the direct graph, excluding
10definitions of rather trivial subcomponents, fits on one computer screen.
12The direct graph contains all data held in the source database but it
13does not contain additional schema information like uniqueness of or
14non-null constraints on columns \cite{dirm}.
16\subsection{Overview on the direct graph}
18The constitution of the direct graph is illustrated in
20Its basic components are, for each row, the \emph{row type triple},
21its \emph{literal triple}s and its \emph{reference triple}s.
22Here, the row type triple encodes which table the respective row
23belongs to, the literal triples encode the data in non-foreign-key
24columns and the reference triples encode the data in foreign key
26These triples are then by degrees united to the direct graph:
27the row triples of each row form the row graph, the row graphs of
28each table form the table graph and all table graphs united
29constitute the final direct graph.
32 \includegraphics[scale=0.86]{Images/direct_graph.pdf}
33 \caption[Constitution of the \emph{direct graph}]{
34 Constitution of the \emph{direct graph}.
35 ``$\rightarrow$'' means ``is part of''.}
36 \label{dirm_fig_direct_graph}
39Carefully assigning IRIs to the \name{RDF} entities is an essential
40part of the approach, since otherwise, name clashes can occur.
41Indeed, there seems to be a corner case which was not considered.
42For details on IRI generation in direct mapping, see
45\subsection{Data representation in direct mapping}
46Since the result of a direct mapping is an \name{RDF} graph, the
47means to represent data are limited to valid \name{RDF} vocabulary.
48This is no problem for IRIs and expressions that only involve IRIs
49(row type triples and reference triples).
51To encode the non-foreign-key data content of the source database,
52thus literal triples, the \name{R2RML} mapping language is used,
53a language providing a mapping from the relational data model to
54the \name{RDF} data model \cite{r2rml}.
55\name{R2RML} expressions thereby are by themselves \name{RDF} statements.
56The data value contained in a direct mapping literal triple is
57defined to be the \emph{R2RML natural RDF literal} representation of
58the value \cite{dirm}, which, as the name suggests, is a single
59\name{RDF} literal \cite{r2rml}.
61\subsection{IRI generation in direct mapping}
63As all \name{RDF} triples generated by the direct mapping are simply
64united to constitute the final \emph{direct graph}
65(see Section~\fullref{dirm_overview} for details),
66a senseful IRI assigning is vital to the functioning of the approach.
67By design, IRIs for different kinds of entities have a different
68structure, which prevents name clashes on the one hand, but on the other
69hand induces that, in case of a clash, all entities with the conflicting
70IRI are of the same kind, which means a high risk of producing ambiguous
71information and thus losing data.
73The relatively simple way IRIs are assigned in direct mapping is
74described in the following \cite{dirm}:
76 \item Table IRIs correspond to the table name.
77 \item Literal property IRIs consist of the table name and the
78 column name, separated by a hash character (`\#').
79 \item Reference property IRIs consist of the parent table name, the
80 string ``\#ref-'' and the parent table column names of the
81 respective foreign key, separated by a semicolon (`;').
82 \item All contained names are included in their percent-encoded
83 form TODO.
86The encoding of reference property IRIs can lead to name clashes in the
87corner case that multiple foreign keys exist which contain exactly the
88same columns -- which is allowed for example in \name{MYSQL} TODO.
89To remove this flaw, the child table name and the child table column
90names of the respective foreign key also must be included in the IRI.