\newcommand{\ind}{\hspace*{30pt}}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\codepar}[1]{\begin{itemize}\item[]\code{#1}\end{itemize}\vspace{12pt}}
-\newcommand{\fullref}[1]{\ref{#1} -- \nameref{#1}}
+\newcommand{\fullref}[1]{\ref{#1}~--~\nameref{#1}}
% Documentclass etc.
\documentclass[\myfontsize,a4paper,twoside=semi]{scrreprt}
language=english]{cs-cover/uni-stuttgart-cs-cover}
% Appendix
-\usepackage{appendix}
+\usepackage[title,titletoc]{appendix}
% Links
%\usepackage{url}
% Appendix
\include{appendix}
-\addcontentsline{toc}{chapter}{Appendix}
% Bibliography bibtex (add pagebackref=true to hyperref options if desired)
%\bibliographystyle{alpha}
\newcommand{\ind}{\hspace*{30pt}}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\codepar}[1]{\begin{itemize}\item[]\code{#1}\end{itemize}\vspace{12pt}}
-\newcommand{\fullref}[1]{\ref{#1} -- \nameref{#1}}
+\newcommand{\fullref}[1]{\ref{#1}~--~\nameref{#1}}
% Documentclass etc.
\documentclass[\myfontsize,a4paper,twoside=semi]{scrreprt}
\bibliography{bibliography}
%% Appendix
-%\usepackage{appendix}
+%\usepackage[title,titletoc]{appendix}
% Links
%\usepackage{url}
%% Appendix
%\include{appendix}
-%\addcontentsline{toc}{chapter}{Appendix}
% Bibliography bibtex (add pagebackref=true to hyperref options if desired)
%\bibliographystyle{alpha}
-\chapter*{Appendix}
-Hierher gehören zur Dokumentation Tabellen, Messprotokolle,
-Rechnerprotokolle, Konstruktionszeichnungen, kurze Programmausdrucke
-und Ähnliches.
+\begin{appendices}
+\appendixpage
+\chapter{Details on the \myprog{} implementation}
+\section{Package contents (\myprog{})}
+\label{app_pkgs}
+\KOMAoption{fontsize}{\smallerfontsize{}}
+The following table lists the contents of each of \myprog{}s packages:
+
+\begin{table}[H]
+ \begin{multicols}{2}\begin{itemize} \KOMAoption{fontsize}{\smallfontsize{}}
+ \item \code{bootstrapping}
+ \begin{itemize}
+ \item \code{Bootstrapping}
+ \item \code{DirectMappingURIBuilder}
+ \item \code{URIBuilder}
+ \end{itemize}
+ \item \code{cli}
+ \begin{itemize}
+ \item \code{CLIDatabaseInteraction}
+ \end{itemize}
+ \item \code{database}
+ \begin{itemize}
+ \item \code{Column}
+ \item \code{ColumnSet}
+ \item \code{DatabaseException}
+ \item \code{DBSchema}
+ \item \code{ForeignKey}
+ \item \code{Key}
+ \item \code{PrimaryKey}
+ \item \code{ReadableColumn}
+ \item \code{ReadableColumnSet}
+ \item \code{ReadableForeignKey}
+ \item \code{ReadableKey}
+ \item \code{ReadablePrimaryKey}
+ \item \code{RetrieveDBSchema}
+ \item \code{Table}
+ \item \code{TableSchema}
+ \end{itemize}
+ \item \code{helpers}
+ \begin{itemize}
+ \item \code{Helpers}
+ \item \code{MapValueIterable}
+ \item \code{MapValueIterator}
+ \item \code{ReadOnlyIterable}
+ \item \code{ReadOnlyIterator}
+ \item \code{SQLType}
+ \item \code{UserAbortException}
+ \end{itemize}
+ \newpage
+ \item \code{log}
+ \begin{itemize}
+ \item \code{ConsoleDiagnosticOutputHandler}
+ \item \code{GlobalLogger}
+ \end{itemize}
+ \item \code{main}
+ \begin{itemize}
+ \item \code{Main}
+ \end{itemize}
+ \item \code{osl}
+ \begin{itemize}
+ \item \code{OSLSpecification}
+ \end{itemize}
+ \item \code{output}
+ \begin{itemize}
+ \item \code{ObjectSpecPrinter}
+ \item \code{OSLSpecPrinter}
+ \item \code{SpecPrinter}
+ \end{itemize}
+ \item \code{settings}
+ \begin{itemize}
+ \item \code{Job}
+ \end{itemize}
+ \item \code{specification}
+ \begin{itemize}
+ \item \code{AttributeMap}
+ \item \code{EntityMap}
+ \item \code{IdentifierMap}
+ \item \code{InvalidSpecificationException}
+ \item \code{OBDAMap}
+ \item \code{OBDASpecification}
+ \item \code{RelationMap}
+ \item \code{SubtypeMap}
+ \item \code{TranslationTable}
+ \end{itemize}
+ \item \code{test}
+ \begin{itemize}
+ \item \code{CreateTestDBSchema}
+ \item \code{GetSomeDBSchema}
+ \end{itemize}
+ \end{itemize}\end{multicols} \KOMAoption{fontsize}{\myfontsize{}}
+ \caption{Class attachment to packages in \myprog{}}
+ \label{app_tbl_classes}
+\end{table}
+
+\KOMAoption{fontsize}{\myfontsize{}}
+
+\end{appendices}
\chapter{Background and related work}
\section{Background}
+\subsection{Basic concepts}
+\label{back_basic}
+TODO: r2rml, rdf, rdfs, owl, xml, iris, baseiri, end with :
+
\subsection{Ontology-based data access (OBDA)}
-\label{obda}
+\label{back_obda}
TODO: References
Storing data in relational databases is a very common proceeding, since the
though some additional preparation might be necessary in these cases \cite{eng}.
\subsection{OBDA specifications}
+\label{back_obdaspecs}
+TODO: more, maybe shorten introduction
+
As mentioned in Section~\fullref{motivation}, the sole bootstrapping of
\name{RDF} triples \cite{rdf} or other forms of structured information
from relational database schemata is a relatively well understood topic.
(OBDA) approach to address the data access problem [...] [aiming] at
solutions that reduce the cost of data access dramatically'' \cite{optique}.
Thus, the \name{OPTIQUE} project tries to reach exactly the benefits a
-well-developed OBDA system can provide (explained in Section~\ref{obda}):
+well-developed OBDA system can provide (explained in
+Section~\ref{back_obda}):
an easy end-user access to data without knowing about its structuring
while taking advantage of automatic translations \cite{optique2}.
In doing so, ascertained shortcomings of existing OBDA systems were addressed:
\emph{usability} (for example the need to use formal query languages),
\emph{costly prerequisites} (consider, for example, the disadvantages
-of materialized OBDA described in Section~\ref{obda}) and
+of materialized OBDA described in Section~\ref{back_obda}) and
\emph{efficiency} (which was perceived as being insufficiently addressed
in previous approaches) \cite{optique}.
}
@misc{dirm,
- shorthand = {W3CR12},
- author = {RDB2RDF Working Group},
+ shorthand = {W3CR12a},
+ author = {W3C RDB2RDF Working Group},
title = {A Direct Mapping of Relational Data to RDF},
year = 2012,
howpublished = {\url{https://www.w3.org/TR/rdb-direct-mapping/}},
year={2003}
}
-@inproceedings{crompton,
- title={TODO: Keynote talk at the W3C Workshop on Sem. Web in Oil \& Gas Industry},
+@misc{crompton,
+ title={Keynote talk at the W3C Workshop on Sem. Web in Oil \& Gas Industry},
author={Crompton, J.},
url={http://www.w3.org/2008/12/ogws-slides/Crompton.pdf},
year={2008}
}
@inproceedings{r2o,
- title={TODO: R\textsubscript{2}O, an Extensible and Semantically Based Database-to-ontology Mapping Language},
- author={Barrasa and Corcho and G{\'o}mez-P{\'e}rez},
+ title={R\textsubscript{2}O, an Extensible and Semantically Based Database-to-ontology Mapping Language},
+ author={Barrasa Rodr{\'\i}guez, Jes{\'u}s and Corcho, {\'O}scar and G{\'o}mez-P{\'e}rez, Asunci{\'o}n},
+ booktitle={SWDB'04: 2nd Workshop on Semantic Web and Databases},
+ pages={1069--1070},
+ year={2004},
+ publisher={Springer-Verlag}
+}
+
+@incollection{npd,
+ title={Publishing the Norwegian Petroleum Directorate's FactPages as Semantic Web Data},
+ author={Skj{\ae}veland, Martin G and Lian, Espen H and Horrocks, Ian},
+ booktitle={The Semantic Web -- ISWC 2013},
+ pages={162--177},
+ year={2013},
+ publisher={Springer}
+}
+
+@article{survey,
+ title={Survey of directly mapping SQL databases to the Semantic Web},
+ author={Sequeda, Juan F. and Tirmizi, Syed Hamid and Corcho, Oscar and Miranker, Daniel P.},
+ journal={The Knowledge Engineering Review},
+ volume={26},
+ number={04},
+ pages={445--486},
+ year={2011},
+ publisher={Cambridge University Press}
+}
+
+@inproceedings{ondirm,
+ author = {Sequeda, Juan F. and Arenas, Marcelo and Miranker, Daniel P.},
+ title = {On Directly Mapping Relational Databases to RDF and OWL},
+ booktitle = {Proceedings of the 21st International Conference on World Wide Web},
+ year = {2012},
+ isbn = {978-1-4503-1229-5},
+ location = {Lyon, France},
+ pages = {649--658},
+ numpages = {10},
+ doi = {10.1145/2187836.2187924},
+ acmid = {2187924},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+ keywords = {OWL, RDB2RDF, RDF, SPARQL, SQL, direct mapping, relational databases, semantic web},
+}
+
+@inproceedings{autodirm,
+ title={A Completely Automatic Direct Mapping of Relational Databases to RDF and OWL},
+ author={Sequeda, Juan F. and Arenas, Marcelo and Miranker, Daniel P.},
+ booktitle={International Semantic Web Conference (Posters \& Demos)},
+ publisher={Citeseer},
+ year={2011}
+}
+
+@inproceedings{sac,
+ title={Migrating data-intensive web sites into the semantic web},
+ author={Stojanovic, Ljiljana and Stojanovic, Nenad and Volz, Raphael},
+ booktitle={Proceedings of the 2002 ACM symposium on Applied computing},
+ pages={1100--1107},
+ year={2002},
+ organization={ACM}
+}
+
+@article{linked,
+ title={Linked data -- the story so far},
+ author={Bizer, Christian and Heath, Tom and Berners-Lee, Tim},
+ journal={Semantic Services, Interoperability and Web Applications: Emerging Concepts},
+ pages={205--227},
+ year={2009}
+}
+
+@article{benefits,
+ title={Benefits of Publishing the Norwegian Petroleum Directorate's FactPages as Linked Open Data},
+ author={Skj{\ae}veland, Martin G and Lian, Espen H},
+ journal={Norsk informatikkonferanse (NIK 2013). Tapir},
+ year={2013}
+}
+
+@misc{r2rml,
+ shorthand = {W3CR12b},
+ author = {W3C RDB2RDF Working Group},
+ title = {R2RML: RDB to RDF Mapping Language},
+ year = 2012,
+ howpublished = {\url{https://www.w3.org/TR/r2rml/}},
+ note = {[Accessed: 2016-05-20]}
}
\section{Ontology bootstrapping using direct mapping}
\label{dirm}
+TODO: more, individuals <- data
+TODO: no alternative approach
As its name suggests, the direct mapping approach is a relatively simple
and straight forward approach.
\item Table IRIs correspond to the table name.
\item Literal property IRIs consist of the table name and the
column name, separated by a hash character (`\#').
- \item Reference property IRIs consist of the parent table name, the
- string ``\#ref-'' and the parent table column names of the
+ \item Reference property IRIs consist of the child table name, the
+ string ``\#ref-'' and the child table column names of the
respective foreign key, separated by a semicolon (`;').
\item All contained names are included in their percent-encoded
form TODO.
\end{itemize}
-The encoding of reference property IRIs can lead to name clashes in the
-corner case that multiple foreign keys exist which contain exactly the
-same columns -- which is allowed for example in \name{MYSQL} TODO.
-To remove this flaw, the child table name and the child table column
-names of the respective foreign key also must be included in the IRI.
+The encoding of reference property IRIs can lead to name clashes in
+cases multiple foreign keys exist which contain exactly the
+same columns -- which is allowed for example in \name{SQL} TODO.
+To remove this flaw, the parent table name and the parent table column
+names of the respective foreign key must also be included in the IRI.
-\section{Generating unique IRIs for OBDA specification entities}
+\section{Generating unique IRIs for OBDA specification map fields}
\label{iris}
+As explained in Section~\fullref{back_basic}, IRIs play a central role in
+diverse topics related to ontology-based data access.
+They provide the means to uniquely identify entities, which of course is
+a necessity for data retrieval.
+As also explained in Section~\ref{back_basic}, every URI is also a IRI, so
+although Skjæveland et al. use the term ``URI'' in the introduction of
+their approach of using OBDA specifications for ontology bootstrapping --
+and that term is also used in Section~\ref{bootstrap_spec}, which describes
+this approach -- in this section the general term ``IRI'' is used, marking
+that the introduced concepts are valid for all types of IRIs.
+
+When dealing with ontology bootstrapping using OBDA specifications, it is
+important to differentiate between the three types of IRIs occurring
+in this matter, which will be underlined by the following unambiguous naming:
+\begin{itemize}
+ \item \emph{Data IRIs} identify entities in the bootstrapped ontology
+ \item \emph{OBDA IRIs} are used as values for the fields of
+ OBDA specification entities
+ \item \emph{OSL IRIs} identify components in serialized OBDA specifications,
+ using the \oslboth{} introduced in Chapter~\ref{osl} for serialization
+\end{itemize}
+
+Skjæveland et al. do not define or assume a particular scheme for IRI generation
+in their introduction of OBDA specifications \cite{eng}.
+Instead, the IRI generation strategy is only adumbrated by giving examples
+of entities having IRIs.
+The examplified scheme was used for the implementation of the \myprog{} software
+bootstrapping OBDA specifications from relational database schemata, which is
+described in this thesis
+(see Section~\fullref{program} and Section~\fullref{impl}).
+The direct mapping approach for ontology bootstrapping described in
+Section~\ref{dirm}, on the other hand, introduces a scheme for
+IRI generation \cite{dirm}, but with this scheme, name clashes can occur,
+as explained in Section~\ref{dirm_iris}.
+The \oslboth{}, finally, defines a proper scheme for OSL IRIs,
+as is explained in Section~TODO.
+
+In the following, an enhanced scheme for the generation of OBDA IRIs
+is proposed, which resembles the previously mentioned scheme used for
+OSL IRIs and which also may serve as a blueprint for
+other IRI generation strategies.
+
+\subsection{Requirements for the IRI scheme}
+\label{iris_req}
+As explained in Section~\fullref{back_basic}, the main requirement on a IRI
+generation scheme is uniqueness of the IRIs: no two entities must be possibly
+assigned the same IRI, regardless of their kind, of how low the probability
+of a name clash (IRI collision) is or of the conditions leading
+to a name clash.
+Additionally, IRI uniqueness shall be independent from the base IRIs,
+thus a base IRI shall be arbitrarily selectable for each generation process
+without introducing name clashes even with IRIs having other base IRIs.
+
+As to OBDA specification entities, the following kinds of IRIs
+have to be available, including IRI patterns:
+\begin{itemize}
+ \item Entity map OWL class IRIs
+ \item Identifier map IRI patterns
+ \item Attribute map OWL property IRIs
+ \item Attribute map IRI patterns
+ \item Relation map OWL property IRIs
+ \item Subtype map (IRI) prefixes
+ \item Subtype map (IRI) suffixes
+ \item Subtype map OWL superclass IRIs
+\end{itemize}
+
+As Subtype map OWL superclass IRIs are IRIs of data entities
+already existing in the target ontology by some means
+(see Section~\fullref{bootstrap_spec_using}), they do not have to be
+generated and thus are ignored in the following.
+Exactly the same holds for Attribute map IRI patterns.
+Furthermore, this approach creates Subtype map IRI prefixes already leading to
+unique IRIs for Subtype map subclasses and so Subtype map IRI suffixes are
+ignored in the following.
+Since an IRI generation scheme cannot avoid collisions with existing IRIs
+out of its outreach and these collisions can easily be prevented,
+for example, by giving them another base IRI (see Setion~\fullref{back_basic}),
+this case is excluded from the requirement that no two URIs must collide
+under any circumstances.
+However, the user shall be able to chose such externally generated IRIs
+from an infinite set of IRIs, while being sure that no name clashes
+will occur.
+
+So compendious, the requirements on the IRI generation scheme are that
+Entity map OWL class IRIs, Identifier map IRI patterns,
+Attribute map OWL property IRIs, Relation map OWL property IRIs and
+Subtype map IRI prefixes can be generated that, regardless of the
+chosen base IRIs, don't clash among another, while leaving an
+infinite set of predictable IRIs that don't clash with any of the
+generated IRIs.
+
+\subsection{Avoiding name clashes in the IRI scheme}
+\label{iris_clashes}
+Generating unique Entity map OWL class IRIs ignoring base IRIs is not much of
+a problem, assuming database table names are distinct, which is guaranteed in a
+common database system like SQL \cite{sql}.
+Including the table name into an Entity map OWL class IRI is sufficient to
+prevent it from colliding with other IRIs with the same base IRI.
+However, when taking two different base IRIs into account that are used for
+two IRIs created according to this scheme, things get more complicated.\\
+Consider, for example, a database table named ``\code{Persons}'' and a
+table named \\ ``\code{Persons\_\_TABLE\_\_Persons}''.
+Generating an IRI according to the scheme
+``<base:>\code{TABLE\_\_} <table name>'' for each of these tables, using the
+base IRI ``\code{TABLE\_\_Persons\_\_}'' for the first one and the
+empty base IRI for the second one, both tables will get the IRI\\
+``\code{TABLE\_\_Persons\_\_TABLE\_\_Persons}'', although
+the table name was included into the IRI in both cases.
+The problem is that the ``\code{TABLE\_\_}'' string occurring in the
+table name cannot be discriminated from the ``\code{TABLE\_\_}'' string
+added in the course of IRI generation or the ``\code{TABLE\_\_}'' string
+occurring in the base IRI.
+To solve the problem, a marker has to be included in the URI which definitely
+indicates the beginning of the table name. In addition, this marker will
+uniquely identify Entity map OWL class IRIs.
+For both aims to be achieved, an escape symbol must be used, which makes
+the marker unique at least outside the base IRI part, by escaping the marker
+whenever it occurs in the table name.
+
+Regarding Identifier map IRI patterns, the IRI resulting from the expansion of
+the pattern will contain the column names of the primary key represented by
+the respective Identifier map \cite{eng}.
+Further on, the table name of the database table containing that primary
+key has to be included in the IRI pattern, since
+two distinct tables may have primary keys with equally named columns.
+This will make the IRI pattern a unique Identifier map IRI pattern,
+since a database table can be assumed to only have one primary key,
+as is the case in common database systems like SQL \cite{sql}.
+The fact that primary key values are unique for each dataset ensures that
+unique Identifier map IRI patterns expand to unique IRIs.
+Moreover, it has to be ensured, that IRIs resulting from the expansion of
+Identifier map IRI patterns do not collide with IRIs of other kinds.\\
+Taking arbitrary and particularly varying base IRIs into account,
+a definite marker has to be included in the IRI pattern and other occurrences
+of this marker in the IRI pattern have to be escaped. This uniquely
+identifies Identifier map IRI patterns and unambiguously
+distinguishes the table name from the rest of the IRI.
+
+Concerning Attribute map OWL property IRIs, they will be unique among their
+kind when they include the column name of the database column they
+represent besides the table name of the table containing it,
+since database table names can be assumed to be distinct and
+column names can be assumed to be unique within a table, which is
+guaranteed in a common database system like SQL \cite{sql}.
+Furthermore, Attribute map OWL property IRIs have to be prevented from
+colliding with IRIs of other kinds.\\
+Taking arbitrary and particularly varying base IRIs into account,
+definite markers have to be included in the IRI and other occurrences
+of this marker in the IRI have to be escaped. This uniquely
+identifies Attribute map OWL property IRIs and unambiguously
+distinguishes the table name and the column name from the rest of the IRI
+and from one another.
+
+Regarding Relation map OWL property IRIs, including the table name and
+the column names of both the foreign key represented by the Relation map
+(or the containing table, respectively) and the referenced key (or its
+containing table, respectively) in the IRI will make it a unique
+Relation map OWL property IRI.
+Note that including only the table name and the column names of the foreign
+key (or its containing table, respectively) would not be sufficient, since
+several distinct foreign keys covering exactly the same columns can exist
+in a table (this is what the IRI generation scheme of the direct mapping
+approach misses). The same applies of course for the referenced table and
+its columns -- several foreign keys can reference them.
+Moreover, these Regarding Relation map OWL property IRIs have to be
+prevented from colliding with IRIs of other kinds.\\
+Taking arbitrary and particularly varying base IRIs into account,
+definite markers have to be included in the IRI pattern and other occurrences
+of this marker in the IRI have to be escaped. This uniquely
+identifies Relation map OWL property IRIs and in particular their parts
+providing the table and column names.
+
+Concerning Subtype map IRI prefixes, they must include the column name of
+the database column containing the values to be declared belonging to
+the subclass. Further on, since another database table could contain a
+column of the same name, the IRI must include the table name of the
+database table containing the column.
+This will make Subtype map IRI prefixes unique among their kind.
+Note that a Subtype map IRI prefix, similarly to a IRI pattern,
+does not specify the final IRI but is subject to expansion.
+This expansion can yield the same IRI for different data records, which,
+however, is not considered a collision, since this behavior is
+intentional -- every two data records having the same value in
+the respective column, and only those, will get the same IRI.
+Additionally, it has to be ensured, that IRIs resulting from such an
+expansion do not collide with IRIs of other kinds.\\
+Taking arbitrary and particularly varying base IRIs into account,
+definite markers have to be included in the IRI pattern and other occurrences
+of this marker in the IRI prefix have to be escaped. This uniquely
+identifies Subtype map IRI prefixes and unambiguously
+distinguishes the table name and the column name from the rest of the IRI
+and from one another.
+
+For an example that makes awkwardly -- or fraudulently -- chosen base
+IRIs introduce name clashes, see the paragraph about Entity map
+OWL class IRIs at the beginning of this section.
+
+\subsection{The proposed IRI generation scheme}
+\label{iris_scheme}
+This section introduces an IRI generation scheme meeting the requirements
+formulated in Section~\ref{iris_req}.
+
+In this section, the following strings are subsumed under the term
+\emph{marking strings}:\\
+``\code{TABLE\_\_}'', ``\code{TBL\_\_}'', ``\code{PROP\_\_}'',
+``\code{REF\_\_}'' and ``\code{SUBTYPE\_\_}''.\\
+The string built by escaping (prefixing) all occurrences of marking strings or
+`\textasciitilde' characters in a string $s$ with a `\textasciitilde'
+character will be called the \emph{IRI-safe version} of $s$.
+
+The IRI generation scheme is presented in Table~\ref{bootstrap_tab_iris}.
+Here, \\ \emph{<base:>} refers to the base IRI to be used for
+the generated IRI (see Section~\fullref{back_basic}),\\
+\emph{<cl. tbl name>} refers to the table name of the database table
+concerning (see Section~\ref{iris_clashes}) in its IRI-safe version,\\
+\emph{<cl. name 1st pk col>} refers to the name of the first primary key
+column concerning (see Section~\ref{iris_clashes}) in its IRI-safe version,\\
+\emph{<\code{/}...>} refers to the continuation of the previous pattern
+using the remaining primary key or foreign key columns,\\
+\emph{<cl. col name>} refers to the name of the column in question
+(see Section~\ref{iris_clashes}) in its IRI-safe version,\\
+\emph{<cl. src tbl>} refers to the table name of the database table
+containing the respective foreign key (see Section~\ref{iris_clashes})
+in its IRI-safe version,\\
+\emph{<cl. 1st src col>} refers to the name of the first foreign key
+column of the respective foreign key (see Section~\ref{iris_clashes})
+in its IRI-safe version,\\
+\emph{<cl. tgt tbl>} refers to the table name of the table referenced by
+the respective foreign key (see Section~\ref{iris_clashes})
+in its IRI-safe version and\\
+\emph{<cl. 1st tgt col>} refers to the name of the first column referenced
+by the respective foreign key (see Section~\ref{iris_clashes})
+in its IRI-safe version.\\
+
+\begin{table}[H]\begin{centering}
+ \begin{tabular}{p{6.3cm}|p{9.7cm}}
+ \textbf{IRI type} & \textbf{Proposed IRI} \\ \hline
+ Entity map OWL class IRI & <base:>\code{TABLE\_\_}<cl. tbl name>\\
+ Identifier map IRI pattern & <base:>\code{TBL\_\_}<cl. tbl name>\code{/}<cl. name 1st pk col>\newline \ind{} \code{/\{\$1\}/}<\code{/}...>\\
+ Attribute map OWL property IRI & <base:>\code{PROP\_\_}<cl. tbl name>\code{\_\_}<cl. col name>\\
+ Relation map OWL property IRI & <base:>\code{REF\_\_}<cl. src tbl>\code{/}<cl. 1st src col>\newline \ind{} <\code{/}...>\code{/}<cl. tgt tbl>\code{/}<cl. 1st tgt col><\code{/}...>\\
+ Subtype map IRI prefixes & <base:>\code{SUBTYPE\_\_}<cl. tbl name>\code{\_\_}\newline \ind{} <cl. col name>\code{/}\\
+% % Slanted and with underscores and without "clean":
+% Entity map OWL class IRI & \textit{<base>}\code{:TABLE\_\_}\textit{<table\_name>}\\
+% Identifier map IRI pattern & \textit{<base>}\code{:TBL\_\_}\textit{<table\_name>}\code{/}\textit{<name\_1st\_pk\_col>}\newline \ind{} \code{/\{\$1\}/}\textit{<\code{/}...>}\\
+% Attribute map OWL property IRI & \textit{<base>}\code{:PROP\_\_}\textit{<table\_name>}\code{\_\_}\textit{<col\_name>}\\
+% Relation map OWL property IRI & \textit{<base>}\code{:REF\_\_}\textit{<src\_table>}\code{/}\textit{<1st\_fk\_src\_col>}\newline \ind{} \textit{<\code{/}...>}\code{/}\textit{<tgt\_table>}\code{/}\textit{<1st\_fk\_tgt\_col>}\textit{<\code{/}...>}\\
+% Subtype map IRI prefixes & \textit{<base>}\code{:SUBTYPE\_\_}\textit{<table\_name>}\code{\_\_}\textit{<col\_name>}\code{/}\\
+ \end{tabular}
+ \caption{Proposed IRIs to be used in OBDA specification map fields}
+ \label{bootstrap_tab_iris}
+\end{centering}\end{table}
+
+It is easily verified that the proposed IRI scheme is correct
+regarding the requirements described in Section~\ref{iris_req}:
+it provides unique IRIs for all types of IRIs it allows to create,
+regardless of the chosen base IRI
+(see proof in Section~\ref{iris_proof}).
+Furthermore, the IRI scheme is expressive: ignoring the base
+IRI part, the kind of entity identified by the IRI can be
+determined by beginning of the IRI.
+In addition, it is regular in that
+the name of the containing table always occurs before the
+name of the first database column.
+
+Taking the information in Section~\ref{iris_clashes} into account,
+it is trivial to observe that the suggested IRI scheme, leaving
+out the demand of IRI-safe versions, is still correct, given that
+all IRIs are generated using the same base IRI.
+
+Note that it is in any case necessary that the beginnings
+of all kinds of IRIs be mutually different:
+if, for example, an Identifier map IRI pattern also would
+commence with ``<base:>\code{TABLE\_\_}'', a table named
+``\code{PERSONS/\{17\}}'' -- which is a valid table name for
+example in SQL \cite{sql} -- possibly could get an Entity map OWL
+class IRI assigned which clashes with the IRI resulting from the
+expansion of the IRI pattern
+``<base:>\code{TABLE\_\_PERSONS/\{\$1\}}''.
+
+\subsection{Proof of correctness of the proposed IRI scheme}
+\label{iris_proof}
+As described in Section~\ref{iris_req}, the previously described
+IRI schema is required to generate several types of IRIs without
+introducing name clashes, thus two equal IRIs for two distinct
+entities, independently of the chosen base IRIs.
+Additionally, the user shall be able to chose additional IRIs
+he can be sure won't collide with IRIs generated with the scheme
+from an infinite set.
+
+In this proof, like in Section~\ref{iris_clashes}, the strings
+``\code{TABLE\_\_}'', ``\code{TBL\_\_}'', ``\code{PROP\_\_}'',
+``\code{REF\_\_}'' and ``\code{SUBTYPE\_\_}'' are called
+\emph{marking strings}.\\
+Strings prefixed by `\textasciitilde' are referred to as
+\emph{escaped}, while strings not prefixed by `\textasciitilde'
+are referred to as \emph{unescaped}.
+
+In the following, it is proven that the IRIs of each type do not
+clash, neither among themselves nor with IRIs of other types.
+Since all generated IRIs begin with a marking string, every IRI \emph{not}
+beginning with a marking string, thus an infinite quantity,
+is sure not to collide with any of the generated IRIs, and so,
+the correctness regarding to the stated requirements is then proven.
+
+Each Entity map OWL class IRI (including its base IRI) is of the form
+$\alpha$\code{TABLE\_\_}$\beta$, with $\alpha$ not ending with `\textasciitilde'
+and $\alpha$ and $\beta$ not containing any unescaped marking strings.\\
+Thus, $\beta$ is the table name, making the IRI unique among all
+other Entity map OWL class IRIs (see considerations in
+Section~\ref{iris_clashes}).
+Because the IRI does not contain any unescaped marking strings,
+it cannot collide with any IRI of another type and thus is indeed unique.
+
+The proof for Identifier map IRI patterns, Attribute map OWL property IRIs,
+Relation map OWL property IRIs and Subtype map IRI prefixes is exactly
+analog.
+
+\hfill $\Box$
\ No newline at end of file
\section{Ontology bootstrapping using OBDA specifications}
\label{bootstrap_spec}
+TODO: only mapping, no duplication, r2rml
\subsection{Structure of OBDA specifications}
+\label{bootstrap_spec_struc}
An OBDA specification consists of several so-called ``maps'', which are
data records containing data and references to each other describing
parts of the OBDA specification in statically defined fields \cite{eng}.
For different aspects of the specification, there are different map types,
while usually several maps exist for each type.
Namely, these are \emph{Entity maps} describing database tables,
-\emph{Attribute maps} describing database columns,
\emph{Identifier maps} describing database primary keys,
+\emph{Attribute maps} describing database columns,
\emph{Relation maps} describing database foreign keys,
\emph{Subtype maps} describing ``is-a'' relationships in the data and
\emph{Translation tables} describing desired translations of data.
The fields of the several types of maps and their interconnection via
references is shown in Figure~\ref{spec_fig_structure}.
+Here, each field specifies, in that order, the field label, the field's
+long name, the bootstrapping steps in which the field is used and the
+field's short name. Fields storing a set of values have both their short
+and their long name suffixed with ``\code{...}''.
+Note that each reference between two fields is denoted with a short field
+name contained in the source of the reference, specifying the field
+in which the reference is stored.
+What the values of the fields of the several types of maps express
+exactly and how they can be used is described in
+Section~\ref{bootstrap_spec_using}.
+For a full description as well of the structure of OBDA specifications
+as of their application and the general idea behind them,
+refer to \cite{eng}.
+How OBDA specifications can in turn be automatically bootstrapped,
+excluding Subtype maps and Translation tables is
+described in Section~\ref{bootstrap_bootstrap}.
\begin{figure}[H]\begin{center}
\includegraphics[scale=1.0]{Images/specification_structure.pdf}
\label{spec_fig_structure}
\end{center}\end{figure}
-While ,
-Subtype maps and Translation tables.
+Entity maps, Identifier maps, Attribute maps and Relation maps
+directly relate to database concepts and each of them describes
+exactly one database table, primary key, column or foreign key,
+respectively, and vice versa.
+Subtype maps and Translation tables, on the other hand,
+represent concepts of the bootstrapping process or data to be added to
+the target ontology and are somewhat harder to obtain:
+Subtype maps represent ``is-a'' relationships in the target ontology
+to be determined from the source data \cite{eng} -- heuristically or
+semi-automatically --, while Translation tables allow for the
+transformation of data values,
+for example from \code{TRUE} to \code{true} or from \code{No} to
+\code{false} \cite{eng}.
+Therefore, they also have to be determined heuristically or semi-
+automatically from the source data --
+considering the database schema only is not sufficient \cite{eng}.
+Note that, because of this, special care has to be taken to keep the maps
+synchronized with the data in case of Subtype maps and Translation tables.
+
+The structural description of OBDA specifications in \cite{eng}
+does not propose a serialization format in which OBDA specifications can
+be stored or read and written by software and human agents.
+How this can be done is subject to this thesis, which introduces the
+\oslboth{} designed exactly for this purpose in Chapter~\ref{osl}.
\subsection{Using OBDA specifiations}
+\label{bootstrap_spec_using}
+TODO: dirm
+TODO: r2rml
+
+As described in in Section~\fullref{back_obdaspecs}, using OBDA
+specifications provides several benefits when concerned in ontology
+bootstrapping.
+Principally, information about the bootstrapping process is collected in
+one place and can be used to manage the tools involved.
+This includes the availability of the URIs to be used in the constructed
+ontology from a central place, which is a great advantage, since URIs
+are central to an ontology TODO.
+Additionally, all information on the database schema of the source
+database is available.
+Using Translation tables, all these information can at will be made
+subject to transformations normalizing or correcting the data
+changing the database \cite{eng}.
+
+By the use of a single specification language like the \oslboth{}
+to store OBDA specifications,
+the expense of converting between different data formats can be
+reduced significantly:
+assumed that there are $n$ different formats to be handled with no means
+provided to convert between them, the converting costs decrease from
+$\mathcal{O}(n^2)$ to $\mathcal{O}(n)$ by introducing a single central
+language TODO.
+
+Besides the structure of OBDA specifications, described in
+Section~\ref{bootstrap_spec_struc}, Skjæveland et al. introduce a set
+of formal rules defining the bootstrapping process and the mapping of the
+source data to the generated ontology \cite{eng}.
+Using an OBDA specification, tools implementing these rules can
+easily and in a well-defined manner bootstrap an ontology and
+a mapping from the source data onto this ontology (or duplicate that
+source data, if the materialized OBDA approach is used, see
+Section~\ref{back_obda}).
+The mapping rules produce RDF triples which can be interpreted by R2RML
+to establish the mapping (see Section~\fullref{back_basic}).
+They are accompanied by SQL statements specifying the
+queries over the source database used to link the ontology to the data.
+If the data source is not a relational database but another form of
+structured data, like \name{CSV} files, a database schema can be
+bootstrapped first by applying additional ``database rules'' \cite{eng}.
+Afterwards, the proceeding can continue as if the data source were a
+database, so this case is neglected in the following.
+
+The rest of this section contains description of the information contained
+in the various types of OBDA specification maps and how it is used during
+bootstrapping, based on the description in \cite{eng}.
+Keep in mind that ``bootstrapping'' here refers to the ontology
+bootstrapping process specified by the OBDA specification, yielding a
+target ontology and mappings
+relating it to the source database -- it does not refer to the
+bootstrapping of the OBDA specification itself.
+The text is meant to give a brief explanatory overview over the bootstrapping
+process using OBDA specifications.
+Thus, it focuses on the information they provide and how they are used to
+link the bootstrapped ontology to the source data, leaving out the
+SQL statements to be used to gain the datasets.
+How exactly the ontology is created is also left out, since this would have
+involved introducing too many technical details, and moreover, the topic is
+also comprehensible by describing the mapping only.
+For a detailed description of the bootstrapping process, including ontology
+creation and all formal rules to be applied, see \cite{eng}.
+For an explanation about how an OBDA specification containing Entity maps,
+Identifier maps, Attribute maps and Relation maps can be bootstrapped from
+a relational database schema, see Section~\ref{bootstrap_bootstrap}.
+For details on URI generation, refer to Section~\fullref{iris}.
+
+\subsubsection{Entity maps}
+Entity maps provide information about the tables contained in the
+source database or in the intermediate database schema to be constructed,
+if the data source is not a database but some other source of structured
+information (see Section~\fullref{back_obda}).
+The information provided by an Entity map includes the table name,
+a label describing the table and a (more detailed) description
+of the table.
+Furthermore, each Entity map references an Identifier map representing
+its primary key and a set of Attribute maps representing its columns.
+Finally, an Entity map provides an OWL class URI identifying the
+represented table uniquely in the resulting ontology.
+As the name suggests, this URI is given to an OWL class which serves as
+OWL type (or more precisely: \code{rdf:type}) for all OWL individuals
+representing the datasets from the respective table in the target ontology
+(see Paragraph~``\nameref{iden}'').
+
+Suppose, for example, that an Entity map for a table ``\code{persons}''
+provides the OWL class URI ``\code{mydb:persons}''.
+Then, in the target ontology, all OWL individuals representing rows in the
+``\code{persons}'' table, will be of \code{rdf:type}
+``\code{mydb:persons}''.
+If a data record in the ``\code{persons}'' table has the identifying URI
+pattern ``\code{mydb:person/\{pno\}}'' (see Paragraph~\nameref{iden}),
+this type information will be expressed by the following RDF triple:\\
+\code{mydb:person/\{pno\} rdf:type mydb:persons}.
+
+\subsubsection{Identifier maps}
+\label{iden}
+Identifier maps describe database primary keys contained in the
+source database or in the intermediate database schema to be constructed,
+if the data source is not a database (see Section~\ref{back_obda}).
+Each Identifier map contains a reference back to the respective
+Entity map, representing the database table the primary key belongs to.
+Furthermore, each Identifier map references a set of Attribute maps,
+representing the database columns the primary key consists of
+(see next paragraph).
+Finally, an Identifier map provides a URI pattern, allowing OWL
+individuals in the bootstrapped target ontology that represent
+datasets to be identified.
+A URI pattern contains placeholders like ``\code{\{\$1\}}'' for
+all primary key columns, which are replaced with the respective column
+names, surrounded by curly braces, during the bootstrapping process,
+to yield a valid R2RML template \cite{r2rml}.
+Since the column name substituted in is still a placeholder,
+the result of this substitution is still a URI pattern and not a URI.
+Since such a URI pattern uniquely identifies a dataset from a given
+database table -- when data values are substituted in --,
+it will be called \emph{identifying URI pattern} in the following.
+
+Consider the URI pattern ``\code{mydb:person/\{\$1\}}''.
+Replacing the placeholder ``\code{\{\$1\}}'' with the column name
+``\code{pno}'' in curly braces yields the following identifying URI
+pattern: \\ ``\code{mydb:person/\{pno\}}''.
+
+\subsubsection{Attribute maps}
+Attribute maps provide information about database columns contained in the
+source database.
+Each Attribute map carries the column name, information whether having a
+value in this column is mandatory for a dataset (in SQL terms: whether it
+has a \code{NOT NULL} constraint), a label describing the column as well as
+an extended description of the column.
+A database column is represented as a relation in the final ontology, thus,
+in OWL terms, as an \code{owl:DataProperty} or an \code{owl:ObjectProperty}.
+The Attribute map provides the URI for this OWL property.
+
+Additionally, it specifies the datatype of values in the
+represented column in the following manner:
+Three fields are provided for this purpose, \code{SQL datatype},
+\code{RDF language} and \code{XSD datatype}.
+If the \code{XSD datatype} field is nonempty, its value is specified to be
+the datatype for values in the column the Attribute map represents
+(note that OWL only knows XSD datatypes TODO).
+Otherwise, if the value of the \code{SQL datatype} is a standard SQL type,
+it will be mapped to an XSD datatype and the resulting type is
+specified as datatype for values in the respective column.
+If neither of the above is the case and the \code{RDF language} field is
+nonempty, values in the respective column will be interpreted as strings
+with the value of the \code{RDF language} field applied as RDF language tag
+(TODO).
+If neither of the above is the case, values in the respective column will be
+interpreted as strings without an RDF language tag.
+
+Finally, an Attribute map specifies whether the column shall be represented
+as an \\ \code{owl:DataProperty} (for non-foreign-key columns) or as an
+\code{owl:ObjectProperty} (for foreign key columns).
+The field specifying that -- \code{Property type} -- also allows, as a
+further distinction of object properties, whether the property's
+target URI should be the column name placed in an URI pattern
+provided by the Attribute map,
+similarly to URI patterns in Identifier maps (see example at the end of this
+paragraph), or if it shall simply be the column name, possibly with a
+translation from a Translation table, specified by the Attribute map,
+applied.
+The former option is useful for example when using custom property URIs
+to express relations between source data and the target ontology.
+If an \code{owl:DataProperty} is generated, it always has the column name
+as target URI, without the use of an URI pattern.
+Note that it is sufficient to have the column name be the target of the
+property, since only a \emph{mapping} to the source data is generated.
+
+Consider an Attribute map representing a column named ``\code{name}'' to
+be mapped to an \\ \code{owl:DataProperty}.
+Suppose the OWL property URI the Attribute map specifies is\\
+``\code{mynamespace:lastName}'' and each dataset containing the
+\code{name} column has the identifying URI pattern
+``\code{mydb:person/\{pno\}}'' (see Paragraph~``\nameref{iden}'').
+Then, during the bootstrapping process
+the following RDF triple will be produced:\\
+\code{mydb:person/\{pno\} mynamespace:lastName "\{name\}"}.\\
+This triple can easily be interpreted by R2RML, which on request then
+retrieves the queried \code{name} from the data source.
+
+Consider, as a more elaborate example,
+an Attribute map representing a column named
+``\code{company}'' and to be mapped to an \code{owl:ObjectProperty} with the
+use of the URI pattern ``\code{http://otherdb/\{\$1\}}''.
+Suppose the OWL property URI the Attribute map specifies is
+``\code{mynamespace:hasSameOwnerAs}'', there is no datatype specified
+and each dataset containing the
+\code{company} column has the identifying URI pattern
+``\code{mydb:company/\{cmpno\}}''
+(see Paragraph~``\nameref{iden}'').
+Then, the following RDF triple will be produced during the bootstrapping
+process:\\
+\code{mydb:company/\{cmpno\} mynamespace:hasSameOwnerAs
+ http://otherdb/\{company\}}.\\
+Note that this only makes sense if R2RML can expand
+``\code{http://otherdb/\{company\}}'' to a valid subject
+for each value in the \code{company} column of the database,
+and if all rows in the respective database table are
+indeed entities having the same owner as the company specified in the
+\code{company} column.
+
+\subsubsection{Relation maps}
+Relation maps represent foreign keys contained in the
+source database.
+Each Relation map references the Entity maps representing the foreign key's
+child table and parent table, respectively.
+Furthermore, it provides the column names of both the foreign key columns
+and the referenced columns and specifies an OWL property URI.
+Relation maps allow the relations expressed in the source data via
+foreign key relationships to be included into the bootstrapped ontology.
+This happens in a simple and straight-forward manner:
+for each foreign key relationship, exactly one triple is generated which
+contains the two identifying
+URI patterns representing the source and the target dataset of the foreign
+key, respectively, and the OWL property URI specified by the Relation map.
+
+Suppose, for example, that a Relation map expressing a relation between
+datasets with the identifying URI patterns (see Paragraph~``\nameref{iden}'')
+``\code{mydb:persons/pno/\{pno\}}'' and
+``\code{mydb:companies/cmpno/\{cmpno\}}'', respectively, and specifying
+the OWL property URI ``\code{mynamespace:isEmployedAt}''.
+This will result in the following RDF triple to be generated during the
+bootstrapping process:\\
+\code{mydb:persons/pno/\{pno\} mynamespace:isEmployedAt
+ mydb:companies/cmpno/\{cmpno\}}
+
+\subsubsection{Subtype maps}
+Subtype maps provide a means to automatically add subclass-superclass
+relationships to the target ontology during the bootstrapping process.
+They specify an Entity map and a column name defining a table and a
+column, respectively, that exist in the source database and contain the
+values to be declared as belonging to the subclass.
+Furthermore, they store a prefix, a suffix and possibly a reference to
+a Translation table which are used to generate a URI for that subclass.
+Finally, they provide the URI of the superclass.
+This can be, for instance, some OWL class being created during the
+bootstrapping process or already existing in some imported ontology.
+The URI generated for the subclass contains the data value of the
+respective database column, thus every dataset gets its own (sub)class.
+During bootstrapping, an RDF triple declaring the value to belong to that
+subclass is generated for each data value of the respective table column in
+the source database.
+The limitation to the desired value only thereby happens by a restriction
+on the SQL statement accompanying the respective triple, not by
+limiting the triple to only cover a specific data value.
+The actual subclass-superclass relationship is expressed during the
+creation of the ontology.
+Note the difference from the previously described mapping rules, which
+produced triples independent from the data values in the source database.
+
+Consider a Subtype map specifying a table column containing the values
+``\code{Purchase}'' and ``\code{Sales}'' with the datasets having the
+identifying URI pattern (see Paragraph~``\nameref{iden}'')
+``\code{mydb:managers/mno/\{mno\}}''.
+Suppose, the Subtype map specifies the prefix\\
+``\code{mydb:manager/of\_department/}'', no suffix, no translation table
+and the supertype URI ``\code{mynamespace:persons}''.
+This will result in the generation of the following triples during the
+bootstrapping process:\\
+\code{mydb:managers/mno/\{mno\} rdf:type mydb:manager/of\_department/Purchase}\\
+\code{mydb:managers/mno/\{mno\} rdf:type mydb:manager/of\_department/Sales},\\
+while ``\code{mydb:manager/of\_department/Purchase}'' \\ and
+``\code{mydb:manager/of\_department/Sales}'' will be subclasses of class\\
+``\code{mynamespace:persons}'' in the target ontology.
+The accompanying SQL statement will ensure that, despite the use of the
+R2RML template ``\code{mydb:managers/mno/\{mno\}}'', not \emph{every}
+manager will be declared as the manager of \emph{every} department.
+
+\subsubsection{Translation tables}
+Translation tables allow for transforming URIs or other strings in
+arbitrary ways, by simply mapping each string to be translated to a
+target string.
+
+They don't reflect in the target ontology in any form but are used
+only during the bootstrapping process.
\subsection{Bootstrapping OBDA specifications}
+\label{bootstrap_bootstrap}
+How OBDA specifications can in turn be bootstrapped from database schemata
+is subject to this thesis and is explained in this section.
+For the description of the software developed to automate this, see
+Chapter~\fullref{program}.
+The description in this section assumes an SQL database as data source.
+However, ontology-based data access and OBDA specifications are not limited
+to SQL databases, as mentioned in Section~\fullref{back_obda} and in
+Section~\fullref{bootstrap_spec_using}.
+Furthermore, this section refers to OBDA specifications without assuming
+any specific format in which they are represented.
+How OBDA specifications are represented internally by the \myprog{}
+software, is described in Section~\fullref{fine}.
+For the description of a format to serialize OBDA specifications
+-- the output format of \myprog{} --,
+refer to Chapter~\fullref{osl}.
+
+Subtype maps and Translation tables are not considered in this approach,
+since they cannot be bootstrapped from schema information only but have to
+be determined from the input data (see Section~\fullref{bootstrap_spec_struc}).
+Thus, the bootstrapped OBDA specification does not contain maps of these
+types. Including them is a significant challenge in its own right and,
+since the use of heuristics or user decisions would be necessary,
+would make the process involve human supervision at least.
+Apart from that, the bootstrapping is an easy and straight-forward task
+which can be carried out fully automatic TODO.
+
+Recall the structure of an OBDA specification explained in
+Section~\fullref{bootstrap_spec_struc}.
+The map types considered in this approach are Entity maps, Attribute maps,
+Identifier maps and Relation maps.
+The assignment of values to their fields is summarized in
+Table~\ref{bootstrap_tab_mapping}, only hinting at how maps are
+generated. Both the generation of the maps and the assignment of values
+to their fields are described in the rest of this section, with one
+exception:
+since the generation of URIs (or IRIs) in the context of OBDA specifications
+is an essential topic which requires some conceptual efforts, it is
+described in a separate section, Section~\ref{iris}.
+
+\begin{table}[H]\begin{centering}
+ \begin{tabular}{p{3.2cm}|p{4.0cm}|p{8.3cm}}
+ \textbf{Map type} & \textbf{Field name} & \textbf{Value} \\ \hline
+ Entity map & Table name & SQL table name \\
+ Entity map & Label & <empty> \\
+ Entity map & \emph{Identifier map} & Identifier map for table \\
+ Entity map & \emph{Attribute maps...} & Attribute maps for table columns \\
+ Entity map & OWL class URI & URI(table) \\
+ Entity map & Description & SQL table description \\ \hline
+ Identifier map & \emph{Entity map} & Entity map for corresponding table \\
+ Identifier map & \emph{Attribute maps...} & Attribute maps for primary key columns \\
+ Identifier map & URI pattern & URIpattern(table) \\ \hline
+ Attribute map & Column name & SQL column name \\
+ Attribute map & SQL datatype & SQL datatype of column \\
+ Attribute map & Mandatory & SQL \code{NOT NULL} property of column\newline(\code{true} or \code{false}) \\
+ Attribute map & Label & <empty> \\
+ Attribute map & OWL property URI & <empty> for foreign key columns,\newline else URI(table, column) \\
+ Attribute map & Property type & ``\code{ObjectProperty}'' for foreign key columns,\newline else ``\code{DataProperty}'' \\
+ Attribute map & \emph{Translation} & <empty> \\
+ Attribute map & URI pattern & <empty> \\
+ Attribute map & RDF language & <empty> \\
+ Attribute map & XSD datatype & <empty> \\
+ Attribute map & Description & SQL column description \\ \hline
+ Relation map & \emph{Source entity map} & Entity map for foreign key child table \\
+ Relation map & Source column & Foreign key child columns\newline(SQL column names) \\
+ Relation map & \emph{Target entity map} & Entity map for foreign key parent table \\
+ Relation map & Target column & Foreign key parent columns\newline (SQL column names) \\
+ Relation map & OWL property URI & URI(table, foreignKey) \\
+ \end{tabular}
+ \caption{Assignment of values to fields of OBDA specification maps}
+ \label{bootstrap_tab_mapping}
+\end{centering}\end{table}
+
+\subsubsection{Entity maps}
+Exactly one Entity map and one Identifier map is generated per table
+contained in the source database.
+The generated Identifier map is referenced by the Entity map's
+\code{Identifier map} field.
+Similarly, exactly one Attribute map is generated per table column
+and these Attribute maps are referenced by the Entity map's
+\code{Attribute maps...} field.
+The Entity map's \code{Table name} field is set to the SQL name of
+the table, the \code{Label} field remains empty.
+An URI identifying the table is generated and stored in the
+Entity map's \code{OWL class URI} field.
+The SQL table description is copied into the Entity map's
+\code{Description} field.
+
+\subsubsection{Identifier maps}
+An Identifier map represents exactly one primary key in the source
+database and is referenced by the Entity map representing the table
+containing the primary key constraint.
+In addition, it references this table in its
+\code{Entity map} field, so that there is a bidirectional referencing.
+The Attribute maps representing the columns constituting the primary
+key are referenced by the Identifier map's \code{Attribute maps...}
+field.
+An URI pattern, allowing datasets (thus, rows in the source database)
+to be identified in the target ontology, is generated and put in
+the Identifier map's \code{URI pattern} field.
+
+\subsubsection{Attribute maps}
+An Attribute map represents exactly one column in the source database
+and is referenced by the Entity map representing the table
+containing the column.
+The Attribute map's \code{Column name} field is set to the SQL column
+name of the column, the \code{SQL datatype} field is set
+to its SQL datatype.
+The \code{Mandatory} field is set to \code{true} if the column has
+the SQL \code{NOT NULL} constraint, otherwise to \code{false}.
+If the column is part of a foreign key, the \code{OWL propert URI}
+field remains empty. Otherwise,
+an URI identifying the column is generated and stored in the
+Attribute map's \code{OWL property URI} field.
+The \code{Property type} field is set to ``\code{ObjectProperty}''
+if the column is part of a foreign key, otherwise to
+``\code{DataProperty}''.
+The SQL column description is copied into the Attribute map's
+\code{Description} field.
+The remaining columns, \code{Label}, \code{Translation},
+\code{URI pattern}, \code{RDF language} and \code{XSD datatype}
+remain empty.
+
+\subsubsection{Relation maps}
+A Relation map represents exactly one foreign key in the source database.
+It contains fields storing the parent and child table of the foreign
+key: the \code{Source entity map} field, referencing the
+Entity map representing the child table of the foreign key, and
+the \code{Target entity map} field, referencing the
+Entity map representing the parent table of the foreign key.
+The SQL column names of the foreign key columns (thus, column names
+in the child table) are copied into the \code{Source column}
+field of the Relation map, and the SQL column names of the referenced
+columns in the parent table are copied into its \code{Target column}
+field.
+Note that, in contrast to Identifier maps representing primary keys,
+it is not referenced by any Entity map (or any other map).
of the respective method's behavior.
Consider the following method from \file{CLIDatabaseInteraction.java}:
\codepar{public static void promptAbortRetrieveDBSchemaAndWait\\
- \ind(final FutureTask<DBSchema> retriever) throws SQLException}
+ \ind{}(final FutureTask<DBSchema> retriever) throws SQLException}
It could have been called \code{promptAbortRetrieveDBSchema} only, with the
waiting mentioned in a comment.
\p The \name{OWL} individuals described by the \osl{} document representing
the certain types of OBDA maps
-must have the IRIs specified in table \ref{spec_tbl_indv_iris}
+must have the IRIs specified in Table~\ref{spec_tbl_indv_iris}
(for base IRIs, see Paragraph~\ref{spec_base}).
Here, \textit{$<$class URI$>$} refers\\
to the \texttt{OWL class URI} field of the respective entity map for entity maps,\\
\begin{table}[]\begin{center}
\begin{tabular}{l|l}
\textbf{Map type} & \textbf{\name{OWL} IRI} \\ \hline
- Entity map & \textit{$<$class URI$>$}\texttt{\_\_ENTITY\_MAP} \\
- Attribute map & \textit{$<$property URI$>$}\texttt{\_\_ATTRIBUTE\_MAP} \\
- Identifier map & \textit{$<$class URI$>$}\texttt{\_\_IDENTIFIER\_MAP} \\
- Relation map & \textit{$<$property URI$>$}\texttt{\_\_RELATION\_MAP} \\
- Subtype map & \textit{$<$class URI$>$}\texttt{\_\_SUBTYPE\_MAP} \\
+ Entity map & $<$class URI$>$\texttt{\_\_ENTITY\_MAP} \\
+ Attribute map & $<$property URI$>$\texttt{\_\_ATTRIBUTE\_MAP} \\
+ Identifier map & $<$class URI$>$\texttt{\_\_IDENTIFIER\_MAP} \\
+ Relation map & $<$property URI$>$\texttt{\_\_RELATION\_MAP} \\
+ Subtype map & $<$class URI$>$\texttt{\_\_SUBTYPE\_MAP} \\
Translation table of attribute map &
- \textit{$<$property URI$>$}\texttt{\_\_ATTRIBUTE\_MAP\_\_TRANSLATION\_TABLE} \\
+ $<$property URI$>$\texttt{\_\_ATTRIBUTE\_MAP\_\_TRANSLATION\_TABLE} \\
Translation table of subtype map &
- \textit{$<$class URI$>$}\texttt{\_\_SUBTYPE\_MAP\_\_TRANSLATION\_TABLE} \\
+ $<$class URI$>$\texttt{\_\_SUBTYPE\_MAP\_\_TRANSLATION\_TABLE} \\
+% % Slanted:
+% Entity map & \textit{$<$class URI$>$}\texttt{\_\_ENTITY\_MAP} \\
+% Attribute map & \textit{$<$property URI$>$}\texttt{\_\_ATTRIBUTE\_MAP} \\
+% Identifier map & \textit{$<$class URI$>$}\texttt{\_\_IDENTIFIER\_MAP} \\
+% Relation map & \textit{$<$property URI$>$}\texttt{\_\_RELATION\_MAP} \\
+% Subtype map & \textit{$<$class URI$>$}\texttt{\_\_SUBTYPE\_MAP} \\
+% Translation table of attribute map &
+% \textit{$<$property URI$>$}\texttt{\_\_ATTRIBUTE\_MAP\_\_TRANSLATION\_TABLE} \\
+% Translation table of subtype map &
+% \textit{$<$class URI$>$}\texttt{\_\_SUBTYPE\_MAP\_\_TRANSLATION\_TABLE} \\
\end{tabular}
\caption{\name{OWL} individual IRIs in \osl{}}
\label{spec_tbl_indv_iris}
\p The \name{OWL} individuals described by the \osl{} document representing
the certain types of OBDA maps
-must be of the \name{OWL} types specified in table \ref{spec_tbl_types}
+must be of the \name{OWL} types specified in Table~\ref{spec_tbl_types}
(for base IRIs, see Paragraph~\ref{spec_base}).
%\vspace{\spacebeforetable{}}
\p The \name{OWL} properties described by the \osl{} document representing
the fields of the certain OBDA maps
-must have the IRIs specified in table \ref{spec_tbl_prop_iris}
+must have the IRIs specified in Table~\ref{spec_tbl_prop_iris}
(for base IRIs, see Paragraph~\ref{spec_base}).
%\vspace{\spacebeforetable{}}
-\chapter{The db2osl software}
+\chapter{The \myprog{} software}
\label{program}
+TODO: uris
+
Besides the conception of the \oslboth{}, the design
and implementation of the \myprog{} software was an important part of this work.
The program itself and its creation process are described in the following sections:
-\section{Architecture}
+\section{Architecture of \myprog{}}
\label{arch}
-\subsection{Libraries used}
-\subsection{Coarse structuring}
+\subsection{Libraries used in \myprog{}}
+\subsection{Coarse structuring of \myprog{}}
\label{coarse}
TODO: overall description, modularity, extendability, ex: easy to add new in-/output formats
TODO: mapping profiles (maybe better in next subsection)
TODO: Java, OPTIQUE
-\subsubsection{Package structuring}
+\subsubsection{Package structuring of \myprog{}}
The $45$ classes of \myprog{} were assigned to $11$ packages, each containing
classes responsible for the same area of operation or taking over similar roles.
Care was taken that package division happened senseful, producing meaningful packages
Since this doesn't have any functional implications \cite{java}, but is rather an
implementation detail, this is further explained in Section~\fullref{code_packages}.
-The packages are introduced and described in table \ref{arch_tbl_packages}.
-The lists of classes each package contains are given in table \ref{arch_tbl_classes}
-in the next Section~\fullref{fine}.
-For a detailed package description, refer to Appendix TODO.
+The packages are introduced and described in Table~\ref{arch_tbl_packages}.
+The lists of classes each package contains are given in Table~\ref{app_tbl_classes}
+in Appendix~\fullref{app_pkgs}.
\begin{table}[H]
\begin{tabular}{p{3cm}|p{13cm}} %\KOMAoption{fontsize}{\smallerfontsize{}}
%Package \code{helpers} depends on package \code{database}, which provides the \code{static}
%method \code{getSQLTypeName}.
-\subsection{Fine structuring}
+\subsection{Fine structuring of \myprog{}}
\label{fine}
+TODO: OBDA spec rep
+
While the packages in \myprog{} are introduced and described in Section~\fullref{coarse},
the classes that comprise them are addressed in this section.
-For a detailed class index, refer to Appendix TODO.
+For a list of classes contained in each package, refer to Appendix \ref{app_pkgs}.
+
TODO: total classes etc.
\subsubsection{Package contents}
\label{package_details}
-Table \ref{arch_tbl_classes} lists the classes each package contains.
+Table~\ref{app_tbl_classes} lists the classes each package contains.
The packages \code{cli}, \code{main}, \code{osl} and \code{settings} contain only
one class each, while the by far most extensive package is \code{database},
containing $15$ classes.
-\begin{table}[H]
- \begin{multicols}{2}\begin{itemize} %\KOMAoption{fontsize}{\smallerfontsize{}}
- \item \code{bootstrapping}
- \begin{itemize}
- \item \code{Bootstrapping}
- \item \code{DirectMappingURIBuilder}
- \item \code{URIBuilder}
- \end{itemize}
- \item \code{cli}
- \begin{itemize}
- \item \code{CLIDatabaseInteraction}
- \end{itemize}
- \item \code{database}
- \begin{itemize}
- \item \code{Column}
- \item \code{ColumnSet}
- \item \code{DatabaseException}
- \item \code{DBSchema}
- \item \code{ForeignKey}
- \item \code{Key}
- \item \code{PrimaryKey}
- \item \code{ReadableColumn}
- \item \code{ReadableColumnSet}
- \item \code{ReadableForeignKey}
- \item \code{ReadableKey}
- \item \code{ReadablePrimaryKey}
- \item \code{RetrieveDBSchema}
- \item \code{Table}
- \item \code{TableSchema}
- \end{itemize}
- \item \code{helpers}
- \begin{itemize}
- \item \code{Helpers}
- \item \code{MapValueIterable}
- \item \code{MapValueIterator}
- \item \code{ReadOnlyIterable}
- \item \code{ReadOnlyIterator}
- \item \code{SQLType}
- \item \code{UserAbortException}
- \end{itemize}
- \newpage
- \item \code{log}
- \begin{itemize}
- \item \code{ConsoleDiagnosticOutputHandler}
- \item \code{GlobalLogger}
- \end{itemize}
- \item \code{main}
- \begin{itemize}
- \item \code{Main}
- \end{itemize}
- \item \code{osl}
- \begin{itemize}
- \item \code{OSLSpecification}
- \end{itemize}
- \item \code{output}
- \begin{itemize}
- \item \code{ObjectSpecPrinter}
- \item \code{OSLSpecPrinter}
- \item \code{SpecPrinter}
- \end{itemize}
- \item \code{settings}
- \begin{itemize}
- \item \code{Job}
- \end{itemize}
- \item \code{specification}
- \begin{itemize}
- \item \code{AttributeMap}
- \item \code{EntityMap}
- \item \code{IdentifierMap}
- \item \code{InvalidSpecificationException}
- \item \code{OBDAMap}
- \item \code{OBDASpecification}
- \item \code{RelationMap}
- \item \code{SubtypeMap}
- \item \code{TranslationTable}
- \end{itemize}
- \item \code{test}
- \begin{itemize}
- \item \code{CreateTestDBSchema}
- \item \code{GetSomeDBSchema}
- \end{itemize}
- \end{itemize}\end{multicols} %\KOMAoption{fontsize}{\myfontsize{}}
- \caption{Class attachment to packages in \myprog{}}
- \label{arch_tbl_classes}
-\end{table}
-
\subsubsection{Class organization}
\label{hierarchies}
Organizing classes in a structured, obvious manner such that classes have well-defined
a program automatically deriving an OBDA specification
from a relational database schema,
which then can be used by other tools to drive the actual bootstrapping process.
-Its functionality is described in the following section,
-leaving out self-evident features, and is then listed completely
-in the section after that.
+Its functionality is described in this section,
+leaving out self-evident features, and is then listed completely.
How this functionality is exposed to users is described in Section~\fullref{interface}.
The bootstrapping process using direct mapping as the core functionality
of the software is described in Section~\ref{dirm}.
TODO: reference to OBDA topics
-\subsection{Function description}
The database schema is retrieved by connecting to an \name{SQL} database
and querying its schema information.
Parsing \name{SQL} scripts or \name{SQL} dumps currently is not supported.
Finally, a help text can be displayed which describes the usage of \myprog{} including the
description of all command-line arguments.
-\subsection{Function summary}
The functionality of the \myprog{} software can be summarized as follows:
\begin{itemize}
while not addressing any real difficulties.
The command-line arguments \myprog{} currently supports are
-described in table \ref{if_tbl_arguments_desc};
-their default values are listed in table \ref{if_tbl_arguments_def}.
+described in Table~\ref{if_tbl_arguments_desc};
+their default values are listed in Table~\ref{if_tbl_arguments_def}.
There is currently no switch to set the output format, since the only supported
output format, besides \osl{}, is a low-level output format for debugging purposes.
Because of this and since the change that has to be made in the source code to enable it
\codepar{db2osl -d mydb myserver.org | sha256sum >oldsum\\
cp oldsum newsum\\
while diff oldsum newsum; do\ \ \# while checksums are the same\\
- \ind sleep 3600\ \ \# wait 1 hour\\
- \ind db2osl -d mydb myserver.org | sha256sum >newsum\\
+ \ind{} sleep 3600\ \ \# wait 1 hour\\
+ \ind{} db2osl -d mydb myserver.org | sha256sum >newsum\\
done\\
rm oldsum newsum\\
\# notify web admin via e-mail:\\
that bootstraps all databases on a server:
\codepar{regex=\textquotesingle(?!\$).*\textquotesingle\ \ \# accept all nonempty database names first\\
while db2osl -d "\$regex" -o spec myserver.org; do\\
- \ind dbname="\`{} sed -ne \textquotesingle/xmlns:ont/ \{ s|.*/||; s|\#"||p \}\textquotesingle\ spec \`{}"\\
- \ind mv spec "\$dbname".osl\\
- \ind \# don't use this database a second time:\\
- \ind regex="\`{} printf \%s "\$regex" | sed -e "s,\textbackslash\textbackslash\textbackslash\textbackslash\$,\$|\$dbname\$," \`{}"\\
+ \ind{} dbname="\`{} sed -ne \textquotesingle/xmlns:ont/ \{ s|.*/||; s|\#"||p \}\textquotesingle\ spec \`{}"\\
+ \ind{} mv spec "\$dbname".osl\\
+ \ind{} \# don't use this database a second time:\\
+ \ind{} regex="\`{} printf \%s "\$regex" | sed -e "s,\textbackslash\textbackslash\textbackslash\textbackslash\$,\$|\$dbname\$," \`{}"\\
done}
Since the programming language used to implement \myprog{} is \name{Java},
\section{Numbers and statistics}
\label{stats}
-
+TODO: consequences
The following numbers and statistics can be stated about \myprog{}:
\begin{table}[H]\begin{centering}