]>
Commit | Line | Data |
---|---|---|
c31df1ed | 1 | \chapter{Background and related work} |
c31df1ed | 2 | |
002fa020 | 3 | \section{Background} |
45d598e9 PM |
4 | \subsection{Basic concepts} |
5 | \label{back_basic} | |
6 | TODO: r2rml, rdf, rdfs, owl, xml, iris, baseiri, end with : | |
7 | ||
b96bb723 | 8 | \subsection{Ontology-based data access (OBDA)} |
45d598e9 | 9 | \label{back_obda} |
b96bb723 PM |
10 | TODO: References |
11 | ||
12 | Storing data in relational databases is a very common proceeding, since the | |
13 | notion of a relational database is comprehensible and widely known, while the | |
14 | required software is widely available both commercially and as open-source | |
15 | software. | |
16 | Thus, it is easy for a domain expert to set up and populate a database. | |
17 | Furthermore, relational databases provide significant advantages concerning | |
18 | performance, data consistency and integrity, integration abilities, support | |
19 | and general prominence. | |
20 | These topics definitely played a major role in the success and the extensive | |
21 | exploration that databases discovered, up to the degree that these are the | |
22 | main fields the strengths of databases are seen in. | |
23 | Many -- if not all -- of these strengths trace back to the relatively fixed | |
24 | and rigid schema databases embody: a well-defined database schema imposing | |
25 | strong and clear cut constraints on the contained data. | |
26 | ||
27 | However, this principle also induces notable disadvantages. | |
28 | The database schema, although theoretically changeable, constitutes a | |
29 | significant burden on TODO the representation of data of dynamic | |
30 | environments, incomplete data or changing requirements, | |
31 | especially when dealing with large amounts of data. | |
32 | The resulting representation of data in unintuitive, suboptimal schemata | |
33 | makes the use of prolonged and complex query constructs inevitable, which | |
34 | lets more elaborate queries quickly become unmanageable for non-experts | |
35 | and time-consuming and error-prone even for experts. | |
36 | Ontologies, on the other hand, are much more flexible regarding incomplete | |
37 | data or changing requirements or environments and allow for much more | |
38 | intuitive and abstract query systems, while still being a quite | |
39 | comprehensible formalism | |
40 | (see Section~\ref{ontologies} for publications describing | |
41 | ontologies and the semantic web). | |
42 | Besides, ontologies provide support for different data records referencing | |
43 | the same entity, while databases do not \cite{eng}, and for the deduction of | |
44 | implicit information \cite{eng}, which with common database systems, if at | |
45 | all, is at least not possible out of the box and in an easy manner. | |
46 | Often, however, relational databases are preferred for their advantages | |
47 | (although the availability of cheaper yet more powerful hardware in some | |
48 | cases offsets these) or simply erroneously. | |
49 | Besides, in some cases, the migration to ontology-based systems, | |
50 | even if beneficial, is to costly to be seriously considered. | |
51 | %, often due to the large amounts of data that would have to be converted. | |
52 | ||
53 | Ontology-based data access often provides a solution to this collision of | |
54 | interests: | |
55 | By adding an ontology-based front-end processing the queries that is | |
56 | sensibly mapped to the data representation, | |
57 | the querying facilities of ontology-based systems are introduced, | |
58 | and changes in the data representation most often can be carried out | |
59 | without breaking existing queries; only the mapping has to be changed | |
60 | -- in a one-time effort -- and only when it introduces changes in the way | |
61 | it presents existing structures to the user, existing queries have to be | |
62 | modified. | |
63 | The creation of these mappings in turn can happen computer-aided or, in | |
64 | simple cases or to a certain degree of completeness and accuracy, | |
65 | the mappings can be completely bootstrapped. | |
66 | ||
67 | Moreover, in cases where it is to costly or for other cases infeasible | |
68 | to carry out a complete data duplication, the data can remain in the | |
69 | underlying database as is and the query front-end merely acts as an | |
70 | interface transforming the query into a database query \cite{eng} by | |
71 | making use of a backward-chaining technique called \emph{query rewriting} | |
72 | \cite{eng}. | |
73 | This approach is called \emph{virtual OBDA} or \emph{virtual RDF view} | |
74 | \cite{eng} and is illustrated in Figure~\ref{back_subfig_virtual}. | |
75 | The oppositional approach of duplicating all data is called | |
76 | \emph{materialized OBDA} or \emph{materialized RDF view} and is | |
77 | illustrated in Figure~\ref{back_subfig_materialized}. | |
78 | Virtual OBDA provides limited abilities compared to materialized OBDA | |
79 | in that it does not allow for decoupling from the source data by for | |
80 | example adding inferred information or applying elaborate transformations | |
81 | and does not support ``fragments of OWL for which query | |
82 | rewriting is not a complete deduction method'' \cite{eng}. | |
83 | However, the response time of systems is hard to predict solely from the | |
84 | architectural approach used, so if it is critical, several systems should | |
85 | be prototyped and evaluated upfront on what are expected to be | |
86 | typical queries \cite{eng}. | |
87 | ||
88 | \begin{figure}[H]\begin{center} | |
89 | \begin{subfigure}[b]{0.8\textwidth} | |
90 | \includegraphics[scale=1.2]{Images/bootstrapping_materialized.pdf} | |
91 | \caption{OBDA system architecture with materialized \name{RDF} view} | |
92 | \label{back_subfig_materialized} | |
93 | \end{subfigure}\\~\\ | |
94 | \begin{subfigure}[b]{0.8\textwidth} | |
95 | \includegraphics[scale=1.2]{Images/bootstrapping_virtual.pdf} | |
96 | \caption{OBDA system architecture with a virtual \name{RDF} view} | |
97 | \label{back_subfig_virtual} | |
98 | \end{subfigure} | |
99 | \caption[The two basic OBDA system architectures] | |
100 | {The two basic OBDA system architectures: | |
101 | materialized and virtual \name{RDF} views (from \cite{eng})} | |
102 | \label{back_fig_bootstrapping} | |
103 | \end{center}\end{figure} | |
104 | ||
105 | Finally, it has to be mentioned, that ontology-based data access is not | |
106 | limited to databases. | |
107 | Although this is the most common scenario and the only one this thesis | |
108 | deals with, ontology-based data access also works with other sources of | |
109 | structured information, like \name{ODS}, \name{XLS} or \name{CSV} files, | |
110 | though some additional preparation might be necessary in these cases \cite{eng}. | |
111 | ||
112 | \subsection{OBDA specifications} | |
45d598e9 PM |
113 | \label{back_obdaspecs} |
114 | TODO: more, maybe shorten introduction | |
115 | ||
b96bb723 | 116 | As mentioned in Section~\fullref{motivation}, the sole bootstrapping of |
002fa020 PM |
117 | \name{RDF} triples \cite{rdf} or other forms of structured information |
118 | from relational database schemata is a relatively well understood topic. | |
b96bb723 PM |
119 | This is outlined more comprehensively in Section~\fullref{related}. |
120 | ||
121 | Nonetheless, bootstrapping remains an elaborate process involving complex | |
122 | tools to be invoked -- possibly in different versions and configurations | |
123 | and processing different formats -- and working on changing input data | |
124 | \cite{eng}. | |
125 | This is why Skjæveland and others proposed the introduction of OBDA | |
126 | specifications centralizing the task of driving these tools and | |
127 | to gather in one place all the information describing the desired mapping | |
128 | between the source database and the target ontology \cite{eng}. | |
129 | As described in Section~\fullref{related}, their approach is the | |
130 | foundation of this thesis, which describes and specifies a format for | |
131 | storing and exchanging such OBDA specifications -- the \oslboth{} -- and | |
132 | introduces a tool that in turn automatically bootstraps OBDA Specifications | |
133 | from relational database schemata -- the \myprog{} software. | |
c31df1ed | 134 | |
b96bb723 PM |
135 | The bootstrapping process using OBDA specifications and \myprog{} is |
136 | illustrated in Figure~\ref{intro_fig_bootstrapping} | |
137 | in Section~\fullref{approach}. | |
c31df1ed | 138 | |
b96bb723 PM |
139 | \subsection{The \name{OPTIQUE} project} |
140 | The problems addressed in Section~\fullref{motivation} are a big issue | |
141 | inter alia TODO in the oil and gas industry: | |
142 | $30 \%$ to $70 \%$ of the working time of engineers is spent on collecting | |
143 | data or assessing its quality \cite{crompton}. | |
144 | This led to the origination of the \name{OPTIQUE} project in TODO which | |
145 | ``advocates for a next generation of the well known Ontology-Based Data Access | |
146 | (OBDA) approach to address the data access problem [...] [aiming] at | |
147 | solutions that reduce the cost of data access dramatically'' \cite{optique}. | |
148 | Thus, the \name{OPTIQUE} project tries to reach exactly the benefits a | |
45d598e9 PM |
149 | well-developed OBDA system can provide (explained in |
150 | Section~\ref{back_obda}): | |
b96bb723 PM |
151 | an easy end-user access to data without knowing about its structuring |
152 | while taking advantage of automatic translations \cite{optique2}. | |
153 | In doing so, ascertained shortcomings of existing OBDA systems were addressed: | |
154 | \emph{usability} (for example the need to use formal query languages), | |
155 | \emph{costly prerequisites} (consider, for example, the disadvantages | |
45d598e9 | 156 | of materialized OBDA described in Section~\ref{back_obda}) and |
b96bb723 PM |
157 | \emph{efficiency} (which was perceived as being insufficiently addressed |
158 | in previous approaches) \cite{optique}. | |
c31df1ed | 159 | |
c31df1ed | 160 | \section{Related work} |
b96bb723 PM |
161 | \label{related} |
162 | \subsection{Ontologies and the semantic web -- publications} | |
163 | \label{ontologies} | |
164 | ||
165 | \subsection{OBDA specifications -- publications} | |
166 | A publication building the foundation of the work presented in this thesis, is | |
167 | the summarizing and benchmarking work on OBDA specifications by Skjæveland et al. | |
168 | \cite{eng}, the group that developed them in their present form. | |
169 | ||
170 | \subsection{OBDA systems -- publications} | |
171 | ||
172 | \subsection{General ontology bootstrapping -- publications} | |
173 | Skjæveland, Lian and Horrocks \cite{npd} provided an exemplifying description of | |
174 | the transformation of the \emph{NPD FactPages}, an enormous collection of data | |
175 | related to oil drilling on the Norwegian continental shelf, provided by | |
176 | the Norwegian Petroleum Dictorate (NPD). | |
177 | ||
178 | Sequeda et al. \cite{survey} provided an overview over different | |
179 | direct mapping approaches. | |
180 | ||
181 | Sequeda, Arenas and Miranker \cite{ondirm} \cite{autodirm} | |
182 | describe the direct mapping of relational databases to \name{RDF} | |
183 | and \name{OWL} formally. | |
184 | ||
185 | Stojanovic, Stojanovic and Volz \cite{sac} published a formal description of | |
186 | the mapping of relational databases onto ontology-based structures, describing | |
187 | concepts preceding and/or supplementing \name{OWL} and using \name{F-LOGIC} TODO | |
188 | as target language. | |
189 | ||
190 | \subsection{The \name{OPTIQUE} project -- publications} | |
191 | Calvanese et al. \cite{optique2} presented the \name{OPTIQUE} project including | |
192 | its underlying OBDA system and showed limitations of current OBDA systems. | |
193 | ||
194 | Kharlamov et al. \cite{optique} described the first version of the \name{OPTIQUE} | |
195 | system, customized for use with the \emph{NPD FactPages} of the | |
196 | Norwegian Petroleum Dictorate (NPD). | |
197 | ||
198 | Skjæveland and Lian \cite{benefits} summarized the benefits of and the | |
199 | proceeding for converting the \name{NPD FactPages} to linked data \cite{linked} | |
200 | and discuss associated terms like linked data, URIs, \name{RDF} and | |
201 | \name{SPARQL}. | |
202 | ||
203 | \subsection{Alternative approaches -- publications} | |
204 | Barrasa, Corcho and P{\'e}rez \cite{r2o} proposed a declarative mapping | |
205 | language -- \name{r2o} -- able to express a mapping between a relational | |
206 | database and ontologies represented in the \name{OWL} and \name{RDF} formats. | |
207 | This approach however aims at connecting existing databases | |
208 | and existing ontologies. | |
209 | ||
210 | TODO: R2RML, SQL2SW |