]>
Commit | Line | Data |
---|---|---|
45d598e9 | 1 | \section{Generating unique IRIs for OBDA specification map fields} |
b96bb723 | 2 | \label{iris} |
45d598e9 PM |
3 | As explained in Section~\fullref{back_basic}, IRIs play a central role in |
4 | diverse topics related to ontology-based data access. | |
5 | They provide the means to uniquely identify entities, which of course is | |
6 | a necessity for data retrieval. | |
7 | As also explained in Section~\ref{back_basic}, every URI is also a IRI, so | |
8 | although Skjæveland et al. use the term ``URI'' in the introduction of | |
9 | their approach of using OBDA specifications for ontology bootstrapping -- | |
10 | and that term is also used in Section~\ref{bootstrap_spec}, which describes | |
11 | this approach -- in this section the general term ``IRI'' is used, marking | |
12 | that the introduced concepts are valid for all types of IRIs. | |
13 | ||
14 | When dealing with ontology bootstrapping using OBDA specifications, it is | |
15 | important to differentiate between the three types of IRIs occurring | |
16 | in this matter, which will be underlined by the following unambiguous naming: | |
17 | \begin{itemize} | |
18 | \item \emph{Data IRIs} identify entities in the bootstrapped ontology | |
19 | \item \emph{OBDA IRIs} are used as values for the fields of | |
20 | OBDA specification entities | |
21 | \item \emph{OSL IRIs} identify components in serialized OBDA specifications, | |
22 | using the \oslboth{} introduced in Chapter~\ref{osl} for serialization | |
23 | \end{itemize} | |
24 | ||
25 | Skjæveland et al. do not define or assume a particular scheme for IRI generation | |
26 | in their introduction of OBDA specifications \cite{eng}. | |
27 | Instead, the IRI generation strategy is only adumbrated by giving examples | |
28 | of entities having IRIs. | |
29 | The examplified scheme was used for the implementation of the \myprog{} software | |
30 | bootstrapping OBDA specifications from relational database schemata, which is | |
31 | described in this thesis | |
32 | (see Section~\fullref{program} and Section~\fullref{impl}). | |
33 | The direct mapping approach for ontology bootstrapping described in | |
34 | Section~\ref{dirm}, on the other hand, introduces a scheme for | |
35 | IRI generation \cite{dirm}, but with this scheme, name clashes can occur, | |
36 | as explained in Section~\ref{dirm_iris}. | |
37 | The \oslboth{}, finally, defines a proper scheme for OSL IRIs, | |
38 | as is explained in Section~TODO. | |
39 | ||
40 | In the following, an enhanced scheme for the generation of OBDA IRIs | |
41 | is proposed, which resembles the previously mentioned scheme used for | |
42 | OSL IRIs and which also may serve as a blueprint for | |
43 | other IRI generation strategies. | |
44 | ||
45 | \subsection{Requirements for the IRI scheme} | |
46 | \label{iris_req} | |
47 | As explained in Section~\fullref{back_basic}, the main requirement on a IRI | |
48 | generation scheme is uniqueness of the IRIs: no two entities must be possibly | |
49 | assigned the same IRI, regardless of their kind, of how low the probability | |
50 | of a name clash (IRI collision) is or of the conditions leading | |
51 | to a name clash. | |
52 | Additionally, IRI uniqueness shall be independent from the base IRIs, | |
53 | thus a base IRI shall be arbitrarily selectable for each generation process | |
54 | without introducing name clashes even with IRIs having other base IRIs. | |
55 | ||
56 | As to OBDA specification entities, the following kinds of IRIs | |
57 | have to be available, including IRI patterns: | |
58 | \begin{itemize} | |
59 | \item Entity map OWL class IRIs | |
60 | \item Identifier map IRI patterns | |
61 | \item Attribute map OWL property IRIs | |
62 | \item Attribute map IRI patterns | |
63 | \item Relation map OWL property IRIs | |
64 | \item Subtype map (IRI) prefixes | |
65 | \item Subtype map (IRI) suffixes | |
66 | \item Subtype map OWL superclass IRIs | |
67 | \end{itemize} | |
68 | ||
69 | As Subtype map OWL superclass IRIs are IRIs of data entities | |
70 | already existing in the target ontology by some means | |
71 | (see Section~\fullref{bootstrap_spec_using}), they do not have to be | |
72 | generated and thus are ignored in the following. | |
73 | Exactly the same holds for Attribute map IRI patterns. | |
74 | Furthermore, this approach creates Subtype map IRI prefixes already leading to | |
75 | unique IRIs for Subtype map subclasses and so Subtype map IRI suffixes are | |
76 | ignored in the following. | |
77 | Since an IRI generation scheme cannot avoid collisions with existing IRIs | |
78 | out of its outreach and these collisions can easily be prevented, | |
79 | for example, by giving them another base IRI (see Setion~\fullref{back_basic}), | |
80 | this case is excluded from the requirement that no two URIs must collide | |
81 | under any circumstances. | |
82 | However, the user shall be able to chose such externally generated IRIs | |
83 | from an infinite set of IRIs, while being sure that no name clashes | |
84 | will occur. | |
85 | ||
86 | So compendious, the requirements on the IRI generation scheme are that | |
87 | Entity map OWL class IRIs, Identifier map IRI patterns, | |
88 | Attribute map OWL property IRIs, Relation map OWL property IRIs and | |
89 | Subtype map IRI prefixes can be generated that, regardless of the | |
90 | chosen base IRIs, don't clash among another, while leaving an | |
91 | infinite set of predictable IRIs that don't clash with any of the | |
92 | generated IRIs. | |
93 | ||
94 | \subsection{Avoiding name clashes in the IRI scheme} | |
95 | \label{iris_clashes} | |
96 | Generating unique Entity map OWL class IRIs ignoring base IRIs is not much of | |
97 | a problem, assuming database table names are distinct, which is guaranteed in a | |
98 | common database system like SQL \cite{sql}. | |
99 | Including the table name into an Entity map OWL class IRI is sufficient to | |
100 | prevent it from colliding with other IRIs with the same base IRI. | |
101 | However, when taking two different base IRIs into account that are used for | |
102 | two IRIs created according to this scheme, things get more complicated.\\ | |
103 | Consider, for example, a database table named ``\code{Persons}'' and a | |
104 | table named \\ ``\code{Persons\_\_TABLE\_\_Persons}''. | |
105 | Generating an IRI according to the scheme | |
106 | ``<base:>\code{TABLE\_\_} <table name>'' for each of these tables, using the | |
107 | base IRI ``\code{TABLE\_\_Persons\_\_}'' for the first one and the | |
108 | empty base IRI for the second one, both tables will get the IRI\\ | |
109 | ``\code{TABLE\_\_Persons\_\_TABLE\_\_Persons}'', although | |
110 | the table name was included into the IRI in both cases. | |
111 | The problem is that the ``\code{TABLE\_\_}'' string occurring in the | |
112 | table name cannot be discriminated from the ``\code{TABLE\_\_}'' string | |
113 | added in the course of IRI generation or the ``\code{TABLE\_\_}'' string | |
114 | occurring in the base IRI. | |
115 | To solve the problem, a marker has to be included in the URI which definitely | |
116 | indicates the beginning of the table name. In addition, this marker will | |
117 | uniquely identify Entity map OWL class IRIs. | |
118 | For both aims to be achieved, an escape symbol must be used, which makes | |
119 | the marker unique at least outside the base IRI part, by escaping the marker | |
120 | whenever it occurs in the table name. | |
121 | ||
122 | Regarding Identifier map IRI patterns, the IRI resulting from the expansion of | |
123 | the pattern will contain the column names of the primary key represented by | |
124 | the respective Identifier map \cite{eng}. | |
125 | Further on, the table name of the database table containing that primary | |
126 | key has to be included in the IRI pattern, since | |
127 | two distinct tables may have primary keys with equally named columns. | |
128 | This will make the IRI pattern a unique Identifier map IRI pattern, | |
129 | since a database table can be assumed to only have one primary key, | |
130 | as is the case in common database systems like SQL \cite{sql}. | |
131 | The fact that primary key values are unique for each dataset ensures that | |
132 | unique Identifier map IRI patterns expand to unique IRIs. | |
133 | Moreover, it has to be ensured, that IRIs resulting from the expansion of | |
134 | Identifier map IRI patterns do not collide with IRIs of other kinds.\\ | |
135 | Taking arbitrary and particularly varying base IRIs into account, | |
136 | a definite marker has to be included in the IRI pattern and other occurrences | |
137 | of this marker in the IRI pattern have to be escaped. This uniquely | |
138 | identifies Identifier map IRI patterns and unambiguously | |
139 | distinguishes the table name from the rest of the IRI. | |
140 | ||
141 | Concerning Attribute map OWL property IRIs, they will be unique among their | |
142 | kind when they include the column name of the database column they | |
143 | represent besides the table name of the table containing it, | |
144 | since database table names can be assumed to be distinct and | |
145 | column names can be assumed to be unique within a table, which is | |
146 | guaranteed in a common database system like SQL \cite{sql}. | |
147 | Furthermore, Attribute map OWL property IRIs have to be prevented from | |
148 | colliding with IRIs of other kinds.\\ | |
149 | Taking arbitrary and particularly varying base IRIs into account, | |
150 | definite markers have to be included in the IRI and other occurrences | |
151 | of this marker in the IRI have to be escaped. This uniquely | |
152 | identifies Attribute map OWL property IRIs and unambiguously | |
153 | distinguishes the table name and the column name from the rest of the IRI | |
154 | and from one another. | |
155 | ||
156 | Regarding Relation map OWL property IRIs, including the table name and | |
157 | the column names of both the foreign key represented by the Relation map | |
158 | (or the containing table, respectively) and the referenced key (or its | |
159 | containing table, respectively) in the IRI will make it a unique | |
160 | Relation map OWL property IRI. | |
161 | Note that including only the table name and the column names of the foreign | |
162 | key (or its containing table, respectively) would not be sufficient, since | |
163 | several distinct foreign keys covering exactly the same columns can exist | |
164 | in a table (this is what the IRI generation scheme of the direct mapping | |
165 | approach misses). The same applies of course for the referenced table and | |
166 | its columns -- several foreign keys can reference them. | |
167 | Moreover, these Regarding Relation map OWL property IRIs have to be | |
168 | prevented from colliding with IRIs of other kinds.\\ | |
169 | Taking arbitrary and particularly varying base IRIs into account, | |
170 | definite markers have to be included in the IRI pattern and other occurrences | |
171 | of this marker in the IRI have to be escaped. This uniquely | |
172 | identifies Relation map OWL property IRIs and in particular their parts | |
173 | providing the table and column names. | |
174 | ||
175 | Concerning Subtype map IRI prefixes, they must include the column name of | |
176 | the database column containing the values to be declared belonging to | |
177 | the subclass. Further on, since another database table could contain a | |
178 | column of the same name, the IRI must include the table name of the | |
179 | database table containing the column. | |
180 | This will make Subtype map IRI prefixes unique among their kind. | |
181 | Note that a Subtype map IRI prefix, similarly to a IRI pattern, | |
182 | does not specify the final IRI but is subject to expansion. | |
183 | This expansion can yield the same IRI for different data records, which, | |
184 | however, is not considered a collision, since this behavior is | |
185 | intentional -- every two data records having the same value in | |
186 | the respective column, and only those, will get the same IRI. | |
187 | Additionally, it has to be ensured, that IRIs resulting from such an | |
188 | expansion do not collide with IRIs of other kinds.\\ | |
189 | Taking arbitrary and particularly varying base IRIs into account, | |
190 | definite markers have to be included in the IRI pattern and other occurrences | |
191 | of this marker in the IRI prefix have to be escaped. This uniquely | |
192 | identifies Subtype map IRI prefixes and unambiguously | |
193 | distinguishes the table name and the column name from the rest of the IRI | |
194 | and from one another. | |
195 | ||
196 | For an example that makes awkwardly -- or fraudulently -- chosen base | |
197 | IRIs introduce name clashes, see the paragraph about Entity map | |
198 | OWL class IRIs at the beginning of this section. | |
199 | ||
200 | \subsection{The proposed IRI generation scheme} | |
201 | \label{iris_scheme} | |
202 | This section introduces an IRI generation scheme meeting the requirements | |
203 | formulated in Section~\ref{iris_req}. | |
204 | ||
205 | In this section, the following strings are subsumed under the term | |
206 | \emph{marking strings}:\\ | |
207 | ``\code{TABLE\_\_}'', ``\code{TBL\_\_}'', ``\code{PROP\_\_}'', | |
208 | ``\code{REF\_\_}'' and ``\code{SUBTYPE\_\_}''.\\ | |
209 | The string built by escaping (prefixing) all occurrences of marking strings or | |
210 | `\textasciitilde' characters in a string $s$ with a `\textasciitilde' | |
211 | character will be called the \emph{IRI-safe version} of $s$. | |
212 | ||
213 | The IRI generation scheme is presented in Table~\ref{bootstrap_tab_iris}. | |
214 | Here, \\ \emph{<base:>} refers to the base IRI to be used for | |
215 | the generated IRI (see Section~\fullref{back_basic}),\\ | |
216 | \emph{<cl. tbl name>} refers to the table name of the database table | |
217 | concerning (see Section~\ref{iris_clashes}) in its IRI-safe version,\\ | |
218 | \emph{<cl. name 1st pk col>} refers to the name of the first primary key | |
219 | column concerning (see Section~\ref{iris_clashes}) in its IRI-safe version,\\ | |
220 | \emph{<\code{/}...>} refers to the continuation of the previous pattern | |
221 | using the remaining primary key or foreign key columns,\\ | |
222 | \emph{<cl. col name>} refers to the name of the column in question | |
223 | (see Section~\ref{iris_clashes}) in its IRI-safe version,\\ | |
224 | \emph{<cl. src tbl>} refers to the table name of the database table | |
225 | containing the respective foreign key (see Section~\ref{iris_clashes}) | |
226 | in its IRI-safe version,\\ | |
227 | \emph{<cl. 1st src col>} refers to the name of the first foreign key | |
228 | column of the respective foreign key (see Section~\ref{iris_clashes}) | |
229 | in its IRI-safe version,\\ | |
230 | \emph{<cl. tgt tbl>} refers to the table name of the table referenced by | |
231 | the respective foreign key (see Section~\ref{iris_clashes}) | |
232 | in its IRI-safe version and\\ | |
233 | \emph{<cl. 1st tgt col>} refers to the name of the first column referenced | |
234 | by the respective foreign key (see Section~\ref{iris_clashes}) | |
235 | in its IRI-safe version.\\ | |
236 | ||
237 | \begin{table}[H]\begin{centering} | |
238 | \begin{tabular}{p{6.3cm}|p{9.7cm}} | |
239 | \textbf{IRI type} & \textbf{Proposed IRI} \\ \hline | |
240 | Entity map OWL class IRI & <base:>\code{TABLE\_\_}<cl. tbl name>\\ | |
241 | Identifier map IRI pattern & <base:>\code{TBL\_\_}<cl. tbl name>\code{/}<cl. name 1st pk col>\newline \ind{} \code{/\{\$1\}/}<\code{/}...>\\ | |
242 | Attribute map OWL property IRI & <base:>\code{PROP\_\_}<cl. tbl name>\code{\_\_}<cl. col name>\\ | |
243 | Relation map OWL property IRI & <base:>\code{REF\_\_}<cl. src tbl>\code{/}<cl. 1st src col>\newline \ind{} <\code{/}...>\code{/}<cl. tgt tbl>\code{/}<cl. 1st tgt col><\code{/}...>\\ | |
244 | Subtype map IRI prefixes & <base:>\code{SUBTYPE\_\_}<cl. tbl name>\code{\_\_}\newline \ind{} <cl. col name>\code{/}\\ | |
245 | % % Slanted and with underscores and without "clean": | |
246 | % Entity map OWL class IRI & \textit{<base>}\code{:TABLE\_\_}\textit{<table\_name>}\\ | |
247 | % Identifier map IRI pattern & \textit{<base>}\code{:TBL\_\_}\textit{<table\_name>}\code{/}\textit{<name\_1st\_pk\_col>}\newline \ind{} \code{/\{\$1\}/}\textit{<\code{/}...>}\\ | |
248 | % Attribute map OWL property IRI & \textit{<base>}\code{:PROP\_\_}\textit{<table\_name>}\code{\_\_}\textit{<col\_name>}\\ | |
249 | % Relation map OWL property IRI & \textit{<base>}\code{:REF\_\_}\textit{<src\_table>}\code{/}\textit{<1st\_fk\_src\_col>}\newline \ind{} \textit{<\code{/}...>}\code{/}\textit{<tgt\_table>}\code{/}\textit{<1st\_fk\_tgt\_col>}\textit{<\code{/}...>}\\ | |
250 | % Subtype map IRI prefixes & \textit{<base>}\code{:SUBTYPE\_\_}\textit{<table\_name>}\code{\_\_}\textit{<col\_name>}\code{/}\\ | |
251 | \end{tabular} | |
252 | \caption{Proposed IRIs to be used in OBDA specification map fields} | |
253 | \label{bootstrap_tab_iris} | |
254 | \end{centering}\end{table} | |
255 | ||
256 | It is easily verified that the proposed IRI scheme is correct | |
257 | regarding the requirements described in Section~\ref{iris_req}: | |
258 | it provides unique IRIs for all types of IRIs it allows to create, | |
259 | regardless of the chosen base IRI | |
260 | (see proof in Section~\ref{iris_proof}). | |
261 | Furthermore, the IRI scheme is expressive: ignoring the base | |
262 | IRI part, the kind of entity identified by the IRI can be | |
263 | determined by beginning of the IRI. | |
264 | In addition, it is regular in that | |
265 | the name of the containing table always occurs before the | |
266 | name of the first database column. | |
267 | ||
268 | Taking the information in Section~\ref{iris_clashes} into account, | |
269 | it is trivial to observe that the suggested IRI scheme, leaving | |
270 | out the demand of IRI-safe versions, is still correct, given that | |
271 | all IRIs are generated using the same base IRI. | |
272 | ||
273 | Note that it is in any case necessary that the beginnings | |
274 | of all kinds of IRIs be mutually different: | |
275 | if, for example, an Identifier map IRI pattern also would | |
276 | commence with ``<base:>\code{TABLE\_\_}'', a table named | |
277 | ``\code{PERSONS/\{17\}}'' -- which is a valid table name for | |
278 | example in SQL \cite{sql} -- possibly could get an Entity map OWL | |
279 | class IRI assigned which clashes with the IRI resulting from the | |
280 | expansion of the IRI pattern | |
281 | ``<base:>\code{TABLE\_\_PERSONS/\{\$1\}}''. | |
282 | ||
283 | \subsection{Proof of correctness of the proposed IRI scheme} | |
284 | \label{iris_proof} | |
285 | As described in Section~\ref{iris_req}, the previously described | |
286 | IRI schema is required to generate several types of IRIs without | |
287 | introducing name clashes, thus two equal IRIs for two distinct | |
288 | entities, independently of the chosen base IRIs. | |
289 | Additionally, the user shall be able to chose additional IRIs | |
290 | he can be sure won't collide with IRIs generated with the scheme | |
291 | from an infinite set. | |
292 | ||
293 | In this proof, like in Section~\ref{iris_clashes}, the strings | |
294 | ``\code{TABLE\_\_}'', ``\code{TBL\_\_}'', ``\code{PROP\_\_}'', | |
295 | ``\code{REF\_\_}'' and ``\code{SUBTYPE\_\_}'' are called | |
296 | \emph{marking strings}.\\ | |
297 | Strings prefixed by `\textasciitilde' are referred to as | |
298 | \emph{escaped}, while strings not prefixed by `\textasciitilde' | |
299 | are referred to as \emph{unescaped}. | |
300 | ||
301 | In the following, it is proven that the IRIs of each type do not | |
302 | clash, neither among themselves nor with IRIs of other types. | |
303 | Since all generated IRIs begin with a marking string, every IRI \emph{not} | |
304 | beginning with a marking string, thus an infinite quantity, | |
305 | is sure not to collide with any of the generated IRIs, and so, | |
306 | the correctness regarding to the stated requirements is then proven. | |
307 | ||
308 | Each Entity map OWL class IRI (including its base IRI) is of the form | |
309 | $\alpha$\code{TABLE\_\_}$\beta$, with $\alpha$ not ending with `\textasciitilde' | |
310 | and $\alpha$ and $\beta$ not containing any unescaped marking strings.\\ | |
311 | Thus, $\beta$ is the table name, making the IRI unique among all | |
312 | other Entity map OWL class IRIs (see considerations in | |
313 | Section~\ref{iris_clashes}). | |
314 | Because the IRI does not contain any unescaped marking strings, | |
315 | it cannot collide with any IRI of another type and thus is indeed unique. | |
316 | ||
317 | The proof for Identifier map IRI patterns, Attribute map OWL property IRIs, | |
318 | Relation map OWL property IRIs and Subtype map IRI prefixes is exactly | |
319 | analog. | |
320 | ||
321 | \hfill $\Box$ |