Minor change
[u/philim/db2osl_thesis.git] / program_code.tex
CommitLineData
c31df1ed
PM
1\section{Code style}
2\label{code}
3TODO: Conventions, ex.: iterators
4As the final system hopefully will have a long living cycle TODO
5and will be used and refined by many people, high code quality was an important aim.
6Beyond architectural issues this also involves cleanness on the lower level,
7like the design of classes and the implementation of methods.
8Common software development principles were followed TODO and
9the unfamiliar reader was constantly taken into account
62fe6284 10to yield clean, readable and extensible code.
c31df1ed
PM
11
12\subsection{Comments}
13\label{comments}
14Comments were used at places ambiguities or misinterpretations could arise,
15yet care was taken to face such problems at their roots and solve them
62fe6284
PM
16wherever possible instead of just effacing the ambiguity with comments.
17This approach is further explained in section \fullref{speaking} and
18rendered many uses of comments unnecessary.
c31df1ed 19
26717a83
PM
20In fact, the number of (plain, e.g. non-\name{Javadoc}) comments was
21consciously minimized, to enforce speaking code and avoid redundancy.
62fe6284 22An exception from this was the highlighting of subdivisions.
c31df1ed 23In class and method implementations, comments like
f779b639 24\codepar{//********************** Constructors **********************\textbackslash\textbackslash}
c31df1ed 25
62fe6284
PM
26were deliberately used to ease navigation inside source files,
27but also to enhance readability: parts of method
c31df1ed 28implementations, for example, were optically separated this way.
62fe6284
PM
29Another alternative would have been to use separate methods for these code
30pieces, and thereby sticking strictly to the so-called
31``Composed Method Pattern'' \cite{composed},
32as was done in other cases.
26717a83
PM
33However, sticking to this pattern too rigidly would have introduced additional
34artifacts with either long or non-speaking names,
f779b639 35would have interrupted the reading flow and also would have increased complexity,
26717a83
PM
36because these methods would have been callable at least from everywhere
37in the source file.
62fe6284
PM
38Consequently, having longer methods at some places that are optically separated
39into smaller units that are in fact independent from each other was considered
40an elegant solution, although, surprisingly, this technique does not seem to be
41proposed that often in the literature.
c31df1ed 42
28b54c67 43Wherever possible, the appropriate \name{Javadoc} comments were used in favor of
c31df1ed
PM
44plain comments, for example to specify parameters, return types, exceptions
45and links to other parts of the documentation.
62fe6284
PM
46This proved even more useful due to the fact that \name{Doxygen} supports all
47of the used \name{Javadoc} comments TODO (but not vice versa TODO).
c31df1ed 48
62fe6284 49\subsection{``Speaking code''}
c31df1ed 50\label{speaking}
62fe6284
PM
51As mentioned in section \fullref{comments}, the code was tried to be designed to
52``speak for itself'' as much as possible instead of making its readers depend on
53comments that provide an understanding.
54In doing so, besides reducing code size due to the missing comments,
55clean code amenable to unfamiliar readers and unpredictable changes was enforced.
56This is especially important since, as described in section \fullref{arch},
57\myprog{} was designed to not only be a standalone program but also offer
58components suitable for reusability.
59%TODO: understandability <- code size
60
61The following topics were identified to be addressed to get what can be
62conceived as ``speaking code'':
c31df1ed 63\begin{itemize}
62fe6284
PM
64 \item Meaningful typing
65 \item Method names
c31df1ed 66 \item Variable names
62fe6284
PM
67 \item Intuitive control flow
68 \item Limited nesting
69 \item Compact code units
70 \item Usage of well-known structures
c31df1ed
PM
71\end{itemize}
72
62fe6284
PM
73~\\The rest of this section describes these topics in some detail.
74Besides, an intuitive architecture and suitable, well-designed libraries
75also contribute to the clarity of the code.
76
77\subsubsection{Meaningful typing}
78Meaningful typing includes the direct mapping of entities of the modeled world
79to code entities \cite{str4} as well as an expressive naming scheme
80for the obtained types.
81Furthermore, inheritance should be used to express commonalities, to avoid
82code duplication and to separate implementations from interfaces \cite{str4}.
83
84All real-world artifacts to be modeled like database schemata, tables, table schemata.
85columns, keys and OBDA Specifications with their certain map types were directly
86translated into classes having simple predicting names like \code{Table},
87\code{TableSchema} and \code{Key}.
88Package affiliation provided the correct context to unambiguously understand these names.
89
90\subsubsection{Method names}
91Assigning expressive names to methods is a substantially important part of
92producing speaking code, since methods encapsulate operation and as such
93are important ``building blocks'' for other methods \cite{str4} and ultimately
94the whole program.
95Furthermore, method names often occur in interfaces and therefore are not limited
96to a local scope, and neither are easily changeable without affecting callers
97\cite{java}.
98
99Care was taken that method names reflect all important aspects of the respective
100method's behavior and
101
102Consider the following method in \file{CLIDatabaseInteraction.java}:
103\codepar{public static void promptAbortRetrieveDBSchemaAndWait\\
104 \ind(final FutureTask<DBSchema> retriever) throws SQLException}
105
106It could have been called \code{promptAbortRetrieveDBSchema} only, with the
107waiting mentioned in a comment.
108However, the waiting is such an important part of its behavior, that this
109wouldn't have been enough, so the waiting was included in the function name.
110Since the method is called at one place only, the lengthening of the method
111name by 7 characters or about 26 \% is really not a problem.
112
c31df1ed 113\subsubsection{Variable names}
62fe6284
PM
114To keep implementation code readable, care was taken to name variables
115meaningful yet concise. If this was not possible, expressiveness was preferred
116over conciseness.
117
118For example, in the implementation of the database schema retrieval,
119variables containing data directly obtained from querying the database
120and thus being subject to further processing was consequently prefixed
121with ``\code{recvd}'', although in most cases this technically would not have
122been necessary.
123
124\subsubsection{Intuitive control flow}
125
126\subsubsection{Limited nesting}
127
128\subsubsection{Compact code units}
129
130\subsubsection{Usage of well-known structures}
c31df1ed
PM
131
132\subsection{Robustness against incorrect use}
133Care was taken to produce code that is robust to incorrect use, making it
134suitable for the expected environment of sporadic updates by unfamiliar and
62fe6284
PM
135potentially even unpracticed programmers, who besides have their emphasis
136on the concepts of bootstrapping rather than details of the present code anyway.
c31df1ed
PM
137In fact, carefully avoiding the introduction of technical artifacts to mind,
138preventing programmers from focusing on the actual program logic,
26717a83 139is an important principle of writing clean code \cite{str4}.
c31df1ed 140
62fe6284
PM
141In modern object-oriented programming languages, of course the main instruments
142for achieving this are the type system and exceptions.
c31df1ed
PM
143In particular, static type information should be used to reflect data
144abstraction and the ``kind'' of data, an object reflects,
145while dynamic type information should only be used implicitly,
62fe6284 146through dynamically dispatching method invocations \cite{str3}.
c31df1ed
PM
147Exceptions on the other hand should be used at any place related to errors
148and error handling, separating error handling noticeably from other code and
62fe6284 149enforcing the treatment of errors \cite{str4}, preventing the programmer from using
c31df1ed
PM
150corrupted information in many cases.
151
62fe6284 152An example of both mechanisms, static type information and exceptions, acting
c31df1ed
PM
153in combination, while cleanly fitting into the context of dynamic dispatching,
154are the following methods from \file{Column.java}:
155\codepar{public Boolean isNonNull()\\public Boolean isUnique()}
156
28b54c67 157There return type is the \name{Java} class \code{Boolean}, not the plain type
c31df1ed
PM
158\code{boolean}, because the information they return is not always known.
159In an early stage of the program, they returned \code{boolean} and were
26717a83 160accompanied by two methods
c31df1ed
PM
161\code{public boolean knownIsNonNull()} and \code{public boolean knownIsUnique()},
162telling the caller whether the respective information was known and thus the
163value returned by \code{isNonNull()} or \code{isUnique()}, respectively,
164was reliable.
165
28b54c67 166They were then changed to return the \name{Java} class \code{Boolean} and to return
c31df1ed
PM
167null pointers in case the respective information is not known.
168This eliminated any possibility of using unreliable data in favor of generating
169exceptions instead, in this case a \code{NullPointerException}, which is thrown
28b54c67 170automatically by the \name{Java Runtime Environment} if the programmer forgets the
c31df1ed
PM
171null check and tries to get a definite value from one of these methods
172when the correct value currently is not known.
173
174Comparing two unknown values -- thus, two null pointers --
175also yields the desired result, \code{true}, since the change,
176even when the programmer forgets that he deals with objects.
177However, when comparing two return values of one of the methods in general
178-- as opposed to comparing one such return value against a constant --,
26717a83 179errors could occur if the programmer mistakenly writes \code{col1.isUnique() == col2.isUnique()}
c31df1ed 180instead of \code{col1.isUnique().booleanValue() == col2.isUnique().booleanValue()}.
26717a83
PM
181In this case, since the two \code{Boolean} objects are compared for identity \cite{java},
182the former comparison can return \code{false}, even when the two boolean values are in fact
183the same.
184However, since this case was considered much less common than cases in which the other
185solution could make programmers making mistakes produce undetected errors, it was preferred.
c31df1ed 186
62fe6284 187TODO: summary
c31df1ed
PM
188
189\subsection{Classes}
26717a83 190\label{code_classes}
c31df1ed
PM
191Following the object-oriented programming paradigm, classes were heavily used
192to abstract from implementation details and to yield intuitively usable objects with
193a set of useful operations \cite{obj}.
194
195\subsubsection{Identification of classes}
196To identify potential classes, entities from the problem domain were -- if reasonable --
28b54c67 197directly represented as \name{Java} classes.
c31df1ed
PM
198The approach of choosing ``the program that most directly models the aspects of the
199real world that we are interested in'' to yield clean code,
200as described and recommended by Stroustrup \cite{str3}, proved to be extremely useful
201and effective.
202As a consequence, the code declares classes like \code{Column}, \code{ColumnSet},
203\code{ForeignKey}, \code{Table}, \code{TableSchema} and \code{SQLType}.
204As described in section \fullref{speaking}, class names were chosen to be concise
205but nevertheless expressive TODO.
28b54c67 206\name{Java} packages were used to help attain this aim,
62fe6284
PM
207which is why the previously mentioned class names are unambiguous.
208For details about package use, see section \fullref{code_packages}.
c31df1ed
PM
209
210Care was taken not to introduce unnecessary classes, thereby complicating
211code structure and increasing the number of source files and program entities.
212Especially artificial classes, having little or no reference to real-world
213objects, could most often be avoided.
214On the other hand of course, it usually is not the cleanest solution
215to avoid such artificial classes entirely.
216
26717a83
PM
217Section \fullref{hierarchies} describes how the classes of \myprog{} are organized
218into class hierarchies.
219
c31df1ed 220\subsubsection{Const correctness}
26717a83 221\label{const}
28b54c67
PM
222Specifying in the code which objects may be altered and which shall remain constant,
223thus allowing for additional static checks preventing undesired modifications,
c31df1ed 224is commonly referred to as ``const correctness'' TODO.
62fe6284 225TODO: powerful, preventing errors, clarity
c31df1ed 226
28b54c67 227Unfortunately, \name{Java} lacks a keyword like \name{C++}'s \code{const},
26717a83 228making it harder to achieve const correctness \cite{final}.
c31df1ed
PM
229It only specifies the similar keyword \code{final}, which is much less expressive and
230doesn't allow for a similarly effective error prevention \cite{final}.
231In particular, because \code{final} is not part of an object's type information,
232it is not possible to declare methods that return read-only objects TODO --
233placing a \code{final} before the method's return type would declare the
234method \code{final}. Similarly, there is no way to express that a method must not change
235the state of its object parameters. A method like \code{public f(final Object obj)}
236is only liable to not assigning a new value to its parameter object \code{obj} \cite{java}
237(which, if allowed, wouldn't affect the caller anyway \cite{java}).
238Methods changing its state, however, are allowed to be called on \code{obj} without
239restrictions \cite{java}.
240
241Several possibilities were considered to address this problem:
242\begin{itemize}
243 \item Not implementing const correctness, but stating the access rules in
244 comments only
245 \item Giving the methods which modify object states special names
246 like\\\code{setName\textendash\textendash USE\_WITH\_CARE}
247 \item Delegating changes of objects to special ``editor'' objects to be
28b54c67 248 obtained when an object shall be altered TODO
c31df1ed
PM
249 \item Deriving classes offering the modifying methods from the read-only
250 classes
251\end{itemize}
252
253Not implementing const correctness at all of course would have been the simplest
254possibility, producing the shortest and most readable code, but since
255incautious manipulation of objects would possibly have introduced subtle,
256hard-to-spot errors which in many cases would have occurred under additional
257conditions only and at other places, for example when inserting a \code{Column}
258into a \code{ColumnSet}, this method was not seriously considered.
259
260Using intentionally angular, conspicuous names also was not considered seriously,
261since it would have cluttered the code for the only sake of hopefully warning
262programmers of possible errors -- and not attempting to avoid them technically.
263
264So the introduction of new classes was considered the most effective and cleanest
265solution, either in the form of ``editor'' classes or derived classes offering the
266modifying methods directly. Again -- as in the identification of classes --,
267the most direct solution was considered the best, so the latter form of introducing
268additional classes was chosen and classes like \code{ReadableColumn},
269\code{ReadableColumnSet} et cetera were introduced which offer only the read-only
270functionality and usually occur in interfaces.
271Their counterparts including modifying methods also were derived from them and the
272implications of modifications were explained in their documentation, while the
273issue and the approach as such were also mentioned in the documentation of the
274\code{Readable...} classes.
275The \code{Readable...} classes can be converted to their fully-functional
276counterparts via downcasting (only), thereby giving a strong hint to
277programmers that the resulting objects are to be used with care.
278
279\subsubsection{Java interfaces}
26717a83 280\label{code_interfaces}
28b54c67
PM
281In \name{Java} programming, it is quiet common and often recommended, that every
282class has at least one \code{interface} it \code{implements},
c31df1ed
PM
283specifying the operations the class provides. TODO
284If no obvious \code{interface} exists for a class or the desired
285interface name is already given to some other entity,
286the interface is often given names like \code{ITableSchema}
287or \code{TableSchemaInterface}.
288
289However, for a special purpose program with a relatively fixed set of classes
290mostly representing real-world artifacts from the problem domain,
291this approach was considered overly cluttering, introducing artificial
292code entities for no benefit.
293In particular, as explained in section TODO, all program classes either are
294standing alone TODO or belong to a class hierarchy derived from at least one
295interface.
296So, except from the standalone classes, an interface existed anyway, either
297``naturally'' (as in the case of \code{Key}, for example) or because of
298the chosen way to implement const correctness.
299In some cases, these were interfaces declared in the program code, while
28b54c67 300in some cases, \name{Java} interfaces like \code{Set} were implemented
c31df1ed
PM
301(an obvious choice, of course, for \code{ColumnSet}).
302Introducing artificial interfaces for the standalone classes was considered
303unnecessary at least, if not messy.
304
305\subsection{Packages}
26717a83
PM
306\label{code_packages}
307As mentioned in section \fullref{code_classes}, class names were chosen to be
c31df1ed 308concise but nevertheless expressive.
28b54c67 309This only was possible through the use of \name{Java} \code{package}s,
c31df1ed
PM
310which also helped structure the program.
311
312For the current, relatively limited, extent of the program which currently
313comprises $45$ (\code{public}) classes, a flat package structure was
314considered ideal, because it is simple and doesn't stash source files deep
28b54c67 315in subdirectories (in \name{Java}, the directory structure of the source tree
c31df1ed
PM
316is required to reflect the package structure TODO).
317Because also every class belongs to a package,
318each source file is to be found exactly one directory below the root
319program source directory, which in many cases eases their handling.
320
321The following $11$ packages exist in the program
322(their purpose and more details about the package structure are
28b54c67
PM
323described in section \fullref{coarse}):
324\begin{multicols}{3}\begin{itemize}
c31df1ed
PM
325 \item \code{boostrapping}
326 \item \code{cli}
327 \item \code{database}
328 \item \code{helpers}
329 \item \code{log}
330 \item \code{main}
331 \item \code{osl}
332 \item \code{output}
333 \item \code{settings}
334 \item \code{specification}
335 \item \code{test}
28b54c67 336\end{itemize}\end{multicols}
c31df1ed 337
62fe6284
PM
338For the description of the packages, their interaction and considerations on
339their structuring, see section \fullref{coarse}.
340For a detailed package description, refer to Appendix TODO.
341
342Each package is documented in the source code also, namely in a file
c31df1ed
PM
343\file{package-info.java} residing in the respective package directory.
344This is a common scheme supported by the \name{Eclipse} IDE as well as the
345documentation generation systems \name{javadoc} and \name{doxygen} TODO
346(all of which were used in the creation of the program,
62fe6284 347as described in section \fullref{tools}).