]> git.uio.no Git - u/philim/db2osl_thesis.git/blame - impl_code.tex
Major change
[u/philim/db2osl_thesis.git] / impl_code.tex
CommitLineData
c31df1ed
PM
1\section{Code style}
2\label{code}
002fa020
PM
3TODO: Conventions, ex.: iterators\\
4As the final system hopefully will have a long lifetime cycle
c31df1ed
PM
5and will be used and refined by many people, high code quality was an important aim.
6Beyond architectural issues this also involves cleanness on the lower level,
7like the design of classes and the implementation of methods.
002fa020 8Common software development principles were followed and
c31df1ed 9the unfamiliar reader was constantly taken into account
62fe6284 10to yield clean, readable and extensible code.
c31df1ed 11
002fa020
PM
12%Scarce, informative comments were inserted at places of higher complexity and to
13%expose logical subdivisions, but care was taken to use ``speaking code'' in favor
14%of rampant comments.
15%Complex, misleading and hard-to-use interfaces were avoided wherever possible.
16%External software libraries employed were chosen to be stable, specific,
17%well structured, publicly available and ideally in wide use.
18
c31df1ed
PM
19\subsection{Comments}
20\label{comments}
21Comments were used at places ambiguities or misinterpretations could arise,
22yet care was taken to face such problems at their roots and solve them
62fe6284 23wherever possible instead of just effacing the ambiguity with comments.
b96bb723 24This approach is further explained in Section~\fullref{speaking} and
62fe6284 25rendered many uses of comments unnecessary.
c31df1ed 26
26717a83
PM
27In fact, the number of (plain, e.g. non-\name{Javadoc}) comments was
28consciously minimized, to enforce speaking code and avoid redundancy.
62fe6284 29An exception from this was the highlighting of subdivisions.
c31df1ed 30In class and method implementations, comments like
f779b639 31\codepar{//********************** Constructors **********************\textbackslash\textbackslash}
c31df1ed 32
62fe6284
PM
33were deliberately used to ease navigation inside source files,
34but also to enhance readability: parts of method
c31df1ed 35implementations, for example, were optically separated this way.
62fe6284
PM
36Another alternative would have been to use separate methods for these code
37pieces, and thereby sticking strictly to the so-called
38``Composed Method Pattern'' \cite{composed},
39as was done in other cases.
26717a83
PM
40However, sticking to this pattern too rigidly would have introduced additional
41artifacts with either long or non-speaking names,
f779b639 42would have interrupted the reading flow and also would have increased complexity,
26717a83
PM
43because these methods would have been callable at least from everywhere
44in the source file.
62fe6284
PM
45Consequently, having longer methods at some places that are optically separated
46into smaller units that are in fact independent from each other was considered
47an elegant solution, although, surprisingly, this technique does not seem to be
48proposed that often in the literature.
c31df1ed 49
28b54c67 50Wherever possible, the appropriate \name{Javadoc} comments were used in favor of
c31df1ed
PM
51plain comments, for example to specify parameters, return types, exceptions
52and links to other parts of the documentation.
62fe6284 53This proved even more useful due to the fact that \name{Doxygen} supports all
002fa020 54of the used \name{Javadoc} comments \cite{doxygen} (but not vice versa \cite{javadoc}).
c31df1ed 55
62fe6284 56\subsection{``Speaking code''}
c31df1ed 57\label{speaking}
b96bb723 58As mentioned in Section~\fullref{comments}, the code was tried to be designed to
62fe6284
PM
59``speak for itself'' as much as possible instead of making its readers depend on
60comments that provide an understanding.
61In doing so, besides reducing code size due to the missing comments,
62clean code amenable to unfamiliar readers and unpredictable changes was enforced.
b96bb723 63This is especially important since, as described in Section~\fullref{arch},
62fe6284
PM
64\myprog{} was designed to not only be a standalone program but also offer
65components suitable for reusability.
002fa020
PM
66
67TODO: understandability <- code size
62fe6284
PM
68
69The following topics were identified to be addressed to get what can be
70conceived as ``speaking code'':
c31df1ed 71\begin{itemize}
62fe6284
PM
72 \item Meaningful typing
73 \item Method names
c31df1ed 74 \item Variable names
62fe6284
PM
75 \item Intuitive control flow
76 \item Limited nesting
62fe6284 77 \item Usage of well-known structures
c31df1ed
PM
78\end{itemize}
79
62fe6284
PM
80~\\The rest of this section describes these topics in some detail.
81Besides, an intuitive architecture and suitable, well-designed libraries
b96bb723 82also contributed to the clarity of the code (TODO: move).
62fe6284
PM
83
84\subsubsection{Meaningful typing}
85Meaningful typing includes the direct mapping of entities of the modeled world
86to code entities \cite{str4} as well as an expressive naming scheme
87for the obtained types.
88Furthermore, inheritance should be used to express commonalities, to avoid
89code duplication and to separate implementations from interfaces \cite{str4}.
90
91All real-world artifacts to be modeled like database schemata, tables, table schemata.
002fa020 92columns, keys and OBDA specifications with their certain map types were directly
62fe6284
PM
93translated into classes having simple predicting names like \code{Table},
94\code{TableSchema} and \code{Key}.
95Package affiliation provided the correct context to unambiguously understand these names.
96
97\subsubsection{Method names}
98Assigning expressive names to methods is a substantially important part of
99producing speaking code, since methods encapsulate operation and as such
100are important ``building blocks'' for other methods \cite{str4} and ultimately
101the whole program.
102Furthermore, method names often occur in interfaces and therefore are not limited
103to a local scope, and neither are easily changeable without affecting callers
104\cite{java}.
105
002fa020
PM
106Ultimately, care was taken that method names reflect all important aspects
107of the respective method's behavior.
108Consider the following method from \file{CLIDatabaseInteraction.java}:
62fe6284 109\codepar{public static void promptAbortRetrieveDBSchemaAndWait\\
45d598e9 110 \ind{}(final FutureTask<DBSchema> retriever) throws SQLException}
62fe6284
PM
111
112It could have been called \code{promptAbortRetrieveDBSchema} only, with the
113waiting mentioned in a comment.
002fa020
PM
114However, the waiting (blocking) is such an important part of its behavior, that this
115was considered not enough, so the waiting was included in the function name.
62fe6284
PM
116Since the method is called at one place only, the lengthening of the method
117name by 7 characters or about 26 \% is really not a problem.
118
c31df1ed 119\subsubsection{Variable names}
62fe6284
PM
120To keep implementation code readable, care was taken to name variables
121meaningful yet concise. If this was not possible, expressiveness was preferred
122over conciseness.
123
124For example, in the implementation of the database schema retrieval,
125variables containing data directly obtained from querying the database
126and thus being subject to further processing was consequently prefixed
127with ``\code{recvd}'', although in most cases this technically would not have
128been necessary.
129
130\subsubsection{Intuitive control flow}
002fa020
PM
131To consequently stick to the maxim of speaking code and further increase readability,
132control flow was tried to kept intuitive.
133\code{do-while} loops, for example, are unintuitive: they complicate matters due to
134the additional, unconditional, loop their reader has to keep in mind.
135Even worse, \name{Java}'s Syntax delays the occurrence of their most important
136control statement -- the loop condition -- till after the loop body.
137Usually, \code{do-while} loops can be circumvented by properly setting variables
138influencing the loop condition immediately before the loop and using a \code{while}
139loop.
140Consequently, \code{do-while} loops were omitted -- the code of \myprog{} does not
141contain a single \code{do-while} loop.
142TODO: references
143
144Another counterproductive technique is the avoidance of the advanced loop control
145statements \code{break}, \code{continue} and \code{return} and the sole direction
146of a loop's control flow with its loop condition, often drawing on additional
147boolean variables like \code{loopDone} or \code{loopContinued}.
148This approach is an essential part of the ``structured programming (paradigm)''
149\cite{struc} and its purpose is to enforce that a loop is always
150left regularly, by unsuccessfully checking the loop condition, which shall ease
151code verification \cite{struc}.
152A related topic is the general avoidance of the \code{return} statement (except at
153the end of a method) for similar considerations \cite{struc}.
154However, both are not needed \cite{clean} and, as always, the introduction
155of artificial technical constructs impairs readability and the ability of the code
156to ``speak for itself''.
157
158Consequently, control flow was not distorted for technical considerations
159and care was taken to yield straight-forward loops, utilizing advanced control
160statements to be concise and intuitive and cleverly designed methods that benefit
161from well-placed \code{return} statements.
62fe6284
PM
162
163\subsubsection{Limited nesting}
002fa020
PM
164A topic related to intuitive control flow is limited code nesting.
165Most introductions of new nesting levels greatly increase complexity,
166since the associated conditions for the respective code to be reached
167combine with the previous ones in often inscrutable ways.
168Besides being aware of the execution condition for the code he is currently
169reading, the reader is forced to either remember the sub-conditions introduced
170with each nesting level, as well as the current nesting level,
171or to jump back to the introduction of one or more nestings to figure out the
172relevant execution condition again.
173
174Naturally, such code is far from being readable and expressive.
175Thus, overly deep nesting was avoided by rearranging code or using control
176statements like \code{return} in favor of opening a new \code{if} block.
177The deepest and most complicated nesting in \myprog{} has level $5$
178(with normal, non-nested method code having level $0$), with
179one of these nestings being dedicated to a big enclosing \code{while} loop,
180one to a \code{try-catch} block and the remaining three to \code{if} blocks
181with no \code{else} parts and trivial one-expression conditions.
182Additionally, in this case all of the nesting blocks only contained
183a few lines of code, making the whole construction easily fit on one screen,
184so this was considered all right.
185At a few other places there occurs similar, less complicated, nesting up to level $5$.
186%These were similar to the above but with two enclosing loops and one or
187%even two \code{try-catch} blocks.
188TODO: references
62fe6284
PM
189
190\subsubsection{Usage of well-known structures}
002fa020
PM
191Great benefit can be taken from constructs familiar to programmers
192regarding expressiveness.
193Surely, implementations based on such well-known constructs and patterns
194are much more likely to be instantly understood by programmers and therefore
195have a much higher ability of ``speaking for themselves''.
196
197Examples in \myprog{} are the (extensively used) iterator concept,
b96bb723
PM
198const correctness (see Paragraph~``\nameref{const}''
199in Section~\fullref{code_classes} TODO), exceptions, predicates \cite{str4},
002fa020
PM
200run-time type information \cite{str4}, helper functions \cite{str4}
201and well-known interfaces from the \name{Java} API like \code{Set} or
202\code{Collection}, as well as common \name{Java} constructs, like
203classes performing a single action (e.g. \code{OSLSpecPrinter}), and
204naming schemes, like \code{get...}/\code{set...}/\code{is...}.
c31df1ed
PM
205
206\subsection{Robustness against incorrect use}
002fa020 207Care was taken to produce code that is geared to incorrect use, making it
c31df1ed 208suitable for the expected environment of sporadic updates by unfamiliar and
62fe6284
PM
209potentially even unpracticed programmers, who besides have their emphasis
210on the concepts of bootstrapping rather than details of the present code anyway.
c31df1ed
PM
211In fact, carefully avoiding the introduction of technical artifacts to mind,
212preventing programmers from focusing on the actual program logic,
26717a83 213is an important principle of writing clean code \cite{str4}.
c31df1ed 214
62fe6284
PM
215In modern object-oriented programming languages, of course the main instruments
216for achieving this are the type system and exceptions.
c31df1ed
PM
217In particular, static type information should be used to reflect data
218abstraction and the ``kind'' of data, an object reflects,
219while dynamic type information should only be used implicitly,
62fe6284 220through dynamically dispatching method invocations \cite{str3}.
c31df1ed
PM
221Exceptions on the other hand should be used at any place related to errors
222and error handling, separating error handling noticeably from other code and
62fe6284 223enforcing the treatment of errors \cite{str4}, preventing the programmer from using
c31df1ed
PM
224corrupted information in many cases.
225
62fe6284 226An example of both mechanisms, static type information and exceptions, acting
c31df1ed
PM
227in combination, while cleanly fitting into the context of dynamic dispatching,
228are the following methods from \file{Column.java}:
229\codepar{public Boolean isNonNull()\\public Boolean isUnique()}
230
002fa020 231Their return type is the \name{Java} class \code{Boolean}, not the plain type
c31df1ed
PM
232\code{boolean}, because the information they return is not always known.
233In an early stage of the program, they returned \code{boolean} and were
26717a83 234accompanied by two methods
c31df1ed
PM
235\code{public boolean knownIsNonNull()} and \code{public boolean knownIsUnique()},
236telling the caller whether the respective information was known and thus the
237value returned by \code{isNonNull()} or \code{isUnique()}, respectively,
238was reliable.
239
28b54c67 240They were then changed to return the \name{Java} class \code{Boolean} and to return
c31df1ed
PM
241null pointers in case the respective information is not known.
242This eliminated any possibility of using unreliable data in favor of generating
243exceptions instead, in this case a \code{NullPointerException}, which is thrown
002fa020
PM
244automatically by the \name{Java} Runtime Environment \cite{java} if the programmer
245forgets the null check and tries to get a definite value from one of these methods
c31df1ed
PM
246when the correct value currently is not known.
247
248Comparing two unknown values -- thus, two null pointers --
249also yields the desired result, \code{true}, since the change,
250even when the programmer forgets that he deals with objects.
251However, when comparing two return values of one of the methods in general
252-- as opposed to comparing one such return value against a constant --,
26717a83 253errors could occur if the programmer mistakenly writes \code{col1.isUnique() == col2.isUnique()}
c31df1ed 254instead of \code{col1.isUnique().booleanValue() == col2.isUnique().booleanValue()}.
26717a83
PM
255In this case, since the two \code{Boolean} objects are compared for identity \cite{java},
256the former comparison can return \code{false}, even when the two boolean values are in fact
257the same.
258However, since this case was considered much less common than cases in which the other
b96bb723
PM
259solution could make incautious programmers introduce subtle errors, it was preferred.
260Besides, wrapper classes like \code{Boolean}, \code{Integer}, \code{Long}
261and \code{Float} are an integral part of the \name{Java} language \cite{java},
262so \name{Java} programmers were expected to manage to use them properly, so
263ultimately, since the new solution effectively prevents errors while
264abstaining from introducing new artifacts, it was considered fair and clean.
c31df1ed 265
62fe6284 266TODO: summary
c31df1ed 267
002fa020 268\subsection{Use of classes}
26717a83 269\label{code_classes}
002fa020 270Following the object-oriented programming paradigm \cite{obj}, classes were heavily used
c31df1ed 271to abstract from implementation details and to yield intuitively usable objects with
002fa020 272a set of useful operations.
c31df1ed
PM
273
274\subsubsection{Identification of classes}
275To identify potential classes, entities from the problem domain were -- if reasonable --
28b54c67 276directly represented as \name{Java} classes.
c31df1ed
PM
277The approach of choosing ``the program that most directly models the aspects of the
278real world that we are interested in'' to yield clean code,
279as described and recommended by Stroustrup \cite{str3}, proved to be extremely useful
280and effective.
281As a consequence, the code declares classes like \code{Column}, \code{ColumnSet},
282\code{ForeignKey}, \code{Table}, \code{TableSchema} and \code{SQLType}.
b96bb723 283As described in Section~\fullref{speaking}, class names were chosen to be concise
002fa020 284but nevertheless expressive.
28b54c67 285\name{Java} packages were used to help attain this aim,
62fe6284 286which is why the previously mentioned class names are unambiguous.
b96bb723 287For details about package use, see Section~\fullref{code_packages}.
c31df1ed
PM
288
289Care was taken not to introduce unnecessary classes, thereby complicating
290code structure and increasing the number of source files and program entities.
291Especially artificial classes, having little or no reference to real-world
292objects, could most often be avoided.
293On the other hand of course, it usually is not the cleanest solution
294to avoid such artificial classes entirely.
295
26717a83
PM
296Section \fullref{hierarchies} describes how the classes of \myprog{} are organized
297into class hierarchies.
298
c31df1ed 299\subsubsection{Const correctness}
26717a83 300\label{const}
28b54c67
PM
301Specifying in the code which objects may be altered and which shall remain constant,
302thus allowing for additional static checks preventing undesired modifications,
c31df1ed 303is commonly referred to as ``const correctness'' TODO.
62fe6284 304TODO: powerful, preventing errors, clarity
c31df1ed 305
28b54c67 306Unfortunately, \name{Java} lacks a keyword like \name{C++}'s \code{const},
26717a83 307making it harder to achieve const correctness \cite{final}.
c31df1ed
PM
308It only specifies the similar keyword \code{final}, which is much less expressive and
309doesn't allow for a similarly effective error prevention \cite{final}.
310In particular, because \code{final} is not part of an object's type information,
002fa020 311it is not possible to declare methods that return read-only objects \cite{final} --
c31df1ed 312placing a \code{final} before the method's return type would declare the
002fa020
PM
313method \code{final} \cite{java}.
314Similarly, there is no way to express that a method must not change
c31df1ed
PM
315the state of its object parameters. A method like \code{public f(final Object obj)}
316is only liable to not assigning a new value to its parameter object \code{obj} \cite{java}
317(which, if allowed, wouldn't affect the caller anyway \cite{java}).
002fa020
PM
318Methods changing its state, on the other hand,
319are allowed to be called on \code{obj} without restrictions.
c31df1ed
PM
320
321Several possibilities were considered to address this problem:
322\begin{itemize}
323 \item Not implementing const correctness, but stating the access rules in
324 comments only
002fa020
PM
325 \item Not implementing const correctness, but giving the methods which modify
326 object states special names like
327 \code{setName\textendash\textendash USE\_WITH\_CARE}
328 \item Implementing const correctness by delegating changes of objects
329 to special ``editor'' objects to be
330 obtained when an object shall be modified
331 \item Implementing const correctness by deriving classes offering
332 the modifying methods from read-only classes
c31df1ed
PM
333\end{itemize}
334
335Not implementing const correctness at all of course would have been the simplest
336possibility, producing the shortest and most readable code, but since
337incautious manipulation of objects would possibly have introduced subtle,
338hard-to-spot errors which in many cases would have occurred under additional
339conditions only and at other places, for example when inserting a \code{Column}
340into a \code{ColumnSet}, this method was not seriously considered.
341
002fa020
PM
342Not implementing const correctness but using intentionally angular,
343conspicuous names also was not considered seriously,
c31df1ed
PM
344since it would have cluttered the code for the only sake of hopefully warning
345programmers of possible errors -- and not attempting to avoid them technically.
346
347So the introduction of new classes was considered the most effective and cleanest
348solution, either in the form of ``editor'' classes or derived classes offering the
002fa020 349modifying methods directly. Again -- as during the identification of classes --,
c31df1ed
PM
350the most direct solution was considered the best, so the latter form of introducing
351additional classes was chosen and classes like \code{ReadableColumn},
352\code{ReadableColumnSet} et cetera were introduced which offer only the read-only
353functionality and usually occur in interfaces.
354Their counterparts including modifying methods also were derived from them and the
355implications of modifications were explained in their documentation, while the
356issue and the approach as such were also mentioned in the documentation of the
357\code{Readable...} classes.
358The \code{Readable...} classes can be converted to their fully-functional
359counterparts via downcasting (only), thereby giving a strong hint to
360programmers that the resulting objects are to be used with care.
361
362\subsubsection{Java interfaces}
26717a83 363\label{code_interfaces}
002fa020
PM
364In \name{Java} programming, it is quiet common and often recommended \cite{gof}
365that every class has at least one \code{interface} it \code{implements},
366specifying the operations the class provides.
c31df1ed
PM
367If no obvious \code{interface} exists for a class or the desired
368interface name is already given to some other entity,
369the interface is often given names like \code{ITableSchema}
370or \code{TableSchemaInterface}.
371
372However, for a special purpose program with a relatively fixed set of classes
373mostly representing real-world artifacts from the problem domain,
374this approach was considered overly cluttering, introducing artificial
375code entities for no benefit.
b96bb723 376In particular, as explained in Section~\fullref{fine}, all program classes either
002fa020 377are standing alone or belong to a class hierarchy derived from at least one
c31df1ed
PM
378interface.
379So, except from the standalone classes, an interface existed anyway, either
380``naturally'' (as in the case of \code{Key}, for example) or because of
381the chosen way to implement const correctness.
382In some cases, these were interfaces declared in the program code, while
28b54c67 383in some cases, \name{Java} interfaces like \code{Set} were implemented
c31df1ed
PM
384(an obvious choice, of course, for \code{ColumnSet}).
385Introducing artificial interfaces for the standalone classes was considered
386unnecessary at least, if not messy.
387
002fa020 388\subsection{Use of packages}
26717a83 389\label{code_packages}
b96bb723 390As mentioned in Section~\fullref{code_classes}, class names were chosen to be
c31df1ed 391concise but nevertheless expressive.
28b54c67 392This only was possible through the use of \name{Java} \code{package}s,
c31df1ed
PM
393which also helped structure the program.
394
395For the current, relatively limited, extent of the program which currently
396comprises $45$ (\code{public}) classes, a flat package structure was
397considered ideal, because it is simple and doesn't stash source files deep
28b54c67 398in subdirectories (in \name{Java}, the directory structure of the source tree
002fa020 399is required to reflect the package structure \cite{java}).
c31df1ed
PM
400Because also every class belongs to a package,
401each source file is to be found exactly one directory below the root
402program source directory, which in many cases eases their handling.
403
62fe6284 404For the description of the packages, their interaction and considerations on
b96bb723 405their structuring, see Section~\fullref{coarse}.
62fe6284
PM
406For a detailed package description, refer to Appendix TODO.
407
408Each package is documented in the source code also, namely in a file
c31df1ed
PM
409\file{package-info.java} residing in the respective package directory.
410This is a common scheme supported by the \name{Eclipse} IDE as well as the
002fa020 411documentation generation systems \name{Javadoc} and \name{Doxygen}
c31df1ed 412(all of which were used in the creation of the program,
b96bb723 413as described in Section~\fullref{tools}).