[u/philim/db2osl_thesis.git] / impl_code.tex

\section{Code style}
\label{code}
TODO: Conventions, ex.: iterators\\
As the final system hopefully will have a long lifetime cycle
and will be used and refined by many people, high code quality was an important aim.
Beyond architectural issues this also involves cleanness on the lower level,
like the design of classes and the implementation of methods.
Common software development principles were followed and
the unfamiliar reader was constantly taken into account
to yield clean, readable and extensible code.

%Scarce, informative comments were inserted at places of higher complexity and to
%expose logical subdivisions, but care was taken to use ``speaking code'' in favor
%of rampant comments.
%Complex, misleading and hard-to-use interfaces were avoided wherever possible.
%External software libraries employed were chosen to be stable, specific,
%well structured, publicly available and ideally in wide use.

\subsection{Comments}
\label{comments}
Comments were used at places ambiguities or misinterpretations could arise,
yet care was taken to face such problems at their roots and solve them
wherever possible instead of just effacing the ambiguity with comments.
This approach is further explained in Section~\fullref{speaking} and
rendered many uses of comments unnecessary.

In fact, the number of (plain, e.g. non-\name{Javadoc}) comments was
consciously minimized, to enforce speaking code and avoid redundancy.
An exception from this was the highlighting of subdivisions.
In class and method implementations, comments like
\codepar{//********************** Constructors **********************\textbackslash\textbackslash}

were deliberately used to ease navigation inside source files,
but also to enhance readability: parts of method
implementations, for example, were optically separated this way.
Another alternative would have been to use separate methods for these code
pieces, and thereby sticking strictly to the so-called
``Composed Method Pattern'' \cite{composed},
as was done in other cases.
However, sticking to this pattern too rigidly would have introduced additional
artifacts with either long or non-speaking names,
would have interrupted the reading flow and also would have increased complexity,
because these methods would have been callable at least from everywhere
in the source file.
Consequently, having longer methods at some places that are optically separated
into smaller units that are in fact independent from each other was considered
an elegant solution, although, surprisingly, this technique does not seem to be
proposed that often in the literature.

Wherever possible, the appropriate \name{Javadoc} comments were used in favor of
plain comments, for example to specify parameters, return types, exceptions
and links to other parts of the documentation.
This proved even more useful due to the fact that \name{Doxygen} supports all
of the used \name{Javadoc} comments \cite{doxygen} (but not vice versa \cite{javadoc}).

\subsection{``Speaking code''}
\label{speaking}
As mentioned in Section~\fullref{comments}, the code was tried to be designed to
``speak for itself'' as much as possible instead of making its readers depend on
comments that provide an understanding.
In doing so, besides reducing code size due to the missing comments,
clean code amenable to unfamiliar readers and unpredictable changes was enforced.
This is especially important since, as described in Section~\fullref{arch},
\myprog{} was designed to not only be a standalone program but also offer
components suitable for reusability.

TODO: understandability <- code size

The following topics were identified to be addressed to get what can be
conceived as ``speaking code'':
\begin{itemize}
	\item Meaningful typing
	\item Method names
	\item Variable names
	\item Intuitive control flow
	\item Limited nesting
	\item Usage of well-known structures
\end{itemize}

~\\The rest of this section describes these topics in some detail.
Besides, an intuitive architecture and suitable, well-designed libraries
also contributed to the clarity of the code (TODO: move).

\subsubsection{Meaningful typing}
Meaningful typing includes the direct mapping of entities of the modeled world
to code entities \cite{str4} as well as an expressive naming scheme
for the obtained types.
Furthermore, inheritance should be used to express commonalities, to avoid
code duplication and to separate implementations from interfaces \cite{str4}.

All real-world artifacts to be modeled like database schemata, tables, table schemata.
columns, keys and OBDA specifications with their certain map types were directly
translated into classes having simple predicting names like \code{Table},
\code{TableSchema} and \code{Key}.
Package affiliation provided the correct context to unambiguously understand these names.

\subsubsection{Method names}
Assigning expressive names to methods is a substantially important part of
producing speaking code, since methods encapsulate operation and as such
are important ``building blocks'' for other methods \cite{str4} and ultimately
the whole program.
Furthermore, method names often occur in interfaces and therefore are not limited
to a local scope, and neither are easily changeable without affecting callers
\cite{java}.

Ultimately, care was taken that method names reflect all important aspects
of the respective method's behavior.
Consider the following method from \file{CLIDatabaseInteraction.java}:
\codepar{public static void promptAbortRetrieveDBSchemaAndWait\\
	\ind{}(final FutureTask<DBSchema> retriever) throws SQLException}

It could have been called \code{promptAbortRetrieveDBSchema} only, with the
waiting mentioned in a comment.
However, the waiting (blocking) is such an important part of its behavior, that this
was considered not enough, so the waiting was included in the function name.
Since the method is called at one place only, the lengthening of the method
name by 7 characters or about 26 \% is really not a problem.

\subsubsection{Variable names}
To keep implementation code readable, care was taken to name variables
meaningful yet concise. If this was not possible, expressiveness was preferred
over conciseness.

For example, in the implementation of the database schema retrieval,
variables containing data directly obtained from querying the database
and thus being subject to further processing was consequently prefixed
with ``\code{recvd}'', although in most cases this technically would not have
been necessary.

\subsubsection{Intuitive control flow}
To consequently stick to the maxim of speaking code and further increase readability,
control flow was tried to kept intuitive.
\code{do-while} loops, for example, are unintuitive: they complicate matters due to
the additional, unconditional, loop their reader has to keep in mind.
Even worse, \name{Java}'s Syntax delays the occurrence of their most important
control statement -- the loop condition -- till after the loop body.
Usually, \code{do-while} loops can be circumvented by properly setting variables
influencing the loop condition immediately before the loop and using a \code{while}
loop.
Consequently, \code{do-while} loops were omitted -- the code of \myprog{} does not
contain a single \code{do-while} loop.
TODO: references

Another counterproductive technique is the avoidance of the advanced loop control
statements \code{break}, \code{continue} and \code{return} and the sole direction
of a loop's control flow with its loop condition, often drawing on additional
boolean variables like \code{loopDone} or \code{loopContinued}.
This approach is an essential part of the ``structured programming (paradigm)''
\cite{struc} and its purpose is to enforce that a loop is always
left regularly, by unsuccessfully checking the loop condition, which shall ease
code verification \cite{struc}.
A related topic is the general avoidance of the \code{return} statement (except at
the end of a method) for similar considerations \cite{struc}.
However, both are not needed \cite{clean} and, as always, the introduction
of artificial technical constructs impairs readability and the ability of the code
to ``speak for itself''.

Consequently, control flow was not distorted for technical considerations
and care was taken to yield straight-forward loops, utilizing advanced control
statements to be concise and intuitive and cleverly designed methods that benefit
from well-placed \code{return} statements.

\subsubsection{Limited nesting}
A topic related to intuitive control flow is limited code nesting.
Most introductions of new nesting levels greatly increase complexity,
since the associated conditions for the respective code to be reached
combine with the previous ones in often inscrutable ways.
Besides being aware of the execution condition for the code he is currently
reading, the reader is forced to either remember the sub-conditions introduced
with each nesting level, as well as the current nesting level,
or to jump back to the introduction of one or more nestings to figure out the
relevant execution condition again.

Naturally, such code is far from being readable and expressive.
Thus, overly deep nesting was avoided by rearranging code or using control
statements like \code{return} in favor of opening a new \code{if} block.
The deepest and most complicated nesting in \myprog{} has level $5$
(with normal, non-nested method code having level $0$), with
one of these nestings being dedicated to a big enclosing \code{while} loop,
one to a \code{try-catch} block and the remaining three to \code{if} blocks
with no \code{else} parts and trivial one-expression conditions.
Additionally, in this case all of the nesting blocks only contained
a few lines of code, making the whole construction easily fit on one screen,
so this was considered all right.
At a few other places there occurs similar, less complicated, nesting up to level $5$.
%These were similar to the above but with two enclosing loops and one or
%even two \code{try-catch} blocks.
TODO: references

\subsubsection{Usage of well-known structures}
Great benefit can be taken from constructs familiar to programmers
regarding expressiveness.
Surely, implementations based on such well-known constructs and patterns
are much more likely to be instantly understood by programmers and therefore
have a much higher ability of ``speaking for themselves''.

Examples in \myprog{} are the (extensively used) iterator concept,
const correctness (see Paragraph~``\nameref{const}''
in Section~\fullref{code_classes} TODO), exceptions, predicates \cite{str4},
run-time type information \cite{str4}, helper functions \cite{str4}
and well-known interfaces from the \name{Java} API like \code{Set} or
\code{Collection}, as well as common \name{Java} constructs, like
classes performing a single action (e.g. \code{OSLSpecPrinter}), and
naming schemes, like \code{get...}/\code{set...}/\code{is...}.

\subsection{Robustness against incorrect use}
Care was taken to produce code that is geared to incorrect use, making it
suitable for the expected environment of sporadic updates by unfamiliar and
potentially even unpracticed programmers, who besides have their emphasis
on the concepts of bootstrapping rather than details of the present code anyway.
In fact, carefully avoiding the introduction of technical artifacts to mind,
preventing programmers from focusing on the actual program logic,
is an important principle of writing clean code \cite{str4}.

In modern object-oriented programming languages, of course the main instruments
for achieving this are the type system and exceptions.
In particular, static type information should be used to reflect data
abstraction and the ``kind'' of data, an object reflects,
while dynamic type information should only be used implicitly,
through dynamically dispatching method invocations \cite{str3}.
Exceptions on the other hand should be used at any place related to errors
and error handling, separating error handling noticeably from other code and
enforcing the treatment of errors \cite{str4}, preventing the programmer from using
corrupted information in many cases.

An example of both mechanisms, static type information and exceptions, acting
in combination, while cleanly fitting into the context of dynamic dispatching,
are the following methods from \file{Column.java}:
\codepar{public Boolean isNonNull()\\public Boolean isUnique()}

Their return type is the \name{Java} class \code{Boolean}, not the plain type
\code{boolean}, because the information they return is not always known.
In an early stage of the program, they returned \code{boolean} and were
accompanied by two methods
\code{public boolean knownIsNonNull()} and \code{public boolean knownIsUnique()},
telling the caller whether the respective information was known and thus the
value returned by \code{isNonNull()} or \code{isUnique()}, respectively,
was reliable.

They were then changed to return the \name{Java} class \code{Boolean} and to return
null pointers in case the respective information is not known.
This eliminated any possibility of using unreliable data in favor of generating
exceptions instead, in this case a \code{NullPointerException}, which is thrown
automatically by the \name{Java} Runtime Environment \cite{java} if the programmer
forgets the null check and tries to get a definite value from one of these methods
when the correct value currently is not known.

Comparing two unknown values -- thus, two null pointers --
also yields the desired result, \code{true}, since the change,
even when the programmer forgets that he deals with objects.
However, when comparing two return values of one of the methods in general
-- as opposed to comparing one such return value against a constant --,
errors could occur if the programmer mistakenly writes \code{col1.isUnique() == col2.isUnique()}
instead of \code{col1.isUnique().booleanValue() == col2.isUnique().booleanValue()}.
In this case, since the two \code{Boolean} objects are compared for identity \cite{java},
the former comparison can return \code{false}, even when the two boolean values are in fact
the same.
However, since this case was considered much less common than cases in which the other
solution could make incautious programmers introduce subtle errors, it was preferred.
Besides, wrapper classes like \code{Boolean}, \code{Integer}, \code{Long}
and \code{Float} are an integral part of the \name{Java} language \cite{java},
so \name{Java} programmers were expected to manage to use them properly, so
ultimately, since the new solution effectively prevents errors while
abstaining from introducing new artifacts, it was considered fair and clean.

TODO: summary

\subsection{Use of classes}
\label{code_classes}
Following the object-oriented programming paradigm \cite{obj}, classes were heavily used
to abstract from implementation details and to yield intuitively usable objects with
a set of useful operations.

\subsubsection{Identification of classes}
To identify potential classes, entities from the problem domain were -- if reasonable --
directly represented as \name{Java} classes.
The approach of choosing ``the program that most directly models the aspects of the
real world that we are interested in'' to yield clean code,
as described and recommended by Stroustrup \cite{str3}, proved to be extremely useful
and effective.
As a consequence, the code declares classes like \code{Column}, \code{ColumnSet},
\code{ForeignKey}, \code{Table}, \code{TableSchema} and \code{SQLType}.
As described in Section~\fullref{speaking}, class names were chosen to be concise
but nevertheless expressive.
\name{Java} packages were used to help attain this aim,
which is why the previously mentioned class names are unambiguous.
For details about package use, see Section~\fullref{code_packages}.

Care was taken not to introduce unnecessary classes, thereby complicating
code structure and increasing the number of source files and program entities.
Especially artificial classes, having little or no reference to real-world
objects, could most often be avoided.
On the other hand of course, it usually is not the cleanest solution
to avoid such artificial classes entirely.

Section \fullref{hierarchies} describes how the classes of \myprog{} are organized
into class hierarchies.

\subsubsection{Const correctness}
\label{const}
Specifying in the code which objects may be altered and which shall remain constant,
thus allowing for additional static checks preventing undesired modifications,
is commonly referred to as ``const correctness'' TODO.
TODO: powerful, preventing errors, clarity

Unfortunately, \name{Java} lacks a keyword like \name{C++}'s \code{const},
making it harder to achieve const correctness \cite{final}.
It only specifies the similar keyword \code{final}, which is much less expressive and
doesn't allow for a similarly effective error prevention \cite{final}.
In particular, because \code{final} is not part of an object's type information,
it is not possible to declare methods that return read-only objects \cite{final} --
placing a \code{final} before the method's return type would declare the
method \code{final} \cite{java}.
Similarly, there is no way to express that a method must not change
the state of its object parameters. A method like \code{public f(final Object obj)}
is only liable to not assigning a new value to its parameter object \code{obj} \cite{java}
(which, if allowed, wouldn't affect the caller anyway \cite{java}).
Methods changing its state, on the other hand,
are allowed to be called on \code{obj} without restrictions.

Several possibilities were considered to address this problem:
\begin{itemize}
	\item Not implementing const correctness, but stating the access rules in
	comments only
	\item Not implementing const correctness, but giving the methods which modify
	object states special names like
	\code{setName\textendash\textendash USE\_WITH\_CARE}
	\item Implementing const correctness by delegating changes of objects
	to special ``editor'' objects to be
	obtained when an object shall be modified
	\item Implementing const correctness by deriving classes offering
	the modifying methods from read-only classes
\end{itemize}

Not implementing const correctness at all of course would have been the simplest
possibility, producing the shortest and most readable code, but since
incautious manipulation of objects would possibly have introduced subtle,
hard-to-spot errors which in many cases would have occurred under additional
conditions only and at other places, for example when inserting a \code{Column}
into a \code{ColumnSet}, this method was not seriously considered.

Not implementing const correctness but using intentionally angular,
conspicuous names also was not considered seriously,
since it would have cluttered the code for the only sake of hopefully warning
programmers of possible errors -- and not attempting to avoid them technically.

So the introduction of new classes was considered the most effective and cleanest
solution, either in the form of ``editor'' classes or derived classes offering the
modifying methods directly. Again -- as during the identification of classes --,
the most direct solution was considered the best, so the latter form of introducing
additional classes was chosen and classes like \code{ReadableColumn},
\code{ReadableColumnSet} et cetera were introduced which offer only the read-only
functionality and usually occur in interfaces.
Their counterparts including modifying methods also were derived from them and the
implications of modifications were explained in their documentation, while the
issue and the approach as such were also mentioned in the documentation of the
\code{Readable...} classes.
The \code{Readable...} classes can be converted to their fully-functional
counterparts via downcasting (only), thereby giving a strong hint to
programmers that the resulting objects are to be used with care.

\subsubsection{Java interfaces}
\label{code_interfaces}
In \name{Java} programming, it is quiet common and often recommended \cite{gof}
that every class has at least one \code{interface} it \code{implements},
specifying the operations the class provides.
If no obvious \code{interface} exists for a class or the desired
interface name is already given to some other entity,
the interface is often given names like \code{ITableSchema}
or \code{TableSchemaInterface}.

However, for a special purpose program with a relatively fixed set of classes
mostly representing real-world artifacts from the problem domain,
this approach was considered overly cluttering, introducing artificial
code entities for no benefit.
In particular, as explained in Section~\fullref{fine}, all program classes either
are standing alone or belong to a class hierarchy derived from at least one
interface.
So, except from the standalone classes, an interface existed anyway, either
``naturally'' (as in the case of \code{Key}, for example) or because of
the chosen way to implement const correctness.
In some cases, these were interfaces declared in the program code, while
in some cases, \name{Java} interfaces like \code{Set} were implemented
(an obvious choice, of course, for \code{ColumnSet}).
Introducing artificial interfaces for the standalone classes was considered
unnecessary at least, if not messy.

\subsection{Use of packages}
\label{code_packages}
As mentioned in Section~\fullref{code_classes}, class names were chosen to be
concise but nevertheless expressive.
This only was possible through the use of \name{Java} \code{package}s,
which also helped structure the program.

For the current, relatively limited, extent of the program which currently
comprises $45$ (\code{public}) classes, a flat package structure was
considered ideal, because it is simple and doesn't stash source files deep
in subdirectories (in \name{Java}, the directory structure of the source tree
is required to reflect the package structure \cite{java}).
Because also every class belongs to a package,
each source file is to be found exactly one directory below the root
program source directory, which in many cases eases their handling.

For the description of the packages, their interaction and considerations on
their structuring, see Section~\fullref{coarse}.
For a detailed package description, refer to Appendix TODO.

Each package is documented in the source code also, namely in a file
\file{package-info.java} residing in the respective package directory.
This is a common scheme supported by the \name{Eclipse} IDE as well as the
documentation generation systems \name{Javadoc} and \name{Doxygen}
(all of which were used in the creation of the program,
as described in Section~\fullref{tools}).