Major change
[u/philim/db2osl_thesis.git] / program_code.tex
1\section{Code style}
3TODO: Conventions, ex.: iterators
4As the final system hopefully will have a long living cycle TODO
5and will be used and refined by many people, high code quality was an important aim.
6Beyond architectural issues this also involves cleanness on the lower level,
7like the design of classes and the implementation of methods.
8Common software development principles were followed TODO and
9the unfamiliar reader was constantly taken into account
10to yield clean, usable and readable code.
14Comments were used at places ambiguities or misinterpretations could arise,
15yet care was taken to face such problems at their roots and solve them
16wherever possible instead of just eliminating the ambiguity with comments.
18Consider the following method in \file{}:
19\codepar{public static void promptAbortRetrieveDBSchemaAndWait\\
20 \ind(final FutureTask<DBSchema> retriever) throws SQLException}
22It could have been called \code{promptAbortRetrieveDBSchema} only, with the
23waiting mentioned in a comment.
24However, the waiting is such an important part of its behavior, that this
25wouldn't have been enough, so the waiting was included in the function name.
26Since the method is called at one place only, the lengthening of the method
27name by 7 characters or about 26 \% is really not a problem.
29More generally, ``speaking code'' was used wherever possible,
30as described in section \fullref{speaking},
31which rendered many uses of comments unnecessary.
32In fact, the number of (plain, e.g. non-Javadoc) comments was consciously minimized,
33to enforce speaking code and avoid redundancy.
34This technique is known TODO.
36An exception of course from this is the highlighting of subdivisions.
37In class and method implementations, comments like
38\codepar{//********************** Constructors **********************TODO}
40were deliberately used to ease navigation inside source files for unfamiliar
41readers, but also to enhance readability: independent parts of method
42implementations, for example, were optically separated this way.
43Another alternative would have been to use separate methods for this code
44pieces, as was done in other cases, but this would then have introduced
45additional artifacts with either long or non-speaking names.
46Additionally, it would have increased complexity, because these methods
47would have been callable at least from everywhere in the source file,
48and would have interrupted the reading flow.
49This technique is known TODO, while TODO
51Wherever possible, appropriate Javadoc comments were used in favor of
52plain comments, for example to specify parameters, return types, exceptions
53and links to other parts of the documentation.
55\subsection{Speaking code}
57As mentioned in section \fullref{comments}, the use of ``speaking code'' as
58introduced TODO
59renders many uses of comments unnecessary.
60In particular, the following aspects are commonly considered when referring to
61the term ``speaking code'' TODO:
64 \item Variable names
65 \item Control flow
68\subsubsection{Variable names}
69A very important part of speaking code
71\subsection{Robustness against incorrect use}
72Care was taken to produce code that is robust to incorrect use, making it
73suitable for the expected environment of sporadic updates by unfamiliar and
74potentially even unpracticed programmers who very likely have their emphasis
75on the concepts of bootstrapping rather than details of the present code.
77In fact, carefully avoiding the introduction of technical artifacts to mind,
78preventing programmers from focusing on the actual program logic,
79is an important principle of writing clean code TODO.
81In modern programming languages, of course the main instruments for achieving
82this are the type system and exceptions.
83In particular, static type information should be used to reflect data
84abstraction and the ``kind'' of data, an object reflects,
85while dynamic type information should only be used implicitly,
86through dynamically dispatching method invocations\cite{str4}.
87Exceptions on the other hand should be used at any place related to errors
88and error handling, separating error handling noticeably from other code and
89enforcing the treatment of errors, preventing the programmer from using
90corrupted information in many cases.
92An example of both mechanism, static type information and exceptions, acting
93in combination, while cleanly fitting into the context of dynamic dispatching,
94are the following methods from \file{}:
95\codepar{public Boolean isNonNull()\\public Boolean isUnique()}
97There return type is the Java class \code{Boolean}, not the plain type
98\code{boolean}, because the information they return is not always known.
99In an early stage of the program, they returned \code{boolean} and were
100accompanied TODO by two methods
101\code{public boolean knownIsNonNull()} and \code{public boolean knownIsUnique()},
102telling the caller whether the respective information was known and thus the
103value returned by \code{isNonNull()} or \code{isUnique()}, respectively,
104was reliable.
106They were then changed to return the Java class \code{Boolean} and to return
107null pointers in case the respective information is not known.
108This eliminated any possibility of using unreliable data in favor of generating
109exceptions instead, in this case a \code{NullPointerException}, which is thrown
110automatically by the Java Runtime Environment if the programmer forgets the
111null check and tries to get a definite value from one of these methods
112when the correct value currently is not known.
114Comparing two unknown values -- thus, two null pointers --
115also yields the desired result, \code{true}, since the change,
116even when the programmer forgets that he deals with objects.
117However, when comparing two return values of one of the methods in general
118-- as opposed to comparing one such return value against a constant --,
119errors could occur if the programmer writes \code{col1.isUnique() == col2.isUnique()}
120instead of \code{col1.isUnique().booleanValue() == col2.isUnique().booleanValue()}.
121\\TODO: Java rules.
123TODO: more, summary
127Following the object-oriented programming paradigm, classes were heavily used
128to abstract from implementation details and to yield intuitively usable objects with
129a set of useful operations \cite{obj}.
131\subsubsection{Identification of classes}
132To identify potential classes, entities from the problem domain were -- if reasonable --
133directly represented as Java classes.
134The approach of choosing ``the program that most directly models the aspects of the
135real world that we are interested in'' to yield clean code,
136as described and recommended by Stroustrup \cite{str3}, proved to be extremely useful
137and effective.
138As a consequence, the code declares classes like \code{Column}, \code{ColumnSet},
139\code{ForeignKey}, \code{Table}, \code{TableSchema} and \code{SQLType}.
140As described in section \fullref{speaking}, class names were chosen to be concise
141but nevertheless expressive TODO.
142Java packages were used to help attain this aim, which is why the previously mentioned
143class names are unambiguous (for details about package use see section \fullref{packages}).
145Care was taken not to introduce unnecessary classes, thereby complicating
146code structure and increasing the number of source files and program entities.
147Especially artificial classes, having little or no reference to real-world
148objects, could most often be avoided.
149On the other hand of course, it usually is not the cleanest solution
150to avoid such artificial classes entirely.
152\subsubsection{Const correctness}
153Specifying in the code which objects may be altered and which shall remain constant, thus
154allowing for additional static checks preventing undesired modifications,
155is commonly referred to as ``const correctness'' TODO.
157Unfortunately, Java lacks a keyword like C++'s \code{const}, making it harder to
158achieve const correctness.
159It only specifies the similar keyword \code{final}, which is much less expressive and
160doesn't allow for a similarly effective error prevention \cite{final}.
161In particular, because \code{final} is not part of an object's type information,
162it is not possible to declare methods that return read-only objects TODO --
163placing a \code{final} before the method's return type would declare the
164method \code{final}. Similarly, there is no way to express that a method must not change
165the state of its object parameters. A method like \code{public f(final Object obj)}
166is only liable to not assigning a new value to its parameter object \code{obj} \cite{java}
167(which, if allowed, wouldn't affect the caller anyway \cite{java}).
168Methods changing its state, however, are allowed to be called on \code{obj} without
169restrictions \cite{java}.
171Several possibilities were considered to address this problem:
173 \item Not implementing const correctness, but stating the access rules in
174 comments only
175 \item Giving the methods which modify object states special names
176 like\\\code{setName\textendash\textendash USE\_WITH\_CARE}
177 \item Delegating changes of objects to special ``editor'' objects to be
178 obtained when an object shall be altered
179 \item Deriving classes offering the modifying methods from the read-only
180 classes
183Not implementing const correctness at all of course would have been the simplest
184possibility, producing the shortest and most readable code, but since
185incautious manipulation of objects would possibly have introduced subtle,
186hard-to-spot errors which in many cases would have occurred under additional
187conditions only and at other places, for example when inserting a \code{Column}
188into a \code{ColumnSet}, this method was not seriously considered.
190Using intentionally angular, conspicuous names also was not considered seriously,
191since it would have cluttered the code for the only sake of hopefully warning
192programmers of possible errors -- and not attempting to avoid them technically.
194So the introduction of new classes was considered the most effective and cleanest
195solution, either in the form of ``editor'' classes or derived classes offering the
196modifying methods directly. Again -- as in the identification of classes --,
197the most direct solution was considered the best, so the latter form of introducing
198additional classes was chosen and classes like \code{ReadableColumn},
199\code{ReadableColumnSet} et cetera were introduced which offer only the read-only
200functionality and usually occur in interfaces.
201Their counterparts including modifying methods also were derived from them and the
202implications of modifications were explained in their documentation, while the
203issue and the approach as such were also mentioned in the documentation of the
204\code{Readable...} classes.
205The \code{Readable...} classes can be converted to their fully-functional
206counterparts via downcasting (only), thereby giving a strong hint to
207programmers that the resulting objects are to be used with care.
209\subsubsection{Java interfaces}
210In Java programming, it is quiet common and often recommended, that every
211class has at least one \code{interface} it \code{implemen,ts},
212specifying the operations the class provides. TODO
213If no obvious \code{interface} exists for a class or the desired
214interface name is already given to some other entity,
215the interface is often given names like \code{ITableSchema}
216or \code{TableSchemaInterface}.
218However, for a special purpose program with a relatively fixed set of classes
219mostly representing real-world artifacts from the problem domain,
220this approach was considered overly cluttering, introducing artificial
221code entities for no benefit.
222In particular, as explained in section TODO, all program classes either are
223standing alone TODO or belong to a class hierarchy derived from at least one
225So, except from the standalone classes, an interface existed anyway, either
226``naturally'' (as in the case of \code{Key}, for example) or because of
227the chosen way to implement const correctness.
228In some cases, these were interfaces declared in the program code, while
229in some cases, Java interfaces like \code{Set} were implemented
230(an obvious choice, of course, for \code{ColumnSet}).
231Introducing artificial interfaces for the standalone classes was considered
232unnecessary at least, if not messy.
236As mentioned in section \fullref{classes}, class names were chosen to be
237concise but nevertheless expressive.
238This only was possible through the use of Java \code{package}s,
239which also helped structure the program.
241For the current, relatively limited, extent of the program which currently
242comprises $45$ (\code{public}) classes, a flat package structure was
243considered ideal, because it is simple and doesn't stash source files deep
244in subdirectories (in Java, the directory structure of the source tree
245is required to reflect the package structure TODO).
246Because also every class belongs to a package,
247each source file is to be found exactly one directory below the root
248program source directory, which in many cases eases their handling.
250The following $11$ packages exist in the program
251(their purpose and more details about the package structure are
252described in section TODO):
254 \item \code{boostrapping}
255 \item \code{cli}
256 \item \code{database}
257 \item \code{helpers}
258 \item \code{log}
259 \item \code{main}
260 \item \code{osl}
261 \item \code{output}
262 \item \code{settings}
263 \item \code{specification}
264 \item \code{test}
266TODO: two columns
268Each package is documented in the source code also, particularly in a file
269\file{} residing in the respective package directory.
270This is a common scheme supported by the \name{Eclipse} IDE as well as the
271documentation generation systems \name{javadoc} and \name{doxygen} TODO
272(all of which were used in the creation of the program,
273as described in section TODO).