Minor change
[u/philim/db2osl_thesis.git] / program_code.tex
... / ...
CommitLineData
1\section{Code style}
2\label{code}
3TODO: Conventions, ex.: iterators
4As the final system hopefully will have a long living cycle TODO
5and will be used and refined by many people, high code quality was an important aim.
6Beyond architectural issues this also involves cleanness on the lower level,
7like the design of classes and the implementation of methods.
8Common software development principles were followed TODO and
9the unfamiliar reader was constantly taken into account
10to yield clean, usable and readable code.
11
12\subsection{Comments}
13\label{comments}
14Comments were used at places ambiguities or misinterpretations could arise,
15yet care was taken to face such problems at their roots and solve them
16wherever possible instead of just eliminating the ambiguity with comments.
17
18Consider the following method in \file{CLIDatabaseInteraction.java}:
19\codepar{public static void promptAbortRetrieveDBSchemaAndWait\\
20 \ind(final FutureTask<DBSchema> retriever) throws SQLException}
21
22It could have been called \code{promptAbortRetrieveDBSchema} only, with the
23waiting mentioned in a comment.
24However, the waiting is such an important part of its behavior, that this
25wouldn't have been enough, so the waiting was included in the function name.
26Since the method is called at one place only, the lengthening of the method
27name by 7 characters or about 26 \% is really not a problem.
28
29More generally, ``speaking code'' was used wherever possible,
30as described in section \fullref{speaking},
31which rendered many uses of comments unnecessary.
32In fact, the number of (plain, e.g. non-\name{Javadoc}) comments was
33consciously minimized, to enforce speaking code and avoid redundancy.
34This technique is known TODO.
35
36An exception of course from this is the highlighting of subdivisions.
37In class and method implementations, comments like
38\codepar{//********************** Constructors **********************\textbackslash\textbackslash}
39
40were deliberately used to ease navigation inside source files for unfamiliar
41readers, but also to enhance readability: independent parts of method
42implementations, for example, were optically separated this way.
43Another alternative would have been to use separate methods for this code
44pieces, as was done in other cases, and thereby sticking strictly to the so-called
45``Composed Method Pattern'' \cite{composed}.
46However, sticking to this pattern too rigidly would have introduced additional
47artifacts with either long or non-speaking names,
48would have interrupted the reading flow and also would have increased complexity,
49because these methods would have been callable at least from everywhere
50in the source file.
51
52Wherever possible, the appropriate \name{Javadoc} comments were used in favor of
53plain comments, for example to specify parameters, return types, exceptions
54and links to other parts of the documentation.
55
56\subsection{Speaking code}
57\label{speaking}
58As mentioned in section \fullref{comments}, the use of ``speaking code'' as
59introduced TODO
60renders many uses of comments unnecessary.
61In particular, the following aspects are commonly considered when referring to
62the term ``speaking code'' TODO:
63
64\begin{itemize}
65 \item Variable names
66 \item Control flow
67\end{itemize}
68
69\subsubsection{Variable names}
70A very important part of speaking code
71
72\subsection{Robustness against incorrect use}
73Care was taken to produce code that is robust to incorrect use, making it
74suitable for the expected environment of sporadic updates by unfamiliar and
75potentially even unpracticed programmers who very likely have their emphasis
76on the concepts of bootstrapping rather than details of the present code.
77
78In fact, carefully avoiding the introduction of technical artifacts to mind,
79preventing programmers from focusing on the actual program logic,
80is an important principle of writing clean code \cite{str4}.
81
82In modern programming languages, of course the main instruments for achieving
83this are the type system and exceptions.
84In particular, static type information should be used to reflect data
85abstraction and the ``kind'' of data, an object reflects,
86while dynamic type information should only be used implicitly,
87through dynamically dispatching method invocations \cite{str4}.
88Exceptions on the other hand should be used at any place related to errors
89and error handling, separating error handling noticeably from other code and
90enforcing the treatment of errors, preventing the programmer from using
91corrupted information in many cases.
92
93An example of both mechanism, static type information and exceptions, acting
94in combination, while cleanly fitting into the context of dynamic dispatching,
95are the following methods from \file{Column.java}:
96\codepar{public Boolean isNonNull()\\public Boolean isUnique()}
97
98There return type is the \name{Java} class \code{Boolean}, not the plain type
99\code{boolean}, because the information they return is not always known.
100In an early stage of the program, they returned \code{boolean} and were
101accompanied by two methods
102\code{public boolean knownIsNonNull()} and \code{public boolean knownIsUnique()},
103telling the caller whether the respective information was known and thus the
104value returned by \code{isNonNull()} or \code{isUnique()}, respectively,
105was reliable.
106
107They were then changed to return the \name{Java} class \code{Boolean} and to return
108null pointers in case the respective information is not known.
109This eliminated any possibility of using unreliable data in favor of generating
110exceptions instead, in this case a \code{NullPointerException}, which is thrown
111automatically by the \name{Java Runtime Environment} if the programmer forgets the
112null check and tries to get a definite value from one of these methods
113when the correct value currently is not known.
114
115Comparing two unknown values -- thus, two null pointers --
116also yields the desired result, \code{true}, since the change,
117even when the programmer forgets that he deals with objects.
118However, when comparing two return values of one of the methods in general
119-- as opposed to comparing one such return value against a constant --,
120errors could occur if the programmer mistakenly writes \code{col1.isUnique() == col2.isUnique()}
121instead of \code{col1.isUnique().booleanValue() == col2.isUnique().booleanValue()}.
122In this case, since the two \code{Boolean} objects are compared for identity \cite{java},
123the former comparison can return \code{false}, even when the two boolean values are in fact
124the same.
125However, since this case was considered much less common than cases in which the other
126solution could make programmers making mistakes produce undetected errors, it was preferred.
127
128TODO: more (?), summary
129
130\subsection{Classes}
131\label{code_classes}
132Following the object-oriented programming paradigm, classes were heavily used
133to abstract from implementation details and to yield intuitively usable objects with
134a set of useful operations \cite{obj}.
135
136\subsubsection{Identification of classes}
137To identify potential classes, entities from the problem domain were -- if reasonable --
138directly represented as \name{Java} classes.
139The approach of choosing ``the program that most directly models the aspects of the
140real world that we are interested in'' to yield clean code,
141as described and recommended by Stroustrup \cite{str3}, proved to be extremely useful
142and effective.
143As a consequence, the code declares classes like \code{Column}, \code{ColumnSet},
144\code{ForeignKey}, \code{Table}, \code{TableSchema} and \code{SQLType}.
145As described in section \fullref{speaking}, class names were chosen to be concise
146but nevertheless expressive TODO.
147\name{Java} packages were used to help attain this aim,
148which is why the previously mentioned class names are unambiguous
149(for details about package use, see section \fullref{code_packages}, for the description
150of the packages themselves and their structuring, see section \fullref{coarse}).
151
152Care was taken not to introduce unnecessary classes, thereby complicating
153code structure and increasing the number of source files and program entities.
154Especially artificial classes, having little or no reference to real-world
155objects, could most often be avoided.
156On the other hand of course, it usually is not the cleanest solution
157to avoid such artificial classes entirely.
158
159Section \fullref{hierarchies} describes how the classes of \myprog{} are organized
160into class hierarchies.
161
162\subsubsection{Const correctness}
163\label{const}
164Specifying in the code which objects may be altered and which shall remain constant,
165thus allowing for additional static checks preventing undesired modifications,
166is commonly referred to as ``const correctness'' TODO.
167
168Unfortunately, \name{Java} lacks a keyword like \name{C++}'s \code{const},
169making it harder to achieve const correctness \cite{final}.
170It only specifies the similar keyword \code{final}, which is much less expressive and
171doesn't allow for a similarly effective error prevention \cite{final}.
172In particular, because \code{final} is not part of an object's type information,
173it is not possible to declare methods that return read-only objects TODO --
174placing a \code{final} before the method's return type would declare the
175method \code{final}. Similarly, there is no way to express that a method must not change
176the state of its object parameters. A method like \code{public f(final Object obj)}
177is only liable to not assigning a new value to its parameter object \code{obj} \cite{java}
178(which, if allowed, wouldn't affect the caller anyway \cite{java}).
179Methods changing its state, however, are allowed to be called on \code{obj} without
180restrictions \cite{java}.
181
182Several possibilities were considered to address this problem:
183\begin{itemize}
184 \item Not implementing const correctness, but stating the access rules in
185 comments only
186 \item Giving the methods which modify object states special names
187 like\\\code{setName\textendash\textendash USE\_WITH\_CARE}
188 \item Delegating changes of objects to special ``editor'' objects to be
189 obtained when an object shall be altered TODO
190 \item Deriving classes offering the modifying methods from the read-only
191 classes
192\end{itemize}
193
194Not implementing const correctness at all of course would have been the simplest
195possibility, producing the shortest and most readable code, but since
196incautious manipulation of objects would possibly have introduced subtle,
197hard-to-spot errors which in many cases would have occurred under additional
198conditions only and at other places, for example when inserting a \code{Column}
199into a \code{ColumnSet}, this method was not seriously considered.
200
201Using intentionally angular, conspicuous names also was not considered seriously,
202since it would have cluttered the code for the only sake of hopefully warning
203programmers of possible errors -- and not attempting to avoid them technically.
204
205So the introduction of new classes was considered the most effective and cleanest
206solution, either in the form of ``editor'' classes or derived classes offering the
207modifying methods directly. Again -- as in the identification of classes --,
208the most direct solution was considered the best, so the latter form of introducing
209additional classes was chosen and classes like \code{ReadableColumn},
210\code{ReadableColumnSet} et cetera were introduced which offer only the read-only
211functionality and usually occur in interfaces.
212Their counterparts including modifying methods also were derived from them and the
213implications of modifications were explained in their documentation, while the
214issue and the approach as such were also mentioned in the documentation of the
215\code{Readable...} classes.
216The \code{Readable...} classes can be converted to their fully-functional
217counterparts via downcasting (only), thereby giving a strong hint to
218programmers that the resulting objects are to be used with care.
219
220\subsubsection{Java interfaces}
221\label{code_interfaces}
222In \name{Java} programming, it is quiet common and often recommended, that every
223class has at least one \code{interface} it \code{implements},
224specifying the operations the class provides. TODO
225If no obvious \code{interface} exists for a class or the desired
226interface name is already given to some other entity,
227the interface is often given names like \code{ITableSchema}
228or \code{TableSchemaInterface}.
229
230However, for a special purpose program with a relatively fixed set of classes
231mostly representing real-world artifacts from the problem domain,
232this approach was considered overly cluttering, introducing artificial
233code entities for no benefit.
234In particular, as explained in section TODO, all program classes either are
235standing alone TODO or belong to a class hierarchy derived from at least one
236interface.
237So, except from the standalone classes, an interface existed anyway, either
238``naturally'' (as in the case of \code{Key}, for example) or because of
239the chosen way to implement const correctness.
240In some cases, these were interfaces declared in the program code, while
241in some cases, \name{Java} interfaces like \code{Set} were implemented
242(an obvious choice, of course, for \code{ColumnSet}).
243Introducing artificial interfaces for the standalone classes was considered
244unnecessary at least, if not messy.
245
246\subsection{Packages}
247\label{code_packages}
248As mentioned in section \fullref{code_classes}, class names were chosen to be
249concise but nevertheless expressive.
250This only was possible through the use of \name{Java} \code{package}s,
251which also helped structure the program.
252
253For the current, relatively limited, extent of the program which currently
254comprises $45$ (\code{public}) classes, a flat package structure was
255considered ideal, because it is simple and doesn't stash source files deep
256in subdirectories (in \name{Java}, the directory structure of the source tree
257is required to reflect the package structure TODO).
258Because also every class belongs to a package,
259each source file is to be found exactly one directory below the root
260program source directory, which in many cases eases their handling.
261
262The following $11$ packages exist in the program
263(their purpose and more details about the package structure are
264described in section \fullref{coarse}):
265\begin{multicols}{3}\begin{itemize}
266 \item \code{boostrapping}
267 \item \code{cli}
268 \item \code{database}
269 \item \code{helpers}
270 \item \code{log}
271 \item \code{main}
272 \item \code{osl}
273 \item \code{output}
274 \item \code{settings}
275 \item \code{specification}
276 \item \code{test}
277\end{itemize}\end{multicols}
278
279Each package is documented in the source code also, particularly in a file
280\file{package-info.java} residing in the respective package directory.
281This is a common scheme supported by the \name{Eclipse} IDE as well as the
282documentation generation systems \name{javadoc} and \name{doxygen} TODO
283(all of which were used in the creation of the program,
284as described in section TODO).