]>
Commit | Line | Data |
---|---|---|
1 | \section{Code style} | |
2 | \label{code} | |
3 | TODO: Conventions, ex.: iterators\\ | |
4 | As the final system hopefully will have a long lifetime cycle | |
5 | and will be used and refined by many people, high code quality was an important aim. | |
6 | Beyond architectural issues this also involves cleanness on the lower level, | |
7 | like the design of classes and the implementation of methods. | |
8 | Common software development principles were followed and | |
9 | the unfamiliar reader was constantly taken into account | |
10 | to yield clean, readable and extensible code. | |
11 | ||
12 | %Scarce, informative comments were inserted at places of higher complexity and to | |
13 | %expose logical subdivisions, but care was taken to use ``speaking code'' in favor | |
14 | %of rampant comments. | |
15 | %Complex, misleading and hard-to-use interfaces were avoided wherever possible. | |
16 | %External software libraries employed were chosen to be stable, specific, | |
17 | %well structured, publicly available and ideally in wide use. | |
18 | ||
19 | \subsection{Comments} | |
20 | \label{comments} | |
21 | Comments were used at places ambiguities or misinterpretations could arise, | |
22 | yet care was taken to face such problems at their roots and solve them | |
23 | wherever possible instead of just effacing the ambiguity with comments. | |
24 | This approach is further explained in Section~\fullref{speaking} and | |
25 | rendered many uses of comments unnecessary. | |
26 | ||
27 | In fact, the number of (plain, e.g. non-\name{Javadoc}) comments was | |
28 | consciously minimized, to enforce speaking code and avoid redundancy. | |
29 | An exception from this was the highlighting of subdivisions. | |
30 | In class and method implementations, comments like | |
31 | \codepar{//********************** Constructors **********************\textbackslash\textbackslash} | |
32 | ||
33 | were deliberately used to ease navigation inside source files, | |
34 | but also to enhance readability: parts of method | |
35 | implementations, for example, were optically separated this way. | |
36 | Another alternative would have been to use separate methods for these code | |
37 | pieces, and thereby sticking strictly to the so-called | |
38 | ``Composed Method Pattern'' \cite{composed}, | |
39 | as was done in other cases. | |
40 | However, sticking to this pattern too rigidly would have introduced additional | |
41 | artifacts with either long or non-speaking names, | |
42 | would have interrupted the reading flow and also would have increased complexity, | |
43 | because these methods would have been callable at least from everywhere | |
44 | in the source file. | |
45 | Consequently, having longer methods at some places that are optically separated | |
46 | into smaller units that are in fact independent from each other was considered | |
47 | an elegant solution, although, surprisingly, this technique does not seem to be | |
48 | proposed that often in the literature. | |
49 | ||
50 | Wherever possible, the appropriate \name{Javadoc} comments were used in favor of | |
51 | plain comments, for example to specify parameters, return types, exceptions | |
52 | and links to other parts of the documentation. | |
53 | This proved even more useful due to the fact that \name{Doxygen} supports all | |
54 | of the used \name{Javadoc} comments \cite{doxygen} (but not vice versa \cite{javadoc}). | |
55 | ||
56 | \subsection{``Speaking code''} | |
57 | \label{speaking} | |
58 | As mentioned in Section~\fullref{comments}, the code was tried to be designed to | |
59 | ``speak for itself'' as much as possible instead of making its readers depend on | |
60 | comments that provide an understanding. | |
61 | In doing so, besides reducing code size due to the missing comments, | |
62 | clean code amenable to unfamiliar readers and unpredictable changes was enforced. | |
63 | This is especially important since, as described in Section~\fullref{arch}, | |
64 | \myprog{} was designed to not only be a standalone program but also offer | |
65 | components suitable for reusability. | |
66 | ||
67 | TODO: understandability <- code size | |
68 | ||
69 | The following topics were identified to be addressed to get what can be | |
70 | conceived as ``speaking code'': | |
71 | \begin{itemize} | |
72 | \item Meaningful typing | |
73 | \item Method names | |
74 | \item Variable names | |
75 | \item Intuitive control flow | |
76 | \item Limited nesting | |
77 | \item Usage of well-known structures | |
78 | \end{itemize} | |
79 | ||
80 | ~\\The rest of this section describes these topics in some detail. | |
81 | Besides, an intuitive architecture and suitable, well-designed libraries | |
82 | also contributed to the clarity of the code (TODO: move). | |
83 | ||
84 | \subsubsection{Meaningful typing} | |
85 | Meaningful typing includes the direct mapping of entities of the modeled world | |
86 | to code entities \cite{str4} as well as an expressive naming scheme | |
87 | for the obtained types. | |
88 | Furthermore, inheritance should be used to express commonalities, to avoid | |
89 | code duplication and to separate implementations from interfaces \cite{str4}. | |
90 | ||
91 | All real-world artifacts to be modeled like database schemata, tables, table schemata. | |
92 | columns, keys and OBDA specifications with their certain map types were directly | |
93 | translated into classes having simple predicting names like \code{Table}, | |
94 | \code{TableSchema} and \code{Key}. | |
95 | Package affiliation provided the correct context to unambiguously understand these names. | |
96 | ||
97 | \subsubsection{Method names} | |
98 | Assigning expressive names to methods is a substantially important part of | |
99 | producing speaking code, since methods encapsulate operation and as such | |
100 | are important ``building blocks'' for other methods \cite{str4} and ultimately | |
101 | the whole program. | |
102 | Furthermore, method names often occur in interfaces and therefore are not limited | |
103 | to a local scope, and neither are easily changeable without affecting callers | |
104 | \cite{java}. | |
105 | ||
106 | Ultimately, care was taken that method names reflect all important aspects | |
107 | of the respective method's behavior. | |
108 | Consider the following method from \file{CLIDatabaseInteraction.java}: | |
109 | \codepar{public static void promptAbortRetrieveDBSchemaAndWait\\ | |
110 | \ind{}(final FutureTask<DBSchema> retriever) throws SQLException} | |
111 | ||
112 | It could have been called \code{promptAbortRetrieveDBSchema} only, with the | |
113 | waiting mentioned in a comment. | |
114 | However, the waiting (blocking) is such an important part of its behavior, that this | |
115 | was considered not enough, so the waiting was included in the function name. | |
116 | Since the method is called at one place only, the lengthening of the method | |
117 | name by 7 characters or about 26 \% is really not a problem. | |
118 | ||
119 | \subsubsection{Variable names} | |
120 | To keep implementation code readable, care was taken to name variables | |
121 | meaningful yet concise. If this was not possible, expressiveness was preferred | |
122 | over conciseness. | |
123 | ||
124 | For example, in the implementation of the database schema retrieval, | |
125 | variables containing data directly obtained from querying the database | |
126 | and thus being subject to further processing was consequently prefixed | |
127 | with ``\code{recvd}'', although in most cases this technically would not have | |
128 | been necessary. | |
129 | ||
130 | \subsubsection{Intuitive control flow} | |
131 | To consequently stick to the maxim of speaking code and further increase readability, | |
132 | control flow was tried to kept intuitive. | |
133 | \code{do-while} loops, for example, are unintuitive: they complicate matters due to | |
134 | the additional, unconditional, loop their reader has to keep in mind. | |
135 | Even worse, \name{Java}'s Syntax delays the occurrence of their most important | |
136 | control statement -- the loop condition -- till after the loop body. | |
137 | Usually, \code{do-while} loops can be circumvented by properly setting variables | |
138 | influencing the loop condition immediately before the loop and using a \code{while} | |
139 | loop. | |
140 | Consequently, \code{do-while} loops were omitted -- the code of \myprog{} does not | |
141 | contain a single \code{do-while} loop. | |
142 | TODO: references | |
143 | ||
144 | Another counterproductive technique is the avoidance of the advanced loop control | |
145 | statements \code{break}, \code{continue} and \code{return} and the sole direction | |
146 | of a loop's control flow with its loop condition, often drawing on additional | |
147 | boolean variables like \code{loopDone} or \code{loopContinued}. | |
148 | This approach is an essential part of the ``structured programming (paradigm)'' | |
149 | \cite{struc} and its purpose is to enforce that a loop is always | |
150 | left regularly, by unsuccessfully checking the loop condition, which shall ease | |
151 | code verification \cite{struc}. | |
152 | A related topic is the general avoidance of the \code{return} statement (except at | |
153 | the end of a method) for similar considerations \cite{struc}. | |
154 | However, both are not needed \cite{clean} and, as always, the introduction | |
155 | of artificial technical constructs impairs readability and the ability of the code | |
156 | to ``speak for itself''. | |
157 | ||
158 | Consequently, control flow was not distorted for technical considerations | |
159 | and care was taken to yield straight-forward loops, utilizing advanced control | |
160 | statements to be concise and intuitive and cleverly designed methods that benefit | |
161 | from well-placed \code{return} statements. | |
162 | ||
163 | \subsubsection{Limited nesting} | |
164 | A topic related to intuitive control flow is limited code nesting. | |
165 | Most introductions of new nesting levels greatly increase complexity, | |
166 | since the associated conditions for the respective code to be reached | |
167 | combine with the previous ones in often inscrutable ways. | |
168 | Besides being aware of the execution condition for the code he is currently | |
169 | reading, the reader is forced to either remember the sub-conditions introduced | |
170 | with each nesting level, as well as the current nesting level, | |
171 | or to jump back to the introduction of one or more nestings to figure out the | |
172 | relevant execution condition again. | |
173 | ||
174 | Naturally, such code is far from being readable and expressive. | |
175 | Thus, overly deep nesting was avoided by rearranging code or using control | |
176 | statements like \code{return} in favor of opening a new \code{if} block. | |
177 | The deepest and most complicated nesting in \myprog{} has level $5$ | |
178 | (with normal, non-nested method code having level $0$), with | |
179 | one of these nestings being dedicated to a big enclosing \code{while} loop, | |
180 | one to a \code{try-catch} block and the remaining three to \code{if} blocks | |
181 | with no \code{else} parts and trivial one-expression conditions. | |
182 | Additionally, in this case all of the nesting blocks only contained | |
183 | a few lines of code, making the whole construction easily fit on one screen, | |
184 | so this was considered all right. | |
185 | At a few other places there occurs similar, less complicated, nesting up to level $5$. | |
186 | %These were similar to the above but with two enclosing loops and one or | |
187 | %even two \code{try-catch} blocks. | |
188 | TODO: references | |
189 | ||
190 | \subsubsection{Usage of well-known structures} | |
191 | Great benefit can be taken from constructs familiar to programmers | |
192 | regarding expressiveness. | |
193 | Surely, implementations based on such well-known constructs and patterns | |
194 | are much more likely to be instantly understood by programmers and therefore | |
195 | have a much higher ability of ``speaking for themselves''. | |
196 | ||
197 | Examples in \myprog{} are the (extensively used) iterator concept, | |
198 | const correctness (see Paragraph~``\nameref{const}'' | |
199 | in Section~\fullref{code_classes} TODO), exceptions, predicates \cite{str4}, | |
200 | run-time type information \cite{str4}, helper functions \cite{str4} | |
201 | and well-known interfaces from the \name{Java} API like \code{Set} or | |
202 | \code{Collection}, as well as common \name{Java} constructs, like | |
203 | classes performing a single action (e.g. \code{OSLSpecPrinter}), and | |
204 | naming schemes, like \code{get...}/\code{set...}/\code{is...}. | |
205 | ||
206 | \subsection{Robustness against incorrect use} | |
207 | Care was taken to produce code that is geared to incorrect use, making it | |
208 | suitable for the expected environment of sporadic updates by unfamiliar and | |
209 | potentially even unpracticed programmers, who besides have their emphasis | |
210 | on the concepts of bootstrapping rather than details of the present code anyway. | |
211 | In fact, carefully avoiding the introduction of technical artifacts to mind, | |
212 | preventing programmers from focusing on the actual program logic, | |
213 | is an important principle of writing clean code \cite{str4}. | |
214 | ||
215 | In modern object-oriented programming languages, of course the main instruments | |
216 | for achieving this are the type system and exceptions. | |
217 | In particular, static type information should be used to reflect data | |
218 | abstraction and the ``kind'' of data, an object reflects, | |
219 | while dynamic type information should only be used implicitly, | |
220 | through dynamically dispatching method invocations \cite{str3}. | |
221 | Exceptions on the other hand should be used at any place related to errors | |
222 | and error handling, separating error handling noticeably from other code and | |
223 | enforcing the treatment of errors \cite{str4}, preventing the programmer from using | |
224 | corrupted information in many cases. | |
225 | ||
226 | An example of both mechanisms, static type information and exceptions, acting | |
227 | in combination, while cleanly fitting into the context of dynamic dispatching, | |
228 | are the following methods from \file{Column.java}: | |
229 | \codepar{public Boolean isNonNull()\\public Boolean isUnique()} | |
230 | ||
231 | Their return type is the \name{Java} class \code{Boolean}, not the plain type | |
232 | \code{boolean}, because the information they return is not always known. | |
233 | In an early stage of the program, they returned \code{boolean} and were | |
234 | accompanied by two methods | |
235 | \code{public boolean knownIsNonNull()} and \code{public boolean knownIsUnique()}, | |
236 | telling the caller whether the respective information was known and thus the | |
237 | value returned by \code{isNonNull()} or \code{isUnique()}, respectively, | |
238 | was reliable. | |
239 | ||
240 | They were then changed to return the \name{Java} class \code{Boolean} and to return | |
241 | null pointers in case the respective information is not known. | |
242 | This eliminated any possibility of using unreliable data in favor of generating | |
243 | exceptions instead, in this case a \code{NullPointerException}, which is thrown | |
244 | automatically by the \name{Java} Runtime Environment \cite{java} if the programmer | |
245 | forgets the null check and tries to get a definite value from one of these methods | |
246 | when the correct value currently is not known. | |
247 | ||
248 | Comparing two unknown values -- thus, two null pointers -- | |
249 | also yields the desired result, \code{true}, since the change, | |
250 | even when the programmer forgets that he deals with objects. | |
251 | However, when comparing two return values of one of the methods in general | |
252 | -- as opposed to comparing one such return value against a constant --, | |
253 | errors could occur if the programmer mistakenly writes \code{col1.isUnique() == col2.isUnique()} | |
254 | instead of \code{col1.isUnique().booleanValue() == col2.isUnique().booleanValue()}. | |
255 | In this case, since the two \code{Boolean} objects are compared for identity \cite{java}, | |
256 | the former comparison can return \code{false}, even when the two boolean values are in fact | |
257 | the same. | |
258 | However, since this case was considered much less common than cases in which the other | |
259 | solution could make incautious programmers introduce subtle errors, it was preferred. | |
260 | Besides, wrapper classes like \code{Boolean}, \code{Integer}, \code{Long} | |
261 | and \code{Float} are an integral part of the \name{Java} language \cite{java}, | |
262 | so \name{Java} programmers were expected to manage to use them properly, so | |
263 | ultimately, since the new solution effectively prevents errors while | |
264 | abstaining from introducing new artifacts, it was considered fair and clean. | |
265 | ||
266 | TODO: summary | |
267 | ||
268 | \subsection{Use of classes} | |
269 | \label{code_classes} | |
270 | Following the object-oriented programming paradigm \cite{obj}, classes were heavily used | |
271 | to abstract from implementation details and to yield intuitively usable objects with | |
272 | a set of useful operations. | |
273 | ||
274 | \subsubsection{Identification of classes} | |
275 | To identify potential classes, entities from the problem domain were -- if reasonable -- | |
276 | directly represented as \name{Java} classes. | |
277 | The approach of choosing ``the program that most directly models the aspects of the | |
278 | real world that we are interested in'' to yield clean code, | |
279 | as described and recommended by Stroustrup \cite{str3}, proved to be extremely useful | |
280 | and effective. | |
281 | As a consequence, the code declares classes like \code{Column}, \code{ColumnSet}, | |
282 | \code{ForeignKey}, \code{Table}, \code{TableSchema} and \code{SQLType}. | |
283 | As described in Section~\fullref{speaking}, class names were chosen to be concise | |
284 | but nevertheless expressive. | |
285 | \name{Java} packages were used to help attain this aim, | |
286 | which is why the previously mentioned class names are unambiguous. | |
287 | For details about package use, see Section~\fullref{code_packages}. | |
288 | ||
289 | Care was taken not to introduce unnecessary classes, thereby complicating | |
290 | code structure and increasing the number of source files and program entities. | |
291 | Especially artificial classes, having little or no reference to real-world | |
292 | objects, could most often be avoided. | |
293 | On the other hand of course, it usually is not the cleanest solution | |
294 | to avoid such artificial classes entirely. | |
295 | ||
296 | Section \fullref{hierarchies} describes how the classes of \myprog{} are organized | |
297 | into class hierarchies. | |
298 | ||
299 | \subsubsection{Const correctness} | |
300 | \label{const} | |
301 | Specifying in the code which objects may be altered and which shall remain constant, | |
302 | thus allowing for additional static checks preventing undesired modifications, | |
303 | is commonly referred to as ``const correctness'' TODO. | |
304 | TODO: powerful, preventing errors, clarity | |
305 | ||
306 | Unfortunately, \name{Java} lacks a keyword like \name{C++}'s \code{const}, | |
307 | making it harder to achieve const correctness \cite{final}. | |
308 | It only specifies the similar keyword \code{final}, which is much less expressive and | |
309 | doesn't allow for a similarly effective error prevention \cite{final}. | |
310 | In particular, because \code{final} is not part of an object's type information, | |
311 | it is not possible to declare methods that return read-only objects \cite{final} -- | |
312 | placing a \code{final} before the method's return type would declare the | |
313 | method \code{final} \cite{java}. | |
314 | Similarly, there is no way to express that a method must not change | |
315 | the state of its object parameters. A method like \code{public f(final Object obj)} | |
316 | is only liable to not assigning a new value to its parameter object \code{obj} \cite{java} | |
317 | (which, if allowed, wouldn't affect the caller anyway \cite{java}). | |
318 | Methods changing its state, on the other hand, | |
319 | are allowed to be called on \code{obj} without restrictions. | |
320 | ||
321 | Several possibilities were considered to address this problem: | |
322 | \begin{itemize} | |
323 | \item Not implementing const correctness, but stating the access rules in | |
324 | comments only | |
325 | \item Not implementing const correctness, but giving the methods which modify | |
326 | object states special names like | |
327 | \code{setName\textendash\textendash USE\_WITH\_CARE} | |
328 | \item Implementing const correctness by delegating changes of objects | |
329 | to special ``editor'' objects to be | |
330 | obtained when an object shall be modified | |
331 | \item Implementing const correctness by deriving classes offering | |
332 | the modifying methods from read-only classes | |
333 | \end{itemize} | |
334 | ||
335 | Not implementing const correctness at all of course would have been the simplest | |
336 | possibility, producing the shortest and most readable code, but since | |
337 | incautious manipulation of objects would possibly have introduced subtle, | |
338 | hard-to-spot errors which in many cases would have occurred under additional | |
339 | conditions only and at other places, for example when inserting a \code{Column} | |
340 | into a \code{ColumnSet}, this method was not seriously considered. | |
341 | ||
342 | Not implementing const correctness but using intentionally angular, | |
343 | conspicuous names also was not considered seriously, | |
344 | since it would have cluttered the code for the only sake of hopefully warning | |
345 | programmers of possible errors -- and not attempting to avoid them technically. | |
346 | ||
347 | So the introduction of new classes was considered the most effective and cleanest | |
348 | solution, either in the form of ``editor'' classes or derived classes offering the | |
349 | modifying methods directly. Again -- as during the identification of classes --, | |
350 | the most direct solution was considered the best, so the latter form of introducing | |
351 | additional classes was chosen and classes like \code{ReadableColumn}, | |
352 | \code{ReadableColumnSet} et cetera were introduced which offer only the read-only | |
353 | functionality and usually occur in interfaces. | |
354 | Their counterparts including modifying methods also were derived from them and the | |
355 | implications of modifications were explained in their documentation, while the | |
356 | issue and the approach as such were also mentioned in the documentation of the | |
357 | \code{Readable...} classes. | |
358 | The \code{Readable...} classes can be converted to their fully-functional | |
359 | counterparts via downcasting (only), thereby giving a strong hint to | |
360 | programmers that the resulting objects are to be used with care. | |
361 | ||
362 | \subsubsection{Java interfaces} | |
363 | \label{code_interfaces} | |
364 | In \name{Java} programming, it is quiet common and often recommended \cite{gof} | |
365 | that every class has at least one \code{interface} it \code{implements}, | |
366 | specifying the operations the class provides. | |
367 | If no obvious \code{interface} exists for a class or the desired | |
368 | interface name is already given to some other entity, | |
369 | the interface is often given names like \code{ITableSchema} | |
370 | or \code{TableSchemaInterface}. | |
371 | ||
372 | However, for a special purpose program with a relatively fixed set of classes | |
373 | mostly representing real-world artifacts from the problem domain, | |
374 | this approach was considered overly cluttering, introducing artificial | |
375 | code entities for no benefit. | |
376 | In particular, as explained in Section~\fullref{fine}, all program classes either | |
377 | are standing alone or belong to a class hierarchy derived from at least one | |
378 | interface. | |
379 | So, except from the standalone classes, an interface existed anyway, either | |
380 | ``naturally'' (as in the case of \code{Key}, for example) or because of | |
381 | the chosen way to implement const correctness. | |
382 | In some cases, these were interfaces declared in the program code, while | |
383 | in some cases, \name{Java} interfaces like \code{Set} were implemented | |
384 | (an obvious choice, of course, for \code{ColumnSet}). | |
385 | Introducing artificial interfaces for the standalone classes was considered | |
386 | unnecessary at least, if not messy. | |
387 | ||
388 | \subsection{Use of packages} | |
389 | \label{code_packages} | |
390 | As mentioned in Section~\fullref{code_classes}, class names were chosen to be | |
391 | concise but nevertheless expressive. | |
392 | This only was possible through the use of \name{Java} \code{package}s, | |
393 | which also helped structure the program. | |
394 | ||
395 | For the current, relatively limited, extent of the program which currently | |
396 | comprises $45$ (\code{public}) classes, a flat package structure was | |
397 | considered ideal, because it is simple and doesn't stash source files deep | |
398 | in subdirectories (in \name{Java}, the directory structure of the source tree | |
399 | is required to reflect the package structure \cite{java}). | |
400 | Because also every class belongs to a package, | |
401 | each source file is to be found exactly one directory below the root | |
402 | program source directory, which in many cases eases their handling. | |
403 | ||
404 | For the description of the packages, their interaction and considerations on | |
405 | their structuring, see Section~\fullref{coarse}. | |
406 | For a detailed package description, refer to Appendix TODO. | |
407 | ||
408 | Each package is documented in the source code also, namely in a file | |
409 | \file{package-info.java} residing in the respective package directory. | |
410 | This is a common scheme supported by the \name{Eclipse} IDE as well as the | |
411 | documentation generation systems \name{Javadoc} and \name{Doxygen} | |
412 | (all of which were used in the creation of the program, | |
413 | as described in Section~\fullref{tools}). |