program_code.tex

   1 \section{Code style}
   2 \label{code}
   3 TODO: Conventions, ex.: iterators
   4 As the final system hopefully will have a long living cycle TODO
   5 and will be used and refined by many people, high code quality was an important aim.
   6 Beyond architectural issues this also involves cleanness on the lower level,
   7 like the design of classes and the implementation of methods.
   8 Common software development principles were followed TODO and
   9 the unfamiliar reader was constantly taken into account
  10 to yield clean, readable and extensible code.
  11
  12 \subsection{Comments}
  13 \label{comments}
  14 Comments were used at places ambiguities or misinterpretations could arise,
  15 yet care was taken to face such problems at their roots and solve them
  16 wherever possible instead of just effacing the ambiguity with comments.
  17 This approach is further explained in section \fullref{speaking} and
  18 rendered many uses of comments unnecessary.
  19
  20 In fact, the number of (plain, e.g. non-\name{Javadoc}) comments was
  21 consciously minimized, to enforce speaking code and avoid redundancy.
  22 An exception from this was the highlighting of subdivisions.
  23 In class and method implementations, comments like
  24 \codepar{//********************** Constructors **********************\textbackslash\textbackslash}
  25
  26 were deliberately used to ease navigation inside source files,
  27 but also to enhance readability: parts of method
  28 implementations, for example, were optically separated this way.
  29 Another alternative would have been to use separate methods for these code
  30 pieces, and thereby sticking strictly to the so-called
  31 ``Composed Method Pattern'' \cite{composed},
  32 as was done in other cases.
  33 However, sticking to this pattern too rigidly would have introduced additional
  34 artifacts with either long or non-speaking names,
  35 would have interrupted the reading flow and also would have increased complexity,
  36 because these methods would have been callable at least from everywhere
  37 in the source file.
  38 Consequently, having longer methods at some places that are optically separated
  39 into smaller units that are in fact independent from each other was considered
  40 an elegant solution, although, surprisingly, this technique does not seem to be
  41 proposed that often in the literature.
  42
  43 Wherever possible, the appropriate \name{Javadoc} comments were used in favor of
  44 plain comments, for example to specify parameters, return types, exceptions
  45 and links to other parts of the documentation.
  46 This proved even more useful due to the fact that \name{Doxygen} supports all
  47 of the used \name{Javadoc} comments TODO (but not vice versa TODO).
  48
  49 \subsection{``Speaking code''}
  50 \label{speaking}
  51 As mentioned in section \fullref{comments}, the code was tried to be designed to
  52 ``speak for itself'' as much as possible instead of making its readers depend on
  53 comments that provide an understanding.
  54 In doing so, besides reducing code size due to the missing comments,
  55 clean code amenable to unfamiliar readers and unpredictable changes was enforced.
  56 This is especially important since, as described in section \fullref{arch},
  57 \myprog{} was designed to not only be a standalone program but also offer
  58 components suitable for reusability.
  59 %TODO: understandability <- code size
  60
  61 The following topics were identified to be addressed to get what can be
  62 conceived as ``speaking code'':
  63 \begin{itemize}
  64         \item Meaningful typing
  65         \item Method names
  66         \item Variable names
  67         \item Intuitive control flow
  68         \item Limited nesting
  69         \item Compact code units
  70         \item Usage of well-known structures
  71 \end{itemize}
  72
  73 ~\\The rest of this section describes these topics in some detail.
  74 Besides, an intuitive architecture and suitable, well-designed libraries
  75 also contribute to the clarity of the code.
  76
  77 \subsubsection{Meaningful typing}
  78 Meaningful typing includes the direct mapping of entities of the modeled world
  79 to code entities \cite{str4} as well as an expressive naming scheme
  80 for the obtained types.
  81 Furthermore, inheritance should be used to express commonalities, to avoid
  82 code duplication and to separate implementations from interfaces \cite{str4}.
  83
  84 All real-world artifacts to be modeled like database schemata, tables, table schemata.
  85 columns, keys and OBDA Specifications with their certain map types were directly
  86 translated into classes having simple predicting names like \code{Table},
  87 \code{TableSchema} and \code{Key}.
  88 Package affiliation provided the correct context to unambiguously understand these names.
  89
  90 \subsubsection{Method names}
  91 Assigning expressive names to methods is a substantially important part of
  92 producing speaking code, since methods encapsulate operation and as such
  93 are important ``building blocks'' for other methods \cite{str4} and ultimately
  94 the whole program.
  95 Furthermore, method names often occur in interfaces and therefore are not limited
  96 to a local scope, and neither are easily changeable without affecting callers
  97 \cite{java}.
  98
  99 Care was taken that method names reflect all important aspects of the respective
 100 method's behavior and
 101
 102 Consider the following method in \file{CLIDatabaseInteraction.java}:
 103 \codepar{public static void promptAbortRetrieveDBSchemaAndWait\\
 104         \ind(final FutureTask<DBSchema> retriever) throws SQLException}
 105
 106 It could have been called \code{promptAbortRetrieveDBSchema} only, with the
 107 waiting mentioned in a comment.
 108 However, the waiting is such an important part of its behavior, that this
 109 wouldn't have been enough, so the waiting was included in the function name.
 110 Since the method is called at one place only, the lengthening of the method
 111 name by 7 characters or about 26 \% is really not a problem.
 112
 113 \subsubsection{Variable names}
 114 To keep implementation code readable, care was taken to name variables
 115 meaningful yet concise. If this was not possible, expressiveness was preferred
 116 over conciseness.
 117
 118 For example, in the implementation of the database schema retrieval,
 119 variables containing data directly obtained from querying the database
 120 and thus being subject to further processing was consequently prefixed
 121 with ``\code{recvd}'', although in most cases this technically would not have
 122 been necessary.
 123
 124 \subsubsection{Intuitive control flow}
 125
 126 \subsubsection{Limited nesting}
 127
 128 \subsubsection{Compact code units}
 129
 130 \subsubsection{Usage of well-known structures}
 131
 132 \subsection{Robustness against incorrect use}
 133 Care was taken to produce code that is robust to incorrect use, making it
 134 suitable for the expected environment of sporadic updates by unfamiliar and
 135 potentially even unpracticed programmers, who besides have their emphasis
 136 on the concepts of bootstrapping rather than details of the present code anyway.
 137 In fact, carefully avoiding the introduction of technical artifacts to mind,
 138 preventing programmers from focusing on the actual program logic,
 139 is an important principle of writing clean code \cite{str4}.
 140
 141 In modern object-oriented programming languages, of course the main instruments
 142 for achieving this are the type system and exceptions.
 143 In particular, static type information should be used to reflect data
 144 abstraction and the ``kind'' of data, an object reflects,
 145 while dynamic type information should only be used implicitly,
 146 through dynamically dispatching method invocations \cite{str3}.
 147 Exceptions on the other hand should be used at any place related to errors
 148 and error handling, separating error handling noticeably from other code and
 149 enforcing the treatment of errors \cite{str4}, preventing the programmer from using
 150 corrupted information in many cases.
 151
 152 An example of both mechanisms, static type information and exceptions, acting
 153 in combination, while cleanly fitting into the context of dynamic dispatching,
 154 are the following methods from \file{Column.java}:
 155 \codepar{public Boolean isNonNull()\\public Boolean isUnique()}
 156
 157 There return type is the \name{Java} class \code{Boolean}, not the plain type
 158 \code{boolean}, because the information they return is not always known.
 159 In an early stage of the program, they returned \code{boolean} and were
 160 accompanied by two methods
 161 \code{public boolean knownIsNonNull()} and \code{public boolean knownIsUnique()},
 162 telling the caller whether the respective information was known and thus the
 163 value returned by \code{isNonNull()} or \code{isUnique()}, respectively,
 164 was reliable.
 165
 166 They were then changed to return the \name{Java} class \code{Boolean} and to return
 167 null pointers in case the respective information is not known.
 168 This eliminated any possibility of using unreliable data in favor of generating
 169 exceptions instead, in this case a \code{NullPointerException}, which is thrown
 170 automatically by the \name{Java Runtime Environment} if the programmer forgets the
 171 null check and tries to get a definite value from one of these methods
 172 when the correct value currently is not known.
 173
 174 Comparing two unknown values -- thus, two null pointers --
 175 also yields the desired result, \code{true}, since the change,
 176 even when the programmer forgets that he deals with objects.
 177 However, when comparing two return values of one of the methods in general
 178 -- as opposed to comparing one such return value against a constant --,
 179 errors could occur if the programmer mistakenly writes \code{col1.isUnique() == col2.isUnique()}
 180 instead of \code{col1.isUnique().booleanValue() == col2.isUnique().booleanValue()}.
 181 In this case, since the two \code{Boolean} objects are compared for identity \cite{java},
 182 the former comparison can return \code{false}, even when the two boolean values are in fact
 183 the same.
 184 However, since this case was considered much less common than cases in which the other
 185 solution could make programmers making mistakes produce undetected errors, it was preferred.
 186
 187 TODO: summary
 188
 189 \subsection{Classes}
 190 \label{code_classes}
 191 Following the object-oriented programming paradigm, classes were heavily used
 192 to abstract from implementation details and to yield intuitively usable objects with
 193 a set of useful operations \cite{obj}.
 194
 195 \subsubsection{Identification of classes}
 196 To identify potential classes, entities from the problem domain were -- if reasonable --
 197 directly represented as \name{Java} classes.
 198 The approach of choosing ``the program that most directly models the aspects of the
 199 real world that we are interested in'' to yield clean code,
 200 as described and recommended by Stroustrup \cite{str3}, proved to be extremely useful
 201 and effective.
 202 As a consequence, the code declares classes like \code{Column}, \code{ColumnSet},
 203 \code{ForeignKey}, \code{Table}, \code{TableSchema} and \code{SQLType}.
 204 As described in section \fullref{speaking}, class names were chosen to be concise
 205 but nevertheless expressive TODO.
 206 \name{Java} packages were used to help attain this aim,
 207 which is why the previously mentioned class names are unambiguous.
 208 For details about package use, see section \fullref{code_packages}.
 209
 210 Care was taken not to introduce unnecessary classes, thereby complicating
 211 code structure and increasing the number of source files and program entities.
 212 Especially artificial classes, having little or no reference to real-world
 213 objects, could most often be avoided.
 214 On the other hand of course, it usually is not the cleanest solution
 215 to avoid such artificial classes entirely.
 216
 217 Section \fullref{hierarchies} describes how the classes of \myprog{} are organized
 218 into class hierarchies.
 219
 220 \subsubsection{Const correctness}
 221 \label{const}
 222 Specifying in the code which objects may be altered and which shall remain constant,
 223 thus allowing for additional static checks preventing undesired modifications,
 224 is commonly referred to as ``const correctness'' TODO.
 225 TODO: powerful, preventing errors, clarity
 226
 227 Unfortunately, \name{Java} lacks a keyword like \name{C++}'s \code{const},
 228 making it harder to achieve const correctness \cite{final}.
 229 It only specifies the similar keyword \code{final}, which is much less expressive and
 230 doesn't allow for a similarly effective error prevention \cite{final}.
 231 In particular, because \code{final} is not part of an object's type information,
 232 it is not possible to declare methods that return read-only objects TODO --
 233 placing a \code{final} before the method's return type would declare the
 234 method \code{final}. Similarly, there is no way to express that a method must not change
 235 the state of its object parameters. A method like \code{public f(final Object obj)}
 236 is only liable to not assigning a new value to its parameter object \code{obj} \cite{java}
 237 (which, if allowed, wouldn't affect the caller anyway \cite{java}).
 238 Methods changing its state, however, are allowed to be called on \code{obj} without
 239 restrictions \cite{java}.
 240
 241 Several possibilities were considered to address this problem:
 242 \begin{itemize}
 243         \item Not implementing const correctness, but stating the access rules in
 244         comments only
 245         \item Giving the methods which modify object states special names
 246         like\\\code{setName\textendash\textendash USE\_WITH\_CARE}
 247         \item Delegating changes of objects to special ``editor'' objects to be
 248         obtained when an object shall be altered TODO
 249         \item Deriving classes offering the modifying methods from the read-only
 250         classes
 251 \end{itemize}
 252
 253 Not implementing const correctness at all of course would have been the simplest
 254 possibility, producing the shortest and most readable code, but since
 255 incautious manipulation of objects would possibly have introduced subtle,
 256 hard-to-spot errors which in many cases would have occurred under additional
 257 conditions only and at other places, for example when inserting a \code{Column}
 258 into a \code{ColumnSet}, this method was not seriously considered.
 259
 260 Using intentionally angular, conspicuous names also was not considered seriously,
 261 since it would have cluttered the code for the only sake of hopefully warning
 262 programmers of possible errors -- and not attempting to avoid them technically.
 263
 264 So the introduction of new classes was considered the most effective and cleanest
 265 solution, either in the form of ``editor'' classes or derived classes offering the
 266 modifying methods directly. Again -- as in the identification of classes --,
 267 the most direct solution was considered the best, so the latter form of introducing
 268 additional classes was chosen and classes like \code{ReadableColumn},
 269 \code{ReadableColumnSet} et cetera were introduced which offer only the read-only
 270 functionality and usually occur in interfaces.
 271 Their counterparts including modifying methods also were derived from them and the
 272 implications of modifications were explained in their documentation, while the
 273 issue and the approach as such were also mentioned in the documentation of the
 274 \code{Readable...} classes.
 275 The \code{Readable...} classes can be converted to their fully-functional
 276 counterparts via downcasting (only), thereby giving a strong hint to
 277 programmers that the resulting objects are to be used with care.
 278
 279 \subsubsection{Java interfaces}
 280 \label{code_interfaces}
 281 In \name{Java} programming, it is quiet common and often recommended, that every
 282 class has at least one \code{interface} it \code{implements},
 283 specifying the operations the class provides. TODO
 284 If no obvious \code{interface} exists for a class or the desired
 285 interface name is already given to some other entity,
 286 the interface is often given names like \code{ITableSchema}
 287 or \code{TableSchemaInterface}.
 288
 289 However, for a special purpose program with a relatively fixed set of classes
 290 mostly representing real-world artifacts from the problem domain,
 291 this approach was considered overly cluttering, introducing artificial
 292 code entities for no benefit.
 293 In particular, as explained in section TODO, all program classes either are
 294 standing alone TODO or belong to a class hierarchy derived from at least one
 295 interface.
 296 So, except from the standalone classes, an interface existed anyway, either
 297 ``naturally'' (as in the case of \code{Key}, for example) or because of
 298 the chosen way to implement const correctness.
 299 In some cases, these were interfaces declared in the program code, while
 300 in some cases, \name{Java} interfaces like \code{Set} were implemented
 301 (an obvious choice, of course, for \code{ColumnSet}).
 302 Introducing artificial interfaces for the standalone classes was considered
 303 unnecessary at least, if not messy.
 304
 305 \subsection{Packages}
 306 \label{code_packages}
 307 As mentioned in section \fullref{code_classes}, class names were chosen to be
 308 concise but nevertheless expressive.
 309 This only was possible through the use of \name{Java} \code{package}s,
 310 which also helped structure the program.
 311
 312 For the current, relatively limited, extent of the program which currently
 313 comprises $45$ (\code{public}) classes, a flat package structure was
 314 considered ideal, because it is simple and doesn't stash source files deep
 315 in subdirectories (in \name{Java}, the directory structure of the source tree
 316 is required to reflect the package structure TODO).
 317 Because also every class belongs to a package,
 318 each source file is to be found exactly one directory below the root
 319 program source directory, which in many cases eases their handling.
 320
 321 The following $11$ packages exist in the program
 322 (their purpose and more details about the package structure are
 323 described in section \fullref{coarse}):
 324 \begin{multicols}{3}\begin{itemize}
 325         \item \code{boostrapping}
 326         \item \code{cli}
 327         \item \code{database}
 328         \item \code{helpers}
 329         \item \code{log}
 330         \item \code{main}
 331         \item \code{osl}
 332         \item \code{output}
 333         \item \code{settings}
 334         \item \code{specification}
 335         \item \code{test}
 336 \end{itemize}\end{multicols}
 337
 338 For the description of the packages, their interaction and considerations on
 339 their structuring, see section \fullref{coarse}.
 340 For a detailed package description, refer to Appendix TODO.
 341
 342 Each package is documented in the source code also, namely in a file
 343 \file{package-info.java} residing in the respective package directory.
 344 This is a common scheme supported by the \name{Eclipse} IDE as well as the
 345 documentation generation systems \name{javadoc} and \name{doxygen} TODO
 346 (all of which were used in the creation of the program,
 347 as described in section \fullref{tools}).