thesis/master-thesis-erlenkr.tex

   1 \documentclass[USenglish]{ifimaster}
   2 \usepackage{import}
   3 \usepackage[utf8]{inputenc}
   4 \usepackage[T1]{fontenc,url}
   5 \urlstyle{sf}
   6 \usepackage{babel,textcomp,csquotes,ifimasterforside,varioref,graphicx}
   7 \usepackage[style=numeric-comp,backend=bibtex]{biblatex}
   8 \usepackage{amsthm}
   9 \usepackage{todonotes}
  10 \usepackage{verbatim}
  11 \usepackage{perpage} %the perpage package
  12 \MakePerPage{footnote} %the perpage package command
  13
  14 \theoremstyle{plain}
  15 \newtheorem*{wordDef}{Definition}
  16
  17 \newcommand{\definition}[1]{\begin{wordDef}#1\end{wordDef}}
  18 \newcommand{\see}[1]{(see section \ref{#1})}
  19 \newcommand{\explanation}[3]{\noindent\textbf{\textit{#1}}\\*\emph{When:}
  20 #2\\*\emph{How:} #3\\*[-7px]}
  21 \newcommand{\type}[1]{\texttt{#1}}
  22 \newcommand{\typeref}[1]{\footnote{\type{#1}}}
  23 \newcommand{\typewithref}[2]{\type{#2}\typeref{#1.#2}}
  24 \newcommand{\method}[1]{\type{#1}}
  25 \newcommand{\methodref}[2]{\footnote{\type{#1}\method{\##2()}}}
  26 \newcommand{\methodwithref}[2]{\method{#2}\footnote{\type{#1}\method{\##2()}}}
  27
  28
  29 \title{Refactoring}
  30 \subtitle{An unfinished essay}
  31 \author{Erlend Kristiansen}
  32
  33 \bibliography{bibliography/master-thesis-erlenkr-bibliography}
  34
  35 \begin{document}
  36 \ififorside
  37 \frontmatter{}
  38
  39
  40 \chapter*{Abstract}
  41 Empty document.
  42
  43 \tableofcontents{}
  44 \listoffigures{}
  45 \listoftables{}
  46
  47 \chapter*{Preface}
  48
  49 To make it clear already from the beginning: The discussions in this report must
  50 be seen in the context of object oriented programming languages, and Java in
  51 particular, since that is the language in which most of the examples will be
  52 given. All though the techniques discussed may be applicable to languages from
  53 other paradigms, they will not be the subject of this report.
  54
  55 \mainmatter
  56
  57 \chapter{What is Refactoring?}
  58
  59 This question is best answered by first defining the concept of a
  60 \emph{refactoring}, what it is to \emph{refactor}, and then discuss what aspects
  61 of programming that make people want to refactor their code.
  62
  63 \section{Defining refactoring}
  64 Martin Fowler, in his masterpiece on refactoring \cite{refactoring}, defines a
  65 refactoring like this:
  66 \begin{quote}
  67   \emph{Refactoring} (noun): a change made to the \todo{what does he mean by
  68   internal?} internal structure of software to make it easier to understand and
  69   cheaper to modify without changing its observable
  70   behavior.~\cite{refactoring} % page 53
  71 \end{quote}
  72 This definition assign additional meaning to the word \emph{refactoring}, beyond
  73 the composition of the prefix \emph{re-}, usually meaning something like
  74 ``again'' or ``anew'', and the word \emph{factoring}, that can mean to determine
  75 the \emph{factors} of something. Where a \emph{factor} would be close to the
  76 mathematical definition of something that divides a quantity, without leaving a
  77 remainder. Fowler is mixing the \emph{motivation} behind refactoring into his
  78 definition.  Instead it could be made clean, only considering the mechanical and
  79 behavioral aspects of refactoring. That is to factor the program again, putting
  80 it together in a different way than before, while preserving the behavior of the
  81 program. An alternative definition could then be:
  82
  83 \definition{A refactoring is a transformation
  84 done to a program without altering its external behavior.}
  85
  86 From this we can conclude that a refactoring primarily changes how the
  87 \emph{code} of a program is perceived by the \emph{programmer}, and not the
  88 \emph{behavior} experienced by any user of the program. Although the logical
  89 meaning is preserved, such changes could potentially alter the program's
  90 behavior when it comes to performance gain or -penalties. So any logic depending
  91 on the performance of a program could make the program behave differently after
  92 a refactoring.
  93
  94 In the extreme case one could argue that such a thing as \emph{software
  95 obfuscation} is to refactor. If we where to define it as a refactoring, it could
  96 be defined as a composite refactoring \see{intro_composite}, consisting of, for
  97 instance, a series of rename refactorings. (But it could of course be much more
  98 complex, and the mechanics of it would not exactly be carved in stone.) To
  99 perform some serious obfuscation one would also take advantage of techniques not
 100 found among established refactorings, such as removing whitespace. This might
 101 not even generate a different syntax tree for languages not sensitive to
 102 whitespace, placing it in the gray area of what kind of transformations is to be
 103 considered refactorings.
 104
 105 Finally, to \emph{refactor} is (quoting Martin Fowler)
 106 \begin{quote}
 107   \ldots to restructure software by applying a series of refactorings without
 108   changing its observable behavior.~\cite{refactoring} % page 54, definition
 109 \end{quote}
 110
 111 \section{The etymology of 'refactoring'}
 112 It is a little difficult to pinpoint the exact origin of the word
 113 ``refactoring'', as it seems to have evolved as part of a colloquial
 114 terminology, more than a scientific term. There is no authoritative source for a
 115 formal definition of it.
 116
 117 According to Martin Fowler~\cite{etymology-refactoring}, there may also be more
 118 than one origin of the word. The most well-known source, when it comes to the
 119 origin of \emph{refactoring}, is the Smalltalk\footnote{\emph{Smalltalk},
 120 object-oriented, dynamically typed, reflective programming language.}\todo{find
 121 reference to Smalltalk website or similar?} community and their infamous
 122 \emph{Refactoring
 123 Browser}\footnote{\url{http://st-www.cs.illinois.edu/users/brant/Refactory/RefactoringBrowser.html}}
 124 described in the article \emph{A Refactoring Tool for
 125 Smalltalk}~\cite{refactoringBrowser1997}, published in 1997.
 126 Allegedly~\cite{etymology-refactoring}, the metaphor of factoring programs was
 127 also present in the Forth\footnote{\emph{Forth} -- stack-based, extensible
 128 programming language, without type-checking. See \url{http://www.forth.org}}
 129 community, and the word ``refactoring'' is mentioned in a book by Leo Brodie,
 130 called \emph{Thinking Forth}~\cite{brodie1984}, first published in
 131 1984\footnote{\emph{Thinking Forth} was first published in 1984 by the
 132 \emph{Forth Interest Group}.  Then it was reprinted in 1994 with minor
 133 typographical corrections, before it was transcribed into an electronic edition
 134 typeset in \LaTeX\ and published under a Creative Commons licence in 2004. The
 135 edition cited here is the 2004 edition, but the content should essentially be as
 136 in 1984.}. The exact word is only printed one place\footnote{p. 232}, but the
 137 term \emph{factoring} is prominent in the book, that also contains a whole
 138 chapter dedicated to (re)factoring, and how to keep the (Forth) code clean and
 139 maintainable.
 140 \begin{quote}
 141   \ldots good factoring technique is perhaps the most important skill for a
 142   Forth programmer.~\cite{brodie1984}
 143 \end{quote}
 144 Brodie also express what \emph{factoring} means to him:
 145 \begin{quote}
 146   Factoring means organizing code into useful fragments. To make a fragment
 147   useful, you often must separate reusable parts from non-reusable parts. The
 148   reusable parts become new definitions. The non-reusable parts become arguments
 149   or parameters to the definitions.~\cite{brodie1984}
 150 \end{quote}
 151
 152 Fowler claims that the usage of the word \emph{refactoring} did not pass between
 153 the \emph{Forth} and \emph{Smalltalk} communities, but that it emerged
 154 independently in each of the communities.
 155
 156 \todo{more history?}
 157
 158 \section{Motivation -- Why people refactor}
 159 To get a grasp of what refactoring is all about, we can try to answer this
 160 question: \emph{Why do people refactor?} Possible answers could include: ``To
 161 remove duplication'' or ``to break up long methods''.  Practitioners of the art
 162 of Design Patterns~\cite{dp} could say that they do it to introduce a
 163 long-needed pattern into their program's design.  So it is safe to say that
 164 peoples' intentions are to make their programs \emph{better} in some sense. But
 165 what aspects of the programs are becoming improved?
 166
 167 As already mentioned, people often refactor to get rid of duplication. Moving
 168 identical or similar code into methods, and maybe pushing those up or down in
 169 their class hierarchies. Making template methods for overlapping
 170 algorithms/functionality and so on.  It's all about gathering what belongs
 171 together and putting it all in one place.  And the result? The code is easier to
 172 maintain. When removing the implicit coupling between the code snippets, the
 173 location of a bug is limited to only one place, and new functionality need only
 174 to be added this one place, instead of a number of places people might not even
 175 remember.
 176
 177 The same people find out that their program contains a lot of long and
 178 hard-to-grasp methods. Then what do they do? They begin dividing their methods
 179 into smaller ones, using the \emph{Extract Method}
 180 refactoring~\cite{refactoring}.  Then they may discover something about their
 181 program that they weren't aware of before; revealing bugs they didn't know about
 182 or couldn't find due to the complex structure of their program. \todo{Proof?}
 183 Making the methods smaller and giving good names to the new ones clarifies the
 184 algorithms and enhances the \emph{understandability} of the program
 185 \see{magic_number_seven}. This makes simple refactoring an excellent method for
 186 exploring unknown program code, or code that you had forgotten that you wrote!
 187
 188 The word \emph{simple} came up in the last section. In fact, most basic
 189 refactorings are simple. The true power of them are revealed first when they are
 190 combined into larger --- higher level --- refactorings, called \emph{composite
 191 refactorings} \see{intro_composite}. Often the goal of such a series of
 192 refactorings is a design pattern. Thus the \emph{design} can be evolved
 193 throughout the lifetime of a program, opposed to designing up-front.  It's all
 194 about being structured and taking small steps to improve a program's design.
 195
 196 Many refactorings are aimed at lowering the coupling between different classes
 197 and different layers of logic. \todo{which refactorings?} Say for instance that
 198 the coupling between the user interface and the business logic of a program is
 199 lowered.  Then the business logic of the program could much easier be the target
 200 of automated tests, increasing the productivity in the software development
 201 process. It is also easier to distribute (e.g. between computers) the different
 202 components of a program if they are sufficiently decoupled.
 203
 204 Another effect of refactoring is that with the increased separation of concerns
 205 coming out of many refactorings, the \emph{performance} is improved.  When
 206 profiling programs, the problem parts are narrowed down to smaller parts of the
 207 code, which are easier to tune, and optimization can be performed only where
 208 needed and in a more effective way.
 209
 210 Last, but not least, and this should probably be the best reason to refactor, is
 211 to refactor to \emph{facilitate a program change}. If one has managed to keep
 212 one's code clean and tidy, and the code is not bloated with design patterns that
 213 is not ever going to be needed, then some refactoring might be needed to
 214 introduce a design pattern that is appropriate for the change that is going to
 215 happen.
 216
 217 Refactoring program code --- with a goal in mind --- can give the code itself
 218 more value. That is in the form of robustness to bugs, understandability and
 219 maintainability. With the first as an obvious advantage, but with the following
 220 two being also very important for software development. By incorporating
 221 refactoring in the development process, bugs are found faster, new functionality
 222 is added more easily and code is easier to understand by the next person exposed
 223 to it, which might as well be the person who wrote it. The consequence of this,
 224 is that refactoring can increase the average productivity of the development
 225 process, and thus also add to the monetary value of a business in the long run.
 226 Where this last point also should open the eyes of some nearsighted managers who
 227 seldom see beyond the next milestone.
 228
 229 \section{The magical number seven}\label{magic_number_seven}
 230 \emph{The magical number seven, plus or minus two: some limits on our capacity
 231 for processing information}~\cite{miller1956} is an article by George A. Miller
 232 that was published in the journal \emph{Psychological Review} in 1956. It
 233 presents evidence that support that the capacity of the number of objects a
 234 human being can hold in its working memory is roughly seven, plus or minus two
 235 objects. This number varies a bit depending on the nature and complexity of the
 236 objects, but is according to Miller ``\ldots never changing so much as to be
 237 unrecognizable.''
 238
 239 Miller's article culminates in the section called \emph{Recoding}, a term he
 240 borrows from communication theory. The central result in this section is that by
 241 recoding information, the capacity of the amount of information that a human can
 242 process at a time is increased. By \emph{recoding}, Miller means to group
 243 objects together in chunks and give each chunk a new name that it can be
 244 remembered by. By organizing objects into patterns of ever growing depth, one
 245 can memorize and process a much larger amount of data than if it were to be
 246 represented as its basic pieces. This grouping and renaming is analogous to how
 247 many refactorings work, by grouping pieces of code and give them a new name.
 248 Examples are the central \emph{Extract Method} and \emph{Extract Class}
 249 refactorings~\cite{refactoring}.
 250
 251 \begin{quote}
 252   \ldots recoding is an extremely powerful weapon for increasing the amount of
 253   information that we can deal with.~\cite{miller1956}
 254 \end{quote}
 255 An example from the article address the problem of memorizing a sequence of
 256 binary digits. Let us say we have the following sequence\footnote{The example
 257   presented here is slightly modified (and shortened) from what is presented in
 258   the original article~\cite{miller1956}, but it is essentially the same.} of
 259 16 binary digits: ``1010001001110011''. Most of us will have a hard time
 260 memorizing this sequence by only reading it once or twice. Imagine if we instead
 261 translate it to this sequence: ``A273''. If you have a background from computer
 262 science, it will be obvious that the latest sequence is the first sequence
 263 recoded to be represented by digits with base 16. Most people should be able to
 264 memorize this last sequence by only looking at it once.
 265
 266 Another result from the Miller article is that when the amount of information a
 267 human must interpret increases, it is crucial that the translation from one code
 268 to another must be almost automatic for the subject to be able to remember the
 269 translation, before he or she is presented with new information to recode. Thus
 270 learning and understanding how to best organize certain kinds of data is
 271 essential to efficiently handle that kind of data in the future. This is much
 272 like when children learn to read. First they must learn how to recognize
 273 letters. Then they can learn distinct words, and later read sequences of words
 274 that form whole sentences. Eventually, most of them will be able to read whole
 275 books and briefly retell the important parts of its content. This suggest that
 276 the use of design patterns~\cite{dp} is a good idea when reasoning about
 277 computer programs. With extensive use of design patterns when creating complex
 278 program structures, one does not always have to read whole classes of code to
 279 comprehend how they function, it may be sufficient to only see the name of a
 280 class to almost fully understand its responsibilities.
 281
 282 \begin{quote}
 283   Our language is tremendously useful for repackaging material into a few chunks
 284   rich in information.~\cite{miller1956}
 285 \end{quote}
 286 Without further evidence, these results at least indicates that refactoring
 287 source code into smaller units with higher cohesion and, when needed,
 288 introducing appropriate design patterns, should aid in the cause of creating
 289 computer programs that are easier to maintain and has code that is easier (and
 290 better) understood.
 291
 292 \section{Notable contributions to the refactoring literature}
 293 \todo{Update with more contributions}
 294 \begin{description}
 295   \item[1992] William F. Opdyke submits his doctoral dissertation called
 296     \emph{Refactoring Object-Oriented Frameworks}~\cite{opdyke1992}. This
 297     work defines a set of refactorings, that are behavior preserving given that
 298     their preconditions are met. The dissertation is focused on the automation
 299     of refactorings.
 300   \item[1999] Martin Fowler et al.: \emph{Refactoring: Improving the Design of
 301     Existing Code}~\cite{refactoring}. This is maybe the most influential text
 302     on refactoring. It bares similarities with Opdykes thesis~\cite{opdyke1992}
 303     in the way that it provides a catalog of refactorings. But Fowler's book is
 304     more about the craft of refactoring, as he focuses on establishing a
 305     vocabulary for refactoring, together with the mechanics of different
 306     refactorings and when to perform them. His methodology is also founded on
 307     the principles of test-driven development.
 308   \item[todo] \emph{Refactoring to Patterns}\todo{include}
 309 \end{description}
 310
 311 \section{Tool support}
 312 \todo{write, section vs. subsection}
 313
 314 \section{Relation to design patterns}
 315 \todo{write, section vs. subsection, refactoring to patterns?}
 316 \begin{comment}
 317
 318 \section{Classification of refactorings}
 319 % only interesting refactorings
 320 % with 2 detailed examples? One for structured and one for intra-method?
 321 % Is replacing Bubblesort with Quick Sort considered a refactoring?
 322
 323 \subsection{Structural refactorings}
 324
 325 \subsubsection{Basic refactorings}
 326
 327 % Composing Methods
 328 \explanation{Extract Method}{You have a code fragment that can be grouped
 329 together.}{Turn the fragment into a method whose name explains the purpose of
 330 the method.}
 331
 332 \explanation{Inline Method}{A method's body is just as clear as its name.}{Put
 333 the method's body into the body of its callers and remove the method.}
 334
 335 \explanation{Inline Temp}{You have a temp that is assigned to once with a simple
 336 expression, and the temp is getting in the way of other refactorings.}{Replace
 337 all references to that temp with the expression}
 338
 339 % Moving Features Between Objects
 340 \explanation{Move Method}{A method is, or will be, using or used by more
 341 features of another class than the class on which it is defined.}{Create a new
 342 method with a similar body in the class it uses most. Either turn the old method
 343 into a simple delegation, or remove it altogether.}
 344
 345 \explanation{Move Field}{A field is, or will be, used by another class more than
 346 the class on which it is defined}{Create a new field in the target class, and
 347 change all its users.}
 348
 349 % Organizing Data
 350 \explanation{Replace Magic Number with Symbolic Constant}{You have a literal
 351 number with a particular meaning.}{Create a constant, name it after the meaning,
 352 and replace the number with it.}
 353
 354 \explanation{Encapsulate Field}{There is a public field.}{Make it private and
 355 provide accessors.}
 356
 357 \explanation{Replace Type Code with Class}{A class has a numeric type code that
 358 does not affect its behavior.}{Replace the number with a new class.}
 359
 360 \explanation{Replace Type Code with Subclasses}{You have an immutable type code
 361 that affects the behavior of a class.}{Replace the type code with subclasses.}
 362
 363 \explanation{Replace Type Code with State/Strategy}{You have a type code that
 364 affects the behavior of a class, but you cannot use subclassing.}{Replace the
 365 type code with a state object.}
 366
 367 % Simplifying Conditional Expressions
 368 \explanation{Consolidate Duplicate Conditional Fragments}{The same fragment of
 369 code is in all branches of a conditional expression.}{Move it outside of the
 370 expression.}
 371
 372 \explanation{Remove Control Flag}{You have a variable that is acting as a
 373 control flag fro a series of boolean expressions.}{Use a break or return
 374 instead.}
 375
 376 \explanation{Replace Nested Conditional with Guard Clauses}{A method has
 377 conditional behavior that does not make clear the normal path of
 378 execution.}{Use guard clauses for all special cases.}
 379
 380 \explanation{Introduce Null Object}{You have repeated checks for a null
 381 value.}{Replace the null value with a null object.}
 382
 383 \explanation{Introduce Assertion}{A section of code assumes something about the
 384 state of the program.}{Make the assumption explicit with an assertion.}
 385
 386 % Making Method Calls Simpler
 387 \explanation{Rename Method}{The name of a method does not reveal its
 388 purpose.}{Change the name of the method}
 389
 390 \explanation{Add Parameter}{A method needs more information from its
 391 caller.}{Add a parameter for an object that can pass on this information.}
 392
 393 \explanation{Remove Parameter}{A parameter is no longer used by the method
 394 body.}{Remove it.}
 395
 396 %\explanation{Parameterize Method}{Several methods do similar things but with
 397 %different values contained in the method.}{Create one method that uses a
 398 %parameter for the different values.}
 399
 400 \explanation{Preserve Whole Object}{You are getting several values from an
 401 object and passing these values as parameters in a method call.}{Send the whole
 402 object instead.}
 403
 404 \explanation{Remove Setting Method}{A field should be set at creation time and
 405 never altered.}{Remove any setting method for that field.}
 406
 407 \explanation{Hide Method}{A method is not used by any other class.}{Make the
 408 method private.}
 409
 410 \explanation{Replace Constructor with Factory Method}{You want to do more than
 411 simple construction when you create an object}{Replace the constructor with a
 412 factory method.}
 413
 414 % Dealing with Generalization
 415 \explanation{Pull Up Field}{Two subclasses have the same field.}{Move the field
 416 to the superclass.}
 417
 418 \explanation{Pull Up Method}{You have methods with identical results on
 419 subclasses.}{Move them to the superclass.}
 420
 421 \explanation{Push Down Method}{Behavior on a superclass is relevant only for
 422 some of its subclasses.}{Move it to those subclasses.}
 423
 424 \explanation{Push Down Field}{A field is used only by some subclasses.}{Move the
 425 field to those subclasses}
 426
 427 \explanation{Extract Interface}{Several clients use the same subset of a class's
 428 interface, or two classes have part of their interfaces in common.}{Extract the
 429 subset into an interface.}
 430
 431 \explanation{Replace Inheritance with Delegation}{A subclass uses only part of a
 432 superclasses interface or does not want to inherit data.}{Create a field for the
 433 superclass, adjust methods to delegate to the superclass, and remove the
 434 subclassing.}
 435
 436 \explanation{Replace Delegation with Inheritance}{You're using delegation and
 437 are often writing many simple delegations for the entire interface}{Make the
 438 delegating class a subclass of the delegate.}
 439
 440 \subsubsection{Composite refactorings}
 441
 442 % Composing Methods
 443 % \explanation{Replace Method with Method Object}{}{}
 444
 445 % Moving Features Between Objects
 446 \explanation{Extract Class}{You have one class doing work that should be done by
 447 two}{Create a new class and move the relevant fields and methods from the old
 448 class into the new class.}
 449
 450 \explanation{Inline Class}{A class isn't doing very much.}{Move all its features
 451 into another class and delete it.}
 452
 453 \explanation{Hide Delegate}{A client is calling a delegate class of an
 454 object.}{Create Methods on the server to hide the delegate.}
 455
 456 \explanation{Remove Middle Man}{A class is doing to much simple delegation.}{Get
 457 the client to call the delegate directly.}
 458
 459 % Organizing Data
 460 \explanation{Replace Data Value with Object}{You have a data item that needs
 461 additional data or behavior.}{Turn the data item into an object.}
 462
 463 \explanation{Change Value to Reference}{You have a class with many equal
 464 instances that you want to replace with a single object.}{Turn the object into a
 465 reference object.}
 466
 467 \explanation{Encapsulate Collection}{A method returns a collection}{Make it
 468 return a read-only view and provide add/remove methods.}
 469
 470 % \explanation{Replace Array with Object}{}{}
 471
 472 \explanation{Replace Subclass with Fields}{You have subclasses that vary only in
 473 methods that return constant data.}{Change the methods to superclass fields and
 474 eliminate the subclasses.}
 475
 476 % Simplifying Conditional Expressions
 477 \explanation{Decompose Conditional}{You have a complicated conditional
 478 (if-then-else) statement.}{Extract methods from the condition, then part, an
 479 else part.}
 480
 481 \explanation{Consolidate Conditional Expression}{You have a sequence of
 482 conditional tests with the same result.}{Combine them into a single conditional
 483 expression and extract it.}
 484
 485 \explanation{Replace Conditional with Polymorphism}{You have a conditional that
 486 chooses different behavior depending on the type of an object.}{Move each leg
 487 of the conditional to an overriding method in a subclass. Make the original
 488 method abstract.}
 489
 490 % Making Method Calls Simpler
 491 \explanation{Replace Parameter with Method}{An object invokes a method, then
 492 passes the result as a parameter for a method. The receiver can also invoke this
 493 method.}{Remove the parameter and let the receiver invoke the method.}
 494
 495 \explanation{Introduce Parameter Object}{You have a group of parameters that
 496 naturally go together.}{Replace them with an object.}
 497
 498 % Dealing with Generalization
 499 \explanation{Extract Subclass}{A class has features that are used only in some
 500 instances.}{Create a subclass for that subset of features.}
 501
 502 \explanation{Extract Superclass}{You have two classes with similar
 503 features.}{Create a superclass and move the common features to the
 504 superclass.}
 505
 506 \explanation{Collapse Hierarchy}{A superclass and subclass are not very
 507 different.}{Merge them together.}
 508
 509 \explanation{Form Template Method}{You have two methods in subclasses that
 510 perform similar steps in the same order, yet the steps are different.}{Get the
 511 steps into methods with the same signature, so that the original methods become
 512 the same. Then you can pull them up.}
 513
 514
 515 \subsection{Functional refactorings}
 516
 517 \explanation{Substitute Algorithm}{You want to replace an algorithm with one
 518 that is clearer.}{Replace the body of the method with the new algorithm.}
 519
 520 \end{comment}
 521
 522 \section{The impact on software quality}
 523
 524 \subsection{What is meant by quality?}
 525 The term \emph{software quality} has many meanings. It all depends on the
 526 context we put it in. If we look at it with the eyes of a software developer, it
 527 usually mean that the software is easily maintainable and testable, or in other
 528 words, that it is \emph{well designed}. This often correlates with the
 529 management scale, where \emph{keeping the schedule} and \emph{customer
 530 satisfaction} is at the center. From the customers point of view, in addition to
 531 good usability, \emph{performance} and \emph{lack of bugs} is always
 532 appreciated, measurements that are also shared by the software developer. (In
 533 addition, such things as good documentation could be measured, but this is out
 534 of the scope of this document.)
 535
 536 \subsection{The impact on performance}
 537 \begin{quote}
 538   Refactoring certainly will make software go more slowly, but it also makes the
 539   software more amenable to performance tuning.~\cite{refactoring} % page 69
 540 \end{quote}
 541 There is a common belief that refactoring compromises performance, due to
 542 increased degree of indirection and that polymorphism is slower than
 543 conditionals.
 544
 545 In a survey, Demeyer~\cite{demeyer2002} disproves this view in the case of
 546 polymorphism. He is doing an experiment on, what he calls, ``Transform Self Type
 547 Checks'' where you introduce a new polymorphic method and a new class hierarchy
 548 to get rid of a class' type checking of a ``type attribute``. He uses this kind
 549 of transformation to represent other ways of replacing conditionals with
 550 polymorphism as well. The experiment is performed on the C++ programming
 551 language and with three different compilers and platforms. \todo{But is the
 552 result better?} Demeyer concludes that, with compiler optimization turned on,
 553 polymorphism beats middle to large sized if-statements and does as well as
 554 case-statements.  (In accordance with his hypothesis, due to similarities
 555 between the way C++ handles polymorphism and case-statements.)
 556 \begin{quote}
 557   The interesting thing about performance is that if you analyze most programs,
 558   you find that they waste most of their time in a small fraction of the code.
 559   ~\cite{refactoring}
 560 \end{quote}
 561 So, although an increased amount of method calls could potentially slow down
 562 programs, one should avoid premature optimization and sacrificing good design,
 563 leaving the performance tuning until after profiling\footnote{For and example of
 564   a Java profiler, check out VisualVM: \url{http://visualvm.java.net/}} the
 565   software and having isolated the actual problem areas.
 566
 567
 568
 569 \section{Correctness of refactorings}
 570 \todo{Volker's example?}
 571
 572 \section{Composite refactorings} \label{intro_composite}
 573 \todo{motivation, examples, manual vs automated?, what about refactoring in a
 574 very large code base?}
 575
 576 \section{Software metrics}
 577
 578
 579 %\part{The project}
 580 %\chapter{Planning the project}
 581 %\part{Conclusion}
 582 %\chapter{Results}
 583
 584
 585
 586 \chapter{\ldots}
 587 \todo{write}
 588 \section{The problem statement}
 589 \section{Choosing the language}
 590 \section{Choosing the tool}
 591
 592 \chapter{Refactorings in Eclipse JDT: Design, Shortcomings and Wishful
 593 Thinking}\label{ch:jdt_refactorings}
 594
 595 This chapter will deal with some of the design behind refactoring support in
 596 Eclipse, and the JDT in specific. After which it will follow a section about
 597 shortcomings of the refactoring API in terms of composition of refactorings. The
 598 chapter will be concluded with a section telling some of the ways the
 599 implementation of refactorings in the JDT could have worked to facilitate
 600 composition of refactorings.
 601
 602 \section{Design}
 603 The refactoring world of Eclipse can in general be separated into two parts: The
 604 language independent part and the part written for a specific programming
 605 language -- the language that is the target of the supported refactorings.
 606 \todo{What about the language specific part?}
 607
 608 \subsection{The Language Toolkit}
 609 The Language Toolkit, or LTK for short, is the framework that is used to
 610 implement refactorings in Eclipse. It is language independent and provides the
 611 abstractions of a refactoring and the change it generates, in the form of the
 612 classes \typewithref{org.eclipse.ltk.core.refactoring}{Refactoring} and
 613 \typewithref{org.eclipse.ltk.core.refactoring}{Change}. (There is also parts of
 614 the LTK that is concerned with user interaction, but they will not be discussed
 615 here, since they are of little value to us and our use of the framework.)
 616
 617 \subsubsection{The Refactoring Class}
 618 The abstract class \type{Refactoring} is the core of the LTK framework. Every
 619 refactoring that is going to be supported by the LTK have to end up creating an
 620 instance of one of its subclasses. The main responsibilities of subclasses of
 621 \type{Refactoring} is to implement template methods for condition checking
 622 (\methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{checkInitialConditions}
 623 and
 624 \methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{checkFinalConditions}),
 625 in addition to the
 626 \methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{createChange}
 627 method that creates and returns an instance of the \type{Change} class.
 628
 629 If the refactoring shall support that others participate in it when it is
 630 executed, the refactoring has to be a processor-based
 631 refactoring\typeref{org.eclipse.ltk.core.refactoring.participants.ProcessorBasedRefactoring}.
 632 It then delegates to its given
 633 \typewithref{org.eclipse.ltk.core.refactoring.participants}{RefactoringProcessor}
 634 for condition checking and change creation.
 635
 636 \subsubsection{The Change Class}
 637 This class is the base class for objects that is responsible for performing the
 638 actual workspace transformations in a refactoring. The main responsibilities for
 639 its subclasses is to implement the
 640 \methodwithref{org.eclipse.ltk.core.refactoring.Change}{perform} and
 641 \methodwithref{org.eclipse.ltk.core.refactoring.Change}{isValid} methods. The
 642 \method{isValid} method verifies that the change object is valid and thus can be
 643 executed by calling its \method{perform} method. The \method{perform} method
 644 performs the desired change and returns an undo change that can be executed to
 645 reverse the effect of the transformation done by its originating change object.
 646
 647 \subsubsection{Executing a Refactoring}\label{executing_refactoring}
 648 The life cycle of a refactoring generally follows two steps after creation:
 649 condition checking and change creation. By letting the refactoring object be
 650 handled by a
 651 \typewithref{org.eclipse.ltk.core.refactoring}{CheckConditionsOperation} that
 652 in turn is handled by a
 653 \typewithref{org.eclipse.ltk.core.refactoring}{CreateChangeOperation}, it is
 654 assured that the change creation process is managed in a proper manner.
 655
 656 The actual execution of a change object has to follow a detailed life cycle.
 657 This life cycle is honored if the \type{CreateChangeOperation} is handled by a
 658 \typewithref{org.eclipse.ltk.core.refactoring}{PerformChangeOperation}. If also
 659 an undo manager\typeref{org.eclipse.ltk.core.refactoring.IUndoManager} is set
 660 for the \type{PerformChangeOperation}, the undo change is added into the undo
 661 history.
 662
 663 \section{Shortcomings}
 664 This section is introduced naturally with a conclusion: The JDT refactoring
 665 implementation does not facilitate composition of refactorings.
 666 \todo{refine}This section will try to explain why, and also identify other
 667 shortcomings of both the usability and the readability of the JDT refactoring
 668 source code.
 669
 670 I will begin at the end and work my way toward the composition part of this
 671 section.
 672
 673 \subsection{Absence of Generics in Eclipse Source Code}
 674 This section is not only concerning the JDT refactoring API, but also large
 675 quantities of the Eclipse source code. The code shows a striking absence of the
 676 Java language feature of generics. It is hard to read a class' interface when
 677 methods return objects or takes parameters of raw types such as \type{List} or
 678 \type{Map}. This sometimes results in having to read a lot of source code to
 679 understand what is going on, instead of relying on the available interfaces. In
 680 addition, it results in a lot of ugly code, making the use of typecasting more
 681 of a rule than an exception.
 682
 683 \subsection{Composite Refactorings Will Not Appear as Atomic Actions}
 684
 685 \subsubsection{Missing Flexibility from JDT Refactorings}
 686 The JDT refactorings are not made with composition of refactorings in mind. When
 687 a JDT refactoring is executed, it assumes that all conditions for it to be
 688 applied successfully can be found by reading source files that has been
 689 persisted to disk. They can only operate on the actual source material, and not
 690 (in-memory) copies thereof. This constitutes a major disadvantage when trying to
 691 compose refactorings, since if an exception occur in the middle of a sequence of
 692 refactorings, it can leave the project in a state where the composite
 693 refactoring was executed only partly. It makes it hard to discard the changes
 694 done without monitoring and consulting the undo manager, an approach that is not
 695 bullet proof.
 696
 697 \subsubsection{Broken Undo History}
 698 When designing a composed refactoring that is to be performed as a sequence of
 699 refactorings, you would like it to appear as a single change to the workspace.
 700 This implies that you would also like to be able to undo all the changes done by
 701 the refactoring in a single step. This is not the way it appears when a sequence
 702 of JDT refactorings is executed. It leaves the undo history filled up with
 703 individual undo actions corresponding to every single JDT refactoring in the
 704 sequence. This problem is not trivial to handle in Eclipse. (See section
 705 \ref{hacking_undo_history}.)
 706
 707 \section{Wishful Thinking}
 708
 709
 710
 711 \chapter{Composite Refactorings in Eclipse}
 712
 713 \section{A Simple Ad Hoc Model}
 714 As pointed out in chapter \ref{ch:jdt_refactorings}, the Eclipse JDT refactoring
 715 model is not very well suited for making composite refactorings. Therefore a
 716 simple model using changer objects (of type \type{RefaktorChanger}) is used as
 717 an abstraction layer on top of the existing Eclipse refactorings.
 718
 719 \section{The Extract and Move Method Refactoring}
 720 %The Extract and Move Method Refactoring is implemented mainly using these
 721 %classes:
 722 %\begin{itemize}
 723 %  \item \type{ExtractAndMoveMethodChanger}
 724 %  \item \type{ExtractAndMoveMethodPrefixesExtractor}
 725 %  \item \type{Prefix}
 726 %  \item \type{PrefixSet}
 727 %\end{itemize}
 728
 729 \subsection{The Building Blocks}
 730 This is a composite refactoring, and hence is built up using several primitive
 731 refactorings. These basic building blocks are, as its name implies, the Extract
 732 Method Refactoring \cite{refactoring} and the Move Method Refactoring
 733 \cite{refactoring}. In Eclipse, the implementations of these refactorings are
 734 found in the classes
 735 \typewithref{org.eclipse.jdt.internal.corext.refactoring.code}{ExtractMethodRefactoring}
 736 and
 737 \typewithref{org.eclipse.jdt.internal.corext.refactoring.structure}{MoveInstanceMethodProcessor},
 738 where the last class is designed to be used together with the processor-based
 739 \typewithref{org.eclipse.ltk.core.refactoring.participants}{MoveRefactoring}.
 740
 741 \subsubsection{The ExtractMethodRefactoring Class}
 742 This class is quite simple in its use. The only parameters it requires for
 743 construction is a compilation
 744 unit\typeref{org.eclipse.jdt.core.ICompilationUnit}, the offset into the source
 745 code where the extraction shall start, and the length of the source to be
 746 extracted. Then you have to set the method name for the new method together with
 747 which access modifier that shall be used and some not so interesting parameters.
 748
 749 \subsubsection{The MoveInstanceMethodProcessor Class}
 750 For the Move Method the processor requires a little more advanced input than
 751 the class for the Extract Method. For construction it requires a method
 752 handle\typeref{org.eclipse.jdt.core.IMethod} from the Java Model for the method
 753 that is to be moved. Then the target for the move have to be supplied as the
 754 variable binding from a chosen variable declaration. In addition to this, one
 755 have to set some parameters regarding setters/getters and delegation.
 756
 757 To make a whole refactoring from the processor, one have to construct a
 758 \type{MoveRefactoring} from it.
 759
 760 \subsection{The ExtractAndMoveMethodChanger Class}
 761 The \typewithref{no.uio.ifi.refaktor.changers}{ExtractAndMoveMethodChanger}
 762 class, that is a subclass of the class
 763 \typewithref{no.uio.ifi.refaktor.changers}{RefaktorChanger}, is the class
 764 responsible for composing the \type{ExtractMethodRefactoring} and the
 765 \type{MoveRefactoring}. Its constructor takes a project
 766 handle\typeref{org.eclipse.core.resources.IProject}, the method name for the new
 767 method and a \typewithref{no.uio.ifi.refaktor.utils}{SmartTextSelection}.
 768
 769 A \type{SmartTextSelection} is basically a text
 770 selection\typeref{org.eclipse.jface.text.ITextSelection} object that enforces
 771 the providing of the underlying document during creation. I.e. its
 772 \methodwithref{no.uio.ifi.refaktor.utils.SmartTextSelection}{getDocument} method
 773 will never return \type{null}.
 774
 775 Before extracting the new method, the possible targets for the move operation is
 776 found with the help of an
 777 \typewithref{no.uio.ifi.refaktor.extractors}{ExtractAndMoveMethodPrefixesExtractor}.
 778 The possible targets is computed from the prefixes that the extractor returns
 779 from its
 780 \methodwithref{no.uio.ifi.refaktor.extractors.ExtractAndMoveMethodPrefixesExtractor}{getSafePrefixes}
 781 method. The changer then choose the most suitable target by finding the most
 782 frequent occurring prefix among the safe ones. The target is the type of the
 783 first part of the prefix.
 784
 785 After finding a suitable target, the \type{ExtractAndMoveMethodChanger} first
 786 creates an \type{ExtractMethodRefactoring} and performs it as explained in
 787 section \ref{executing_refactoring} about the execution of refactorings. Then it
 788 creates and performs the \type{MoveRefactoring} in the same way, based on the
 789 changes done by the Extract Method refactoring.
 790
 791 \subsection{The ExtractAndMoveMethodPrefixesExtractor Class}
 792 This extractor extracts properties needed for building the Extract and Move
 793 Method refactoring. It searches through the given selection to find safe
 794 prefixes, and those prefixes form a base that can be used to compute possible
 795 targets for the move part of the refactoring.  It finds both the candidates, in
 796 the form of prefixes, and the non-candidates, called unfixes. All prefixes (and
 797 unfixes) are represented by a
 798 \typewithref{no.uio.ifi.refaktor.extractors}{Prefix}, and they are collected
 799 into prefix sets.\typeref{no.uio.ifi.refaktor.extractors.PrefixSet}.
 800
 801 The prefixes and unfixes are found by property
 802 collectors\typeref{no.uio.ifi.refaktor.extractors.collectors.PropertyCollector}.
 803 A property collector follows the visitor pattern \cite{dp} and is of the
 804 \typewithref{org.eclipse.jdt.core.dom}{ASTVisitor} type.  An \type{ASTVisitor}
 805 visits nodes in an abstract syntax tree that forms the Java document object
 806 model. The tree consists of nodes of type
 807 \typewithref{org.eclipse.jdt.core.do}{ASTNode}.
 808
 809 \subsubsection{The PrefixesCollector}
 810 The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{PrefixesCollector}
 811 is of type \type{PropertyCollector}. It visits expression
 812 statements\typeref{org.eclipse.jdt.core.dom.ExpressionStatement} and creates
 813 prefixes from its expressions in the case of method invocations. The prefixes
 814 found is registered with a prefix set, together with all its sub-prefixes.
 815 \todo{Rewrite in the case of changes to the way prefixes are found}
 816
 817 \subsubsection{The UnfixesCollector}
 818 The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{UnfixesCollector}
 819 finds unfixes within the selection. An unfix is a name that is assigned to
 820 within the selection. The reason that this cannot be allowed, is that the result
 821 would be an assignment to the \type{this} keyword, which is not valid in Java.
 822
 823 \subsubsection{Computing Safe Prefixes}
 824 A safe prefix is a prefix that does not enclose an unfix. A prefix is enclosing
 825 an unfix if the unfix is in the set of its sub-prefixes. As an example,
 826 \texttt{``a.b''} is enclosing \texttt{``a''}, as is \texttt{``a''}. The safe
 827 prefixes is unified in a \type{PrefixSet} and can be fetched calling the
 828 \method{getSafePrefixes} method of the
 829 \type{ExtractAndMoveMethodPrefixesExtractor}.
 830
 831 \subsection{The Prefix Class}
 832 \todo{?}
 833 \subsection{The PrefixSet Class}
 834
 835 \subsection{Hacking the Refactoring Undo
 836 History}\label{hacking_undo_history}
 837 \todo{Where to put this section?}
 838
 839 As an attempt to make multiple subsequent changes to the workspace appear as a
 840 single action (i.e. make the undo changes appear as such), I tried to alter
 841 the undo changes\typeref{org.eclipse.ltk.core.refactoring.Change} in the history
 842 of the refactorings.
 843
 844 My first impulse was to remove the, in this case, last two undo changes from the
 845 undo manager\typeref{org.eclipse.ltk.core.refactoring.IUndoManager} for the
 846 Eclipse refactorings, and then add them to a composite
 847 change\typeref{org.eclipse.ltk.core.refactoring.CompositeChange} that could be
 848 added back to the manager. The interface of the undo manager does not offer a
 849 way to remove/pop the last added undo change, so a possible solution could be to
 850 decorate \cite{dp} the undo manager, to intercept and collect the undo changes
 851 before delegating to the \method{addUndo}
 852 method\methodref{org.eclipse.ltk.core.refactoring.IUndoManager}{addUndo} of the
 853 manager. Instead of giving it the intended undo change, a null change could be
 854 given to prevent it from making any changes if run. Then one could let the
 855 collected undo changes form a composite change to be added to the manager.
 856
 857 There is a technical challenge with this approach, and it relates to the undo
 858 manager, and the concrete implementation
 859 UndoManager2\typeref{org.eclipse.ltk.internal.core.refactoring.UndoManager2}.
 860 This implementation is designed in a way that it is not possible to just add an
 861 undo change, you have to do it in the context of an active
 862 operation\typeref{org.eclipse.core.commands.operations.TriggeredOperations}.
 863 One could imagine that it might be possible to trick the undo manager into
 864 believing that you are doing a real change, by executing a refactoring that is
 865 returning a kind of null change that is returning our composite change of undo
 866 refactorings when it is performed.
 867
 868 Apart from the technical problems with this solution, there is a functional
 869 problem: If it all had worked out as planned, this would leave the undo history
 870 in a dirty state, with multiple empty undo operations corresponding to each of
 871 the sequentially executed refactoring operations, followed by a composite undo
 872 change corresponding to an empty change of the workspace for rounding of our
 873 composite refactoring. The solution to this particular problem could be to
 874 intercept the registration of the intermediate changes in the undo manager, and
 875 only register the last empty change.
 876
 877 Unfortunately, not everything works as desired with this solution. The grouping
 878 of the undo changes into the composite change does not make the undo operation
 879 appear as an atomic operation. The undo operation is still split up into
 880 separate undo actions, corresponding to the change done by its originating
 881 refactoring. And in addition, the undo actions has to be performed separate in
 882 all the editors involved. This makes it no solution at all, but a step toward
 883 something worse.
 884
 885 There might be a solution to this problem, but it remains to be found. The
 886 design of the refactoring undo management is partly to be blamed for this, as it
 887 it is to complex to be easily manipulated.
 888
 889
 890 \backmatter{}
 891 \printbibliography
 892 \listoftodos
 893 \end{document}