Thesis: adding sentence + citation about advices

[ifi-stolz-refaktor.git] / thesis / master-thesis-erlenkr.tex
diff --git a/thesis/master-thesis-erlenkr.tex b/thesis/master-thesis-erlenkr.tex

index 0cf2808719403f3a7775cdddfdbc45bfc5c53337..eef744548d3331c2018697dcb21039035dc937b5 100644 (file)
--- a/thesis/master-thesis-erlenkr.tex
+++ b/thesis/master-thesis-erlenkr.tex
@@ -12,7 +12,7 @@
  \usepackage{babel,textcomp,csquotes,ifimasterforside,varioref}
  \usepackage[hidelinks]{hyperref}
  \usepackage{cleveref}
-\usepackage[style=numeric-comp,backend=bibtex]{biblatex}
+\usepackage[style=alphabetic,backend=bibtex]{biblatex}
  \usepackage{amsthm}
  \usepackage{graphicx}
  % use 'disable' before printing:
@@ -212,20 +212,20 @@ Allegedly\citing{etymology-refactoring}, the metaphor of factoring programs was
  also present in the Forth\footnote{\emph{Forth} -- stack-based, extensible 
  programming language, without type-checking. See \url{http://www.forth.org}} 
  community, and the word ``refactoring'' is mentioned in a book by Leo Brodie, 
-called \emph{Thinking Forth}\citing{brodie1984}, first published in 
+called \emph{Thinking Forth}\citing{brodie2004}, first published in 
  1984\footnote{\emph{Thinking Forth} was first published in 1984 by the 
  \emph{Forth Interest Group}.  Then it was reprinted in 1994 with minor 
  typographical corrections, before it was transcribed into an electronic edition 
  typeset in \LaTeX\ and published under a Creative Commons licence in 2004. The 
  edition cited here is the 2004 edition, but the content should essentially be as 
-in 1984.}. The exact word is only printed one place~\cite[p.~232]{brodie1984}, 
+in 1984.}. The exact word is only printed one place~\cite[p.~232]{brodie2004}, 
  but the term \emph{factoring} is prominent in the book, that also contains a 
  whole chapter dedicated to (re)factoring, and how to keep the (Forth) code clean 
  and maintainable.
  
  \begin{quote}
    \ldots good factoring technique is perhaps the most important skill for a 
-  Forth programmer.~\cite[p.~172]{brodie1984}
+  Forth programmer.~\cite[p.~172]{brodie2004}
  \end{quote}
  
  \noindent Brodie also express what \emph{factoring} means to him:
@@ -234,7 +234,7 @@ and maintainable.
    Factoring means organizing code into useful fragments. To make a fragment 
    useful, you often must separate reusable parts from non-reusable parts. The  
    reusable parts become new definitions. The non-reusable parts become arguments 
-  or parameters to the definitions.~\cite[p.~172]{brodie1984}
+  or parameters to the definitions.~\cite[p.~172]{brodie2004}
  \end{quote}
  
  Fowler claims that the usage of the word \emph{refactoring} did not pass between 
@@ -892,8 +892,8 @@ examples of refactorings in existing tools that break programs. I will now
  present an example of an \ExtractMethod refactoring followed by a \MoveMethod 
  refactoring that breaks a program in both the \emph{Eclipse} and \emph{IntelliJ} 
  IDEs\footnote{The NetBeans IDE handles this particular situation without 
-  altering ther program's beavior, mainly because its Move Method refactoring 
-  implementation is a bit rancid in other ways \see{toolSupport}.}. The 
+  altering the program's beavior, mainly because its Move Method refactoring 
+  implementation is a bit flawed in other ways \see{toolSupport}.}. The 
    following piece of code shows the target for the composed refactoring:
  
  \begin{minted}[linenos,samepage]{java}
@@ -1011,8 +1011,10 @@ Unit tests, as they are known from the different xUnit frameworks around, are
  only suitable to test the \emph{result} of isolated operations. They can not 
  easily (if at all) observe the \emph{history} of a program.
  
+This problem is still open.
  
-\todoin{Write \ldots}
+\todoin{Write?}
+\begin{comment}
  
  Assuming a sequential (non-concurrent) program:
  
@@ -1045,16 +1047,17 @@ tracematch (C c, X x) {
  %  { assert x1 != x2; }
  %}
  %\end{minted}
+\end{comment}
  
  \section{The project}
-The aim of this project will be to investigate the relationship between a 
+The aim of this master project will be to investigate the relationship between a 
  composite refactoring composed of the \ExtractMethod and \MoveMethod 
  refactorings, and its impact on one or more software metrics.
  
-The composition of \ExtractMethod and \MoveMethod springs naturally out of the 
-need to move procedures closer to the data they manipulate. This composed 
-refactoring is not well described in the literature, but it is implemented in at 
-least one tool called 
+The composition of the \ExtractMethod and \MoveMethod refactorings springs 
+naturally out of the need to move procedures closer to the data they manipulate.  
+This composed refactoring is not well described in the literature, but it is 
+implemented in at least one tool called 
  \emph{CodeRush}\footnote{\url{https://help.devexpress.com/\#CodeRush/CustomDocument3519}}, 
  that is an extension for \emph{MS Visual 
  Studio}\footnote{\url{http://www.visualstudio.com/}}. In CodeRush it is called 
@@ -1075,8 +1078,6 @@ before and after the execution. To be able to execute the refactoring
  automatically I have to make it analyze code to determine the best selections to 
  extract into new methods.
  
-\section{Software metrics}
-\todoin{Is this the appropriate place to have this section?}
  
  %\part{The project}
  %\chapter{Planning the project}
@@ -1085,22 +1086,29 @@ extract into new methods.
  
  
  
-\chapter{\ldots}
-\todoin{write}
+\chapter{The Project}
+
  \section{The problem statement}
+\todoin{write/move}
+
  \section{Choosing the target language}
-Choosing which programming language to use as the target for manipulation is not 
-a very difficult task. The language has to be an object-oriented programming 
-language, and it must have existing tool support for refactoring. The 
-\emph{Java} programming language\footnote{\url{https://www.java.com/}} is the 
-dominating language when it comes to examples in the literature of refactoring, 
-and is thus a natural choice. Java is perhaps, currently the most influential 
-programming language in the world, with its \emph{Java Virtual Machine} that 
-runs on all of the most popular architectures and also supports\footnote{They 
-compile to java bytecode.} dozens of other programming languages, with 
-\emph{Scala}, \emph{Clojure} and \emph{Groovy} as the most prominent ones. Java 
-is currently the language that every other programming language is compared 
-against. It is also the primary language of the author of this thesis.
+Choosing which programming language the code that shall be manipulated shall be 
+written in, is not a very difficult task. We choose to limit the possible 
+languages to the object-oriented programming languages, since most of the 
+terminology and literature regarding refactoring comes from the world of 
+object-oriented programming. In addition, the language must have existing tool 
+support for refactoring.
+
+The \emph{Java} programming language\footnote{\url{https://www.java.com/}} is 
+the dominating language when it comes to example code in the literature of 
+refactoring, and is thus a natural choice. Java is perhaps, currently the most 
+influential programming language in the world, with its \emph{Java Virtual 
+Machine} that runs on all of the most popular architectures and also supports 
+dozens of other programming languages\footnote{They compile to java bytecode.}, 
+with \emph{Scala}, \emph{Clojure} and \emph{Groovy} as the most prominent ones.  
+Java is currently the language that every other programming language is compared 
+against. It is also the primary programming language for the author of this 
+thesis.
  
  \section{Choosing the tools}
  When choosing a tool for manipulating Java, there are certain criterias that 
@@ -1133,6 +1141,82 @@ stand out as a favorite, and that is the \emph{Eclipse IDE}. This is the most
  popular\citing{javaReport2011} among them and seems to be de facto standard IDE 
  for Java development regardless of platform.
  
+\section{Organizing the project}
+All the parts of this master project is under version control with 
+\emph{Git}\footnote{\url{http://git-scm.com/}}.
+
+The software written is organized as some Eclipse plugins. Writing a plugin is 
+the natural way to utilize the API of Eclipse. This also makes it possible to 
+provide a user interface to manually run operations on selections in program 
+source code or whole projects/packages.
+
+When writing a plugin in Eclipse, one has access to resources such as the 
+current workspace, the open editor and the current selection.
+
+\section{Continuous integration}
+The continuous integration server 
+\emph{Jenkins}\footnote{\url{http://jenkins-ci.org/}} has been set up for the 
+project\footnote{A work mostly done by the supervisor.}. It is used as a way to 
+run tests and perform code coverage analysis. 
+
+To be able to build the Eclipse plugins and run tests for them with Jenkins, the 
+component assembly project 
+\emph{Buckminster}\footnote{\url{http://www.eclipse.org/buckminster/}} is used, 
+through its plugin for Jenkins. Buckminster provides for a way to specify the 
+resources needed for building a project and where and how to find them.  
+Buckminster also handles the setup of a target environment to run the tests in.  
+All this is needed because the code to build depends on an Eclipse installation 
+with various plugins.
+
+\subsection{Problems with AspectJ}
+The Buckminster build worked fine until introducing AspectJ into the project.  
+When building projects using AspectJ, there are some additional steps that needs 
+to be performed. First of all, the aspects themselves must be compiled. Then the 
+aspects needs to be woven with the classes they affect. This demands a process 
+that does multiple passes over the source code.
+
+When using AspectJ with Eclipse, the specialized compilation and the weaving can 
+be handled by the \emph{AspectJ Development 
+Tools}\footnote{\url{https://www.eclipse.org/ajdt/}}. This works all fine, but 
+it complicates things when trying to build a project depending on Eclipse 
+plugins outside of Eclipse. There is supposed to be a way to specify a compiler 
+adapter for javac, together with the file extensions for the file types it shall 
+operate. The AspectJ compiler adapter is called 
+\typewithref{org.aspectj.tools.ant.taskdefs}{Ajc11CompilerAdapter}, and it works 
+with files that has the extensions \code{*.java} and \code{*.aj}. I tried to 
+setup this in the build properties file for the project containing the aspects, 
+but to no avail. The project containing the aspects does not seem to be built at 
+all, and the projects that depends on it complains that they cannot find certain 
+classes.
+
+I then managed to write an \emph{Ant}\footnote{\url{https://ant.apache.org/}} 
+build file that utilizes the AspectJ compiler adapter, for the 
+\code{no.uio.ifi.refaktor} plugin. The problem was then that it could no longer 
+take advantage of the environment set up by Buckminster. The solution to this 
+particular problem was of a ``hacky'' nature. It involves exporting the plugin 
+dependencies for the project to an Ant build file, and copy the exported path 
+into the existing build script. But then the Ant script needs to know where the 
+local Eclipse installation is located. This is no problem when building on a 
+local machine, but to utilize the setup done by Buckminster is a problem still 
+unsolved. To get the classpath for the build setup correctly, and here comes the 
+most ``hacky'' part of the solution, the Ant script has a target for copying the 
+classpath elements into a directory relative to the project directory and 
+checking it into Git. When no \code{ECLIPSE\_HOME} property is set while running 
+Ant, the script uses the copied plugins instead of the ones provided by the 
+Eclipse installation when building the project. This obviously creates some 
+problems with maintaining the list of dependencies in the Ant file, as well as 
+remembering to copy the plugins every time the list of dependencies change.
+
+The Ant script described above is run by Jenkins before the Buckminster setup 
+and build. When setup like this, the Buckminster build succeeds for the projects 
+not using AspectJ, and the tests are run as normal. This is all good, but it 
+feels a little scary, since the reason for Buckminster not working with AspectJ 
+is still unknown.
+
+The problems with building with AspectJ on the Jenkins server lasted for a 
+while, before they were solved. This is reflected in the ``Test Result Trend'' 
+and ``Code Coverage Trend'' reported by Jenkins.
+
  
  \chapter{Refactorings in Eclipse JDT: Design, Shortcomings and Wishful 
  Thinking}\label{ch:jdt_refactorings}
@@ -1151,13 +1235,20 @@ language -- the language that is the target of the supported refactorings.
  \todo{What about the language specific part?}
  
  \subsection{The Language Toolkit}
-The Language Toolkit, or LTK for short, is the framework that is used to 
-implement refactorings in Eclipse. It is language independent and provides the 
-abstractions of a refactoring and the change it generates, in the form of the 
-classes \typewithref{org.eclipse.ltk.core.refactoring}{Refactoring} and 
-\typewithref{org.eclipse.ltk.core.refactoring}{Change}. (There is also parts of 
-the LTK that is concerned with user interaction, but they will not be discussed 
-here, since they are of little value to us and our use of the framework.)
+The Language Toolkit\footnote{The content of this section is a mixture of 
+  written material from 
+  \url{https://www.eclipse.org/articles/Article-LTK/ltk.html} and 
+  \url{http://www.eclipse.org/articles/article.php?file=Article-Unleashing-the-Power-of-Refactoring/index.html}, 
+the LTK source code and my own memory.}, or LTK for short, is the framework that 
+is used to implement refactorings in Eclipse.  It is language independent and 
+provides the abstractions of a refactoring and the change it generates, in the 
+form of the classes \typewithref{org.eclipse.ltk.core.refactoring}{Refactoring} 
+and \typewithref{org.eclipse.ltk.core.refactoring}{Change}.
+
+There are also parts of the LTK that is concerned with user interaction, but 
+they will not be discussed here, since they are of little value to us and our 
+use of the framework. We are primarily interested in the parts that can be 
+automated.
  
  \subsubsection{The Refactoring Class}
  The abstract class \type{Refactoring} is the core of the LTK framework. Every 
@@ -1176,7 +1267,11 @@ executed, the refactoring has to be a processor-based
  refactoring\typeref{org.eclipse.ltk.core.refactoring.participants.ProcessorBasedRefactoring}.  
  It then delegates to its given 
  \typewithref{org.eclipse.ltk.core.refactoring.participants}{RefactoringProcessor} 
-for condition checking and change creation.
+for condition checking and change creation. Participating in a refactoring can 
+be useful in cases where the changes done to programming source code affects 
+other related resources in the workspace. This can be names or paths in 
+configuration files, or maybe one would like to perform additional logging of 
+changes done in the workspace.
  
  \subsubsection{The Change Class}
  This class is the base class for objects that is responsible for performing the 
@@ -1230,12 +1325,12 @@ of a rule than an exception.
  \subsubsection{Missing Flexibility from JDT Refactorings}
  The JDT refactorings are not made with composition of refactorings in mind. When 
  a JDT refactoring is executed, it assumes that all conditions for it to be 
-applied successfully can be found by reading source files that has been 
+applied successfully can be found by reading source files that have been 
  persisted to disk. They can only operate on the actual source material, and not 
  (in-memory) copies thereof. This constitutes a major disadvantage when trying to 
-compose refactorings, since if an exception occur in the middle of a sequence of 
-refactorings, it can leave the project in a state where the composite 
-refactoring was executed only partly. It makes it hard to discard the changes 
+compose refactorings, since if an exception occurs in the middle of a sequence 
+of refactorings, it can leave the project in a state where the composite 
+refactoring was only partially executed. It makes it hard to discard the changes 
  done without monitoring and consulting the undo manager, an approach that is not 
  bullet proof.
  
@@ -1321,7 +1416,7 @@ parameters regarding setters/getters, as well as delegation.
  To make a working refactoring from the processor, one have to create a 
  \type{MoveRefactoring} with it.
  
-\subsection{The ExtractAndMoveMethodChanger Class}
+\subsection{The ExtractAndMoveMethodChanger}
  
  The \typewithref{no.uio.ifi.refaktor.changers}{ExtractAndMoveMethodChanger} 
  class is a subclass of the class 
@@ -1344,19 +1439,21 @@ The analysis and precondition checking is done by the
  First is check whether the selection is a valid selection or not, with respect 
  to statement boundaries and that it actually contains any selections. Then it 
  checks the legality of both extracting the selection and also moving it to 
-another class. If the selection is approved as legal, it is analyzed to find the 
-presumably best target to move the extracted method to.
+another class. This checking of is performed by a range of checkers 
+\see{checkers}.  If the selection is approved as legal, it is analyzed to find 
+the presumably best target to move the extracted method to.
  
  For finding the best suitable target the analyzer is using a 
  \typewithref{no.uio.ifi.refaktor.analyze.collectors}{PrefixesCollector} that 
  collects all the possible candidates for the refactoring. All the non-candidates 
  is found by an 
  \typewithref{no.uio.ifi.refaktor.analyze.collectors}{UnfixesCollector} that 
-collects all the targets that will give some kind of error if used. All prefixes 
-(and unfixes) are represented by a 
+collects all the targets that will give some kind of error if used.  (For 
+details about the property collectors, se \myref{propertyCollectors}.) All 
+prefixes (and unfixes) are represented by a 
  \typewithref{no.uio.ifi.refaktor.extractors}{Prefix}, and they are collected 
  into sets of prefixes. The safe prefixes is found by subtracting from the set of 
-candidate prefixes the prefixes that is enclosing any of the unfixes. A prefix 
+candidate prefixes the prefixes that is enclosing any of the unfixes.  A prefix 
  is enclosing an unfix if the unfix is in the set of its sub-prefixes.  As an 
  example, \texttt{``a.b''} is enclosing \texttt{``a''}, as is \texttt{``a''}. The 
  safe prefixes is unified in a \type{PrefixSet}. If a prefix has only one 
@@ -1365,6 +1462,10 @@ target. This occurs in statements such as \texttt{``a.foo()''}. For such
  statements it bares no meaning to extract and move them. It only generates an 
  extra method and the calling of it. 
  
+The most suitable target for the refactoring is found by finding the prefix with 
+the most occurrences. If two prefixes have the same occurrence count, but they 
+differ in length, the longest of them is chosen.
+
  \todoin{Clean up sections/subsections.}
  
  \subsubsection{The \type{ExtractAndMoveMethodExecutor}}
@@ -1424,124 +1525,13 @@ old binding from the parameter information object, as well as the same name that
  is provided by the parameter information object.
  
  
-\subsection{Finding the IMethod}\label{postExtractExecution}
-\todoin{Rename section. Write.}
-
-\subsection{Property collectors}
-The prefixes and unfixes are found by property 
-collectors\typeref{no.uio.ifi.refaktor.extractors.collectors.PropertyCollector}.  
-A property collector follows the visitor pattern\citing{designPatterns} and is 
-of the \typewithref{org.eclipse.jdt.core.dom}{ASTVisitor} type.  An 
-\type{ASTVisitor} visits nodes in an abstract syntax tree that forms the Java 
-document object model. The tree consists of nodes of type 
-\typewithref{org.eclipse.jdt.core.do}{ASTNode}.
-
-\subsubsection{The PrefixesCollector}
-The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{PrefixesCollector} 
-finds prefixes that makes up tha basis for calculating move targets for the 
-Extract and Move Method refactoring. It visits expression 
-statements\typeref{org.eclipse.jdt.core.dom.ExpressionStatement} and creates 
-prefixes from its expressions in the case of method invocations. The prefixes 
-found is registered with a prefix set, together with all its sub-prefixes.
-\todo{Rewrite in the case of changes to the way prefixes are found}
-
-\subsubsection{The UnfixesCollector}\label{unfixes}
-The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{UnfixesCollector} 
-finds unfixes within a selection. That is prefixes that cannot be used as a 
-basis for finding a move target in a refactoring.
-
-An unfix can be a name that is assigned to within a selection. The reason that 
-this cannot be allowed, is that the result would be an assignment to the 
-\type{this} keyword, which is not valid in Java \see{eclipse_bug_420726}.
-
-Prefixes that originates from variable declarations within the same selection 
-are also considered unfixes. This is because when a method is moved, it needs to 
-be called through a variable. If this variable is also within the method that is 
-to be moved, this obviously cannot be done.
-
-Also considered as unfixes are variable references that are of types that is not 
-suitable for moving a methods to. This can be either because it is not 
-physically possible to move the method to the desired class or that it will 
-cause compilation errors by doing so.
-
-If the type binding for a name is not resolved it is considered and unfix. The 
-same applies to types that is only found in compiled code, so they have no 
-underlying source that is accessible to us. (E.g. the \type{java.lang.String} 
-class.)
-
-Interfaces types are not suitable as targets. This is simply because interfaces 
-in java cannot contain methods with bodies. (This thesis does not deal with 
-features of Java versions later than Java 7. Java 8 has interfaces with default 
-implementations of methods.) Neither are local types allowed. This accounts for 
-both local and anonymous classes. Anonymous classes are effectively the same as 
-interface types with respect to unfixes. Local classes could in theory be used 
-as targets, but this is not possible due to limitations of the implementation of 
-the Extract and Move Method refactoring. The problem is that the refactoring is 
-done in two steps, so the intermediate state between the two refactorings would 
-not be legal Java code. In the case of local classes, the problem is that, in 
-the intermediate step, a selection referencing a local class would need to take 
-the local class as a parameter if it were to be extracted to a new method. This 
-new method would need to live in the scope of the declaring class of the 
-originating method. The local class would then not be in the scope of the 
-extracted method, thus bringing the source code into an illegal state. One could 
-imagine that the method was extracted and moved in one operation, without an 
-intermediate state. Then it would make sense to include variables with types of 
-local classes in the set of legal targets, since the local classes would then be 
-in the scopes of the method calls. If this makes any difference for software 
-metrics that measure coupling would be a different discussion.
-
-\begin{listing}
-\begin{multicols}{2}
-\begin{minted}[]{java}
-// Before
-void declaresLocalClass() {
-  class LocalClass {
-    void foo() {}
-    void bar() {}
-  }
-
-  LocalClass inst =
-    new LocalClass();
-  inst.foo();
-  inst.bar();
-}
-\end{minted}
-
-\columnbreak
-
-\begin{minted}[]{java}
-// After Extract Method
-void declaresLocalClass() {
-  class LocalClass {
-    void foo() {}
-    void bar() {}
-  }
-
-  LocalClass inst =
-    new LocalClass();
-  fooBar(inst);
-}
+\subsection{The 
+SearchBasedExtractAndMoveMethodChanger}\label{searchBasedExtractAndMoveMethodChanger}
+\todoin{Write\ldots}
  
-// Intermediate step
-void fooBar(LocalClass inst) {
-  inst.foo();
-  inst.bar();
-}
-\end{minted}
-\end{multicols}
-\caption{When Extract and Move Method tries to use a variable with a local type 
-as the move target, an intermediate step is taken that is not allowed. Here: 
-\type{LocalClass} is not in the scope of \method{fooBar} in its intermediate 
-location.}
-\label{lst:extractMethod_LocalClass}
-\end{listing}
+\subsection{Finding the IMethod}\label{postExtractExecution}
+\todoin{Rename section. Write??}
  
-The last class of names that are considered unfixes is names used in null tests.  
-These are tests that reads like this: if \texttt{<name>} equals \var{null} then 
-do something. If allowing variables used in those kinds of expressions as 
-targets for moving methods, we would end up with code containing boolean 
-expressions like \texttt{this == null}, which would not be meaningful, since 
-\var{this} would never be \var{null}.
  
  \subsection{The Prefix Class}
  This class exists mainly for holding data about a prefix, such as the expression 
@@ -1632,7 +1622,7 @@ it is to complex to be easily manipulated.
  
  \chapter{Analyzing Source Code in Eclipse}
  
-\section{The Java model}
+\section{The Java model}\label{javaModel}
  The Java model of Eclipse is its internal representation of a Java project. It 
  is light-weight, and has only limited possibilities for manipulating source 
  code. It is typically used as a basis for the Package Explorer in Eclipse.
@@ -1642,7 +1632,7 @@ means that the underlying element of a handle does not need to actually exist.
  Hence the user of a handle must always check that it exist by calling the 
  \method{exists} method of the handle.
  
-The handles with descriptions is listed in \myref{tab:javaModelTable}.
+The handles with descriptions is listed in \myref{tab:javaModel}.
  
  \begin{table}[h]
    \centering
@@ -1674,7 +1664,7 @@ The handles with descriptions is listed in \myref{tab:javaModelTable}.
    \end{tabularx}
    \caption{The elements of the Java Model. {\footnotesize Taken from 
      \url{http://www.vogella.com/tutorials/EclipseJDT/article.html}}}
-  \label{tab:javaModelTable}
+  \label{tab:javaModel}
  \end{table}
  
  The hierarchy of the Java Model is shown in \myref{fig:javaModel}.
@@ -1774,7 +1764,67 @@ from how source manipulation programs would do it, except for some properties of
  code that is analyzed in the parser, and that they may be differing in what 
  kinds of properties they analyze.  Thus the process of translation source code 
  into a structure that is suitable for analyzing, can be seen as a kind of 
-interrupted compilation process.
+interrupted compilation process \see{fig:interruptedCompilationProcess}.
+
+\begin{figure}[h]
+  \centering
+  \tikzset{
+    base/.style={anchor=north, align=center, rectangle, minimum height=1.4cm},
+    basewithshadow/.style={base, drop shadow, fill=white},
+    outlined/.style={basewithshadow, draw, rounded corners, minimum 
+    width=0.4cm},
+    primary/.style={outlined, font=\bfseries},
+    dashedbox/.style={outlined, dashed},
+    arrowpath/.style={black, align=center, font=\small},
+    processarrow/.style={arrowpath, ->, >=angle 90, shorten >=1pt},
+  }
+  \begin{tikzpicture}[node distance=1.3cm and 3cm, scale=1, every 
+    node/.style={transform shape}]
+    \node[base](AuxNode1){\small source code};
+    \node[primary, right=of AuxNode1, xshift=-2.5cm](Scanner){Scanner};
+    \node[primary, right=of Scanner, xshift=0.5cm](Parser){Parser};
+    \node[dashedbox, below=of Parser](SemanticAnalyzer){Semantic\\Analyzer};
+    \node[dashedbox, left=of SemanticAnalyzer](SourceCodeOptimizer){Source 
+    Code\\Optimizer};
+    \node[dashedbox, below=of SourceCodeOptimizer
+    ](CodeGenerator){Code\\Generator};
+    \node[dashedbox, right=of CodeGenerator](TargetCodeOptimizer){Target 
+    Code\\Optimizer};
+    \node[base, right=of TargetCodeOptimizer](AuxNode2){};
+
+    \draw[processarrow](AuxNode1) -- (Scanner);
+
+    \path[arrowpath] (Scanner) -- node [sloped](tokens){tokens}(Parser);
+    \draw[processarrow](Scanner) -- (tokens) -- (Parser);
+
+    \path[arrowpath] (Parser) -- node (syntax){syntax 
+    tree}(SemanticAnalyzer);
+    \draw[processarrow](Parser) -- (syntax) -- (SemanticAnalyzer);
+
+    \path[arrowpath] (SemanticAnalyzer) -- node 
+    [sloped](annotated){annotated\\tree}(SourceCodeOptimizer);
+    \draw[processarrow, dashed](SemanticAnalyzer) -- (annotated) -- 
+    (SourceCodeOptimizer);
+
+    \path[arrowpath] (SourceCodeOptimizer) -- node 
+    (intermediate){intermediate code}(CodeGenerator);
+    \draw[processarrow, dashed](SourceCodeOptimizer) -- (intermediate) --
+    (CodeGenerator);
+
+    \path[arrowpath] (CodeGenerator) -- node [sloped](target1){target 
+    code}(TargetCodeOptimizer);
+    \draw[processarrow, dashed](CodeGenerator) -- (target1) --
+    (TargetCodeOptimizer);
+
+    \path[arrowpath](TargetCodeOptimizer) -- node [sloped](target2){target 
+    code}(AuxNode2);
+    \draw[processarrow, dashed](TargetCodeOptimizer) -- (target2) (AuxNode2);
+  \end{tikzpicture}
+  \caption{Interrupted compilation process. {\footnotesize (Full compilation 
+    process borrowed from \emph{Compiler construction: principles and practice} 
+    by Kenneth C.  Louden\citing{louden1997}.)}}
+  \label{fig:interruptedCompilationProcess}
+\end{figure}
  
  The process starts with a \emph{scanner}, or lexer. The job of the scanner is to 
  read the source code and divide it into tokens for the parser. Therefore, it is 
@@ -1930,7 +1980,7 @@ There is an overview of some of the structure of an Eclipse AST in
  \end{figure}
  \todoin{Add more to the AST format tree? \myref{fig:astEclipse}}
  
-\section{The ASTVisitor}
+\section{The ASTVisitor}\label{astVisitor}
  So far, the only thing that has been adressed is how the the data that is going 
  to be the basis for our analysis is structured. Another aspect of it is how we 
  are going to traverse the AST to gather the information we need, so we can 
@@ -2050,22 +2100,447 @@ easy way to provide access to the nodes in the tree.
  The version of the visitor pattern implemented for the AST nodes in Eclipse also 
  provides an elegant way to traverse the tree. It does so by following the 
  convention that every node in the tree first let the visitor visit itself, 
-before it also makes all their children accept the visitor. This then makes for 
-an prefix traversal of the AST.
-\todoin{Elaborate\ldots}
+before it also makes all its children accept the visitor. The children are only 
+visited if the visit method of their parent returns \var{true}. This pattern 
+then makes for a prefix traversal of the AST. If postfix traversal is desired, 
+the visitors also has \method{endVisit} methods for each node type, that is 
+called after the \method{visit} method for a node. In addition to these visit 
+methods, there are also the methods \method{preVisit(ASTNode)}, 
+\method{postVisit(ASTNode)} and \method{preVisit2(ASTNode)}. The 
+\method{preVisit} method is called before the type-specific \method{visit} 
+method. The \method{postVisit} method is called after the type-specific 
+\method{endVisit}. The type specific \method{visit} is only called if 
+\method{preVisit2} returns \var{true}. Overriding the \method{preVisit2} is also 
+altering the behavior of \method{preVisit}, since the default implementation is 
+responsible for calling it.
+
+An example of a trivial \type{ASTVisitor} is shown in 
+\myref{lst:astVisitorExample}.
  
-\section{Illegal selections}
+\begin{listing}
+\begin{minted}{java}
+public class CollectNamesVisitor extends ASTVisitor {
+    Collection<Name> names = new LinkedList<Name>();
  
-\subsection{Not all branches end in return}
+    @Override
+    public boolean visit(QualifiedName node) {
+      names.add(node);
+      return false;
+    }
  
-\subsection{Ambiguous return statement}
-This problem occurs when there is either more than one assignment to a local 
-variable that is used outside of the selection, or there is only one, but there 
-are also return statements in the selection.
+    @Override
+    public boolean visit(SimpleName node) {
+        names.add(node);
+        return true;
+    }
+} 
+\end{minted}
+\caption{An \type{ASTVisitor} that visits all the names in a subtree and adds 
+them to a collection, except those names that are children of any 
+\type{QualifiedName}.}
+\label{lst:astVisitorExample}
+\end{listing}
+
+\section{Property collectors}\label{propertyCollectors}
+The prefixes and unfixes are found by property 
+collectors\typeref{no.uio.ifi.refaktor.extractors.collectors.PropertyCollector}.  
+A property collector is of the \type{ASTVisitor} type, and thus visits nodes of 
+type \type{ASTNode} of the abstract syntax tree \see{astVisitor}.
+
+\subsection{The PrefixesCollector}
+The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{PrefixesCollector} 
+finds prefixes that makes up the basis for calculating move targets for the 
+Extract and Move Method refactoring. It visits expression 
+statements\typeref{org.eclipse.jdt.core.dom.ExpressionStatement} and creates 
+prefixes from its expressions in the case of method invocations. The prefixes 
+found is registered with a prefix set, together with all its sub-prefixes.
+
+\subsection{The UnfixesCollector}\label{unfixes}
+The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{UnfixesCollector} 
+finds unfixes within a selection. That is prefixes that cannot be used as a 
+basis for finding a move target in a refactoring.
+
+An unfix can be a name that is assigned to within a selection. The reason that 
+this cannot be allowed, is that the result would be an assignment to the 
+\type{this} keyword, which is not valid in Java \see{eclipse_bug_420726}.
+
+Prefixes that originates from variable declarations within the same selection 
+are also considered unfixes. This is because when a method is moved, it needs to 
+be called through a variable. If this variable is also within the method that is 
+to be moved, this obviously cannot be done.
+
+Also considered as unfixes are variable references that are of types that is not 
+suitable for moving a methods to. This can be either because it is not 
+physically possible to move the method to the desired class or that it will 
+cause compilation errors by doing so.
+
+If the type binding for a name is not resolved it is considered and unfix. The 
+same applies to types that is only found in compiled code, so they have no 
+underlying source that is accessible to us. (E.g. the \type{java.lang.String} 
+class.)
+
+Interfaces types are not suitable as targets. This is simply because interfaces 
+in java cannot contain methods with bodies. (This thesis does not deal with 
+features of Java versions later than Java 7. Java 8 has interfaces with default 
+implementations of methods.) Neither are local types allowed. This accounts for 
+both local and anonymous classes. Anonymous classes are effectively the same as 
+interface types with respect to unfixes. Local classes could in theory be used 
+as targets, but this is not possible due to limitations of the implementation of 
+the Extract and Move Method refactoring. The problem is that the refactoring is 
+done in two steps, so the intermediate state between the two refactorings would 
+not be legal Java code. In the case of local classes, the problem is that, in 
+the intermediate step, a selection referencing a local class would need to take 
+the local class as a parameter if it were to be extracted to a new method. This 
+new method would need to live in the scope of the declaring class of the 
+originating method. The local class would then not be in the scope of the 
+extracted method, thus bringing the source code into an illegal state. One could 
+imagine that the method was extracted and moved in one operation, without an 
+intermediate state. Then it would make sense to include variables with types of 
+local classes in the set of legal targets, since the local classes would then be 
+in the scopes of the method calls. If this makes any difference for software 
+metrics that measure coupling would be a different discussion.
+
+\begin{listing}
+\begin{multicols}{2}
+\begin{minted}[]{java}
+// Before
+void declaresLocalClass() {
+  class LocalClass {
+    void foo() {}
+    void bar() {}
+  }
+
+  LocalClass inst =
+    new LocalClass();
+  inst.foo();
+  inst.bar();
+}
+\end{minted}
+
+\columnbreak
+
+\begin{minted}[]{java}
+// After Extract Method
+void declaresLocalClass() {
+  class LocalClass {
+    void foo() {}
+    void bar() {}
+  }
+
+  LocalClass inst =
+    new LocalClass();
+  fooBar(inst);
+}
+
+// Intermediate step
+void fooBar(LocalClass inst) {
+  inst.foo();
+  inst.bar();
+}
+\end{minted}
+\end{multicols}
+\caption{When Extract and Move Method tries to use a variable with a local type 
+as the move target, an intermediate step is taken that is not allowed. Here: 
+\type{LocalClass} is not in the scope of \method{fooBar} in its intermediate 
+location.}
+\label{lst:extractMethod_LocalClass}
+\end{listing}
+
+The last class of names that are considered unfixes is names used in null tests.  
+These are tests that reads like this: if \texttt{<name>} equals \var{null} then 
+do something. If allowing variables used in those kinds of expressions as 
+targets for moving methods, we would end up with code containing boolean 
+expressions like \texttt{this == null}, which would not be meaningful, since 
+\var{this} would never be \var{null}.
+
+
+\subsection{The ContainsReturnStatementCollector}
+The 
+\typewithref{no.uio.ifi.refaktor.analyze.collectors}{ContainsReturnStatementCollector} 
+is a very simple property collector. It only visits the return statements within 
+a selection, and can report whether it encountered a return statement or not.
+
+\subsection{The LastStatementCollector}
+The \typewithref{no.uio.ifi.refaktor.analyze.collectors}{LastStatementCollector} 
+collects the last statement of a selection. It does so by only visiting the top 
+level statements of the selection, and compares the textual end offset of each 
+encuntered statement with the end offset of the previous statement found.
+
+\section{Checkers}\label{checkers}
+The checkers are a range of classes that checks that selections complies with 
+certian criterias. If a 
+\typewithref{no.uio.ifi.refaktor.analyze.analyzers}{Checker} fails, it throws a 
+\type{CheckerException}. The checkers are managed by the 
+\type{LegalStatementsChecker}, which does not, in fact, implement the 
+\type{Checker} interface. It does, however, run all the checkers registered with 
+it, and reports that all statements are considered legal if no 
+\type{CheckerException} is thrown. Many of the checkers either extends the 
+\type{PropertyCollector} or utilizes one or more property collectors to verify 
+some criterias. The checkers registered with the \type{LegalStatementsChecker} 
+are described next. They are run in the order presented below.
+
+\subsection{The EnclosingInstanceReferenceChecker}
+The purpose of this checker is to verify that the names in a selection is not 
+referencing any enclosing instances. This is for making sure that all references 
+is legal in a method that is to be moved. Theoretically, some situations could 
+be easily solved my passing a reference to the referenced class with the moved 
+method (e.g. when calling public methods), but the dependency on the 
+\type{MoveInstanceMethodProcessor} prevents this.
+
+The 
+\typewithref{no.uio.ifi.refaktor.analyze.analyzers}{EnclosingInstanceReferenceChecker} 
+is a modified version of the 
+\typewithref{org.eclipse.jdt.internal.corext.refactoring.structure.MoveInstanceMethodProcessor}{EnclosingInstanceReferenceFinder} 
+from the \type{MoveInstanceMethodProcessor}. Wherever the 
+\type{EnclosingInstanceReferenceFinder} would create a fatal error status, the 
+checker throws a \type{CheckerException}.
+
+It works by first finding all of the enclosing types of a selection. Thereafter 
+it visits all its simple names to check that they are not references to 
+variables or methods declared in any of the enclosing types. In addition the 
+checker visits \var{this}-expressions to verify that no such expressions is 
+qualified with any name.
+
+\subsection{The ReturnStatementsChecker}\label{returnStatementsChecker}
+\todoin{Write\ldots/change implementation/use control flow graph?}
+
+\subsection{The AmbiguousReturnValueChecker}
+This checker verifies that there are no \emph{ambiguous return statements} in a 
+selection. The problem with ambiguous return statements arise when a selection 
+is chosen to be extracted into a new method, but it needs to return more than 
+one value from that method.  This problem occurs in two situations.  The first 
+situation arise when there is more than one local variable that is both assigned 
+to within a selection and also referenced after the selection. The other 
+situation occur when there is only one such assignment, but there is also one or 
+more return statements in the selection.
+
+First the checker needs to collect some data. Those data are the binding keys 
+for all simple names that are assigned to within the selection, including 
+variable declarations, but excluding fields. The checker also collects whether 
+there exists a return statement in the selection or not. No further checks of 
+return statements are needed, since, at this point, the selection is already 
+checked for illegal return statements \see{returnStatementsChecker}.
+
+After the binding keys of the assignees are collected, the checker searches the 
+part of the enclosing method that is after the selection for references whose 
+binding keys are among the the collected keys. If more than one unique referral 
+is found, or only one referral is found, but the selection also contains a 
+return statement, we have a situation with an ambiguous return value, and an 
+exception is thrown.
+
+%\todoin{Explain why we do not need to consider variables assigned inside 
+%local/anonymous classes. (The referenced variables need to be final and so 
+%on\ldots)}
+
+\subsection{The IllegalStatementsChecker}
+This checker is designed to check for illegal statements.
+
+Any use of the \var{super} keyword is prohibited, since its meaning is altered 
+when moving a method to another class.
+
+For a \emph{break} statement, there is two situations to consider: A break 
+statement with or without a label. If the break statement has a label, it is 
+checked that whole of the labeled statement is inside the selection. Since a 
+label does not have any binding information, we have to search upwards in the 
+AST to find the \type{LabeledStatement} that corresponds to the label from the 
+break statement, and check that it is contained in the selection. If the break 
+statement does not have a label attached to it, it is checked that its innermost 
+enclosing loop or switch statement also is inside the selection.
+
+The situation for a \emph{continue} statement is the same as for a break 
+statement, except that it is not allowed inside switch statements.
+
+Regarding \emph{assignments}, two types of assignments is allowed: Assignment to 
+a non-final variable and assignment to an array access. All other assignments is 
+regarded illegal.
+
+\todoin{Finish\ldots}
+
+
+\chapter{Benchmarking}
+\todoin{Better name than ``benchmarking''?}
+This part of the master project is located in the Eclipse project 
+\code{no.uio.ifi.refaktor.benchmark}. The purpose of it is to run the equivalent 
+of the \type{SearchBasedExtractAndMoveMethodChanger} 
+\see{searchBasedExtractAndMoveMethodChanger} over a larger software project, 
+both to test its roubustness but also its effect on different software metrics.
+
+\section{The benchmark setup}
+The benchmark itself is set up as a \emph{JUnit} test case. This is a convenient 
+setup, and utilizes the \emph{JUnit Plugin Test Launcher}. This provides us a 
+with a fully functional Eclipse workbench. Most importantly, this gives us 
+access to the Java Model of Eclipse \see{javaModel}.
+
+\subsection{The ProjectImporter}
+The Java project that is going to be used as the data for the benchmark, must be 
+imported into the JUnit workspace. This is done by the 
+\typewithref{no.uio.ifi.refaktor.benchmark}{ProjectImporter}. The importer 
+require the absolute path to the project description file. It is named 
+\code{.project} and is located at the root of the project directory.
+
+The project description is loaded to find the name of the project to be 
+imported. The project that shall be the destination for the import is created in 
+the workspace, on the base of the name from the description. Then an import 
+operation is created, based on both the source and destination information. The 
+import operation is run to perform the import.
+
+I have found no simple API call to accomplish what the importer does, which 
+tells me that it may not be too many people performing this particular action. 
+The solution to the problem was found on \emph{Stack 
+Overflow}\footnote{\url{https://stackoverflow.com/questions/12401297}}. It 
+contains enough dirty details to be considered unconvenient to use, if not 
+wrapping it in a class like my \type{ProjectImporter}. One would probably have 
+to delve into the source code for the import wizard to find out how the import 
+operation works, if no one had already done it.
+
+\section{Statistics}
+Statistics for the analysis and changes is captured by the 
+\typewithref{no.uio.ifi.refaktor.aspects}{StatisticsAspect}. This an 
+\emph{aspect} written in \emph{AspectJ}.
+
+\subsection{AspectJ}
+\emph{AspectJ}\footnote{\url{http://eclipse.org/aspectj/}} is an extension to 
+the Java language, and facilitates combining aspect-oriented programming with 
+the object-oriented programming in Java.
+
+Aspect-oriented programming is a programming paradigm that is meant to isolate 
+so-called \emph{cross-cutting concerns} into their own modules. These 
+cross-cutting concerns are functionalities that spans over multiple classes, but 
+may not belong naturally in any of them. It can be functionality that does not 
+concern the business logic of an application, and thus may be a burden when 
+entangled with parts of the source code it does not really belong. Examples 
+include logging, debugging, optimization and security.
+
+Aspects are interacting with other modules by defining advices. The concept of 
+an \emph{advice} is known from both aspect-oriented and functional 
+programming\citing{wikiAdvice2014}. It is a function that modifies another 
+function when the latter is run. An advice in AspectJ is somewhat similar to a 
+method in Java. It is meant to alter the behavior of other methods, and contains 
+a body that is executed when it is applied.
+
+An advice can be applied at a defined \emph{pointcut}. A pointcut picks out one 
+or more \emph{join points}. A join point is a well-defined point in the 
+execution of a program. It can occur when calling a method defined for a 
+particular class, when calling all methods with the same name, 
+accessing/assigning to a particular field of a given class and so on. An advice 
+can be declared to run both before, after returning from a pointcut, when there 
+is thrown an exception in the pointcut or after the pointcut either returns or 
+throws an exception.  In addition to picking out join points, a pointcut can 
+also bind variables from its context, so they can be accessed in the body of an 
+advice. An example of a pointcut and an advice is found in 
+\myref{lst:aspectjExample}.
+
+\begin{listing}[h]
+\begin{minted}{java}
+pointcut methodAnalyze(
+  SearchBasedExtractAndMoveMethodAnalyzer analyzer) :
+    call(* SearchBasedExtractAndMoveMethodAnalyzer.analyze()) 
+      && target(analyzer);
+
+after(SearchBasedExtractAndMoveMethodAnalyzer analyzer) : 
+    methodAnalyze(analyzer) {
+  statistics.methodCount++;
+  debugPrintMethodAnalysisProgress(analyzer.method);
+}
+\end{minted}
+\caption{An example of a pointcut named \method{methodAnalyze}, 
+and an advice defined to be applied after it has occurred.}
+\label{lst:aspectjExample}
+\end{listing}
+
+\subsection{The Statistics class}
+The statistics aspect stores statistical information in an object of type 
+\type{Statistics}. As of now, the aspect needs to be initialized at the point in 
+time where it is desired that it starts its data gathering. At any point in time 
+the statistics aspect can be queried for a snapshot of the current statistics.
+
+The \type{Statistics} class also include functionality for generating a report 
+of its gathered statistics. The report can be given either as a string or it can 
+be written to a file.
+
+\subsection{Advices}
+The statistics aspect contains advices for gathering statistical data from 
+different parts of the benchmarking process. It captures statistics from both 
+the analysis part and the execution part of the composite \ExtractAndMoveMethod 
+refactoring.
+
+For the analysis part, there are advices to count the number of text selections 
+analyzed and the number of methods, types, compilation units and packages 
+analyzed. There are also advices that counts for how many of the methods there 
+is found a selection that is a candidate for the refactoring, and for how many 
+ethods there is not.
+
+There exists advices for counting both the successful and unsuccessful 
+executions of all the refactorings. Both for the \ExtractMethod and \MoveMethod 
+refactorings in isolation, as well as for the combination of them.
+
+\section{Optimizations}
+When looking for optimizations to make for the benchmarking process, I used the 
+\emph{VisualVM}\footnote{\url{http://visualvm.java.net/}} for the Java Virtual 
+Machine to both profile the application and also to make memory dumps of its 
+heap.
+
+\subsection{Caching}
+When profiling the benchmark process before making any optimizations, it early 
+became apparent that the parsing of source code was a place to direct attention 
+towards. This discovery was done when only \emph{analyzing} source code, before 
+trying to do any \emph{manipulation} of it. Caching of the parsed ASTs seemed 
+like the best way to save some time, as expected. With only a simple cache of 
+the most recently used AST, the analysis time was speeded up by a factor of 
+around 
+20.  This number depends a little upon which type of system the analysis was 
+run.
+
+The caching is managed by a cache manager, that now, by default, utilizes the 
+not so well known feature of Java called a \emph{soft reference}. Soft 
+references are best explained in the context of weak references. A \emph{weak 
+reference} is a reference to an object instance that is only guaranteed to 
+persist as long as there is a \emph{strong reference} or a soft reference 
+referring the same object. If no such reference is found, its referred object is 
+garbage collected. A strong reference is basically the same as a regular Java 
+reference. A soft reference has the same guarantees as a week reference when it 
+comes to its relation to strong references, but it is not necessarily garbage 
+collected whenever there exists no strong references to it. A soft reference 
+\emph{may} reside in memory as long as the JVM has enough free memory in the 
+heap. A soft reference will therefore usually perform better than a weak 
+reference when used for simple caching and similar tasks. The way to use a 
+soft/weak reference is to as it for its referent. The return value then has to 
+be tested to check that it is not \var{null}. For the basic usage of soft 
+references, see \myref{lst:softReferenceExample}. For a more thorough 
+explanation of weak references in general, see\citing{weakRef2006}.
+
+\begin{listing}[h]
+\begin{minted}{java}
+// Strong reference
+Object strongRef = new Object();
+
+// Soft reference
+SoftReference<Object> softRef =
+    new SoftReference<Object>(new Object());
+
+// Using the soft reference
+Object obj = softRef.get();
+if (obj != null) {
+    // Use object here
+}
+\end{minted}
+\caption{Showing the basic usage of soft references. Weak references is used the 
+  same way. {\footnotesize (The references are part of the \code{java.lang.ref} 
+package.)}}
+\label{lst:softReferenceExample}
+\end{listing}
  
-\todoin{Explain why we do not need to consider variables assigned inside 
-local/anonymous classes. (The referenced variables need to be final and so 
-on\ldots)}
+The cache based on soft references has no limit for how many ASTs it caches. It 
+is generally not advisable to keep references to ASTs for prolonged periods of
+time, since they are expensive structures to hold on to. For regular plugin
+development, Eclipse recommends not creating more than one AST at a time to 
+limit memory consumption. Since the benchmarking has nothing to do with user 
+experience, and throughput is everything, these advices are intentionally 
+ignored. This means that during the benchmarking process, the target Eclipse 
+application may very well work close to its memory limit for the heap space for 
+long periods during the benchmark.
+
+\subsection{Memento}
  
  \chapter{Eclipse Bugs Found}
  \todoin{Add other things and change headline?}