]> git.uio.no Git - u/mrichter/AliRoot.git/blame - doc/Distributed-Analysis/Introduction.tex
Modified file access mode
[u/mrichter/AliRoot.git] / doc / Distributed-Analysis / Introduction.tex
CommitLineData
76bcc7e6 1%________________________________________________________
2\section{Introduction}
3\label{Note:INTRO}
4
5Based on the official ALICE documents
6\cite{Note:RefPPR,Note:RefComputingTDR}, the computing model of the
7experiment can be described as follows:
8
9\begin{itemize}
10\item Tier 0 provides permanent storage of the raw data, distributes
11 them to Tier 1 and performs the calibration and alignment task as
12 well as the first reconstruction pass. The calibration procedure
13 will also be addressed by PROOF clusters such as the CERN Analysis
14 Facility (CAF) \cite{Note:RefCAF}.
15
16\item Tier 1s outside CERN collectively provide permanent storage of a
17 copy of the raw data. All Tier 1s perform the subsequent
18 reconstruction passes and the scheduled analysis tasks.
19
20\item Tier 2s generate and reconstruct the simulated Monte Carlo data
21 and perform the chaotic analysis submitted by the physicists.
22
23\end{itemize}
24
25The experience of past experiments shows that the typical data
26analysis (chaotic analysis) will consume a large fraction of the total
27amount of resources. The time needed to analyze and reconstruct events
28depends mainly on the analysis and reconstruction algorithm. In
29particular, the GRID user data analysis has been developed and tested
30with two approaches: the asynchronous (batch approach) and the
31synchronous (interactive) analysis.
32
33
34In this note we will try to describe the distributed framework, and
35the steps needed in order to analyze data. We will also provide some
36practical examples for the users based on the new analysis framework
37which has been adopted by the collaboration
38\cite{Note:RefAnalysisFramework}. Before going into detail on the
39different analysis tasks, we would like to address the general steps a
40user needs to take before submitting an analysis job:
41
42
43\begin{itemize}
44\item Code validation: In order to validate the code, a user should
45 copy a few AliESDs.root files locally and try to analyze them by
46 following the instructions listed in section \ref{Note:LOCAL}.
47
48\item Interactive analysis: After the user is satisfied from both the
49 code sanity and the corresponding results, the next step is to
50 increase the statistics by submitting an interactive job that will
51 analyze ESDs stored on the GRID. This task is done in such a way to
52 simulate the behavior of a GRID worker node. If this step is
53 successful then we have a big probability that our batch job will be
54 executed properly. Detailed instructions on how to perform this task
55 are listed in section \ref{Note:INTERACTIVE}.
56
57\item Finally, if the user is satisfied with the results from the
58 previous step, a batch job can be launched that will take advantage
59 of the whole GRID infrastructure in order to analyze files stored in
60 different storage elements. This step is covered in detail in
61 section \ref{Note:BATCH}.
62
63\end{itemize}
64
65It should be pointed out that what we describe in this note involves
66the usage of the whole metadata machinery of the ALICE experiment:
67that is both the file/run level metadata
68\cite{Note:RefFileCatalogMetadataNote} as well as the \tag\
69\cite{Note:RefEventTagNote}. The latter is used extensively because
70apart from the fact that it provides an event filtering mechanism to
71the users and thus reducing the overall analysis time significantly
72\cite{Note:RefEventTagNote}, it also provides a transparent way to
73retrieve the desired input data collection in the proper format (= a
74chain of ESD files) which can be directly analyzed. On the other hand,
75if the \tag\ is not used then apart from the fact that the user cannot
76utilize the event filtering, he/she also has to create the input data
77collection (= a chain of ESD files), manually.
78