adding number of open runs to monitoring
[u/mrichter/AliRoot.git] / SHUTTLE / AliShuttle.cxx
1 /**************************************************************************
2  * Copyright(c) 1998-1999, ALICE Experiment at CERN, All rights reserved. *
3  *                                                                        *
4  * Author: The ALICE Off-line Project.                                    *
5  * Contributors are mentioned in the code where appropriate.              *
6  *                                                                        *
7  * Permission to use, copy, modify and distribute this software and its   *
8  * documentation strictly for non-commercial purposes is hereby granted   *
9  * without fee, provided that the above copyright notice appears in all   *
10  * copies and that both the copyright notice and this permission notice   *
11  * appear in the supporting documentation. The authors make no claims     *
12  * about the suitability of this software for any purpose. It is          *
13  * provided "as is" without express or implied warranty.                  *
14  **************************************************************************/
15
16 /*
17 $Log$
18 Revision 1.74  2007/12/17 03:23:32  jgrosseo
19 several bugfixes
20 added "empty preprocessor" as placeholder for Acorde in FDR
21
22 Revision 1.73  2007/12/14 19:31:36  acolla
23 Sending email to DCS experts is temporarily commented
24
25 Revision 1.72  2007/12/13 15:44:28  acolla
26 Run type added in mail sent to detector expert (eases understanding)
27
28 Revision 1.71  2007/12/12 14:56:14  jgrosseo
29 sending shuttle_ignore to ML also in case of 0 events
30
31 Revision 1.70  2007/12/12 13:45:35  acolla
32 Monalisa started in Collect() function. Alive message to monitor is sent at each Collect and every minute during preprocessor processing.
33
34 Revision 1.69  2007/12/12 10:06:29  acolla
35 in AliShuttle.cxx: SHUTTLE logbook is updated in case of invalid run times:
36
37 time_start==0 && time_end==0
38
39 logbook is NOT updated if time_start != 0 && time_end == 0, because it may mean that the run is still ongoing.
40
41 Revision 1.68  2007/12/11 10:15:17  acolla
42 Added marking SHUTTLE=DONE for invalid runs
43 (invalid start time or end time) and runs with totalEvents < 1
44
45 Revision 1.67  2007/12/07 19:14:36  acolla
46 in AliShuttleTrigger:
47
48 Added automatic collection of new runs on a regular time basis (settable from the configuration)
49
50 in AliShuttleConfig: new members
51
52 - triggerWait: time to wait for DIM trigger (s) before starting automatic collection of new runs
53 - mode: run mode (test, prod) -> used to build log folder (logs or logs_PROD)
54
55 in AliShuttle:
56
57 - logs now stored in logs/#RUN/DET_#RUN.log
58
59 Revision 1.66  2007/12/05 10:45:19  jgrosseo
60 changed order of arguments to TMonaLisaWriter
61
62 Revision 1.65  2007/11/26 16:58:37  acolla
63 Monalisa configuration added: host and table name
64
65 Revision 1.64  2007/11/13 16:15:47  acolla
66 DCS map is stored in a file in the temp folder where the detector is processed.
67 If the preprocessor fails, the temp folder is not removed. This will help the debugging of the problem.
68
69 Revision 1.63  2007/11/02 10:53:16  acolla
70 Protection added to AliShuttle::CopyFileLocally
71
72 Revision 1.62  2007/10/31 18:23:13  acolla
73 Furter developement on the Shuttle:
74
75 - Shuttle now connects to the Grid as alidaq. The OCDB and Reference folders
76 are now built from /alice/data, e.g.:
77 /alice/data/2007/LHC07a/OCDB
78
79 the year and LHC period are taken from the Shuttle.
80 Raw metadata files are stored by GRP to:
81 /alice/data/2007/LHC07a/<runNb>/Raw/RunMetadata.root
82
83 - Shuttle sends a mail to DCS experts each time DP retrieval fails.
84
85 Revision 1.61  2007/10/30 20:33:51  acolla
86 Improved managing of temporary folders, which weren't correctly handled.
87 Resolved bug introduced in StoreReferenceFile, which caused SPD preprocessor fail.
88
89 Revision 1.60  2007/10/29 18:06:16  acolla
90
91 New function StoreRunMetadataFile added to preprocessor and Shuttle interface
92 This function can be used by GRP only. It stores raw data tags merged file to the
93 raw data folder (e.g. /alice/data/2008/LHC08a/000099999/Raw).
94
95 KNOWN ISSUES:
96
97 1. Shuttle cannot write to /alice/data/ because it belongs to alidaq. Tag file is stored in /alice/simulation/... for the time being.
98 2. Due to a bug in TAlien::Mkdir, the creation of a folder in recursive mode (-p option) does not work. The problem
99 has been corrected in the root package on the Shuttle machine.
100
101 Revision 1.59  2007/10/05 12:40:55  acolla
102
103 Result error code added to AliDCSClient data members (it was "lost" with the new implementation of TMap* GetAliasValues and GetDPValues).
104
105 Revision 1.58  2007/09/28 15:27:40  acolla
106
107 AliDCSClient "multiSplit" option added in the DCS configuration
108 in AliDCSMessage: variable MAX_BODY_SIZE set to 500000
109
110 Revision 1.57  2007/09/27 16:53:13  acolla
111 Detectors can have more than one AMANDA server. SHUTTLE queries the servers sequentially,
112 merges the dcs aliases/DPs in one TMap and sends it to the preprocessor.
113
114 Revision 1.56  2007/09/14 16:46:14  jgrosseo
115 1) Connect and Close are called before and after each query, so one can
116 keep the same AliDCSClient object.
117 2) The splitting of a query is moved to GetDPValues/GetAliasValues.
118 3) Splitting interval can be specified in constructor
119
120 Revision 1.55  2007/08/06 12:26:40  acolla
121 Function Bool_t GetHLTStatus added to preprocessor. It returns the status of HLT
122 read from the run logbook.
123
124 Revision 1.54  2007/07/12 09:51:25  jgrosseo
125 removed duplicated log message in GetFile
126
127 Revision 1.53  2007/07/12 09:26:28  jgrosseo
128 updating hlt fxs base path
129
130 Revision 1.52  2007/07/12 08:06:45  jgrosseo
131 adding log messages in getfile... functions
132 adding not implemented copy constructor in alishuttleconfigholder
133
134 Revision 1.51  2007/07/03 17:24:52  acolla
135 root moved to v5-16-00. TFileMerger->Cp moved to TFile::Cp.
136
137 Revision 1.50  2007/07/02 17:19:32  acolla
138 preprocessor is run in a temp directory that is removed when process is finished.
139
140 Revision 1.49  2007/06/29 10:45:06  acolla
141 Number of columns in MySql Shuttle logbook increased by one (HLT added)
142
143 Revision 1.48  2007/06/21 13:06:19  acolla
144 GetFileSources returns dummy list with 1 source if system=DCS (better than
145 returning error as it was)
146
147 Revision 1.47  2007/06/19 17:28:56  acolla
148 HLT updated; missing map bug removed.
149
150 Revision 1.46  2007/06/09 13:01:09  jgrosseo
151 Switching to retrieval of several DCS DPs at a time (multiDPrequest)
152
153 Revision 1.45  2007/05/30 06:35:20  jgrosseo
154 Adding functionality to the Shuttle/TestShuttle:
155 o) Function to retrieve list of sources from a given system (GetFileSources with id=0)
156 o) Function to retrieve list of IDs for a given source      (GetFileIDs)
157 These functions are needed for dealing with the tag files that are saved for the GRP preprocessor
158 Example code has been added to the TestProcessor in TestShuttle
159
160 Revision 1.44  2007/05/11 16:09:32  acolla
161 Reference files for ITS, MUON and PHOS are now stored in OfflineDetName/OnlineDetName/run_...
162 example: ITS/SPD/100_filename.root
163
164 Revision 1.43  2007/05/10 09:59:51  acolla
165 Various bug fixes in StoreRefFilesToGrid; Cleaning of reference storage before processing detector (CleanReferenceStorage)
166
167 Revision 1.42  2007/05/03 08:01:39  jgrosseo
168 typo in last commit :-(
169
170 Revision 1.41  2007/05/03 08:00:48  jgrosseo
171 fixing log message when pp want to skip dcs value retrieval
172
173 Revision 1.40  2007/04/27 07:06:48  jgrosseo
174 GetFileSources returns empty list in case of no files, but successful query
175 No mails sent in testmode
176
177 Revision 1.39  2007/04/17 12:43:57  acolla
178 Correction in StoreOCDB; change of text in mail to detector expert
179
180 Revision 1.38  2007/04/12 08:26:18  jgrosseo
181 updated comment
182
183 Revision 1.37  2007/04/10 16:53:14  jgrosseo
184 redirecting sub detector stdout, stderr to sub detector log file
185
186 Revision 1.35  2007/04/04 16:26:38  acolla
187 1. Re-organization of function calls in TestPreprocessor to make it more meaningful.
188 2. Added missing dependency in test preprocessors.
189 3. in AliShuttle.cxx: processing time and memory consumption info on a single line.
190
191 Revision 1.34  2007/04/04 10:33:36  jgrosseo
192 1) Storing of files to the Grid is now done _after_ your preprocessors succeeded. This is transparent, which means that you can still use the same functions (Store, StoreReferenceData) to store files to the Grid. However, the Shuttle first stores them locally and transfers them after the preprocessor finished. The return code of these two functions has changed from UInt_t to Bool_t which gives you the success of the storing.
193 In case of an error with the Grid, the Shuttle will retry the storing later, the preprocessor does not need to be run again.
194
195 2) The meaning of the return code of the preprocessor has changed. 0 is now success and any other value means failure. This value is stored in the log and you can use it to keep details about the error condition.
196
197 3) New function StoreReferenceFile to _directly_ store a file (without opening it) to the reference storage.
198
199 4) The memory usage of the preprocessor is monitored. If it exceeds 2 GB it is terminated.
200
201 5) New function AliPreprocessor::ProcessDCS(). If you do not need to have DCS data in all cases, you can skip the processing by implemting this function and returning kFALSE under certain conditions. E.g. if there is a certain run type.
202 If you always need DCS data (like before), you do not need to implement it.
203
204 6) The run type has been added to the monitoring page
205
206 Revision 1.33  2007/04/03 13:56:01  acolla
207 Grid Storage at the end of preprocessing. Added virtual method to disable DCS query according to the
208 run type.
209
210 Revision 1.32  2007/02/28 10:41:56  acolla
211 Run type field added in SHUTTLE framework. Run type is read from "run type" logbook and retrieved by
212 AliPreprocessor::GetRunType() function.
213 Added some ldap definition files.
214
215 Revision 1.30  2007/02/13 11:23:21  acolla
216 Moved getters and setters of Shuttle's main OCDB/Reference, local
217 OCDB/Reference, temp and log folders to AliShuttleInterface
218
219 Revision 1.27  2007/01/30 17:52:42  jgrosseo
220 adding monalisa monitoring
221
222 Revision 1.26  2007/01/23 19:20:03  acolla
223 Removed old ldif files, added TOF, MCH ldif files. Added some options in
224 AliShuttleConfig::Print. Added in Ali Shuttle: SetShuttleTempDir and
225 SetShuttleLogDir
226
227 Revision 1.25  2007/01/15 19:13:52  acolla
228 Moved some AliInfo to AliDebug in SendMail function
229
230 Revision 1.21  2006/12/07 08:51:26  jgrosseo
231 update (alberto):
232 table, db names in ldap configuration
233 added GRP preprocessor
234 DCS data can also be retrieved by data point
235
236 Revision 1.20  2006/11/16 16:16:48  jgrosseo
237 introducing strict run ordering flag
238 removed giving preprocessor name to preprocessor, they have to know their name themselves ;-)
239
240 Revision 1.19  2006/11/06 14:23:04  jgrosseo
241 major update (Alberto)
242 o) reading of run parameters from the logbook
243 o) online offline naming conversion
244 o) standalone DCSclient package
245
246 Revision 1.18  2006/10/20 15:22:59  jgrosseo
247 o) Adding time out to the execution of the preprocessors: The Shuttle forks and the parent process monitors the child
248 o) Merging Collect, CollectAll, CollectNew function
249 o) Removing implementation of empty copy constructors (declaration still there!)
250
251 Revision 1.17  2006/10/05 16:20:55  jgrosseo
252 adapting to new CDB classes
253
254 Revision 1.16  2006/10/05 15:46:26  jgrosseo
255 applying to the new interface
256
257 Revision 1.15  2006/10/02 16:38:39  jgrosseo
258 update (alberto):
259 fixed memory leaks
260 storing of objects that failed to be stored to the grid before
261 interfacing of shuttle status table in daq system
262
263 Revision 1.14  2006/08/29 09:16:05  jgrosseo
264 small update
265
266 Revision 1.13  2006/08/15 10:50:00  jgrosseo
267 effc++ corrections (alberto)
268
269 Revision 1.12  2006/08/08 14:19:29  jgrosseo
270 Update to shuttle classes (Alberto)
271
272 - Possibility to set the full object's path in the Preprocessor's and
273 Shuttle's  Store functions
274 - Possibility to extend the object's run validity in the same classes
275 ("startValidity" and "validityInfinite" parameters)
276 - Implementation of the StoreReferenceData function to store reference
277 data in a dedicated CDB storage.
278
279 Revision 1.11  2006/07/21 07:37:20  jgrosseo
280 last run is stored after each run
281
282 Revision 1.10  2006/07/20 09:54:40  jgrosseo
283 introducing status management: The processing per subdetector is divided into several steps,
284 after each step the status is stored on disk. If the system crashes in any of the steps the Shuttle
285 can keep track of the number of failures and skips further processing after a certain threshold is
286 exceeded. These thresholds can be configured in LDAP.
287
288 Revision 1.9  2006/07/19 10:09:55  jgrosseo
289 new configuration, accesst to DAQ FES (Alberto)
290
291 Revision 1.8  2006/07/11 12:44:36  jgrosseo
292 adding parameters for extended validity range of data produced by preprocessor
293
294 Revision 1.7  2006/07/10 14:37:09  jgrosseo
295 small fix + todo comment
296
297 Revision 1.6  2006/07/10 13:01:41  jgrosseo
298 enhanced storing of last sucessfully processed run (alberto)
299
300 Revision 1.5  2006/07/04 14:59:57  jgrosseo
301 revision of AliDCSValue: Removed wrapper classes, reduced storage size per value by factor 2
302
303 Revision 1.4  2006/06/12 09:11:16  jgrosseo
304 coding conventions (Alberto)
305
306 Revision 1.3  2006/06/06 14:26:40  jgrosseo
307 o) removed files that were moved to STEER
308 o) shuttle updated to follow the new interface (Alberto)
309
310 Revision 1.2  2006/03/07 07:52:34  hristov
311 New version (B.Yordanov)
312
313 Revision 1.6  2005/11/19 17:19:14  byordano
314 RetrieveDATEEntries and RetrieveConditionsData added
315
316 Revision 1.5  2005/11/19 11:09:27  byordano
317 AliShuttle declaration added
318
319 Revision 1.4  2005/11/17 17:47:34  byordano
320 TList changed to TObjArray
321
322 Revision 1.3  2005/11/17 14:43:23  byordano
323 import to local CVS
324
325 Revision 1.1.1.1  2005/10/28 07:33:58  hristov
326 Initial import as subdirectory in AliRoot
327
328 Revision 1.2  2005/09/13 08:41:15  byordano
329 default startTime endTime added
330
331 Revision 1.4  2005/08/30 09:13:02  byordano
332 some docs added
333
334 Revision 1.3  2005/08/29 21:15:47  byordano
335 some docs added
336
337 */
338
339 //
340 // This class is the main manager for AliShuttle. 
341 // It organizes the data retrieval from DCS and call the 
342 // interface methods of AliPreprocessor.
343 // For every detector in AliShuttleConfgi (see AliShuttleConfig),
344 // data for its set of aliases is retrieved. If there is registered
345 // AliPreprocessor for this detector then it will be used
346 // accroding to the schema (see AliPreprocessor).
347 // If there isn't registered AliPreprocessor than the retrieved
348 // data is stored automatically to the undelying AliCDBStorage.
349 // For detSpec is used the alias name.
350 //
351
352 #include "AliShuttle.h"
353
354 #include "AliCDBManager.h"
355 #include "AliCDBStorage.h"
356 #include "AliCDBId.h"
357 #include "AliCDBRunRange.h"
358 #include "AliCDBPath.h"
359 #include "AliCDBEntry.h"
360 #include "AliShuttleConfig.h"
361 #include "DCSClient/AliDCSClient.h"
362 #include "AliLog.h"
363 #include "AliPreprocessor.h"
364 #include "AliShuttleStatus.h"
365 #include "AliShuttleLogbookEntry.h"
366
367 #include <TSystem.h>
368 #include <TObject.h>
369 #include <TString.h>
370 #include <TTimeStamp.h>
371 #include <TObjString.h>
372 #include <TSQLServer.h>
373 #include <TSQLResult.h>
374 #include <TSQLRow.h>
375 #include <TMutex.h>
376 #include <TSystemDirectory.h>
377 #include <TSystemFile.h>
378 #include <TFile.h>
379 #include <TGrid.h>
380 #include <TGridResult.h>
381
382 #include <TMonaLisaWriter.h>
383
384 #include <fstream>
385
386 #include <sys/types.h>
387 #include <sys/wait.h>
388
389 ClassImp(AliShuttle)
390
391 //______________________________________________________________________________________________
392 AliShuttle::AliShuttle(const AliShuttleConfig* config,
393                 UInt_t timeout, Int_t retries):
394 fConfig(config),
395 fTimeout(timeout), fRetries(retries),
396 fPreprocessorMap(),
397 fLogbookEntry(0),
398 fCurrentDetector(),
399 fStatusEntry(0),
400 fMonitoringMutex(0),
401 fLastActionTime(0),
402 fLastAction(),
403 fMonaLisa(0),
404 fTestMode(kNone),
405 fReadTestMode(kFALSE),
406 fOutputRedirected(kFALSE)
407 {
408         //
409         // config: AliShuttleConfig used
410         // timeout: timeout used for AliDCSClient connection
411         // retries: the number of retries in case of connection error.
412         //
413
414         if (!fConfig->IsValid()) AliFatal("********** !!!!! Invalid configuration !!!!! **********");
415         for(int iSys=0;iSys<4;iSys++) {
416                 fServer[iSys]=0;
417                 if (iSys < 3)
418                         fFXSlist[iSys].SetOwner(kTRUE);
419         }
420         fPreprocessorMap.SetOwner(kTRUE);
421
422         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
423                 fFirstUnprocessed[iDet] = kFALSE;
424
425         fMonitoringMutex = new TMutex();
426 }
427
428 //______________________________________________________________________________________________
429 AliShuttle::~AliShuttle()
430 {
431         //
432         // destructor
433         //
434
435         fPreprocessorMap.DeleteAll();
436         for(int iSys=0;iSys<4;iSys++)
437                 if(fServer[iSys]) {
438                         fServer[iSys]->Close();
439                         delete fServer[iSys];
440                         fServer[iSys] = 0;
441                 }
442
443         if (fStatusEntry){
444                 delete fStatusEntry;
445                 fStatusEntry = 0;
446         }
447         
448         if (fMonitoringMutex) 
449         {
450                 delete fMonitoringMutex;
451                 fMonitoringMutex = 0;
452         }
453 }
454
455 //______________________________________________________________________________________________
456 void AliShuttle::RegisterPreprocessor(AliPreprocessor* preprocessor)
457 {
458         //
459         // Registers new AliPreprocessor.
460         // It uses GetName() for indentificator of the pre processor.
461         // The pre processor is registered it there isn't any other
462         // with the same identificator (GetName()).
463         //
464
465         const char* detName = preprocessor->GetName();
466         if(GetDetPos(detName) < 0)
467                 AliFatal(Form("********** !!!!! Invalid detector name: %s !!!!! **********", detName));
468
469         if (fPreprocessorMap.GetValue(detName)) {
470                 AliWarning(Form("AliPreprocessor %s is already registered!", detName));
471                 return;
472         }
473
474         fPreprocessorMap.Add(new TObjString(detName), preprocessor);
475 }
476 //______________________________________________________________________________________________
477 Bool_t AliShuttle::Store(const AliCDBPath& path, TObject* object,
478                 AliCDBMetaData* metaData, Int_t validityStart, Bool_t validityInfinite)
479 {
480         // Stores a CDB object in the storage for offline reconstruction. Objects that are not needed for
481         // offline reconstruction, but should be stored anyway (e.g. for debugging) should NOT be stored
482         // using this function. Use StoreReferenceData instead!
483         // It calls StoreLocally function which temporarily stores the data locally; when the preprocessor
484         // finishes the data are transferred to the main storage (Grid).
485
486         return StoreLocally(fgkLocalCDB, path, object, metaData, validityStart, validityInfinite);
487 }
488
489 //______________________________________________________________________________________________
490 Bool_t AliShuttle::StoreReferenceData(const AliCDBPath& path, TObject* object, AliCDBMetaData* metaData)
491 {
492         // Stores a CDB object in the storage for reference data. This objects will not be available during
493         // offline reconstrunction. Use this function for reference data only!
494         // It calls StoreLocally function which temporarily stores the data locally; when the preprocessor
495         // finishes the data are transferred to the main storage (Grid).
496
497         return StoreLocally(fgkLocalRefStorage, path, object, metaData);
498 }
499
500 //______________________________________________________________________________________________
501 Bool_t AliShuttle::StoreLocally(const TString& localUri,
502                         const AliCDBPath& path, TObject* object, AliCDBMetaData* metaData,
503                         Int_t validityStart, Bool_t validityInfinite)
504 {
505         // Store object temporarily in local storage. Parameters are passed by Store and StoreReferenceData functions.
506         // when the preprocessor finishes the data are transferred to the main storage (Grid).
507         // The parameters are:
508         //   1) Uri of the backup storage (Local)
509         //   2) the object's path.
510         //   3) the object to be stored
511         //   4) the metaData to be associated with the object
512         //   5) the validity start run number w.r.t. the current run,
513         //      if the data is valid only for this run leave the default 0
514         //   6) specifies if the calibration data is valid for infinity (this means until updated),
515         //      typical for calibration runs, the default is kFALSE
516         //
517         // returns 0 if fail, 1 otherwise
518
519         if (fTestMode & kErrorStorage)
520         {
521                 Log(fCurrentDetector, "StoreLocally - In TESTMODE - Simulating error while storing locally");
522                 return kFALSE;
523         }
524         
525         const char* cdbType = (localUri == fgkLocalCDB) ? "CDB" : "Reference";
526
527         Int_t firstRun = GetCurrentRun() - validityStart;
528         if(firstRun < 0) {
529                 AliWarning("First valid run happens to be less than 0! Setting it to 0.");
530                 firstRun=0;
531         }
532
533         Int_t lastRun = -1;
534         if(validityInfinite) {
535                 lastRun = AliCDBRunRange::Infinity();
536         } else {
537                 lastRun = GetCurrentRun();
538         }
539
540         // Version is set to current run, it will be used later to transfer data to Grid
541         AliCDBId id(path, firstRun, lastRun, GetCurrentRun(), -1);
542
543         if(! dynamic_cast<TObjString*> (metaData->GetProperty("RunUsed(TObjString)"))){
544                 TObjString runUsed = Form("%d", GetCurrentRun());
545                 metaData->SetProperty("RunUsed(TObjString)", runUsed.Clone());
546         }
547
548         Bool_t result = kFALSE;
549
550         if (!(AliCDBManager::Instance()->GetStorage(localUri))) {
551                 Log("SHUTTLE", Form("StoreLocally - Cannot activate local %s storage", cdbType));
552         } else {
553                 result = AliCDBManager::Instance()->GetStorage(localUri)
554                                         ->Put(object, id, metaData);
555         }
556
557         if(!result) {
558
559                 Log(fCurrentDetector, Form("StoreLocally - Can't store object <%s>!", id.ToString().Data()));
560         }
561
562         return result;
563 }
564
565 //______________________________________________________________________________________________
566 Bool_t AliShuttle::StoreOCDB()
567 {
568         //
569         // Called when preprocessor ends successfully or when previous storage attempt failed (kStoreError status)
570         // Calls underlying StoreOCDB(const char*) function twice, for OCDB and Reference storage.
571         // Then calls StoreRefFilesToGrid to store reference files. 
572         //
573         
574         if (fTestMode & kErrorGrid)
575         {
576                 Log("SHUTTLE", "StoreOCDB - In TESTMODE - Simulating error while storing in the Grid");
577                 Log(fCurrentDetector, "StoreOCDB - In TESTMODE - Simulating error while storing in the Grid");
578                 return kFALSE;
579         }
580         
581         Log("SHUTTLE","StoreOCDB - Storing OCDB data ...");
582         Bool_t resultCDB = StoreOCDB(fgkMainCDB);
583
584         Log("SHUTTLE","StoreOCDB - Storing reference data ...");
585         Bool_t resultRef = StoreOCDB(fgkMainRefStorage);
586         
587         Log("SHUTTLE","StoreOCDB - Storing reference files ...");
588         Bool_t resultRefFiles = CopyFilesToGrid("reference");
589         
590         Bool_t resultMetadata = kTRUE;
591         if(fCurrentDetector == "GRP") 
592         {
593                 Log("StoreOCDB - SHUTTLE","Storing Run Metadata file ...");
594                 resultMetadata = CopyFilesToGrid("metadata");
595         }
596         
597         return resultCDB && resultRef && resultRefFiles && resultMetadata;
598 }
599
600 //______________________________________________________________________________________________
601 Bool_t AliShuttle::StoreOCDB(const TString& gridURI)
602 {
603         //
604         // Called by StoreOCDB(), performs actual storage to the main OCDB and reference storages (Grid)
605         //
606
607         TObjArray* gridIds=0;
608
609         Bool_t result = kTRUE;
610
611         const char* type = 0;
612         TString localURI;
613         if(gridURI == fgkMainCDB) {
614                 type = "OCDB";
615                 localURI = fgkLocalCDB;
616         } else if(gridURI == fgkMainRefStorage) {
617                 type = "reference";
618                 localURI = fgkLocalRefStorage;
619         } else {
620                 AliError(Form("Invalid storage URI: %s", gridURI.Data()));
621                 return kFALSE;
622         }
623
624         AliCDBManager* man = AliCDBManager::Instance();
625
626         AliCDBStorage *gridSto = man->GetStorage(gridURI);
627         if(!gridSto) {
628                 Log("SHUTTLE",
629                         Form("StoreOCDB - cannot activate main %s storage", type));
630                 return kFALSE;
631         }
632
633         gridIds = gridSto->GetQueryCDBList();
634
635         // get objects previously stored in local CDB
636         AliCDBStorage *localSto = man->GetStorage(localURI);
637         if(!localSto) {
638                 Log("SHUTTLE",
639                         Form("StoreOCDB - cannot activate local %s storage", type));
640                 return kFALSE;
641         }
642         AliCDBPath aPath(GetOfflineDetName(fCurrentDetector.Data()),"*","*");
643         // Local objects were stored with current run as Grid version!
644         TList* localEntries = localSto->GetAll(aPath.GetPath(), GetCurrentRun(), GetCurrentRun());
645         localEntries->SetOwner(1);
646
647         // loop on local stored objects
648         TIter localIter(localEntries);
649         AliCDBEntry *aLocEntry = 0;
650         while((aLocEntry = dynamic_cast<AliCDBEntry*> (localIter.Next()))){
651                 aLocEntry->SetOwner(1);
652                 AliCDBId aLocId = aLocEntry->GetId();
653                 aLocEntry->SetVersion(-1);
654                 aLocEntry->SetSubVersion(-1);
655
656                 // If local object is valid up to infinity we store it only if it is
657                 // the first unprocessed run!
658                 if (aLocId.GetLastRun() == AliCDBRunRange::Infinity() &&
659                         !fFirstUnprocessed[GetDetPos(fCurrentDetector)])
660                 {
661                         Log("SHUTTLE", Form("StoreOCDB - %s: object %s has validity infinite but "
662                                                 "there are previous unprocessed runs!",
663                                                 fCurrentDetector.Data(), aLocId.GetPath().Data()));
664                         result = kFALSE;
665                         continue;
666                 }
667
668                 // loop on Grid valid Id's
669                 Bool_t store = kTRUE;
670                 TIter gridIter(gridIds);
671                 AliCDBId* aGridId = 0;
672                 while((aGridId = dynamic_cast<AliCDBId*> (gridIter.Next()))){
673                         if(aGridId->GetPath() != aLocId.GetPath()) continue;
674                         // skip all objects valid up to infinity
675                         if(aGridId->GetLastRun() == AliCDBRunRange::Infinity()) continue;
676                         // if we get here, it means there's already some more recent object stored on Grid!
677                         store = kFALSE;
678                         break;
679                 }
680
681                 // If we get here, the file can be stored!
682                 Bool_t storeOk = gridSto->Put(aLocEntry);
683                 if(!store || storeOk){
684
685                         if (!store)
686                         {
687                                 Log(fCurrentDetector.Data(),
688                                         Form("StoreOCDB - A more recent object already exists in %s storage: <%s>",
689                                                 type, aGridId->ToString().Data()));
690                         } else {
691                                 Log("SHUTTLE",
692                                         Form("StoreOCDB - Object <%s> successfully put into %s storage",
693                                                 aLocId.ToString().Data(), type));
694                                 Log(fCurrentDetector.Data(),
695                                         Form("StoreOCDB - Object <%s> successfully put into %s storage",
696                                                 aLocId.ToString().Data(), type));
697                         }
698
699                         // removing local filename...
700                         TString filename;
701                         localSto->IdToFilename(aLocId, filename);
702                         Log("SHUTTLE", Form("StoreOCDB - Removing local file %s", filename.Data()));
703                         RemoveFile(filename.Data());
704                         continue;
705                 } else  {
706                         Log("SHUTTLE",
707                                 Form("StoreOCDB - Grid %s storage of object <%s> failed",
708                                         type, aLocId.ToString().Data()));
709                         Log(fCurrentDetector.Data(),
710                                 Form("StoreOCDB - Grid %s storage of object <%s> failed",
711                                         type, aLocId.ToString().Data()));
712                         result = kFALSE;
713                 }
714         }
715         localEntries->Clear();
716
717         return result;
718 }
719
720 //______________________________________________________________________________________________
721 Bool_t AliShuttle::CleanReferenceStorage(const char* detector)
722 {
723         // clears the directory used to store reference files of a given subdetector
724   
725         AliCDBManager* man = AliCDBManager::Instance();
726         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
727         TString localBaseFolder = sto->GetBaseFolder();
728
729         TString targetDir = GetRefFilePrefix(localBaseFolder.Data(), detector);
730         
731         Log("SHUTTLE", Form("CleanReferenceStorage - Cleaning %s", targetDir.Data()));
732
733         TString begin;
734         begin.Form("%d_", GetCurrentRun());
735         
736         TSystemDirectory* baseDir = new TSystemDirectory("/", targetDir);
737         if (!baseDir)
738                 return kTRUE;
739                 
740         TList* dirList = baseDir->GetListOfFiles();
741         delete baseDir;
742         
743         if (!dirList) return kTRUE;
744                         
745         if (dirList->GetEntries() < 3) 
746         {
747                 delete dirList;
748                 return kTRUE;
749         }
750                                 
751         Int_t nDirs = 0, nDel = 0;
752         TIter dirIter(dirList);
753         TSystemFile* entry = 0;
754
755         Bool_t success = kTRUE;
756         
757         while ((entry = dynamic_cast<TSystemFile*> (dirIter.Next())))
758         {                                       
759                 if (entry->IsDirectory())
760                         continue;
761                 
762                 TString fileName(entry->GetName());
763                 if (!fileName.BeginsWith(begin))
764                         continue;
765                         
766                 nDirs++;
767                                                 
768                 // delete file
769                 Int_t result = gSystem->Unlink(fileName.Data());
770                 
771                 if (result)
772                 {
773                         Log("SHUTTLE", Form("CleanReferenceStorage - Could not delete file %s!", fileName.Data()));
774                         success = kFALSE;
775                 } else {
776                         nDel++;
777                 }
778         }
779
780         if(nDirs > 0)
781                 Log("SHUTTLE", Form("CleanReferenceStorage - %d (over %d) reference files in folder %s were deleted.", 
782                         nDel, nDirs, targetDir.Data()));
783
784                 
785         delete dirList;
786         return success;
787
788
789
790
791
792
793   Int_t result = gSystem->GetPathInfo(targetDir, 0, (Long64_t*) 0, 0, 0);
794   if (result == 0)
795   {
796     // delete directory
797     result = gSystem->Exec(Form("rm -rf %s", targetDir.Data()));
798     if (result != 0)
799     {  
800       Log("SHUTTLE", Form("CleanReferenceStorage - Could not clean directory %s", targetDir.Data()));
801       return kFALSE;
802     }
803   }
804
805   result = gSystem->mkdir(targetDir, kTRUE);
806   if (result != 0)
807   {
808     Log("SHUTTLE", Form("CleanReferenceStorage - Error creating base directory %s", targetDir.Data()));
809     return kFALSE;
810   }
811         
812   return kTRUE;
813 }
814
815 //______________________________________________________________________________________________
816 Bool_t AliShuttle::StoreReferenceFile(const char* detector, const char* localFile, const char* gridFileName)
817 {
818         //
819         // Stores reference file directly (without opening it). This function stores the file locally.
820         //
821         // The file is stored under the following location: 
822         // <base folder of local reference storage>/<DET>/<RUN#>_<gridFileName>
823         // where <gridFileName> is the second parameter given to the function
824         // 
825         
826         if (fTestMode & kErrorStorage)
827         {
828                 Log(fCurrentDetector, "StoreReferenceFile - In TESTMODE - Simulating error while storing locally");
829                 return kFALSE;
830         }
831         
832         AliCDBManager* man = AliCDBManager::Instance();
833         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
834         
835         TString localBaseFolder = sto->GetBaseFolder();
836         
837         TString target = GetRefFilePrefix(localBaseFolder.Data(), detector);    
838         target.Append(Form("/%d_%s", GetCurrentRun(), gridFileName));
839         
840         return CopyFileLocally(localFile, target);
841 }
842
843 //______________________________________________________________________________________________
844 Bool_t AliShuttle::StoreRunMetadataFile(const char* localFile, const char* gridFileName)
845 {
846         //
847         // Stores Run metadata file to the Grid, in the run folder
848         //
849         // Only GRP can call this function.
850         
851         if (fTestMode & kErrorStorage)
852         {
853                 Log(fCurrentDetector, "StoreRunMetaDataFile - In TESTMODE - Simulating error while storing locally");
854                 return kFALSE;
855         }
856         
857         AliCDBManager* man = AliCDBManager::Instance();
858         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
859         
860         TString localBaseFolder = sto->GetBaseFolder();
861         
862         // Build Run level folder
863         // folder = /alice/data/year/lhcPeriod/runNb/Raw
864         
865                 
866         TString lhcPeriod = GetLHCPeriod();     
867         if (lhcPeriod.Length() == 0) 
868         {
869                 Log("SHUTTLE","StoreRunMetaDataFile - LHCPeriod not found in logbook!");
870                 return 0;
871         }
872         
873         TString target = Form("%s/GRP/RunMetadata/alice/data/%d/%s/%09d/Raw/%s", 
874                                 localBaseFolder.Data(), GetCurrentYear(), 
875                                 lhcPeriod.Data(), GetCurrentRun(), gridFileName);
876                                         
877         return CopyFileLocally(localFile, target);
878 }
879
880 //______________________________________________________________________________________________
881 Bool_t AliShuttle::CopyFileLocally(const char* localFile, const TString& target)
882 {
883         //
884         // Stores file locally. Called by StoreReferenceFile and StoreRunMetadataFile
885         // Files are temporarily stored in the local reference storage. When the preprocessor 
886         // finishes, the Shuttle calls CopyFilesToGrid to transfer the files to AliEn 
887         // (in reference or run level folders)
888         //
889         
890         TString targetDir(target(0, target.Last('/')));
891         
892         //try to open base dir folder, if it does not exist
893         void* dir = gSystem->OpenDirectory(targetDir.Data());
894         if (dir == NULL) {
895                 if (gSystem->mkdir(targetDir.Data(), kTRUE)) {
896                         Log("SHUTTLE", Form("StoreFileLocally - Can't open directory <%s>", targetDir.Data()));
897                         return kFALSE;
898                 }
899
900         } else {
901                 gSystem->FreeDirectory(dir);
902         }
903         
904         Int_t result = 0;
905         
906         result = gSystem->GetPathInfo(localFile, 0, (Long64_t*) 0, 0, 0);
907         if (result)
908         {
909                 Log("SHUTTLE", Form("StoreFileLocally - %s does not exist", localFile));
910                 return kFALSE;
911         }
912
913         result = gSystem->GetPathInfo(target, 0, (Long64_t*) 0, 0, 0);
914         if (!result)
915         {
916                 Log("SHUTTLE", Form("StoreFileLocally - target file %s already exist, removing...", target.Data()));
917                 if (gSystem->Unlink(target.Data()))
918                 {
919                         Log("SHUTTLE", Form("StoreFileLocally - Could not remove existing target file %s!", target.Data()));
920                         return kFALSE;
921                 }
922         }       
923         
924         result = gSystem->CopyFile(localFile, target);
925
926         if (result == 0)
927         {
928                 Log("SHUTTLE", Form("StoreFileLocally - File %s stored locally to %s", localFile, target.Data()));
929                 return kTRUE;
930         }
931         else
932         {
933                 Log("SHUTTLE", Form("StoreFileLocally - Could not store file %s to %s! Error code = %d", 
934                                 localFile, target.Data(), result));
935                 return kFALSE;
936         }       
937
938
939
940 }
941
942 //______________________________________________________________________________________________
943 Bool_t AliShuttle::CopyFilesToGrid(const char* type)
944 {
945         //
946         // Transfers local files to the Grid. Local files can be reference files 
947         // or run metadata file (from GRP only).
948         //
949         // According to the type (ref, metadata) the files are stored under the following location: 
950         // ref --> <base folder of reference storage>/<DET>/<RUN#>_<gridFileName>
951         // metadata --> <run data folder>/<MetadataFileName>
952         //
953                 
954         AliCDBManager* man = AliCDBManager::Instance();
955         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
956         if (!sto)
957                 return kFALSE;
958         TString localBaseFolder = sto->GetBaseFolder();
959         
960         TString dir;
961         TString alienDir;
962         TString begin;
963         
964         if (strcmp(type, "reference") == 0) 
965         {
966                 dir = GetRefFilePrefix(localBaseFolder.Data(), fCurrentDetector.Data());
967                 AliCDBStorage* gridSto = man->GetStorage(fgkMainRefStorage);
968                 if (!gridSto)
969                         return kFALSE;
970                 TString gridBaseFolder = gridSto->GetBaseFolder();
971                 alienDir = GetRefFilePrefix(gridBaseFolder.Data(), fCurrentDetector.Data());
972                 begin = Form("%d_", GetCurrentRun());
973         } 
974         else if (strcmp(type, "metadata") == 0)
975         {
976                         
977                 TString lhcPeriod = GetLHCPeriod();
978         
979                 if (lhcPeriod.Length() == 0) 
980                 {
981                         Log("SHUTTLE","CopyFilesToGrid - LHCPeriod not found in logbook!");
982                         return 0;
983                 }
984                 
985                 dir = Form("%s/GRP/RunMetadata/alice/data/%d/%s/%09d/Raw", 
986                                 localBaseFolder.Data(), GetCurrentYear(), 
987                                 lhcPeriod.Data(), GetCurrentRun());
988                 alienDir = dir(dir.Index("/alice/data/"), dir.Length());
989                 
990                 begin = "";
991         }
992         else 
993         {
994                 Log("SHUTTLE", "CopyFilesToGrid - Unexpected: type label must be reference or metadata!");
995                 return kFALSE;
996         }
997                 
998         TSystemDirectory* baseDir = new TSystemDirectory("/", dir);
999         if (!baseDir)
1000                 return kTRUE;
1001                 
1002         TList* dirList = baseDir->GetListOfFiles();
1003         delete baseDir;
1004         
1005         if (!dirList) return kTRUE;
1006                 
1007         if (dirList->GetEntries() < 3) 
1008         {
1009                 delete dirList;
1010                 return kTRUE;
1011         }
1012                         
1013         if (!gGrid)
1014         { 
1015                 Log("SHUTTLE", "CopyFilesToGrid - Connection to Grid failed: Cannot continue!");
1016                 delete dirList;
1017                 return kFALSE;
1018         }
1019         
1020         Int_t nDirs = 0, nTransfer = 0;
1021         TIter dirIter(dirList);
1022         TSystemFile* entry = 0;
1023
1024         Bool_t success = kTRUE;
1025         Bool_t first = kTRUE;
1026         
1027         while ((entry = dynamic_cast<TSystemFile*> (dirIter.Next())))
1028         {                       
1029                 if (entry->IsDirectory())
1030                         continue;
1031                         
1032                 TString fileName(entry->GetName());
1033                 if (!fileName.BeginsWith(begin))
1034                         continue;
1035                         
1036                 nDirs++;
1037                         
1038                 if (first)
1039                 {
1040                         first = kFALSE;
1041                         // check that folder exists, otherwise create it
1042                         TGridResult* result = gGrid->Ls(alienDir.Data(), "a");
1043                         
1044                         if (!result)
1045                         {
1046                                 delete dirList;
1047                                 return kFALSE;
1048                         }
1049                         
1050                         if (!result->GetFileName(1)) // TODO: It looks like element 0 is always 0!!
1051                         {
1052                                 // TODO It does not work currently! Bug in TAliEn::Mkdir
1053                                 // TODO Manually fixed in local root v5-16-00
1054                                 if (!gGrid->Mkdir(alienDir.Data(),"-p",0))
1055                                 {
1056                                         Log("SHUTTLE", Form("CopyFilesToGrid - Cannot create directory %s",
1057                                                         alienDir.Data()));
1058                                         delete dirList;
1059                                         return kFALSE;
1060                                 } else {
1061                                         Log("SHUTTLE",Form("CopyFilesToGrid - Folder %s created", alienDir.Data()));
1062                                 }
1063                                 
1064                         } else {
1065                                         Log("SHUTTLE",Form("CopyFilesToGrid - Folder %s found", alienDir.Data()));
1066                         }
1067                 }
1068                         
1069                 TString fullLocalPath;
1070                 fullLocalPath.Form("%s/%s", dir.Data(), fileName.Data());
1071                 
1072                 TString fullGridPath;
1073                 fullGridPath.Form("alien://%s/%s", alienDir.Data(), fileName.Data());
1074
1075                 Bool_t result = TFile::Cp(fullLocalPath, fullGridPath);
1076                 
1077                 if (result)
1078                 {
1079                         Log("SHUTTLE", Form("CopyFilesToGrid - Copying local file %s to %s succeeded!", 
1080                                                 fullLocalPath.Data(), fullGridPath.Data()));
1081                         RemoveFile(fullLocalPath);
1082                         nTransfer++;
1083                 }
1084                 else
1085                 {
1086                         Log("SHUTTLE", Form("CopyFilesToGrid - Copying local file %s to %s FAILED!", 
1087                                                 fullLocalPath.Data(), fullGridPath.Data()));
1088                         success = kFALSE;
1089                 }
1090         }
1091
1092         Log("SHUTTLE", Form("CopyFilesToGrid - %d (over %d) files in folder %s copied to Grid.", 
1093                                                 nTransfer, nDirs, dir.Data()));
1094
1095                 
1096         delete dirList;
1097         return success;
1098 }
1099
1100 //______________________________________________________________________________________________
1101 const char* AliShuttle::GetRefFilePrefix(const char* base, const char* detector)
1102 {
1103         //
1104         // Get folder name of reference files 
1105         //
1106
1107         TString offDetStr(GetOfflineDetName(detector));
1108         TString dir;
1109         if (offDetStr == "ITS" || offDetStr == "MUON" || offDetStr == "PHOS")
1110         {
1111                 dir.Form("%s/%s/%s", base, offDetStr.Data(), detector);
1112         } else {
1113                 dir.Form("%s/%s", base, offDetStr.Data());
1114         }
1115         
1116         return dir.Data();
1117         
1118
1119 }
1120
1121 //______________________________________________________________________________________________
1122 void AliShuttle::CleanLocalStorage(const TString& uri)
1123 {
1124         //
1125         // Called in case the preprocessor is declared failed. Remove remaining objects from the local storages.
1126         //
1127
1128         const char* type = 0;
1129         if(uri == fgkLocalCDB) {
1130                 type = "OCDB";
1131         } else if(uri == fgkLocalRefStorage) {
1132                 type = "Reference";
1133         } else {
1134                 AliError(Form("Invalid storage URI: %s", uri.Data()));
1135                 return;
1136         }
1137
1138         AliCDBManager* man = AliCDBManager::Instance();
1139
1140         // open local storage
1141         AliCDBStorage *localSto = man->GetStorage(uri);
1142         if(!localSto) {
1143                 Log("SHUTTLE",
1144                         Form("CleanLocalStorage - cannot activate local %s storage", type));
1145                 return;
1146         }
1147
1148         TString filename(Form("%s/%s/*/Run*_v%d_s*.root",
1149                 localSto->GetBaseFolder().Data(), GetOfflineDetName(fCurrentDetector.Data()), GetCurrentRun()));
1150
1151         AliDebug(2, Form("filename = %s", filename.Data()));
1152
1153         Log("SHUTTLE", Form("Removing remaining local files for run %d and detector %s ...",
1154                 GetCurrentRun(), fCurrentDetector.Data()));
1155
1156         RemoveFile(filename.Data());
1157
1158 }
1159
1160 //______________________________________________________________________________________________
1161 void AliShuttle::RemoveFile(const char* filename)
1162 {
1163         //
1164         // removes local file
1165         //
1166
1167         TString command(Form("rm -f %s", filename));
1168
1169         Int_t result = gSystem->Exec(command.Data());
1170         if(result != 0)
1171         {
1172                 Log("SHUTTLE", Form("RemoveFile - %s: Cannot remove file %s!",
1173                         fCurrentDetector.Data(), filename));
1174         }
1175 }
1176
1177 //______________________________________________________________________________________________
1178 AliShuttleStatus* AliShuttle::ReadShuttleStatus()
1179 {
1180         //
1181         // Reads the AliShuttleStatus from the CDB
1182         //
1183
1184         if (fStatusEntry){
1185                 delete fStatusEntry;
1186                 fStatusEntry = 0;
1187         }
1188
1189         fStatusEntry = AliCDBManager::Instance()->GetStorage(GetLocalCDB())
1190                 ->Get(Form("/SHUTTLE/STATUS/%s", fCurrentDetector.Data()), GetCurrentRun());
1191
1192         if (!fStatusEntry) return 0;
1193         fStatusEntry->SetOwner(1);
1194
1195         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1196         if (!status) {
1197                 AliError("Invalid object stored to CDB!");
1198                 return 0;
1199         }
1200
1201         return status;
1202 }
1203
1204 //______________________________________________________________________________________________
1205 Bool_t AliShuttle::WriteShuttleStatus(AliShuttleStatus* status)
1206 {
1207         //
1208         // writes the status for one subdetector
1209         //
1210
1211         if (fStatusEntry){
1212                 delete fStatusEntry;
1213                 fStatusEntry = 0;
1214         }
1215
1216         Int_t run = GetCurrentRun();
1217
1218         AliCDBId id(AliCDBPath("SHUTTLE", "STATUS", fCurrentDetector), run, run);
1219
1220         fStatusEntry = new AliCDBEntry(status, id, new AliCDBMetaData);
1221         fStatusEntry->SetOwner(1);
1222
1223         UInt_t result = AliCDBManager::Instance()->GetStorage(fgkLocalCDB)->Put(fStatusEntry);
1224
1225         if (!result) {
1226                 Log("SHUTTLE", Form("WriteShuttleStatus - Failed for %s, run %d",
1227                                                 fCurrentDetector.Data(), run));
1228                 return kFALSE;
1229         }
1230         
1231         SendMLInfo();
1232
1233         return kTRUE;
1234 }
1235
1236 //______________________________________________________________________________________________
1237 void AliShuttle::UpdateShuttleStatus(AliShuttleStatus::Status newStatus, Bool_t increaseCount)
1238 {
1239         //
1240         // changes the AliShuttleStatus for the given detector and run to the given status
1241         //
1242
1243         if (!fStatusEntry){
1244                 AliError("UNEXPECTED: fStatusEntry empty");
1245                 return;
1246         }
1247
1248         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1249
1250         if (!status){
1251                 Log("SHUTTLE", "UpdateShuttleStatus - UNEXPECTED: status could not be read from current CDB entry");
1252                 return;
1253         }
1254
1255         TString actionStr = Form("UpdateShuttleStatus - %s: Changing state from %s to %s",
1256                                 fCurrentDetector.Data(),
1257                                 status->GetStatusName(),
1258                                 status->GetStatusName(newStatus));
1259         Log("SHUTTLE", actionStr);
1260         SetLastAction(actionStr);
1261
1262         status->SetStatus(newStatus);
1263         if (increaseCount) status->IncreaseCount();
1264
1265         AliCDBManager::Instance()->GetStorage(fgkLocalCDB)->Put(fStatusEntry);
1266
1267         SendMLInfo();
1268 }
1269
1270 //______________________________________________________________________________________________
1271 void AliShuttle::SendMLInfo()
1272 {
1273         //
1274         // sends ML information about the current status of the current detector being processed
1275         //
1276         
1277         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1278         
1279         if (!status){
1280                 Log("SHUTTLE", "SendMLInfo - UNEXPECTED: status could not be read from current CDB entry");
1281                 return;
1282         }
1283         
1284         TMonaLisaText  mlStatus(Form("%s_status", fCurrentDetector.Data()), status->GetStatusName());
1285         TMonaLisaValue mlRetryCount(Form("%s_count", fCurrentDetector.Data()), status->GetCount());
1286
1287         TList mlList;
1288         mlList.Add(&mlStatus);
1289         mlList.Add(&mlRetryCount);
1290
1291         TString mlID;
1292         mlID.Form("%d", GetCurrentRun());
1293         fMonaLisa->SendParameters(&mlList, mlID);
1294 }
1295
1296 //______________________________________________________________________________________________
1297 Bool_t AliShuttle::ContinueProcessing()
1298 {
1299         // this function reads the AliShuttleStatus information from CDB and
1300         // checks if the processing should be continued
1301         // if yes it returns kTRUE and updates the AliShuttleStatus with nextStatus
1302
1303         if (!fConfig->HostProcessDetector(fCurrentDetector)) return kFALSE;
1304
1305         AliPreprocessor* aPreprocessor =
1306                 dynamic_cast<AliPreprocessor*> (fPreprocessorMap.GetValue(fCurrentDetector));
1307         if (!aPreprocessor)
1308         {
1309                 Log("SHUTTLE", Form("ContinueProcessing - %s: no preprocessor registered", fCurrentDetector.Data()));
1310                 return kFALSE;
1311         }
1312
1313         AliShuttleLogbookEntry::Status entryStatus =
1314                 fLogbookEntry->GetDetectorStatus(fCurrentDetector);
1315
1316         if(entryStatus != AliShuttleLogbookEntry::kUnprocessed) {
1317                 Log("SHUTTLE", Form("ContinueProcessing - %s is %s",
1318                                 fCurrentDetector.Data(),
1319                                 fLogbookEntry->GetDetectorStatusName(entryStatus)));
1320                 return kFALSE;
1321         }
1322
1323         // if we get here, according to Shuttle logbook subdetector is in UNPROCESSED state
1324
1325         // check if current run is first unprocessed run for current detector
1326         if (fConfig->StrictRunOrder(fCurrentDetector) &&
1327                 !fFirstUnprocessed[GetDetPos(fCurrentDetector)])
1328         {
1329                 if (fTestMode == kNone)
1330                 {
1331                         Log("SHUTTLE", Form("ContinueProcessing - %s requires strict run ordering"
1332                                         " but this is not the first unprocessed run!"));
1333                         return kFALSE;
1334                 }
1335                 else
1336                 {
1337                         Log("SHUTTLE", Form("ContinueProcessing - In TESTMODE - "
1338                                         "Although %s requires strict run ordering "
1339                                         "and this is not the first unprocessed run, "
1340                                         "the SHUTTLE continues"));
1341                 }
1342         }
1343
1344         AliShuttleStatus* status = ReadShuttleStatus();
1345         if (!status) {
1346                 // first time
1347                 Log("SHUTTLE", Form("ContinueProcessing - %s: Processing first time",
1348                                 fCurrentDetector.Data()));
1349                 status = new AliShuttleStatus(AliShuttleStatus::kStarted);
1350                 return WriteShuttleStatus(status);
1351         }
1352
1353         // The following two cases shouldn't happen if Shuttle Logbook was correctly updated.
1354         // If it happens it may mean Logbook updating failed... let's do it now!
1355         if (status->GetStatus() == AliShuttleStatus::kDone ||
1356             status->GetStatus() == AliShuttleStatus::kFailed){
1357                 Log("SHUTTLE", Form("ContinueProcessing - %s is already %s. Updating Shuttle Logbook",
1358                                         fCurrentDetector.Data(),
1359                                         status->GetStatusName(status->GetStatus())));
1360                 UpdateShuttleLogbook(fCurrentDetector.Data(),
1361                                         status->GetStatusName(status->GetStatus()));
1362                 return kFALSE;
1363         }
1364
1365         if (status->GetStatus() == AliShuttleStatus::kStoreError) {
1366                 Log("SHUTTLE",
1367                         Form("ContinueProcessing - %s: Grid storage of one or more "
1368                                 "objects failed. Trying again now",
1369                                 fCurrentDetector.Data()));
1370                 UpdateShuttleStatus(AliShuttleStatus::kStoreStarted);
1371                 if (StoreOCDB()){
1372                         Log("SHUTTLE", Form("ContinueProcessing - %s: all objects "
1373                                 "successfully stored into main storage",
1374                                 fCurrentDetector.Data()));
1375                 } else {
1376                         Log("SHUTTLE",
1377                                 Form("ContinueProcessing - %s: Grid storage failed again",
1378                                         fCurrentDetector.Data()));
1379                         UpdateShuttleStatus(AliShuttleStatus::kStoreError);
1380                 }
1381                 return kFALSE;
1382         }
1383
1384         // if we get here, there is a restart
1385         Bool_t cont = kFALSE;
1386
1387         // abort conditions
1388         if (status->GetCount() >= fConfig->GetMaxRetries()) {
1389                 Log("SHUTTLE", Form("ContinueProcessing - %s failed %d times in status %s - "
1390                                 "Updating Shuttle Logbook", fCurrentDetector.Data(),
1391                                 status->GetCount(), status->GetStatusName()));
1392                 UpdateShuttleLogbook(fCurrentDetector.Data(), "FAILED");
1393                 UpdateShuttleStatus(AliShuttleStatus::kFailed);
1394
1395                 // there may still be objects in local OCDB and reference storage
1396                 // and FXS databases may be not updated: do it now!
1397                 
1398                 // TODO Currently disabled, we want to keep files in case of failure!
1399                 // CleanLocalStorage(fgkLocalCDB);
1400                 // CleanLocalStorage(fgkLocalRefStorage);
1401                 // UpdateTableFailCase();
1402                 
1403                 // Send mail to detector expert!
1404                 Log("SHUTTLE", Form("ContinueProcessing - Sending mail to %s expert...", 
1405                                         fCurrentDetector.Data()));
1406                 if (!SendMail())
1407                         Log("SHUTTLE", Form("ContinueProcessing - Could not send mail to %s expert",
1408                                         fCurrentDetector.Data()));
1409
1410         } else {
1411                 Log("SHUTTLE", Form("ContinueProcessing - %s: restarting. "
1412                                 "Aborted before with %s. Retry number %d.", fCurrentDetector.Data(),
1413                                 status->GetStatusName(), status->GetCount()));
1414                 Bool_t increaseCount = kTRUE;
1415                 if (status->GetStatus() == AliShuttleStatus::kDCSError || 
1416                         status->GetStatus() == AliShuttleStatus::kDCSStarted)
1417                                 increaseCount = kFALSE;
1418                                 
1419                 UpdateShuttleStatus(AliShuttleStatus::kStarted, increaseCount);
1420                 cont = kTRUE;
1421         }
1422
1423         return cont;
1424 }
1425
1426 //______________________________________________________________________________________________
1427 Bool_t AliShuttle::Process(AliShuttleLogbookEntry* entry)
1428 {
1429         //
1430         // Makes data retrieval for all detectors in the configuration.
1431         // entry: Shuttle logbook entry, contains run paramenters and status of detectors
1432         // (Unprocessed, Inactive, Failed or Done).
1433         // Returns kFALSE in case of error occured and kTRUE otherwise
1434         //
1435
1436         if (!entry) return kFALSE;
1437
1438         fLogbookEntry = entry;
1439
1440         Log("SHUTTLE", Form("\t\t\t^*^*^*^*^*^*^*^*^*^*^*^* run %d: START ^*^*^*^*^*^*^*^*^*^*^*^*",
1441                                         GetCurrentRun()));
1442
1443         // Send the information to ML
1444         TMonaLisaText  mlStatus("SHUTTLE_status", "Processing");
1445         TMonaLisaText  mlRunType("SHUTTLE_runtype", Form("%s (%s)", entry->GetRunType(), entry->GetRunParameter("log")));
1446
1447         TList mlList;
1448         mlList.Add(&mlStatus);
1449         mlList.Add(&mlRunType);
1450
1451         TString mlID;
1452         mlID.Form("%d", GetCurrentRun());
1453         fMonaLisa->SendParameters(&mlList, mlID);
1454
1455         if (fLogbookEntry->IsDone())
1456         {
1457                 Log("SHUTTLE","Process - Shuttle is already DONE. Updating logbook");
1458                 UpdateShuttleLogbook("shuttle_done");
1459                 fLogbookEntry = 0;
1460                 return kTRUE;
1461         }
1462
1463         // read test mode if flag is set
1464         if (fReadTestMode)
1465         {
1466                 fTestMode = kNone;
1467                 TString logEntry(entry->GetRunParameter("log"));
1468                 //printf("log entry = %s\n", logEntry.Data());
1469                 TString searchStr("Testmode: ");
1470                 Int_t pos = logEntry.Index(searchStr.Data());
1471                 //printf("%d\n", pos);
1472                 if (pos >= 0)
1473                 {
1474                         TSubString subStr = logEntry(pos + searchStr.Length(), logEntry.Length());
1475                         //printf("%s\n", subStr.String().Data());
1476                         TString newStr(subStr.Data());
1477                         TObjArray* token = newStr.Tokenize(' ');
1478                         if (token)
1479                         {
1480                                 //token->Print();
1481                                 TObjString* tmpStr = dynamic_cast<TObjString*> (token->First());
1482                                 if (tmpStr)
1483                                 {
1484                                         Int_t testMode = tmpStr->String().Atoi();
1485                                         if (testMode > 0)
1486                                         {
1487                                                 Log("SHUTTLE", Form("Process - Enabling test mode %d", testMode));
1488                                                 SetTestMode((TestMode) testMode);
1489                                         }
1490                                 }
1491                                 delete token;          
1492                         }
1493                 }
1494         }
1495                 
1496         fLogbookEntry->Print("all");
1497
1498         // Initialization
1499         Bool_t hasError = kFALSE;
1500
1501         // Set the CDB and Reference folders according to the year and LHC period
1502         TString lhcPeriod(GetLHCPeriod());
1503         if (lhcPeriod.Length() == 0) 
1504         {
1505                 Log("SHUTTLE","Process - LHCPeriod not found in logbook!");
1506                 return 0; 
1507         }       
1508         
1509         if (fgkMainCDB.Length() == 0)
1510                 fgkMainCDB = Form("alien://folder=/alice/data/%d/%s/OCDB?user=alidaq?cacheFold=/tmp/OCDBCache", 
1511                                         GetCurrentYear(), lhcPeriod.Data());
1512         
1513         if (fgkMainRefStorage.Length() == 0)
1514                 fgkMainRefStorage = Form("alien://folder=/alice/data/%d/%s/Reference?user=alidaq?cacheFold=/tmp/OCDBCache", 
1515                                         GetCurrentYear(), lhcPeriod.Data());
1516         
1517         // Loop on detectors in the configuration
1518         TIter iter(fConfig->GetDetectors());
1519         TObjString* aDetector = 0;
1520
1521         Bool_t first = kTRUE;
1522
1523         while ((aDetector = (TObjString*) iter.Next()))
1524         {
1525                 fCurrentDetector = aDetector->String();
1526
1527                 if (ContinueProcessing() == kFALSE) continue;
1528                 
1529                 if (first)
1530                 {
1531                   // only read QueryCDB when needed and only once
1532                   AliCDBStorage *mainCDBSto = AliCDBManager::Instance()->GetStorage(fgkMainCDB);
1533                   if(mainCDBSto) mainCDBSto->QueryCDB(GetCurrentRun());
1534                   AliCDBStorage *mainRefSto = AliCDBManager::Instance()->GetStorage(fgkMainRefStorage);
1535                   if(mainRefSto) mainRefSto->QueryCDB(GetCurrentRun());
1536                   first = kFALSE;
1537                 }
1538
1539                 Log("SHUTTLE", Form("\t\t\t****** run %d - %s: START  ******",
1540                                                 GetCurrentRun(), aDetector->GetName()));
1541
1542                 for(Int_t iSys=0;iSys<3;iSys++) fFXSCalled[iSys]=kFALSE;
1543
1544                 Log(fCurrentDetector.Data(), "Process - Starting processing");
1545
1546                 Int_t pid = fork();
1547
1548                 if (pid < 0)
1549                 {
1550                         Log("SHUTTLE", "Process - ERROR: Forking failed");
1551                 }
1552                 else if (pid > 0)
1553                 {
1554                         // parent
1555                         Log("SHUTTLE", Form("Process - In parent process of %d - %s: Starting monitoring",
1556                                                         GetCurrentRun(), aDetector->GetName()));
1557
1558                         Long_t begin = time(0);
1559
1560                         int status; // to be used with waitpid, on purpose an int (not Int_t)!
1561                         while (waitpid(pid, &status, WNOHANG) == 0)
1562                         {
1563                                 Long_t expiredTime = time(0) - begin;
1564
1565                                 if (expiredTime > fConfig->GetPPTimeOut())
1566                                 {
1567                                         TString tmp;
1568                                         tmp.Form("Process - Process of %s time out. "
1569                                                         "Run time: %d seconds. Killing...",
1570                                                         fCurrentDetector.Data(), expiredTime);
1571                                         Log("SHUTTLE", tmp);
1572                                         Log(fCurrentDetector, tmp);
1573
1574                                         kill(pid, 9);
1575
1576                                         UpdateShuttleStatus(AliShuttleStatus::kPPTimeOut);
1577                                         hasError = kTRUE;
1578
1579                                         gSystem->Sleep(1000);
1580                                 }
1581                                 else
1582                                 {
1583                                         gSystem->Sleep(1000);
1584                                         
1585                                         TString checkStr;
1586                                         checkStr.Form("ps -o vsize --pid %d | tail -n 1", pid);
1587                                         FILE* pipe = gSystem->OpenPipe(checkStr, "r");
1588                                         if (!pipe)
1589                                         {
1590                                                 Log("SHUTTLE", Form("Process - Error: "
1591                                                         "Could not open pipe to %s", checkStr.Data()));
1592                                                 continue;
1593                                         }
1594                                                 
1595                                         char buffer[100];
1596                                         if (!fgets(buffer, 100, pipe))
1597                                         {
1598                                                 Log("SHUTTLE", "Process - Error: ps did not return anything");
1599                                                 gSystem->ClosePipe(pipe);
1600                                                 continue;
1601                                         }
1602                                         gSystem->ClosePipe(pipe);
1603                                         
1604                                         //Log("SHUTTLE", Form("ps returned %s", buffer));
1605                                         
1606                                         Int_t mem = 0;
1607                                         if ((sscanf(buffer, "%d\n", &mem) != 1) || !mem)
1608                                         {
1609                                                 Log("SHUTTLE", "Process - Error: Could not parse output of ps");
1610                                                 continue;
1611                                         }
1612                                         
1613                                         if (expiredTime % 60 == 0)
1614                                         {
1615                                                 Log("SHUTTLE", Form("Process - %s: Checking process. "
1616                                                         "Run time: %d seconds - Memory consumption: %d KB",
1617                                                         fCurrentDetector.Data(), expiredTime, mem));
1618                                                 SendAlive();
1619                                         }
1620                                         
1621                                         if (mem > fConfig->GetPPMaxMem())
1622                                         {
1623                                                 TString tmp;
1624                                                 tmp.Form("Process - Process exceeds maximum allowed memory "
1625                                                         "(%d KB > %d KB). Killing...",
1626                                                         mem, fConfig->GetPPMaxMem());
1627                                                 Log("SHUTTLE", tmp);
1628                                                 Log(fCurrentDetector, tmp);
1629         
1630                                                 kill(pid, 9);
1631         
1632                                                 UpdateShuttleStatus(AliShuttleStatus::kPPOutOfMemory);
1633                                                 hasError = kTRUE;
1634         
1635                                                 gSystem->Sleep(1000);
1636                                         }
1637                                 }
1638                         }
1639
1640                         Log("SHUTTLE", Form("Process - In parent process of %d - %s: Client has terminated.",
1641                                                                 GetCurrentRun(), aDetector->GetName()));
1642
1643                         if (WIFEXITED(status))
1644                         {
1645                                 Int_t returnCode = WEXITSTATUS(status);
1646
1647                                 Log("SHUTTLE", Form("Process - %s: the return code is %d", fCurrentDetector.Data(),
1648                                                                                 returnCode));
1649
1650                                 if (returnCode == 0) hasError = kTRUE;
1651                         }
1652                 }
1653                 else if (pid == 0)
1654                 {
1655                         // client
1656                         Log("SHUTTLE", Form("Process - In client process of %d - %s", GetCurrentRun(),
1657                                 aDetector->GetName()));
1658
1659                         Log("SHUTTLE", Form("Process - Redirecting output to %s log",fCurrentDetector.Data()));
1660
1661                         if ((freopen(GetLogFileName(fCurrentDetector), "a", stdout)) == 0)
1662                         {
1663                                 Log("SHUTTLE", "Process - Could not freopen stdout");
1664                         }
1665                         else
1666                         {
1667                                 fOutputRedirected = kTRUE;
1668                                 if ((dup2(fileno(stdout), fileno(stderr))) < 0)
1669                                         Log("SHUTTLE", "Process - Could not redirect stderr");
1670                                 
1671                         }
1672                         
1673                         TString wd = gSystem->WorkingDirectory();
1674                         TString tmpDir = Form("%s/%s_%d_process", GetShuttleTempDir(), 
1675                                 fCurrentDetector.Data(), GetCurrentRun());
1676                         
1677                         Int_t result = gSystem->GetPathInfo(tmpDir.Data(), 0, (Long64_t*) 0, 0, 0);
1678                         if (!result) // temp dir already exists!
1679                         {
1680                                 Log(fCurrentDetector.Data(), 
1681                                         Form("Process - %s dir already exists! Removing...", tmpDir.Data()));
1682                                 gSystem->Exec(Form("rm -rf %s",tmpDir.Data()));         
1683                         } 
1684                         
1685                         if (gSystem->mkdir(tmpDir.Data(), 1))
1686                         {
1687                                 Log(fCurrentDetector.Data(), "Process - could not make temp directory!!");
1688                                 gSystem->Exit(1);
1689                         }
1690                         
1691                         if (!gSystem->ChangeDirectory(tmpDir.Data())) 
1692                         {
1693                                 Log(fCurrentDetector.Data(), "Process - could not change directory!!");
1694                                 gSystem->Exit(1);                       
1695                         }
1696                         
1697                         Bool_t success = ProcessCurrentDetector();
1698                         
1699                         gSystem->ChangeDirectory(wd.Data());
1700                                                 
1701                         if (success) // Preprocessor finished successfully!
1702                         { 
1703                                 // remove temporary folder
1704                                 // temporary commented (JF)
1705                                 //gSystem->Exec(Form("rm -rf %s",tmpDir.Data()));
1706                                 
1707                                 // Update time_processed field in FXS DB
1708                                 if (UpdateTable() == kFALSE)
1709                                         Log("SHUTTLE", Form("Process - %s: Could not update FXS databases!", 
1710                                                         fCurrentDetector.Data()));
1711
1712                                 // Transfer the data from local storage to main storage (Grid)
1713                                 UpdateShuttleStatus(AliShuttleStatus::kStoreStarted);
1714                                 if (StoreOCDB() == kFALSE)
1715                                 {
1716                                         Log("SHUTTLE", 
1717                                                 Form("\t\t\t****** run %d - %s: STORAGE ERROR ******",
1718                                                         GetCurrentRun(), aDetector->GetName()));
1719                                         UpdateShuttleStatus(AliShuttleStatus::kStoreError);
1720                                         success = kFALSE;
1721                                 } else {
1722                                         Log("SHUTTLE", 
1723                                                 Form("\t\t\t****** run %d - %s: DONE ******",
1724                                                         GetCurrentRun(), aDetector->GetName()));
1725                                         UpdateShuttleStatus(AliShuttleStatus::kDone);
1726                                         UpdateShuttleLogbook(fCurrentDetector, "DONE");
1727                                 }
1728                         } else 
1729                         {
1730                                 Log("SHUTTLE", 
1731                                         Form("\t\t\t****** run %d - %s: PP ERROR ******",
1732                                                 GetCurrentRun(), aDetector->GetName()));
1733                         }
1734
1735                         for (UInt_t iSys=0; iSys<3; iSys++)
1736                         {
1737                                 if (fFXSCalled[iSys]) fFXSlist[iSys].Clear();
1738                         }
1739
1740                         Log("SHUTTLE", Form("Process - Client process of %d - %s is exiting now with %d.",
1741                                                         GetCurrentRun(), aDetector->GetName(), success));
1742
1743                         // the client exits here
1744                         gSystem->Exit(success);
1745
1746                         AliError("We should never get here!!!");
1747                 }
1748         }
1749
1750         Log("SHUTTLE", Form("\t\t\t^*^*^*^*^*^*^*^*^*^*^*^* run %d: FINISH ^*^*^*^*^*^*^*^*^*^*^*^*",
1751                                                         GetCurrentRun()));
1752
1753         //check if shuttle is done for this run, if so update logbook
1754         TObjArray checkEntryArray;
1755         checkEntryArray.SetOwner(1);
1756         TString whereClause = Form("where run=%d", GetCurrentRun());
1757         if (!QueryShuttleLogbook(whereClause.Data(), checkEntryArray) || 
1758                         checkEntryArray.GetEntries() == 0) {
1759                 Log("SHUTTLE", Form("Process - Warning: Cannot check status of run %d on Shuttle logbook!",
1760                                                 GetCurrentRun()));
1761                 return hasError == kFALSE;
1762         }
1763
1764         AliShuttleLogbookEntry* checkEntry = dynamic_cast<AliShuttleLogbookEntry*>
1765                                                 (checkEntryArray.At(0));
1766
1767         if (checkEntry)
1768         {
1769                 if (checkEntry->IsDone())
1770                 {
1771                         Log("SHUTTLE","Process - Shuttle is DONE. Updating logbook");
1772                         UpdateShuttleLogbook("shuttle_done");
1773                 }
1774                 else
1775                 {
1776                         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
1777                         {
1778                                 if (checkEntry->GetDetectorStatus(iDet) == AliShuttleLogbookEntry::kUnprocessed)
1779                                 {
1780                                         AliDebug(2, Form("Run %d: setting %s as \"not first time unprocessed\"",
1781                                                         checkEntry->GetRun(), GetDetName(iDet)));
1782                                         fFirstUnprocessed[iDet] = kFALSE;
1783                                 }
1784                         }
1785                 }
1786         }
1787
1788         fLogbookEntry = 0;
1789
1790         return hasError == kFALSE;
1791 }
1792
1793 //______________________________________________________________________________________________
1794 Bool_t AliShuttle::ProcessCurrentDetector()
1795 {
1796         //
1797         // Makes data retrieval just for a specific detector (fCurrentDetector).
1798         // Threre should be a configuration for this detector.
1799
1800         Log("SHUTTLE", Form("ProcessCurrentDetector - Retrieving values for %s, run %d", 
1801                                                 fCurrentDetector.Data(), GetCurrentRun()));
1802
1803         TString wd = gSystem->WorkingDirectory();
1804         
1805         if (!CleanReferenceStorage(fCurrentDetector.Data()))
1806                 return kFALSE;
1807         
1808         gSystem->ChangeDirectory(wd.Data());
1809         
1810         TMap* dcsMap = new TMap();
1811
1812         // call preprocessor
1813         AliPreprocessor* aPreprocessor =
1814                 dynamic_cast<AliPreprocessor*> (fPreprocessorMap.GetValue(fCurrentDetector));
1815
1816         aPreprocessor->Initialize(GetCurrentRun(), GetCurrentStartTime(), GetCurrentEndTime());
1817
1818         Bool_t processDCS = aPreprocessor->ProcessDCS();
1819
1820         if (!processDCS)
1821         {
1822                 Log(fCurrentDetector, "ProcessCurrentDetector -"
1823                         " The preprocessor requested to skip the retrieval of DCS values");
1824         }
1825         else if (fTestMode & kSkipDCS)
1826         {
1827                 Log(fCurrentDetector, "ProcessCurrentDetector - In TESTMODE: Skipping DCS processing");
1828         } 
1829         else if (fTestMode & kErrorDCS)
1830         {
1831                 Log(fCurrentDetector, "ProcessCurrentDetector - In TESTMODE: Simulating DCS error");
1832                 UpdateShuttleStatus(AliShuttleStatus::kDCSStarted);
1833                 UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1834                 delete dcsMap;
1835                 return kFALSE;
1836         } else {
1837
1838                 UpdateShuttleStatus(AliShuttleStatus::kDCSStarted);
1839
1840                 // Query DCS archive
1841                 Int_t nServers = fConfig->GetNServers(fCurrentDetector);
1842                 
1843                 for (int iServ=0; iServ<nServers; iServ++)
1844                 {
1845                 
1846                         TString host(fConfig->GetDCSHost(fCurrentDetector, iServ));
1847                         Int_t port = fConfig->GetDCSPort(fCurrentDetector, iServ);
1848                         Int_t multiSplit = fConfig->GetMultiSplit(fCurrentDetector, iServ);
1849
1850                         Log(fCurrentDetector, Form("ProcessCurrentDetector -"
1851                                         " Querying DCS Amanda server %s:%d (%d of %d)", 
1852                                         host.Data(), port, iServ+1, nServers));
1853                         
1854                         TMap* aliasMap = 0;
1855                         TMap* dpMap = 0;
1856         
1857                         if (fConfig->GetDCSAliases(fCurrentDetector, iServ)->GetEntries() > 0)
1858                         {
1859                                 aliasMap = GetValueSet(host, port, 
1860                                                 fConfig->GetDCSAliases(fCurrentDetector, iServ), 
1861                                                 kAlias, multiSplit);
1862                                 if (!aliasMap)
1863                                 {
1864                                         Log(fCurrentDetector, 
1865                                                 Form("ProcessCurrentDetector -"
1866                                                         " Error retrieving DCS aliases from server %s."
1867                                                         " Sending mail to DCS experts!", host.Data()));
1868                                         UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1869                                         
1870                                         //if (!SendMailToDCS())
1871                                         //      Log("SHUTTLE", Form("ProcessCurrentDetector - Could not send mail to DCS experts!"));
1872
1873                                         delete dcsMap;
1874                                         return kFALSE;
1875                                 }
1876                         }
1877                         
1878                         if (fConfig->GetDCSDataPoints(fCurrentDetector, iServ)->GetEntries() > 0)
1879                         {
1880                                 dpMap = GetValueSet(host, port, 
1881                                                 fConfig->GetDCSDataPoints(fCurrentDetector, iServ), 
1882                                                 kDP, multiSplit);
1883                                 if (!dpMap)
1884                                 {
1885                                         Log(fCurrentDetector, 
1886                                                 Form("ProcessCurrentDetector -"
1887                                                         " Error retrieving DCS data points from server %s."
1888                                                         " Sending mail to DCS experts!", host.Data()));
1889                                         UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1890                                         
1891                                         //if (!SendMailToDCS())
1892                                         //      Log("SHUTTLE", Form("ProcessCurrentDetector - Could not send mail to DCS experts!"));
1893                                         
1894                                         if (aliasMap) delete aliasMap;
1895                                         delete dcsMap;
1896                                         return kFALSE;
1897                                 }                               
1898                         }
1899                         
1900                         // merge aliasMap and dpMap into dcsMap
1901                         if(aliasMap) {
1902                                 TIter iter(aliasMap);
1903                                 TObjString* key = 0;
1904                                 while ((key = (TObjString*) iter.Next()))
1905                                         dcsMap->Add(key, aliasMap->GetValue(key->String()));
1906                                 
1907                                 aliasMap->SetOwner(kFALSE);
1908                                 delete aliasMap;
1909                         }       
1910                         
1911                         if(dpMap) {
1912                                 TIter iter(dpMap);
1913                                 TObjString* key = 0;
1914                                 while ((key = (TObjString*) iter.Next()))
1915                                         dcsMap->Add(key, dpMap->GetValue(key->String()));
1916                                 
1917                                 dpMap->SetOwner(kFALSE);
1918                                 delete dpMap;
1919                         }
1920                 }
1921         }
1922         
1923         // save map into file, to help debugging in case of preprocessor error
1924         TFile* f = TFile::Open("DCSMap.root","recreate");
1925         f->cd();
1926         dcsMap->Write("DCSMap", TObject::kSingleKey);
1927         f->Close();
1928         delete f;
1929         
1930         // DCS Archive DB processing successful. Call Preprocessor!
1931         UpdateShuttleStatus(AliShuttleStatus::kPPStarted);
1932
1933         UInt_t returnValue = aPreprocessor->Process(dcsMap);
1934
1935         if (returnValue > 0) // Preprocessor error!
1936         {
1937                 Log(fCurrentDetector, Form("ProcessCurrentDetector - "
1938                                 "Preprocessor failed. Process returned %d.", returnValue));
1939                 UpdateShuttleStatus(AliShuttleStatus::kPPError);
1940                 dcsMap->DeleteAll();
1941                 delete dcsMap;
1942                 return kFALSE;
1943         }
1944         
1945         // preprocessor ok!
1946         UpdateShuttleStatus(AliShuttleStatus::kPPDone);
1947         Log(fCurrentDetector, Form("ProcessCurrentDetector - %s preprocessor returned success",
1948                                 fCurrentDetector.Data()));
1949
1950         dcsMap->DeleteAll();
1951         delete dcsMap;
1952
1953         return kTRUE;
1954 }
1955
1956 //______________________________________________________________________________________________
1957 void AliShuttle::CountOpenRuns()
1958 {
1959         // Query DAQ's Shuttle logbook and sends the number of open runs to ML
1960         
1961         // check connection, in case connect
1962         if (!Connect(3)) 
1963                 return;
1964
1965         TString sqlQuery;
1966         sqlQuery = Form("select count(*) from %s where shuttle_done=0", fConfig->GetShuttlelbTable());
1967         
1968         TSQLResult* aResult = fServer[3]->Query(sqlQuery);
1969         if (!aResult) {
1970                 AliError(Form("Can't execute query <%s>!", sqlQuery.Data()));
1971                 return;
1972         }
1973
1974         AliDebug(2,Form("Query = %s", sqlQuery.Data()));
1975         
1976         if (aResult->GetRowCount() == 0) {
1977                 AliError(Form("No result for query %s received", sqlQuery.Data()));
1978                 return;
1979         }
1980
1981         if (aResult->GetFieldCount() != 1) {
1982                 AliError(Form("Invalid field count for query %s received", sqlQuery.Data()));
1983                 return;
1984         }
1985
1986         TSQLRow* aRow = aResult->Next();
1987         if (!aRow) {
1988                 AliError(Form("Could not receive result of query %s", sqlQuery.Data()));
1989                 return;
1990         }
1991         
1992         TString result(aRow->GetField(0), aRow->GetFieldLength(0));
1993         Int_t count = result.Atoi();
1994         
1995         Log("SHUTTLE", Form("%d unprocessed runs", count));
1996         
1997         delete aRow;
1998         delete aResult;
1999
2000         TMonaLisaValue mlStatus("SHUTTLE_openruns", count);
2001
2002         TList mlList;
2003         mlList.Add(&mlStatus);
2004
2005         fMonaLisa->SendParameters(&mlList, "__PROCESSINGINFO__");
2006 }
2007
2008 //______________________________________________________________________________________________
2009 Bool_t AliShuttle::QueryShuttleLogbook(const char* whereClause,
2010                 TObjArray& entries)
2011 {
2012         // Query DAQ's Shuttle logbook and fills detector status object.
2013         // Call QueryRunParameters to query DAQ logbook for run parameters.
2014         //
2015
2016         entries.SetOwner(1);
2017
2018         // check connection, in case connect
2019         if (!Connect(3)) return kFALSE;
2020
2021         TString sqlQuery;
2022         sqlQuery = Form("select * from %s %s order by run", fConfig->GetShuttlelbTable(), whereClause);
2023
2024         TSQLResult* aResult = fServer[3]->Query(sqlQuery);
2025         if (!aResult) {
2026                 AliError(Form("Can't execute query <%s>!", sqlQuery.Data()));
2027                 return kFALSE;
2028         }
2029
2030         AliDebug(2,Form("Query = %s", sqlQuery.Data()));
2031
2032         if(aResult->GetRowCount() == 0) {
2033                 Log("SHUTTLE", "No entries in Shuttle Logbook match request");
2034                 delete aResult;
2035                 return kTRUE;
2036         }
2037
2038         // TODO Check field count!
2039         const UInt_t nCols = 23;
2040         if (aResult->GetFieldCount() != (Int_t) nCols) {
2041                 Log("SHUTTLE", "Invalid SQL result field number!");
2042                 delete aResult;
2043                 return kFALSE;
2044         }
2045
2046         TSQLRow* aRow;
2047         while ((aRow = aResult->Next())) {
2048                 TString runString(aRow->GetField(0), aRow->GetFieldLength(0));
2049                 Int_t run = runString.Atoi();
2050
2051                 AliShuttleLogbookEntry *entry = QueryRunParameters(run);
2052                 if (!entry)
2053                         continue;
2054
2055                 // loop on detectors
2056                 for(UInt_t ii = 0; ii < nCols; ii++)
2057                         entry->SetDetectorStatus(aResult->GetFieldName(ii), aRow->GetField(ii));
2058
2059                 entries.AddLast(entry);
2060                 delete aRow;
2061         }
2062
2063         delete aResult;
2064         return kTRUE;
2065 }
2066
2067 //______________________________________________________________________________________________
2068 AliShuttleLogbookEntry* AliShuttle::QueryRunParameters(Int_t run)
2069 {
2070         //
2071         // Retrieve run parameters written in the DAQ logbook and sets them into AliShuttleLogbookEntry object
2072         //
2073
2074         // check connection, in case connect
2075         if (!Connect(3))
2076                 return 0;
2077
2078         TString sqlQuery;
2079         sqlQuery.Form("select * from %s where run=%d", fConfig->GetDAQlbTable(), run);
2080
2081         TSQLResult* aResult = fServer[3]->Query(sqlQuery);
2082         if (!aResult) {
2083                 Log("SHUTTLE", Form("Can't execute query <%s>!", sqlQuery.Data()));
2084                 return 0;
2085         }
2086
2087         if (aResult->GetRowCount() == 0) {
2088                 Log("SHUTTLE", Form("QueryRunParameters - No entry in DAQ Logbook for run %d. Skipping", run));
2089                 delete aResult;
2090                 return 0;
2091         }
2092
2093         if (aResult->GetRowCount() > 1) {
2094                 Log("SHUTTLE", Form("QueryRunParameters - UNEXPECTED: "
2095                                 "more than one entry in DAQ Logbook for run %d!", run));
2096                 delete aResult;
2097                 return 0;
2098         }
2099
2100         TSQLRow* aRow = aResult->Next();
2101         if (!aRow)
2102         {
2103                 Log("SHUTTLE", Form("QueryRunParameters - Could not retrieve row for run %d. Skipping", run));
2104                 delete aResult;
2105                 return 0;
2106         }
2107
2108         AliShuttleLogbookEntry* entry = new AliShuttleLogbookEntry(run);
2109
2110         for (Int_t ii = 0; ii < aResult->GetFieldCount(); ii++)
2111                 entry->SetRunParameter(aResult->GetFieldName(ii), aRow->GetField(ii));
2112
2113         UInt_t startTime = entry->GetStartTime();
2114         UInt_t endTime = entry->GetEndTime();
2115
2116 //      if (!startTime || !endTime || startTime > endTime) 
2117 //      {
2118 //              Log("SHUTTLE",
2119 //                      Form("QueryRunParameters - Invalid parameters for Run %d: startTime = %d, endTime = %d. Skipping!",
2120 //                              run, startTime, endTime));              
2121 //              
2122 //              Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));
2123 //              fLogbookEntry = entry;  
2124 //              if (!UpdateShuttleLogbook("shuttle_done"))
2125 //              {
2126 //                      AliError(Form("Could not update logbook for run %d !", run));
2127 //              }
2128 //              fLogbookEntry = 0;
2129 //                              
2130 //              delete entry;
2131 //              delete aRow;
2132 //              delete aResult;
2133 //              return 0;
2134 //      }
2135
2136         if (!startTime) 
2137         {
2138                 Log("SHUTTLE",
2139                         Form("QueryRunParameters - Invalid parameters for Run %d: " 
2140                                 "startTime = %d, endTime = %d. Skipping!",
2141                                         run, startTime, endTime));              
2142                 
2143                 Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));
2144                 fLogbookEntry = entry;  
2145                 if (!UpdateShuttleLogbook("shuttle_ignored"))
2146                 {
2147                         AliError(Form("Could not update logbook for run %d !", run));
2148                 }
2149                 fLogbookEntry = 0;
2150                                 
2151                 delete entry;
2152                 delete aRow;
2153                 delete aResult;
2154                 return 0;
2155         }
2156         
2157         if (startTime && !endTime) 
2158         {
2159                 // TODO Here we don't mark SHUTTLE done, because this may mean 
2160                 //the run is still ongoing!!            
2161                 Log("SHUTTLE",
2162                         Form("QueryRunParameters - Invalid parameters for Run %d: "
2163                              "startTime = %d, endTime = %d. Skipping (Shuttle won't be marked as DONE)!",
2164                                         run, startTime, endTime));              
2165                 
2166                 //Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));
2167                 //fLogbookEntry = entry;        
2168                 //if (!UpdateShuttleLogbook("shuttle_done"))
2169                 //{
2170                 //      AliError(Form("Could not update logbook for run %d !", run));
2171                 //}
2172                 //fLogbookEntry = 0;
2173                                 
2174                 delete entry;
2175                 delete aRow;
2176                 delete aResult;
2177                 return 0;
2178         }
2179                         
2180         if (startTime && endTime && (startTime > endTime)) 
2181         {
2182                 Log("SHUTTLE",
2183                         Form("QueryRunParameters - Invalid parameters for Run %d: "
2184                                 "startTime = %d, endTime = %d. Skipping!",
2185                                         run, startTime, endTime));              
2186                 
2187                 Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));
2188                 fLogbookEntry = entry;  
2189                 if (!UpdateShuttleLogbook("shuttle_ignored"))
2190                 {
2191                         AliError(Form("Could not update logbook for run %d !", run));
2192                 }
2193                 fLogbookEntry = 0;
2194                                 
2195                 delete entry;
2196                 delete aRow;
2197                 delete aResult;
2198                 return 0;
2199         }
2200                         
2201         TString totEventsStr = entry->GetRunParameter("totalEvents");  
2202         Int_t totEvents = totEventsStr.Atoi();
2203         if (totEvents < 1) 
2204         {
2205                 Log("SHUTTLE",
2206                         Form("QueryRunParameters - Run %d has 0 events - Skipping!", run));             
2207                 
2208                 Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));           
2209                 fLogbookEntry = entry;  
2210                 if (!UpdateShuttleLogbook("shuttle_ignored"))
2211                 {
2212                         AliError(Form("Could not update logbook for run %d !", run));
2213                 }
2214                 fLogbookEntry = 0;
2215                                 
2216                 delete entry;
2217                 delete aRow;
2218                 delete aResult;
2219                 return 0;
2220         }
2221
2222         delete aRow;
2223         delete aResult;
2224
2225         return entry;
2226 }
2227
2228 //______________________________________________________________________________________________
2229 TMap* AliShuttle::GetValueSet(const char* host, Int_t port, const TSeqCollection* entries,
2230                               DCSType type, Int_t multiSplit)
2231 {
2232         // Retrieve all "entry" data points from the DCS server
2233         // host, port: TSocket connection parameters
2234         // entries: list of name of the alias or data point
2235         // type: kAlias or kDP
2236         // returns TMap of values, 0 when failure
2237         
2238         AliDCSClient client(host, port, fTimeout, fRetries, multiSplit);
2239
2240         TMap* result = 0;
2241         if (type == kAlias)
2242         {
2243                 result = client.GetAliasValues(entries, GetCurrentStartTime(), 
2244                         GetCurrentEndTime());
2245         } 
2246         else if (type == kDP)
2247         {
2248                 result = client.GetDPValues(entries, GetCurrentStartTime(), 
2249                         GetCurrentEndTime());
2250         }
2251
2252         if (result == 0)
2253         {
2254                 Log(fCurrentDetector.Data(), Form("GetValueSet - Can't get entries! Reason: %s",
2255                         client.GetErrorString(client.GetResultErrorCode())));
2256                 if (client.GetResultErrorCode() == AliDCSClient::fgkServerError)        
2257                         Log(fCurrentDetector.Data(), Form("GetValueSet - Server error code: %s",
2258                                 client.GetServerError().Data()));
2259
2260                 return 0;
2261         }
2262                 
2263         return result;
2264 }
2265
2266 //______________________________________________________________________________________________
2267 const char* AliShuttle::GetFile(Int_t system, const char* detector,
2268                 const char* id, const char* source)
2269 {
2270         // Get calibration file from file exchange servers
2271         // First queris the FXS database for the file name, using the run, detector, id and source info
2272         // then calls RetrieveFile(filename) for actual copy to local disk
2273         // run: current run being processed (given by Logbook entry fLogbookEntry)
2274         // detector: the Preprocessor name
2275         // id: provided as a parameter by the Preprocessor
2276         // source: provided by the Preprocessor through GetFileSources function
2277
2278         // check if test mode should simulate a FXS error
2279         if (fTestMode & kErrorFXSFiles)
2280         {
2281                 Log(detector, Form("GetFile - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
2282                 return 0;
2283         }
2284         
2285         // check connection, in case connect
2286         if (!Connect(system))
2287         {
2288                 Log(detector, Form("GetFile - Couldn't connect to %s FXS database", GetSystemName(system)));
2289                 return 0;
2290         }
2291
2292         // Query preparation
2293         TString sourceName(source);
2294         Int_t nFields = 3;
2295         TString sqlQueryStart = Form("select filePath,size,fileChecksum from %s where",
2296                                                                 fConfig->GetFXSdbTable(system));
2297         TString whereClause = Form("run=%d and detector=\"%s\" and fileId=\"%s\"",
2298                                                                 GetCurrentRun(), detector, id);
2299
2300         if (system == kDAQ)
2301         {
2302                 whereClause += Form(" and DAQsource=\"%s\"", source);
2303         }
2304         else if (system == kDCS)
2305         {
2306                 sourceName="none";
2307         }
2308         else if (system == kHLT)
2309         {
2310                 whereClause += Form(" and DDLnumbers=\"%s\"", source);
2311                 nFields = 3;
2312         }
2313
2314         TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
2315
2316         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2317
2318         // Query execution
2319         TSQLResult* aResult = 0;
2320         aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2321         if (!aResult) {
2322                 Log(detector, Form("GetFileName - Can't execute SQL query to %s database for: id = %s, source = %s",
2323                                 GetSystemName(system), id, sourceName.Data()));
2324                 return 0;
2325         }
2326
2327         if(aResult->GetRowCount() == 0)
2328         {
2329                 Log(detector,
2330                         Form("GetFileName - No entry in %s FXS db for: id = %s, source = %s",
2331                                 GetSystemName(system), id, sourceName.Data()));
2332                 delete aResult;
2333                 return 0;
2334         }
2335
2336         if (aResult->GetRowCount() > 1) {
2337                 Log(detector,
2338                         Form("GetFileName - More than one entry in %s FXS db for: id = %s, source = %s",
2339                                 GetSystemName(system), id, sourceName.Data()));
2340                 delete aResult;
2341                 return 0;
2342         }
2343
2344         if (aResult->GetFieldCount() != nFields) {
2345                 Log(detector,
2346                         Form("GetFileName - Wrong field count in %s FXS db for: id = %s, source = %s",
2347                                 GetSystemName(system), id, sourceName.Data()));
2348                 delete aResult;
2349                 return 0;
2350         }
2351
2352         TSQLRow* aRow = dynamic_cast<TSQLRow*> (aResult->Next());
2353
2354         if (!aRow){
2355                 Log(detector, Form("GetFileName - Empty set result in %s FXS db from query: id = %s, source = %s",
2356                                 GetSystemName(system), id, sourceName.Data()));
2357                 delete aResult;
2358                 return 0;
2359         }
2360
2361         TString filePath(aRow->GetField(0), aRow->GetFieldLength(0));
2362         TString fileSize(aRow->GetField(1), aRow->GetFieldLength(1));
2363         TString fileChecksum(aRow->GetField(2), aRow->GetFieldLength(2));
2364
2365         delete aResult;
2366         delete aRow;
2367
2368         AliDebug(2, Form("filePath = %s; size = %s, fileChecksum = %s",
2369                                 filePath.Data(), fileSize.Data(), fileChecksum.Data()));
2370
2371         // retrieved file is renamed to make it unique
2372         TString localFileName = Form("%s/%s_%d_process/%s_%s_%d_%s_%s.shuttle",
2373                                         GetShuttleTempDir(), detector, GetCurrentRun(),
2374                                         GetSystemName(system), detector, GetCurrentRun(), 
2375                                         id, sourceName.Data());
2376
2377
2378         // file retrieval from FXS
2379         UInt_t nRetries = 0;
2380         UInt_t maxRetries = 3;
2381         Bool_t result = kFALSE;
2382
2383         // copy!! if successful TSystem::Exec returns 0
2384         while(nRetries++ < maxRetries) {
2385                 AliDebug(2, Form("Trying to copy file. Retry # %d", nRetries));
2386                 result = RetrieveFile(system, filePath.Data(), localFileName.Data());
2387                 if(!result)
2388                 {
2389                         Log(detector, Form("GetFileName - Copy of file %s from %s FXS failed",
2390                                         filePath.Data(), GetSystemName(system)));
2391                         continue;
2392                 } 
2393
2394                 if (fileChecksum.Length()>0)
2395                 {
2396                         // compare md5sum of local file with the one stored in the FXS DB
2397                         Int_t md5Comp = gSystem->Exec(Form("md5sum %s |grep %s 2>&1 > /dev/null",
2398                                                 localFileName.Data(), fileChecksum.Data()));
2399
2400                         if (md5Comp != 0)
2401                         {
2402                                 Log(detector, Form("GetFileName - md5sum of file %s does not match with local copy!",
2403                                                         filePath.Data()));
2404                                 result = kFALSE;
2405                                 continue;
2406                         }
2407                 } else {
2408                         Log(fCurrentDetector, Form("GetFile - md5sum of file %s not set in %s database, skipping comparison",
2409                                                         filePath.Data(), GetSystemName(system)));
2410                 }
2411                 if (result) break;
2412         }
2413
2414         if(!result) return 0;
2415
2416         fFXSCalled[system]=kTRUE;
2417         TObjString *fileParams = new TObjString(Form("%s#!?!#%s", id, sourceName.Data()));
2418         fFXSlist[system].Add(fileParams);
2419
2420         static TString staticLocalFileName;
2421         staticLocalFileName.Form("%s", localFileName.Data());
2422         
2423         Log(fCurrentDetector, Form("GetFile - Retrieved file with id %s and "
2424                         "source %s from %s to %s", id, source, 
2425                         GetSystemName(system), localFileName.Data()));
2426                         
2427         return staticLocalFileName.Data();
2428 }
2429
2430 //______________________________________________________________________________________________
2431 Bool_t AliShuttle::RetrieveFile(UInt_t system, const char* fxsFileName, const char* localFileName)
2432 {
2433         //
2434         // Copies file from FXS to local Shuttle machine
2435         //
2436
2437         // check temp directory: trying to cd to temp; if it does not exist, create it
2438         AliDebug(2, Form("Copy file %s from %s FXS into %s",
2439                         GetSystemName(system), fxsFileName, localFileName));
2440                         
2441         TString tmpDir(localFileName);
2442         
2443         tmpDir = tmpDir(0,tmpDir.Last('/'));
2444
2445         Int_t noDir = gSystem->GetPathInfo(tmpDir.Data(), 0, (Long64_t*) 0, 0, 0);
2446         if (noDir) // temp dir does not exists!
2447         {
2448                 if (gSystem->mkdir(tmpDir.Data(), 1))
2449                 {
2450                         Log(fCurrentDetector.Data(), "RetrieveFile - could not make temp directory!!");
2451                         return kFALSE;
2452                 }
2453         }
2454
2455         TString baseFXSFolder;
2456         if (system == kDAQ)
2457         {
2458                 baseFXSFolder = "FES/";
2459         }
2460         else if (system == kDCS)
2461         {
2462                 baseFXSFolder = "";
2463         }
2464         else if (system == kHLT)
2465         {
2466                 baseFXSFolder = "/opt/FXS/";
2467         }
2468
2469
2470         TString command = Form("scp -oPort=%d -2 %s@%s:%s%s %s",
2471                 fConfig->GetFXSPort(system),
2472                 fConfig->GetFXSUser(system),
2473                 fConfig->GetFXSHost(system),
2474                 baseFXSFolder.Data(),
2475                 fxsFileName,
2476                 localFileName);
2477
2478         AliDebug(2, Form("%s",command.Data()));
2479
2480         Bool_t result = (gSystem->Exec(command.Data()) == 0);
2481
2482         return result;
2483 }
2484
2485 //______________________________________________________________________________________________
2486 TList* AliShuttle::GetFileSources(Int_t system, const char* detector, const char* id)
2487 {
2488         //
2489         // Get sources producing the condition file Id from file exchange servers
2490         // if id is NULL all sources are returned (distinct)
2491         //
2492
2493         Log(detector, Form("GetFileSources - Retrieving sources with id %s from %s", id, GetSystemName(system)));
2494         
2495         // check if test mode should simulate a FXS error
2496         if (fTestMode & kErrorFXSSources)
2497         {
2498                 Log(detector, Form("GetFileSources - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
2499                 return 0;
2500         }
2501
2502         if (system == kDCS)
2503         {
2504                 Log(detector, "GetFileSources - WARNING: DCS system has only one source of data!");
2505                 TList *list = new TList();
2506                 list->SetOwner(1);
2507                 list->Add(new TObjString(" "));
2508                 return list;
2509         }
2510
2511         // check connection, in case connect
2512         if (!Connect(system))
2513         {
2514                 Log(detector, Form("GetFileSources - Couldn't connect to %s FXS database", GetSystemName(system)));
2515                 return NULL;
2516         }
2517
2518         TString sourceName = 0;
2519         if (system == kDAQ)
2520         {
2521                 sourceName = "DAQsource";
2522         } else if (system == kHLT)
2523         {
2524                 sourceName = "DDLnumbers";
2525         }
2526
2527         TString sqlQueryStart = Form("select distinct %s from %s where", sourceName.Data(), fConfig->GetFXSdbTable(system));
2528         TString whereClause = Form("run=%d and detector=\"%s\"",
2529                                 GetCurrentRun(), detector);
2530         if (id)
2531                 whereClause += Form(" and fileId=\"%s\"", id);
2532         TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
2533
2534         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2535
2536         // Query execution
2537         TSQLResult* aResult;
2538         aResult = fServer[system]->Query(sqlQuery);
2539         if (!aResult) {
2540                 Log(detector, Form("GetFileSources - Can't execute SQL query to %s database for id: %s",
2541                                 GetSystemName(system), id));
2542                 return 0;
2543         }
2544
2545         TList *list = new TList();
2546         list->SetOwner(1);
2547         
2548         if (aResult->GetRowCount() == 0)
2549         {
2550                 Log(detector,
2551                         Form("GetFileSources - No entry in %s FXS table for id: %s", GetSystemName(system), id));
2552                 delete aResult;
2553                 return list;
2554         }
2555
2556         Log(detector, Form("GetFileSources - Found %d sources", aResult->GetRowCount()));
2557
2558         TSQLRow* aRow;
2559         while ((aRow = aResult->Next()))
2560         {
2561
2562                 TString source(aRow->GetField(0), aRow->GetFieldLength(0));
2563                 AliDebug(2, Form("%s = %s", sourceName.Data(), source.Data()));
2564                 list->Add(new TObjString(source));
2565                 delete aRow;
2566         }
2567
2568         delete aResult;
2569
2570         return list;
2571 }
2572
2573 //______________________________________________________________________________________________
2574 TList* AliShuttle::GetFileIDs(Int_t system, const char* detector, const char* source)
2575 {
2576         //
2577         // Get all ids of condition files produced by a given source from file exchange servers
2578         //
2579         
2580         Log(detector, Form("GetFileIDs - Retrieving ids with source %s with %s", source, GetSystemName(system)));
2581
2582         // check if test mode should simulate a FXS error
2583         if (fTestMode & kErrorFXSSources)
2584         {
2585                 Log(detector, Form("GetFileIDs - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
2586                 return 0;
2587         }
2588
2589         // check connection, in case connect
2590         if (!Connect(system))
2591         {
2592                 Log(detector, Form("GetFileIDs - Couldn't connect to %s FXS database", GetSystemName(system)));
2593                 return NULL;
2594         }
2595
2596         TString sourceName = 0;
2597         if (system == kDAQ)
2598         {
2599                 sourceName = "DAQsource";
2600         } else if (system == kHLT)
2601         {
2602                 sourceName = "DDLnumbers";
2603         }
2604
2605         TString sqlQueryStart = Form("select fileId from %s where", fConfig->GetFXSdbTable(system));
2606         TString whereClause = Form("run=%d and detector=\"%s\"",
2607                                 GetCurrentRun(), detector);
2608         if (sourceName.Length() > 0 && source)
2609                 whereClause += Form(" and %s=\"%s\"", sourceName.Data(), source);
2610         TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
2611
2612         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2613
2614         // Query execution
2615         TSQLResult* aResult;
2616         aResult = fServer[system]->Query(sqlQuery);
2617         if (!aResult) {
2618                 Log(detector, Form("GetFileIDs - Can't execute SQL query to %s database for source: %s",
2619                                 GetSystemName(system), source));
2620                 return 0;
2621         }
2622
2623         TList *list = new TList();
2624         list->SetOwner(1);
2625         
2626         if (aResult->GetRowCount() == 0)
2627         {
2628                 Log(detector,
2629                         Form("GetFileIDs - No entry in %s FXS table for source: %s", GetSystemName(system), source));
2630                 delete aResult;
2631                 return list;
2632         }
2633
2634         Log(detector, Form("GetFileIDs - Found %d ids", aResult->GetRowCount()));
2635
2636         TSQLRow* aRow;
2637
2638         while ((aRow = aResult->Next()))
2639         {
2640
2641                 TString id(aRow->GetField(0), aRow->GetFieldLength(0));
2642                 AliDebug(2, Form("fileId = %s", id.Data()));
2643                 list->Add(new TObjString(id));
2644                 delete aRow;
2645         }
2646
2647         delete aResult;
2648
2649         return list;
2650 }
2651
2652 //______________________________________________________________________________________________
2653 Bool_t AliShuttle::Connect(Int_t system)
2654 {
2655         // Connect to MySQL Server of the system's FXS MySQL databases
2656         // DAQ Logbook, Shuttle Logbook and DAQ FXS db are on the same host
2657         //
2658
2659         // check connection: if already connected return
2660         if(fServer[system] && fServer[system]->IsConnected()) return kTRUE;
2661
2662         TString dbHost, dbUser, dbPass, dbName;
2663
2664         if (system < 3) // FXS db servers
2665         {
2666                 dbHost = Form("mysql://%s:%d", fConfig->GetFXSdbHost(system), fConfig->GetFXSdbPort(system));
2667                 dbUser = fConfig->GetFXSdbUser(system);
2668                 dbPass = fConfig->GetFXSdbPass(system);
2669                 dbName =   fConfig->GetFXSdbName(system);
2670         } else { // Run & Shuttle logbook servers
2671         // TODO Will the Shuttle logbook server be the same as the Run logbook server ???
2672                 dbHost = Form("mysql://%s:%d", fConfig->GetDAQlbHost(), fConfig->GetDAQlbPort());
2673                 dbUser = fConfig->GetDAQlbUser();
2674                 dbPass = fConfig->GetDAQlbPass();
2675                 dbName =   fConfig->GetDAQlbDB();
2676         }
2677
2678         fServer[system] = TSQLServer::Connect(dbHost.Data(), dbUser.Data(), dbPass.Data());
2679         if (!fServer[system] || !fServer[system]->IsConnected()) {
2680                 if(system < 3)
2681                 {
2682                 AliError(Form("Can't establish connection to FXS database for %s",
2683                                         AliShuttleInterface::GetSystemName(system)));
2684                 } else {
2685                 AliError("Can't establish connection to Run logbook.");
2686                 }
2687                 if(fServer[system]) delete fServer[system];
2688                 return kFALSE;
2689         }
2690
2691         // Get tables
2692         TSQLResult* aResult=0;
2693         switch(system){
2694                 case kDAQ:
2695                         aResult = fServer[kDAQ]->GetTables(dbName.Data());
2696                         break;
2697                 case kDCS:
2698                         aResult = fServer[kDCS]->GetTables(dbName.Data());
2699                         break;
2700                 case kHLT:
2701                         aResult = fServer[kHLT]->GetTables(dbName.Data());
2702                         break;
2703                 default:
2704                         aResult = fServer[3]->GetTables(dbName.Data());
2705                         break;
2706         }
2707
2708         delete aResult;
2709         return kTRUE;
2710 }
2711
2712 //______________________________________________________________________________________________
2713 Bool_t AliShuttle::UpdateTable()
2714 {
2715         //
2716         // Update FXS table filling time_processed field in all rows corresponding to current run and detector
2717         //
2718
2719         Bool_t result = kTRUE;
2720
2721         for (UInt_t system=0; system<3; system++)
2722         {
2723                 if(!fFXSCalled[system]) continue;
2724
2725                 // check connection, in case connect
2726                 if (!Connect(system))
2727                 {
2728                         Log(fCurrentDetector, Form("UpdateTable - Couldn't connect to %s FXS database", GetSystemName(system)));
2729                         result = kFALSE;
2730                         continue;
2731                 }
2732
2733                 TTimeStamp now; // now
2734
2735                 // Loop on FXS list entries
2736                 TIter iter(&fFXSlist[system]);
2737                 TObjString *aFXSentry=0;
2738                 while ((aFXSentry = dynamic_cast<TObjString*> (iter.Next())))
2739                 {
2740                         TString aFXSentrystr = aFXSentry->String();
2741                         TObjArray *aFXSarray = aFXSentrystr.Tokenize("#!?!#");
2742                         if (!aFXSarray || aFXSarray->GetEntries() != 2 )
2743                         {
2744                                 Log(fCurrentDetector, Form("UpdateTable - error updating %s FXS entry. Check string: <%s>",
2745                                         GetSystemName(system), aFXSentrystr.Data()));
2746                                 if(aFXSarray) delete aFXSarray;
2747                                 result = kFALSE;
2748                                 continue;
2749                         }
2750                         const char* fileId = ((TObjString*) aFXSarray->At(0))->GetName();
2751                         const char* source = ((TObjString*) aFXSarray->At(1))->GetName();
2752
2753                         TString whereClause;
2754                         if (system == kDAQ)
2755                         {
2756                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\" and DAQsource=\"%s\";",
2757                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId, source);
2758                         }
2759                         else if (system == kDCS)
2760                         {
2761                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\";",
2762                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId);
2763                         }
2764                         else if (system == kHLT)
2765                         {
2766                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\" and DDLnumbers=\"%s\";",
2767                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId, source);
2768                         }
2769
2770                         delete aFXSarray;
2771
2772                         TString sqlQuery = Form("update %s set time_processed=%d %s", fConfig->GetFXSdbTable(system),
2773                                                                 now.GetSec(), whereClause.Data());
2774
2775                         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2776
2777                         // Query execution
2778                         TSQLResult* aResult;
2779                         aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2780                         if (!aResult)
2781                         {
2782                                 Log(fCurrentDetector, Form("UpdateTable - %s db: can't execute SQL query <%s>",
2783                                                                 GetSystemName(system), sqlQuery.Data()));
2784                                 result = kFALSE;
2785                                 continue;
2786                         }
2787                         delete aResult;
2788                 }
2789         }
2790
2791         return result;
2792 }
2793
2794 //______________________________________________________________________________________________
2795 Bool_t AliShuttle::UpdateTableFailCase()
2796 {
2797         // Update FXS table filling time_processed field in all rows corresponding to current run and detector
2798         // this is called in case the preprocessor is declared failed for the current run, because
2799         // the fields are updated only in case of success
2800
2801         Bool_t result = kTRUE;
2802
2803         for (UInt_t system=0; system<3; system++)
2804         {
2805                 // check connection, in case connect
2806                 if (!Connect(system))
2807                 {
2808                         Log(fCurrentDetector, Form("UpdateTableFailCase - Couldn't connect to %s FXS database",
2809                                                         GetSystemName(system)));
2810                         result = kFALSE;
2811                         continue;
2812                 }
2813
2814                 TTimeStamp now; // now
2815
2816                 // Loop on FXS list entries
2817
2818                 TString whereClause = Form("where run=%d and detector=\"%s\";",
2819                                                 GetCurrentRun(), fCurrentDetector.Data());
2820
2821
2822                 TString sqlQuery = Form("update %s set time_processed=%d %s", fConfig->GetFXSdbTable(system),
2823                                                         now.GetSec(), whereClause.Data());
2824
2825                 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2826
2827                 // Query execution
2828                 TSQLResult* aResult;
2829                 aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2830                 if (!aResult)
2831                 {
2832                         Log(fCurrentDetector, Form("UpdateTableFailCase - %s db: can't execute SQL query <%s>",
2833                                                         GetSystemName(system), sqlQuery.Data()));
2834                         result = kFALSE;
2835                         continue;
2836                 }
2837                 delete aResult;
2838         }
2839
2840         return result;
2841 }
2842
2843 //______________________________________________________________________________________________
2844 Bool_t AliShuttle::UpdateShuttleLogbook(const char* detector, const char* status)
2845 {
2846         //
2847         // Update Shuttle logbook filling detector or shuttle_done column
2848         // ex. of usage: UpdateShuttleLogbook("PHOS", "DONE") or UpdateShuttleLogbook("shuttle_done")
2849         //
2850
2851         // check connection, in case connect
2852         if(!Connect(3)){
2853                 Log("SHUTTLE", "UpdateShuttleLogbook - Couldn't connect to DAQ Logbook.");
2854                 return kFALSE;
2855         }
2856
2857         TString detName(detector);
2858         TString setClause;
2859         if (detName == "shuttle_done" || detName == "shuttle_ignored")
2860         {
2861                 setClause = "set shuttle_done=1";
2862
2863                 if (detName == "shuttle_done")
2864                 {
2865                         // Send the information to ML
2866                         TMonaLisaText  mlStatus("SHUTTLE_status", "Done");
2867
2868                         TList mlList;
2869                         mlList.Add(&mlStatus);
2870                 
2871                         TString mlID;
2872                         mlID.Form("%d", GetCurrentRun());
2873                         fMonaLisa->SendParameters(&mlList, mlID);
2874                 }
2875         } else {
2876                 TString statusStr(status);
2877                 if(statusStr.Contains("done", TString::kIgnoreCase) ||
2878                    statusStr.Contains("failed", TString::kIgnoreCase)){
2879                         setClause = Form("set %s=\"%s\"", detector, status);
2880                 } else {
2881                         Log("SHUTTLE",
2882                                 Form("UpdateShuttleLogbook - Invalid status <%s> for detector %s",
2883                                         status, detector));
2884                         return kFALSE;
2885                 }
2886         }
2887
2888         TString whereClause = Form("where run=%d", GetCurrentRun());
2889
2890         TString sqlQuery = Form("update %s %s %s",
2891                                         fConfig->GetShuttlelbTable(), setClause.Data(), whereClause.Data());
2892
2893         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2894
2895         // Query execution
2896         TSQLResult* aResult;
2897         aResult = dynamic_cast<TSQLResult*> (fServer[3]->Query(sqlQuery));
2898         if (!aResult) {
2899                 Log("SHUTTLE", Form("UpdateShuttleLogbook - Can't execute query <%s>", sqlQuery.Data()));
2900                 return kFALSE;
2901         }
2902         delete aResult;
2903
2904         return kTRUE;
2905 }
2906
2907 //______________________________________________________________________________________________
2908 Int_t AliShuttle::GetCurrentRun() const
2909 {
2910         //
2911         // Get current run from logbook entry
2912         //
2913
2914         return fLogbookEntry ? fLogbookEntry->GetRun() : -1;
2915 }
2916
2917 //______________________________________________________________________________________________
2918 UInt_t AliShuttle::GetCurrentStartTime() const
2919 {
2920         //
2921         // get current start time
2922         //
2923
2924         return fLogbookEntry ? fLogbookEntry->GetStartTime() : 0;
2925 }
2926
2927 //______________________________________________________________________________________________
2928 UInt_t AliShuttle::GetCurrentEndTime() const
2929 {
2930         //
2931         // get current end time from logbook entry
2932         //
2933
2934         return fLogbookEntry ? fLogbookEntry->GetEndTime() : 0;
2935 }
2936
2937 //______________________________________________________________________________________________
2938 UInt_t AliShuttle::GetCurrentYear() const
2939 {
2940         //
2941         // Get current year from logbook entry
2942         //
2943
2944         if (!fLogbookEntry) return 0;
2945         
2946         TTimeStamp startTime(GetCurrentStartTime());
2947         TString year =  Form("%d",startTime.GetDate());
2948         year = year(0,4);
2949         
2950         return year.Atoi();
2951 }
2952
2953 //______________________________________________________________________________________________
2954 const char* AliShuttle::GetLHCPeriod() const
2955 {
2956         //
2957         // Get current LHC period from logbook entry
2958         //
2959
2960         if (!fLogbookEntry) return 0;
2961                 
2962         return fLogbookEntry->GetRunParameter("LHCperiod");
2963 }
2964
2965 //______________________________________________________________________________________________
2966 void AliShuttle::Log(const char* detector, const char* message)
2967 {
2968         //
2969         // Fill log string with a message
2970         //
2971
2972         TString logRunDir = GetShuttleLogDir();
2973         if (GetCurrentRun() >=0)
2974                 logRunDir += Form("/%d", GetCurrentRun());
2975         
2976         void* dir = gSystem->OpenDirectory(logRunDir.Data());
2977         if (dir == NULL) {
2978                 if (gSystem->mkdir(logRunDir.Data(), kTRUE)) {
2979                         AliError(Form("Can't open directory <%s>", GetShuttleLogDir()));
2980                         return;
2981                 }
2982
2983         } else {
2984                 gSystem->FreeDirectory(dir);
2985         }
2986
2987         TString toLog = Form("%s (%d): %s - ", TTimeStamp(time(0)).AsString("s"), getpid(), detector);
2988         if (GetCurrentRun() >= 0) 
2989                 toLog += Form("run %d - ", GetCurrentRun());
2990         toLog += Form("%s", message);
2991
2992         AliInfo(toLog.Data());
2993         
2994         // if we redirect the log output already to the file, leave here
2995         if (fOutputRedirected && strcmp(detector, "SHUTTLE") != 0)
2996                 return;
2997
2998         TString fileName = GetLogFileName(detector);
2999         
3000         gSystem->ExpandPathName(fileName);
3001
3002         ofstream logFile;
3003         logFile.open(fileName, ofstream::out | ofstream::app);
3004
3005         if (!logFile.is_open()) {
3006                 AliError(Form("Could not open file %s", fileName.Data()));
3007                 return;
3008         }
3009
3010         logFile << toLog.Data() << "\n";
3011
3012         logFile.close();
3013 }
3014
3015 //______________________________________________________________________________________________
3016 TString AliShuttle::GetLogFileName(const char* detector) const
3017 {
3018         // 
3019         // returns the name of the log file for a given sub detector
3020         //
3021         
3022         TString fileName;
3023         
3024         if (GetCurrentRun() >= 0) 
3025         {
3026                 fileName.Form("%s/%d/%s_%d.log", GetShuttleLogDir(), GetCurrentRun(), 
3027                         detector, GetCurrentRun());
3028         } else {
3029                 fileName.Form("%s/%s.log", GetShuttleLogDir(), detector);
3030         }
3031
3032         return fileName;
3033 }
3034
3035 //______________________________________________________________________________________________
3036 void AliShuttle::SendAlive()
3037 {
3038         // sends alive message to ML
3039         
3040         TMonaLisaText mlStatus("SHUTTLE_status", "Alive");
3041
3042         TList mlList;
3043         mlList.Add(&mlStatus);
3044
3045         fMonaLisa->SendParameters(&mlList, "__PROCESSINGINFO__");
3046 }
3047
3048 //______________________________________________________________________________________________
3049 Bool_t AliShuttle::Collect(Int_t run)
3050 {
3051         //
3052         // Collects conditions data for all UNPROCESSED run written to DAQ LogBook in case of run = -1 (default)
3053         // If a dedicated run is given this run is processed
3054         //
3055         // In operational mode, this is the Shuttle function triggered by the EOR signal.
3056         //
3057
3058         if (run == -1)
3059                 Log("SHUTTLE","Collect - Shuttle called. Collecting conditions data for unprocessed runs");
3060         else
3061                 Log("SHUTTLE", Form("Collect - Shuttle called. Collecting conditions data for run %d", run));
3062
3063         SetLastAction("Starting");
3064
3065         // create ML instance
3066         if (!fMonaLisa)
3067                 fMonaLisa = new TMonaLisaWriter(fConfig->GetMonitorHost(), fConfig->GetMonitorTable());
3068                 
3069         SendAlive();
3070         CountOpenRuns();
3071
3072         TString whereClause("where shuttle_done=0");
3073         if (run != -1)
3074                 whereClause += Form(" and run=%d", run);
3075
3076         TObjArray shuttleLogbookEntries;
3077         if (!QueryShuttleLogbook(whereClause, shuttleLogbookEntries))
3078         {
3079                 Log("SHUTTLE", "Collect - Can't retrieve entries from Shuttle logbook");
3080                 return kFALSE;
3081         }
3082
3083         if (shuttleLogbookEntries.GetEntries() == 0)
3084         {
3085                 if (run == -1)
3086                         Log("SHUTTLE","Collect - Found no UNPROCESSED runs in Shuttle logbook");
3087                 else
3088                         Log("SHUTTLE", Form("Collect - Run %d is already DONE "