Monalisa started in Collect() function. Alive message to monitor is sent at each...
[u/mrichter/AliRoot.git] / SHUTTLE / AliShuttle.cxx
1 /**************************************************************************
2  * Copyright(c) 1998-1999, ALICE Experiment at CERN, All rights reserved. *
3  *                                                                        *
4  * Author: The ALICE Off-line Project.                                    *
5  * Contributors are mentioned in the code where appropriate.              *
6  *                                                                        *
7  * Permission to use, copy, modify and distribute this software and its   *
8  * documentation strictly for non-commercial purposes is hereby granted   *
9  * without fee, provided that the above copyright notice appears in all   *
10  * copies and that both the copyright notice and this permission notice   *
11  * appear in the supporting documentation. The authors make no claims     *
12  * about the suitability of this software for any purpose. It is          *
13  * provided "as is" without express or implied warranty.                  *
14  **************************************************************************/
15
16 /*
17 $Log$
18 Revision 1.69  2007/12/12 10:06:29  acolla
19 in AliShuttle.cxx: SHUTTLE logbook is updated in case of invalid run times:
20
21 time_start==0 && time_end==0
22
23 logbook is NOT updated if time_start != 0 && time_end == 0, because it may mean that the run is still ongoing.
24
25 Revision 1.68  2007/12/11 10:15:17  acolla
26 Added marking SHUTTLE=DONE for invalid runs
27 (invalid start time or end time) and runs with totalEvents < 1
28
29 Revision 1.67  2007/12/07 19:14:36  acolla
30 in AliShuttleTrigger:
31
32 Added automatic collection of new runs on a regular time basis (settable from the configuration)
33
34 in AliShuttleConfig: new members
35
36 - triggerWait: time to wait for DIM trigger (s) before starting automatic collection of new runs
37 - mode: run mode (test, prod) -> used to build log folder (logs or logs_PROD)
38
39 in AliShuttle:
40
41 - logs now stored in logs/#RUN/DET_#RUN.log
42
43 Revision 1.66  2007/12/05 10:45:19  jgrosseo
44 changed order of arguments to TMonaLisaWriter
45
46 Revision 1.65  2007/11/26 16:58:37  acolla
47 Monalisa configuration added: host and table name
48
49 Revision 1.64  2007/11/13 16:15:47  acolla
50 DCS map is stored in a file in the temp folder where the detector is processed.
51 If the preprocessor fails, the temp folder is not removed. This will help the debugging of the problem.
52
53 Revision 1.63  2007/11/02 10:53:16  acolla
54 Protection added to AliShuttle::CopyFileLocally
55
56 Revision 1.62  2007/10/31 18:23:13  acolla
57 Furter developement on the Shuttle:
58
59 - Shuttle now connects to the Grid as alidaq. The OCDB and Reference folders
60 are now built from /alice/data, e.g.:
61 /alice/data/2007/LHC07a/OCDB
62
63 the year and LHC period are taken from the Shuttle.
64 Raw metadata files are stored by GRP to:
65 /alice/data/2007/LHC07a/<runNb>/Raw/RunMetadata.root
66
67 - Shuttle sends a mail to DCS experts each time DP retrieval fails.
68
69 Revision 1.61  2007/10/30 20:33:51  acolla
70 Improved managing of temporary folders, which weren't correctly handled.
71 Resolved bug introduced in StoreReferenceFile, which caused SPD preprocessor fail.
72
73 Revision 1.60  2007/10/29 18:06:16  acolla
74
75 New function StoreRunMetadataFile added to preprocessor and Shuttle interface
76 This function can be used by GRP only. It stores raw data tags merged file to the
77 raw data folder (e.g. /alice/data/2008/LHC08a/000099999/Raw).
78
79 KNOWN ISSUES:
80
81 1. Shuttle cannot write to /alice/data/ because it belongs to alidaq. Tag file is stored in /alice/simulation/... for the time being.
82 2. Due to a bug in TAlien::Mkdir, the creation of a folder in recursive mode (-p option) does not work. The problem
83 has been corrected in the root package on the Shuttle machine.
84
85 Revision 1.59  2007/10/05 12:40:55  acolla
86
87 Result error code added to AliDCSClient data members (it was "lost" with the new implementation of TMap* GetAliasValues and GetDPValues).
88
89 Revision 1.58  2007/09/28 15:27:40  acolla
90
91 AliDCSClient "multiSplit" option added in the DCS configuration
92 in AliDCSMessage: variable MAX_BODY_SIZE set to 500000
93
94 Revision 1.57  2007/09/27 16:53:13  acolla
95 Detectors can have more than one AMANDA server. SHUTTLE queries the servers sequentially,
96 merges the dcs aliases/DPs in one TMap and sends it to the preprocessor.
97
98 Revision 1.56  2007/09/14 16:46:14  jgrosseo
99 1) Connect and Close are called before and after each query, so one can
100 keep the same AliDCSClient object.
101 2) The splitting of a query is moved to GetDPValues/GetAliasValues.
102 3) Splitting interval can be specified in constructor
103
104 Revision 1.55  2007/08/06 12:26:40  acolla
105 Function Bool_t GetHLTStatus added to preprocessor. It returns the status of HLT
106 read from the run logbook.
107
108 Revision 1.54  2007/07/12 09:51:25  jgrosseo
109 removed duplicated log message in GetFile
110
111 Revision 1.53  2007/07/12 09:26:28  jgrosseo
112 updating hlt fxs base path
113
114 Revision 1.52  2007/07/12 08:06:45  jgrosseo
115 adding log messages in getfile... functions
116 adding not implemented copy constructor in alishuttleconfigholder
117
118 Revision 1.51  2007/07/03 17:24:52  acolla
119 root moved to v5-16-00. TFileMerger->Cp moved to TFile::Cp.
120
121 Revision 1.50  2007/07/02 17:19:32  acolla
122 preprocessor is run in a temp directory that is removed when process is finished.
123
124 Revision 1.49  2007/06/29 10:45:06  acolla
125 Number of columns in MySql Shuttle logbook increased by one (HLT added)
126
127 Revision 1.48  2007/06/21 13:06:19  acolla
128 GetFileSources returns dummy list with 1 source if system=DCS (better than
129 returning error as it was)
130
131 Revision 1.47  2007/06/19 17:28:56  acolla
132 HLT updated; missing map bug removed.
133
134 Revision 1.46  2007/06/09 13:01:09  jgrosseo
135 Switching to retrieval of several DCS DPs at a time (multiDPrequest)
136
137 Revision 1.45  2007/05/30 06:35:20  jgrosseo
138 Adding functionality to the Shuttle/TestShuttle:
139 o) Function to retrieve list of sources from a given system (GetFileSources with id=0)
140 o) Function to retrieve list of IDs for a given source      (GetFileIDs)
141 These functions are needed for dealing with the tag files that are saved for the GRP preprocessor
142 Example code has been added to the TestProcessor in TestShuttle
143
144 Revision 1.44  2007/05/11 16:09:32  acolla
145 Reference files for ITS, MUON and PHOS are now stored in OfflineDetName/OnlineDetName/run_...
146 example: ITS/SPD/100_filename.root
147
148 Revision 1.43  2007/05/10 09:59:51  acolla
149 Various bug fixes in StoreRefFilesToGrid; Cleaning of reference storage before processing detector (CleanReferenceStorage)
150
151 Revision 1.42  2007/05/03 08:01:39  jgrosseo
152 typo in last commit :-(
153
154 Revision 1.41  2007/05/03 08:00:48  jgrosseo
155 fixing log message when pp want to skip dcs value retrieval
156
157 Revision 1.40  2007/04/27 07:06:48  jgrosseo
158 GetFileSources returns empty list in case of no files, but successful query
159 No mails sent in testmode
160
161 Revision 1.39  2007/04/17 12:43:57  acolla
162 Correction in StoreOCDB; change of text in mail to detector expert
163
164 Revision 1.38  2007/04/12 08:26:18  jgrosseo
165 updated comment
166
167 Revision 1.37  2007/04/10 16:53:14  jgrosseo
168 redirecting sub detector stdout, stderr to sub detector log file
169
170 Revision 1.35  2007/04/04 16:26:38  acolla
171 1. Re-organization of function calls in TestPreprocessor to make it more meaningful.
172 2. Added missing dependency in test preprocessors.
173 3. in AliShuttle.cxx: processing time and memory consumption info on a single line.
174
175 Revision 1.34  2007/04/04 10:33:36  jgrosseo
176 1) Storing of files to the Grid is now done _after_ your preprocessors succeeded. This is transparent, which means that you can still use the same functions (Store, StoreReferenceData) to store files to the Grid. However, the Shuttle first stores them locally and transfers them after the preprocessor finished. The return code of these two functions has changed from UInt_t to Bool_t which gives you the success of the storing.
177 In case of an error with the Grid, the Shuttle will retry the storing later, the preprocessor does not need to be run again.
178
179 2) The meaning of the return code of the preprocessor has changed. 0 is now success and any other value means failure. This value is stored in the log and you can use it to keep details about the error condition.
180
181 3) New function StoreReferenceFile to _directly_ store a file (without opening it) to the reference storage.
182
183 4) The memory usage of the preprocessor is monitored. If it exceeds 2 GB it is terminated.
184
185 5) New function AliPreprocessor::ProcessDCS(). If you do not need to have DCS data in all cases, you can skip the processing by implemting this function and returning kFALSE under certain conditions. E.g. if there is a certain run type.
186 If you always need DCS data (like before), you do not need to implement it.
187
188 6) The run type has been added to the monitoring page
189
190 Revision 1.33  2007/04/03 13:56:01  acolla
191 Grid Storage at the end of preprocessing. Added virtual method to disable DCS query according to the
192 run type.
193
194 Revision 1.32  2007/02/28 10:41:56  acolla
195 Run type field added in SHUTTLE framework. Run type is read from "run type" logbook and retrieved by
196 AliPreprocessor::GetRunType() function.
197 Added some ldap definition files.
198
199 Revision 1.30  2007/02/13 11:23:21  acolla
200 Moved getters and setters of Shuttle's main OCDB/Reference, local
201 OCDB/Reference, temp and log folders to AliShuttleInterface
202
203 Revision 1.27  2007/01/30 17:52:42  jgrosseo
204 adding monalisa monitoring
205
206 Revision 1.26  2007/01/23 19:20:03  acolla
207 Removed old ldif files, added TOF, MCH ldif files. Added some options in
208 AliShuttleConfig::Print. Added in Ali Shuttle: SetShuttleTempDir and
209 SetShuttleLogDir
210
211 Revision 1.25  2007/01/15 19:13:52  acolla
212 Moved some AliInfo to AliDebug in SendMail function
213
214 Revision 1.21  2006/12/07 08:51:26  jgrosseo
215 update (alberto):
216 table, db names in ldap configuration
217 added GRP preprocessor
218 DCS data can also be retrieved by data point
219
220 Revision 1.20  2006/11/16 16:16:48  jgrosseo
221 introducing strict run ordering flag
222 removed giving preprocessor name to preprocessor, they have to know their name themselves ;-)
223
224 Revision 1.19  2006/11/06 14:23:04  jgrosseo
225 major update (Alberto)
226 o) reading of run parameters from the logbook
227 o) online offline naming conversion
228 o) standalone DCSclient package
229
230 Revision 1.18  2006/10/20 15:22:59  jgrosseo
231 o) Adding time out to the execution of the preprocessors: The Shuttle forks and the parent process monitors the child
232 o) Merging Collect, CollectAll, CollectNew function
233 o) Removing implementation of empty copy constructors (declaration still there!)
234
235 Revision 1.17  2006/10/05 16:20:55  jgrosseo
236 adapting to new CDB classes
237
238 Revision 1.16  2006/10/05 15:46:26  jgrosseo
239 applying to the new interface
240
241 Revision 1.15  2006/10/02 16:38:39  jgrosseo
242 update (alberto):
243 fixed memory leaks
244 storing of objects that failed to be stored to the grid before
245 interfacing of shuttle status table in daq system
246
247 Revision 1.14  2006/08/29 09:16:05  jgrosseo
248 small update
249
250 Revision 1.13  2006/08/15 10:50:00  jgrosseo
251 effc++ corrections (alberto)
252
253 Revision 1.12  2006/08/08 14:19:29  jgrosseo
254 Update to shuttle classes (Alberto)
255
256 - Possibility to set the full object's path in the Preprocessor's and
257 Shuttle's  Store functions
258 - Possibility to extend the object's run validity in the same classes
259 ("startValidity" and "validityInfinite" parameters)
260 - Implementation of the StoreReferenceData function to store reference
261 data in a dedicated CDB storage.
262
263 Revision 1.11  2006/07/21 07:37:20  jgrosseo
264 last run is stored after each run
265
266 Revision 1.10  2006/07/20 09:54:40  jgrosseo
267 introducing status management: The processing per subdetector is divided into several steps,
268 after each step the status is stored on disk. If the system crashes in any of the steps the Shuttle
269 can keep track of the number of failures and skips further processing after a certain threshold is
270 exceeded. These thresholds can be configured in LDAP.
271
272 Revision 1.9  2006/07/19 10:09:55  jgrosseo
273 new configuration, accesst to DAQ FES (Alberto)
274
275 Revision 1.8  2006/07/11 12:44:36  jgrosseo
276 adding parameters for extended validity range of data produced by preprocessor
277
278 Revision 1.7  2006/07/10 14:37:09  jgrosseo
279 small fix + todo comment
280
281 Revision 1.6  2006/07/10 13:01:41  jgrosseo
282 enhanced storing of last sucessfully processed run (alberto)
283
284 Revision 1.5  2006/07/04 14:59:57  jgrosseo
285 revision of AliDCSValue: Removed wrapper classes, reduced storage size per value by factor 2
286
287 Revision 1.4  2006/06/12 09:11:16  jgrosseo
288 coding conventions (Alberto)
289
290 Revision 1.3  2006/06/06 14:26:40  jgrosseo
291 o) removed files that were moved to STEER
292 o) shuttle updated to follow the new interface (Alberto)
293
294 Revision 1.2  2006/03/07 07:52:34  hristov
295 New version (B.Yordanov)
296
297 Revision 1.6  2005/11/19 17:19:14  byordano
298 RetrieveDATEEntries and RetrieveConditionsData added
299
300 Revision 1.5  2005/11/19 11:09:27  byordano
301 AliShuttle declaration added
302
303 Revision 1.4  2005/11/17 17:47:34  byordano
304 TList changed to TObjArray
305
306 Revision 1.3  2005/11/17 14:43:23  byordano
307 import to local CVS
308
309 Revision 1.1.1.1  2005/10/28 07:33:58  hristov
310 Initial import as subdirectory in AliRoot
311
312 Revision 1.2  2005/09/13 08:41:15  byordano
313 default startTime endTime added
314
315 Revision 1.4  2005/08/30 09:13:02  byordano
316 some docs added
317
318 Revision 1.3  2005/08/29 21:15:47  byordano
319 some docs added
320
321 */
322
323 //
324 // This class is the main manager for AliShuttle. 
325 // It organizes the data retrieval from DCS and call the 
326 // interface methods of AliPreprocessor.
327 // For every detector in AliShuttleConfgi (see AliShuttleConfig),
328 // data for its set of aliases is retrieved. If there is registered
329 // AliPreprocessor for this detector then it will be used
330 // accroding to the schema (see AliPreprocessor).
331 // If there isn't registered AliPreprocessor than the retrieved
332 // data is stored automatically to the undelying AliCDBStorage.
333 // For detSpec is used the alias name.
334 //
335
336 #include "AliShuttle.h"
337
338 #include "AliCDBManager.h"
339 #include "AliCDBStorage.h"
340 #include "AliCDBId.h"
341 #include "AliCDBRunRange.h"
342 #include "AliCDBPath.h"
343 #include "AliCDBEntry.h"
344 #include "AliShuttleConfig.h"
345 #include "DCSClient/AliDCSClient.h"
346 #include "AliLog.h"
347 #include "AliPreprocessor.h"
348 #include "AliShuttleStatus.h"
349 #include "AliShuttleLogbookEntry.h"
350
351 #include <TSystem.h>
352 #include <TObject.h>
353 #include <TString.h>
354 #include <TTimeStamp.h>
355 #include <TObjString.h>
356 #include <TSQLServer.h>
357 #include <TSQLResult.h>
358 #include <TSQLRow.h>
359 #include <TMutex.h>
360 #include <TSystemDirectory.h>
361 #include <TSystemFile.h>
362 #include <TFile.h>
363 #include <TGrid.h>
364 #include <TGridResult.h>
365
366 #include <TMonaLisaWriter.h>
367
368 #include <fstream>
369
370 #include <sys/types.h>
371 #include <sys/wait.h>
372
373 ClassImp(AliShuttle)
374
375 //______________________________________________________________________________________________
376 AliShuttle::AliShuttle(const AliShuttleConfig* config,
377                 UInt_t timeout, Int_t retries):
378 fConfig(config),
379 fTimeout(timeout), fRetries(retries),
380 fPreprocessorMap(),
381 fLogbookEntry(0),
382 fCurrentDetector(),
383 fStatusEntry(0),
384 fMonitoringMutex(0),
385 fLastActionTime(0),
386 fLastAction(),
387 fMonaLisa(0),
388 fTestMode(kNone),
389 fReadTestMode(kFALSE),
390 fOutputRedirected(kFALSE)
391 {
392         //
393         // config: AliShuttleConfig used
394         // timeout: timeout used for AliDCSClient connection
395         // retries: the number of retries in case of connection error.
396         //
397
398         if (!fConfig->IsValid()) AliFatal("********** !!!!! Invalid configuration !!!!! **********");
399         for(int iSys=0;iSys<4;iSys++) {
400                 fServer[iSys]=0;
401                 if (iSys < 3)
402                         fFXSlist[iSys].SetOwner(kTRUE);
403         }
404         fPreprocessorMap.SetOwner(kTRUE);
405
406         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
407                 fFirstUnprocessed[iDet] = kFALSE;
408
409         fMonitoringMutex = new TMutex();
410 }
411
412 //______________________________________________________________________________________________
413 AliShuttle::~AliShuttle()
414 {
415         //
416         // destructor
417         //
418
419         fPreprocessorMap.DeleteAll();
420         for(int iSys=0;iSys<4;iSys++)
421                 if(fServer[iSys]) {
422                         fServer[iSys]->Close();
423                         delete fServer[iSys];
424                         fServer[iSys] = 0;
425                 }
426
427         if (fStatusEntry){
428                 delete fStatusEntry;
429                 fStatusEntry = 0;
430         }
431         
432         if (fMonitoringMutex) 
433         {
434                 delete fMonitoringMutex;
435                 fMonitoringMutex = 0;
436         }
437 }
438
439 //______________________________________________________________________________________________
440 void AliShuttle::RegisterPreprocessor(AliPreprocessor* preprocessor)
441 {
442         //
443         // Registers new AliPreprocessor.
444         // It uses GetName() for indentificator of the pre processor.
445         // The pre processor is registered it there isn't any other
446         // with the same identificator (GetName()).
447         //
448
449         const char* detName = preprocessor->GetName();
450         if(GetDetPos(detName) < 0)
451                 AliFatal(Form("********** !!!!! Invalid detector name: %s !!!!! **********", detName));
452
453         if (fPreprocessorMap.GetValue(detName)) {
454                 AliWarning(Form("AliPreprocessor %s is already registered!", detName));
455                 return;
456         }
457
458         fPreprocessorMap.Add(new TObjString(detName), preprocessor);
459 }
460 //______________________________________________________________________________________________
461 Bool_t AliShuttle::Store(const AliCDBPath& path, TObject* object,
462                 AliCDBMetaData* metaData, Int_t validityStart, Bool_t validityInfinite)
463 {
464         // Stores a CDB object in the storage for offline reconstruction. Objects that are not needed for
465         // offline reconstruction, but should be stored anyway (e.g. for debugging) should NOT be stored
466         // using this function. Use StoreReferenceData instead!
467         // It calls StoreLocally function which temporarily stores the data locally; when the preprocessor
468         // finishes the data are transferred to the main storage (Grid).
469
470         return StoreLocally(fgkLocalCDB, path, object, metaData, validityStart, validityInfinite);
471 }
472
473 //______________________________________________________________________________________________
474 Bool_t AliShuttle::StoreReferenceData(const AliCDBPath& path, TObject* object, AliCDBMetaData* metaData)
475 {
476         // Stores a CDB object in the storage for reference data. This objects will not be available during
477         // offline reconstrunction. Use this function for reference data only!
478         // It calls StoreLocally function which temporarily stores the data locally; when the preprocessor
479         // finishes the data are transferred to the main storage (Grid).
480
481         return StoreLocally(fgkLocalRefStorage, path, object, metaData);
482 }
483
484 //______________________________________________________________________________________________
485 Bool_t AliShuttle::StoreLocally(const TString& localUri,
486                         const AliCDBPath& path, TObject* object, AliCDBMetaData* metaData,
487                         Int_t validityStart, Bool_t validityInfinite)
488 {
489         // Store object temporarily in local storage. Parameters are passed by Store and StoreReferenceData functions.
490         // when the preprocessor finishes the data are transferred to the main storage (Grid).
491         // The parameters are:
492         //   1) Uri of the backup storage (Local)
493         //   2) the object's path.
494         //   3) the object to be stored
495         //   4) the metaData to be associated with the object
496         //   5) the validity start run number w.r.t. the current run,
497         //      if the data is valid only for this run leave the default 0
498         //   6) specifies if the calibration data is valid for infinity (this means until updated),
499         //      typical for calibration runs, the default is kFALSE
500         //
501         // returns 0 if fail, 1 otherwise
502
503         if (fTestMode & kErrorStorage)
504         {
505                 Log(fCurrentDetector, "StoreLocally - In TESTMODE - Simulating error while storing locally");
506                 return kFALSE;
507         }
508         
509         const char* cdbType = (localUri == fgkLocalCDB) ? "CDB" : "Reference";
510
511         Int_t firstRun = GetCurrentRun() - validityStart;
512         if(firstRun < 0) {
513                 AliWarning("First valid run happens to be less than 0! Setting it to 0.");
514                 firstRun=0;
515         }
516
517         Int_t lastRun = -1;
518         if(validityInfinite) {
519                 lastRun = AliCDBRunRange::Infinity();
520         } else {
521                 lastRun = GetCurrentRun();
522         }
523
524         // Version is set to current run, it will be used later to transfer data to Grid
525         AliCDBId id(path, firstRun, lastRun, GetCurrentRun(), -1);
526
527         if(! dynamic_cast<TObjString*> (metaData->GetProperty("RunUsed(TObjString)"))){
528                 TObjString runUsed = Form("%d", GetCurrentRun());
529                 metaData->SetProperty("RunUsed(TObjString)", runUsed.Clone());
530         }
531
532         Bool_t result = kFALSE;
533
534         if (!(AliCDBManager::Instance()->GetStorage(localUri))) {
535                 Log("SHUTTLE", Form("StoreLocally - Cannot activate local %s storage", cdbType));
536         } else {
537                 result = AliCDBManager::Instance()->GetStorage(localUri)
538                                         ->Put(object, id, metaData);
539         }
540
541         if(!result) {
542
543                 Log(fCurrentDetector, Form("StoreLocally - Can't store object <%s>!", id.ToString().Data()));
544         }
545
546         return result;
547 }
548
549 //______________________________________________________________________________________________
550 Bool_t AliShuttle::StoreOCDB()
551 {
552         //
553         // Called when preprocessor ends successfully or when previous storage attempt failed (kStoreError status)
554         // Calls underlying StoreOCDB(const char*) function twice, for OCDB and Reference storage.
555         // Then calls StoreRefFilesToGrid to store reference files. 
556         //
557         
558         if (fTestMode & kErrorGrid)
559         {
560                 Log("SHUTTLE", "StoreOCDB - In TESTMODE - Simulating error while storing in the Grid");
561                 Log(fCurrentDetector, "StoreOCDB - In TESTMODE - Simulating error while storing in the Grid");
562                 return kFALSE;
563         }
564         
565         Log("SHUTTLE","StoreOCDB - Storing OCDB data ...");
566         Bool_t resultCDB = StoreOCDB(fgkMainCDB);
567
568         Log("SHUTTLE","StoreOCDB - Storing reference data ...");
569         Bool_t resultRef = StoreOCDB(fgkMainRefStorage);
570         
571         Log("SHUTTLE","StoreOCDB - Storing reference files ...");
572         Bool_t resultRefFiles = CopyFilesToGrid("reference");
573         
574         Bool_t resultMetadata = kTRUE;
575         if(fCurrentDetector == "GRP") 
576         {
577                 Log("StoreOCDB - SHUTTLE","Storing Run Metadata file ...");
578                 resultMetadata = CopyFilesToGrid("metadata");
579         }
580         
581         return resultCDB && resultRef && resultRefFiles && resultMetadata;
582 }
583
584 //______________________________________________________________________________________________
585 Bool_t AliShuttle::StoreOCDB(const TString& gridURI)
586 {
587         //
588         // Called by StoreOCDB(), performs actual storage to the main OCDB and reference storages (Grid)
589         //
590
591         TObjArray* gridIds=0;
592
593         Bool_t result = kTRUE;
594
595         const char* type = 0;
596         TString localURI;
597         if(gridURI == fgkMainCDB) {
598                 type = "OCDB";
599                 localURI = fgkLocalCDB;
600         } else if(gridURI == fgkMainRefStorage) {
601                 type = "reference";
602                 localURI = fgkLocalRefStorage;
603         } else {
604                 AliError(Form("Invalid storage URI: %s", gridURI.Data()));
605                 return kFALSE;
606         }
607
608         AliCDBManager* man = AliCDBManager::Instance();
609
610         AliCDBStorage *gridSto = man->GetStorage(gridURI);
611         if(!gridSto) {
612                 Log("SHUTTLE",
613                         Form("StoreOCDB - cannot activate main %s storage", type));
614                 return kFALSE;
615         }
616
617         gridIds = gridSto->GetQueryCDBList();
618
619         // get objects previously stored in local CDB
620         AliCDBStorage *localSto = man->GetStorage(localURI);
621         if(!localSto) {
622                 Log("SHUTTLE",
623                         Form("StoreOCDB - cannot activate local %s storage", type));
624                 return kFALSE;
625         }
626         AliCDBPath aPath(GetOfflineDetName(fCurrentDetector.Data()),"*","*");
627         // Local objects were stored with current run as Grid version!
628         TList* localEntries = localSto->GetAll(aPath.GetPath(), GetCurrentRun(), GetCurrentRun());
629         localEntries->SetOwner(1);
630
631         // loop on local stored objects
632         TIter localIter(localEntries);
633         AliCDBEntry *aLocEntry = 0;
634         while((aLocEntry = dynamic_cast<AliCDBEntry*> (localIter.Next()))){
635                 aLocEntry->SetOwner(1);
636                 AliCDBId aLocId = aLocEntry->GetId();
637                 aLocEntry->SetVersion(-1);
638                 aLocEntry->SetSubVersion(-1);
639
640                 // If local object is valid up to infinity we store it only if it is
641                 // the first unprocessed run!
642                 if (aLocId.GetLastRun() == AliCDBRunRange::Infinity() &&
643                         !fFirstUnprocessed[GetDetPos(fCurrentDetector)])
644                 {
645                         Log("SHUTTLE", Form("StoreOCDB - %s: object %s has validity infinite but "
646                                                 "there are previous unprocessed runs!",
647                                                 fCurrentDetector.Data(), aLocId.GetPath().Data()));
648                         continue;
649                 }
650
651                 // loop on Grid valid Id's
652                 Bool_t store = kTRUE;
653                 TIter gridIter(gridIds);
654                 AliCDBId* aGridId = 0;
655                 while((aGridId = dynamic_cast<AliCDBId*> (gridIter.Next()))){
656                         if(aGridId->GetPath() != aLocId.GetPath()) continue;
657                         // skip all objects valid up to infinity
658                         if(aGridId->GetLastRun() == AliCDBRunRange::Infinity()) continue;
659                         // if we get here, it means there's already some more recent object stored on Grid!
660                         store = kFALSE;
661                         break;
662                 }
663
664                 // If we get here, the file can be stored!
665                 Bool_t storeOk = gridSto->Put(aLocEntry);
666                 if(!store || storeOk){
667
668                         if (!store)
669                         {
670                                 Log(fCurrentDetector.Data(),
671                                         Form("StoreOCDB - A more recent object already exists in %s storage: <%s>",
672                                                 type, aGridId->ToString().Data()));
673                         } else {
674                                 Log("SHUTTLE",
675                                         Form("StoreOCDB - Object <%s> successfully put into %s storage",
676                                                 aLocId.ToString().Data(), type));
677                                 Log(fCurrentDetector.Data(),
678                                         Form("StoreOCDB - Object <%s> successfully put into %s storage",
679                                                 aLocId.ToString().Data(), type));
680                         }
681
682                         // removing local filename...
683                         TString filename;
684                         localSto->IdToFilename(aLocId, filename);
685                         Log("SHUTTLE", Form("StoreOCDB - Removing local file %s", filename.Data()));
686                         RemoveFile(filename.Data());
687                         continue;
688                 } else  {
689                         Log("SHUTTLE",
690                                 Form("StoreOCDB - Grid %s storage of object <%s> failed",
691                                         type, aLocId.ToString().Data()));
692                         Log(fCurrentDetector.Data(),
693                                 Form("StoreOCDB - Grid %s storage of object <%s> failed",
694                                         type, aLocId.ToString().Data()));
695                         result = kFALSE;
696                 }
697         }
698         localEntries->Clear();
699
700         return result;
701 }
702
703 //______________________________________________________________________________________________
704 Bool_t AliShuttle::CleanReferenceStorage(const char* detector)
705 {
706         // clears the directory used to store reference files of a given subdetector
707   
708         AliCDBManager* man = AliCDBManager::Instance();
709         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
710         TString localBaseFolder = sto->GetBaseFolder();
711
712         TString targetDir = GetRefFilePrefix(localBaseFolder.Data(), detector);
713         
714         Log("SHUTTLE", Form("CleanReferenceStorage - Cleaning %s", targetDir.Data()));
715
716         TString begin;
717         begin.Form("%d_", GetCurrentRun());
718         
719         TSystemDirectory* baseDir = new TSystemDirectory("/", targetDir);
720         if (!baseDir)
721                 return kTRUE;
722                 
723         TList* dirList = baseDir->GetListOfFiles();
724         delete baseDir;
725         
726         if (!dirList) return kTRUE;
727                         
728         if (dirList->GetEntries() < 3) 
729         {
730                 delete dirList;
731                 return kTRUE;
732         }
733                                 
734         Int_t nDirs = 0, nDel = 0;
735         TIter dirIter(dirList);
736         TSystemFile* entry = 0;
737
738         Bool_t success = kTRUE;
739         
740         while ((entry = dynamic_cast<TSystemFile*> (dirIter.Next())))
741         {                                       
742                 if (entry->IsDirectory())
743                         continue;
744                 
745                 TString fileName(entry->GetName());
746                 if (!fileName.BeginsWith(begin))
747                         continue;
748                         
749                 nDirs++;
750                                                 
751                 // delete file
752                 Int_t result = gSystem->Unlink(fileName.Data());
753                 
754                 if (result)
755                 {
756                         Log("SHUTTLE", Form("CleanReferenceStorage - Could not delete file %s!", fileName.Data()));
757                         success = kFALSE;
758                 } else {
759                         nDel++;
760                 }
761         }
762
763         if(nDirs > 0)
764                 Log("SHUTTLE", Form("CleanReferenceStorage - %d (over %d) reference files in folder %s were deleted.", 
765                         nDel, nDirs, targetDir.Data()));
766
767                 
768         delete dirList;
769         return success;
770
771
772
773
774
775
776   Int_t result = gSystem->GetPathInfo(targetDir, 0, (Long64_t*) 0, 0, 0);
777   if (result == 0)
778   {
779     // delete directory
780     result = gSystem->Exec(Form("rm -rf %s", targetDir.Data()));
781     if (result != 0)
782     {  
783       Log("SHUTTLE", Form("CleanReferenceStorage - Could not clean directory %s", targetDir.Data()));
784       return kFALSE;
785     }
786   }
787
788   result = gSystem->mkdir(targetDir, kTRUE);
789   if (result != 0)
790   {
791     Log("SHUTTLE", Form("CleanReferenceStorage - Error creating base directory %s", targetDir.Data()));
792     return kFALSE;
793   }
794         
795   return kTRUE;
796 }
797
798 //______________________________________________________________________________________________
799 Bool_t AliShuttle::StoreReferenceFile(const char* detector, const char* localFile, const char* gridFileName)
800 {
801         //
802         // Stores reference file directly (without opening it). This function stores the file locally.
803         //
804         // The file is stored under the following location: 
805         // <base folder of local reference storage>/<DET>/<RUN#>_<gridFileName>
806         // where <gridFileName> is the second parameter given to the function
807         // 
808         
809         if (fTestMode & kErrorStorage)
810         {
811                 Log(fCurrentDetector, "StoreReferenceFile - In TESTMODE - Simulating error while storing locally");
812                 return kFALSE;
813         }
814         
815         AliCDBManager* man = AliCDBManager::Instance();
816         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
817         
818         TString localBaseFolder = sto->GetBaseFolder();
819         
820         TString target = GetRefFilePrefix(localBaseFolder.Data(), detector);    
821         target.Append(Form("/%d_%s", GetCurrentRun(), gridFileName));
822         
823         return CopyFileLocally(localFile, target);
824 }
825
826 //______________________________________________________________________________________________
827 Bool_t AliShuttle::StoreRunMetadataFile(const char* localFile, const char* gridFileName)
828 {
829         //
830         // Stores Run metadata file to the Grid, in the run folder
831         //
832         // Only GRP can call this function.
833         
834         if (fTestMode & kErrorStorage)
835         {
836                 Log(fCurrentDetector, "StoreRunMetaDataFile - In TESTMODE - Simulating error while storing locally");
837                 return kFALSE;
838         }
839         
840         AliCDBManager* man = AliCDBManager::Instance();
841         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
842         
843         TString localBaseFolder = sto->GetBaseFolder();
844         
845         // Build Run level folder
846         // folder = /alice/data/year/lhcPeriod/runNb/Raw
847         
848                 
849         TString lhcPeriod = GetLHCPeriod();     
850         if (lhcPeriod.Length() == 0) 
851         {
852                 Log("SHUTTLE","StoreRunMetaDataFile - LHCPeriod not found in logbook!");
853                 return 0;
854         }
855         
856         TString target = Form("%s/GRP/RunMetadata/alice/data/%d/%s/%09d/Raw/%s", 
857                                 localBaseFolder.Data(), GetCurrentYear(), 
858                                 lhcPeriod.Data(), GetCurrentRun(), gridFileName);
859                                         
860         return CopyFileLocally(localFile, target);
861 }
862
863 //______________________________________________________________________________________________
864 Bool_t AliShuttle::CopyFileLocally(const char* localFile, const TString& target)
865 {
866         //
867         // Stores file locally. Called by StoreReferenceFile and StoreRunMetadataFile
868         // Files are temporarily stored in the local reference storage. When the preprocessor 
869         // finishes, the Shuttle calls CopyFilesToGrid to transfer the files to AliEn 
870         // (in reference or run level folders)
871         //
872         
873         TString targetDir(target(0, target.Last('/')));
874         
875         //try to open base dir folder, if it does not exist
876         void* dir = gSystem->OpenDirectory(targetDir.Data());
877         if (dir == NULL) {
878                 if (gSystem->mkdir(targetDir.Data(), kTRUE)) {
879                         Log("SHUTTLE", Form("StoreFileLocally - Can't open directory <%s>", targetDir.Data()));
880                         return kFALSE;
881                 }
882
883         } else {
884                 gSystem->FreeDirectory(dir);
885         }
886         
887         Int_t result = 0;
888         
889         result = gSystem->GetPathInfo(localFile, 0, (Long64_t*) 0, 0, 0);
890         if (result)
891         {
892                 Log("SHUTTLE", Form("StoreFileLocally - %s does not exist", localFile));
893                 return kFALSE;
894         }
895
896         result = gSystem->GetPathInfo(target, 0, (Long64_t*) 0, 0, 0);
897         if (!result)
898         {
899                 Log("SHUTTLE", Form("StoreFileLocally - target file %s already exist, removing...", target.Data()));
900                 if (gSystem->Unlink(target.Data()))
901                 {
902                         Log("SHUTTLE", Form("StoreFileLocally - Could not remove existing target file %s!", target.Data()));
903                         return kFALSE;
904                 }
905         }       
906         
907         result = gSystem->CopyFile(localFile, target);
908
909         if (result == 0)
910         {
911                 Log("SHUTTLE", Form("StoreFileLocally - File %s stored locally to %s", localFile, target.Data()));
912                 return kTRUE;
913         }
914         else
915         {
916                 Log("SHUTTLE", Form("StoreFileLocally - Could not store file %s to %s! Error code = %d", 
917                                 localFile, target.Data(), result));
918                 return kFALSE;
919         }       
920
921
922
923 }
924
925 //______________________________________________________________________________________________
926 Bool_t AliShuttle::CopyFilesToGrid(const char* type)
927 {
928         //
929         // Transfers local files to the Grid. Local files can be reference files 
930         // or run metadata file (from GRP only).
931         //
932         // According to the type (ref, metadata) the files are stored under the following location: 
933         // ref --> <base folder of reference storage>/<DET>/<RUN#>_<gridFileName>
934         // metadata --> <run data folder>/<MetadataFileName>
935         //
936                 
937         AliCDBManager* man = AliCDBManager::Instance();
938         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
939         if (!sto)
940                 return kFALSE;
941         TString localBaseFolder = sto->GetBaseFolder();
942         
943         TString dir;
944         TString alienDir;
945         TString begin;
946         
947         if (strcmp(type, "reference") == 0) 
948         {
949                 dir = GetRefFilePrefix(localBaseFolder.Data(), fCurrentDetector.Data());
950                 AliCDBStorage* gridSto = man->GetStorage(fgkMainRefStorage);
951                 if (!gridSto)
952                         return kFALSE;
953                 TString gridBaseFolder = gridSto->GetBaseFolder();
954                 alienDir = GetRefFilePrefix(gridBaseFolder.Data(), fCurrentDetector.Data());
955                 begin = Form("%d_", GetCurrentRun());
956         } 
957         else if (strcmp(type, "metadata") == 0)
958         {
959                         
960                 TString lhcPeriod = GetLHCPeriod();
961         
962                 if (lhcPeriod.Length() == 0) 
963                 {
964                         Log("SHUTTLE","CopyFilesToGrid - LHCPeriod not found in logbook!");
965                         return 0;
966                 }
967                 
968                 dir = Form("%s/GRP/RunMetadata/alice/data/%d/%s/%09d/Raw", 
969                                 localBaseFolder.Data(), GetCurrentYear(), 
970                                 lhcPeriod.Data(), GetCurrentRun());
971                 alienDir = dir(dir.Index("/alice/data/"), dir.Length());
972                 
973                 begin = "";
974         }
975         else 
976         {
977                 Log("SHUTTLE", "CopyFilesToGrid - Unexpected: type label must be reference or metadata!");
978                 return kFALSE;
979         }
980                 
981         TSystemDirectory* baseDir = new TSystemDirectory("/", dir);
982         if (!baseDir)
983                 return kTRUE;
984                 
985         TList* dirList = baseDir->GetListOfFiles();
986         delete baseDir;
987         
988         if (!dirList) return kTRUE;
989                 
990         if (dirList->GetEntries() < 3) 
991         {
992                 delete dirList;
993                 return kTRUE;
994         }
995                         
996         if (!gGrid)
997         { 
998                 Log("SHUTTLE", "CopyFilesToGrid - Connection to Grid failed: Cannot continue!");
999                 delete dirList;
1000                 return kFALSE;
1001         }
1002         
1003         Int_t nDirs = 0, nTransfer = 0;
1004         TIter dirIter(dirList);
1005         TSystemFile* entry = 0;
1006
1007         Bool_t success = kTRUE;
1008         Bool_t first = kTRUE;
1009         
1010         while ((entry = dynamic_cast<TSystemFile*> (dirIter.Next())))
1011         {                       
1012                 if (entry->IsDirectory())
1013                         continue;
1014                         
1015                 TString fileName(entry->GetName());
1016                 if (!fileName.BeginsWith(begin))
1017                         continue;
1018                         
1019                 nDirs++;
1020                         
1021                 if (first)
1022                 {
1023                         first = kFALSE;
1024                         // check that folder exists, otherwise create it
1025                         TGridResult* result = gGrid->Ls(alienDir.Data(), "a");
1026                         
1027                         if (!result)
1028                         {
1029                                 delete dirList;
1030                                 return kFALSE;
1031                         }
1032                         
1033                         if (!result->GetFileName(1)) // TODO: It looks like element 0 is always 0!!
1034                         {
1035                                 // TODO It does not work currently! Bug in TAliEn::Mkdir
1036                                 // TODO Manually fixed in local root v5-16-00
1037                                 if (!gGrid->Mkdir(alienDir.Data(),"-p",0))
1038                                 {
1039                                         Log("SHUTTLE", Form("CopyFilesToGrid - Cannot create directory %s",
1040                                                         alienDir.Data()));
1041                                         delete dirList;
1042                                         return kFALSE;
1043                                 } else {
1044                                         Log("SHUTTLE",Form("CopyFilesToGrid - Folder %s created", alienDir.Data()));
1045                                 }
1046                                 
1047                         } else {
1048                                         Log("SHUTTLE",Form("CopyFilesToGrid - Folder %s found", alienDir.Data()));
1049                         }
1050                 }
1051                         
1052                 TString fullLocalPath;
1053                 fullLocalPath.Form("%s/%s", dir.Data(), fileName.Data());
1054                 
1055                 TString fullGridPath;
1056                 fullGridPath.Form("alien://%s/%s", alienDir.Data(), fileName.Data());
1057
1058                 Bool_t result = TFile::Cp(fullLocalPath, fullGridPath);
1059                 
1060                 if (result)
1061                 {
1062                         Log("SHUTTLE", Form("CopyFilesToGrid - Copying local file %s to %s succeeded!", 
1063                                                 fullLocalPath.Data(), fullGridPath.Data()));
1064                         RemoveFile(fullLocalPath);
1065                         nTransfer++;
1066                 }
1067                 else
1068                 {
1069                         Log("SHUTTLE", Form("CopyFilesToGrid - Copying local file %s to %s FAILED!", 
1070                                                 fullLocalPath.Data(), fullGridPath.Data()));
1071                         success = kFALSE;
1072                 }
1073         }
1074
1075         Log("SHUTTLE", Form("CopyFilesToGrid - %d (over %d) files in folder %s copied to Grid.", 
1076                                                 nTransfer, nDirs, dir.Data()));
1077
1078                 
1079         delete dirList;
1080         return success;
1081 }
1082
1083 //______________________________________________________________________________________________
1084 const char* AliShuttle::GetRefFilePrefix(const char* base, const char* detector)
1085 {
1086         //
1087         // Get folder name of reference files 
1088         //
1089
1090         TString offDetStr(GetOfflineDetName(detector));
1091         TString dir;
1092         if (offDetStr == "ITS" || offDetStr == "MUON" || offDetStr == "PHOS")
1093         {
1094                 dir.Form("%s/%s/%s", base, offDetStr.Data(), detector);
1095         } else {
1096                 dir.Form("%s/%s", base, offDetStr.Data());
1097         }
1098         
1099         return dir.Data();
1100         
1101
1102 }
1103
1104 //______________________________________________________________________________________________
1105 void AliShuttle::CleanLocalStorage(const TString& uri)
1106 {
1107         //
1108         // Called in case the preprocessor is declared failed. Remove remaining objects from the local storages.
1109         //
1110
1111         const char* type = 0;
1112         if(uri == fgkLocalCDB) {
1113                 type = "OCDB";
1114         } else if(uri == fgkLocalRefStorage) {
1115                 type = "Reference";
1116         } else {
1117                 AliError(Form("Invalid storage URI: %s", uri.Data()));
1118                 return;
1119         }
1120
1121         AliCDBManager* man = AliCDBManager::Instance();
1122
1123         // open local storage
1124         AliCDBStorage *localSto = man->GetStorage(uri);
1125         if(!localSto) {
1126                 Log("SHUTTLE",
1127                         Form("CleanLocalStorage - cannot activate local %s storage", type));
1128                 return;
1129         }
1130
1131         TString filename(Form("%s/%s/*/Run*_v%d_s*.root",
1132                 localSto->GetBaseFolder().Data(), GetOfflineDetName(fCurrentDetector.Data()), GetCurrentRun()));
1133
1134         AliDebug(2, Form("filename = %s", filename.Data()));
1135
1136         Log("SHUTTLE", Form("Removing remaining local files for run %d and detector %s ...",
1137                 GetCurrentRun(), fCurrentDetector.Data()));
1138
1139         RemoveFile(filename.Data());
1140
1141 }
1142
1143 //______________________________________________________________________________________________
1144 void AliShuttle::RemoveFile(const char* filename)
1145 {
1146         //
1147         // removes local file
1148         //
1149
1150         TString command(Form("rm -f %s", filename));
1151
1152         Int_t result = gSystem->Exec(command.Data());
1153         if(result != 0)
1154         {
1155                 Log("SHUTTLE", Form("RemoveFile - %s: Cannot remove file %s!",
1156                         fCurrentDetector.Data(), filename));
1157         }
1158 }
1159
1160 //______________________________________________________________________________________________
1161 AliShuttleStatus* AliShuttle::ReadShuttleStatus()
1162 {
1163         //
1164         // Reads the AliShuttleStatus from the CDB
1165         //
1166
1167         if (fStatusEntry){
1168                 delete fStatusEntry;
1169                 fStatusEntry = 0;
1170         }
1171
1172         fStatusEntry = AliCDBManager::Instance()->GetStorage(GetLocalCDB())
1173                 ->Get(Form("/SHUTTLE/STATUS/%s", fCurrentDetector.Data()), GetCurrentRun());
1174
1175         if (!fStatusEntry) return 0;
1176         fStatusEntry->SetOwner(1);
1177
1178         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1179         if (!status) {
1180                 AliError("Invalid object stored to CDB!");
1181                 return 0;
1182         }
1183
1184         return status;
1185 }
1186
1187 //______________________________________________________________________________________________
1188 Bool_t AliShuttle::WriteShuttleStatus(AliShuttleStatus* status)
1189 {
1190         //
1191         // writes the status for one subdetector
1192         //
1193
1194         if (fStatusEntry){
1195                 delete fStatusEntry;
1196                 fStatusEntry = 0;
1197         }
1198
1199         Int_t run = GetCurrentRun();
1200
1201         AliCDBId id(AliCDBPath("SHUTTLE", "STATUS", fCurrentDetector), run, run);
1202
1203         fStatusEntry = new AliCDBEntry(status, id, new AliCDBMetaData);
1204         fStatusEntry->SetOwner(1);
1205
1206         UInt_t result = AliCDBManager::Instance()->GetStorage(fgkLocalCDB)->Put(fStatusEntry);
1207
1208         if (!result) {
1209                 Log("SHUTTLE", Form("WriteShuttleStatus - Failed for %s, run %d",
1210                                                 fCurrentDetector.Data(), run));
1211                 return kFALSE;
1212         }
1213         
1214         SendMLInfo();
1215
1216         return kTRUE;
1217 }
1218
1219 //______________________________________________________________________________________________
1220 void AliShuttle::UpdateShuttleStatus(AliShuttleStatus::Status newStatus, Bool_t increaseCount)
1221 {
1222         //
1223         // changes the AliShuttleStatus for the given detector and run to the given status
1224         //
1225
1226         if (!fStatusEntry){
1227                 AliError("UNEXPECTED: fStatusEntry empty");
1228                 return;
1229         }
1230
1231         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1232
1233         if (!status){
1234                 Log("SHUTTLE", "UpdateShuttleStatus - UNEXPECTED: status could not be read from current CDB entry");
1235                 return;
1236         }
1237
1238         TString actionStr = Form("UpdateShuttleStatus - %s: Changing state from %s to %s",
1239                                 fCurrentDetector.Data(),
1240                                 status->GetStatusName(),
1241                                 status->GetStatusName(newStatus));
1242         Log("SHUTTLE", actionStr);
1243         SetLastAction(actionStr);
1244
1245         status->SetStatus(newStatus);
1246         if (increaseCount) status->IncreaseCount();
1247
1248         AliCDBManager::Instance()->GetStorage(fgkLocalCDB)->Put(fStatusEntry);
1249
1250         SendMLInfo();
1251 }
1252
1253 //______________________________________________________________________________________________
1254 void AliShuttle::SendMLInfo()
1255 {
1256         //
1257         // sends ML information about the current status of the current detector being processed
1258         //
1259         
1260         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1261         
1262         if (!status){
1263                 Log("SHUTTLE", "SendMLInfo - UNEXPECTED: status could not be read from current CDB entry");
1264                 return;
1265         }
1266         
1267         TMonaLisaText  mlStatus(Form("%s_status", fCurrentDetector.Data()), status->GetStatusName());
1268         TMonaLisaValue mlRetryCount(Form("%s_count", fCurrentDetector.Data()), status->GetCount());
1269
1270         TList mlList;
1271         mlList.Add(&mlStatus);
1272         mlList.Add(&mlRetryCount);
1273
1274         TString mlID;
1275         mlID.Form("%d", GetCurrentRun());
1276         fMonaLisa->SendParameters(&mlList, mlID);
1277 }
1278
1279 //______________________________________________________________________________________________
1280 Bool_t AliShuttle::ContinueProcessing()
1281 {
1282         // this function reads the AliShuttleStatus information from CDB and
1283         // checks if the processing should be continued
1284         // if yes it returns kTRUE and updates the AliShuttleStatus with nextStatus
1285
1286         if (!fConfig->HostProcessDetector(fCurrentDetector)) return kFALSE;
1287
1288         AliPreprocessor* aPreprocessor =
1289                 dynamic_cast<AliPreprocessor*> (fPreprocessorMap.GetValue(fCurrentDetector));
1290         if (!aPreprocessor)
1291         {
1292                 Log("SHUTTLE", Form("ContinueProcessing - %s: no preprocessor registered", fCurrentDetector.Data()));
1293                 return kFALSE;
1294         }
1295
1296         AliShuttleLogbookEntry::Status entryStatus =
1297                 fLogbookEntry->GetDetectorStatus(fCurrentDetector);
1298
1299         if(entryStatus != AliShuttleLogbookEntry::kUnprocessed) {
1300                 Log("SHUTTLE", Form("ContinueProcessing - %s is %s",
1301                                 fCurrentDetector.Data(),
1302                                 fLogbookEntry->GetDetectorStatusName(entryStatus)));
1303                 return kFALSE;
1304         }
1305
1306         // if we get here, according to Shuttle logbook subdetector is in UNPROCESSED state
1307
1308         // check if current run is first unprocessed run for current detector
1309         if (fConfig->StrictRunOrder(fCurrentDetector) &&
1310                 !fFirstUnprocessed[GetDetPos(fCurrentDetector)])
1311         {
1312                 if (fTestMode == kNone)
1313                 {
1314                         Log("SHUTTLE", Form("ContinueProcessing - %s requires strict run ordering"
1315                                         " but this is not the first unprocessed run!"));
1316                         return kFALSE;
1317                 }
1318                 else
1319                 {
1320                         Log("SHUTTLE", Form("ContinueProcessing - In TESTMODE - "
1321                                         "Although %s requires strict run ordering "
1322                                         "and this is not the first unprocessed run, "
1323                                         "the SHUTTLE continues"));
1324                 }
1325         }
1326
1327         AliShuttleStatus* status = ReadShuttleStatus();
1328         if (!status) {
1329                 // first time
1330                 Log("SHUTTLE", Form("ContinueProcessing - %s: Processing first time",
1331                                 fCurrentDetector.Data()));
1332                 status = new AliShuttleStatus(AliShuttleStatus::kStarted);
1333                 return WriteShuttleStatus(status);
1334         }
1335
1336         // The following two cases shouldn't happen if Shuttle Logbook was correctly updated.
1337         // If it happens it may mean Logbook updating failed... let's do it now!
1338         if (status->GetStatus() == AliShuttleStatus::kDone ||
1339             status->GetStatus() == AliShuttleStatus::kFailed){
1340                 Log("SHUTTLE", Form("ContinueProcessing - %s is already %s. Updating Shuttle Logbook",
1341                                         fCurrentDetector.Data(),
1342                                         status->GetStatusName(status->GetStatus())));
1343                 UpdateShuttleLogbook(fCurrentDetector.Data(),
1344                                         status->GetStatusName(status->GetStatus()));
1345                 return kFALSE;
1346         }
1347
1348         if (status->GetStatus() == AliShuttleStatus::kStoreError) {
1349                 Log("SHUTTLE",
1350                         Form("ContinueProcessing - %s: Grid storage of one or more "
1351                                 "objects failed. Trying again now",
1352                                 fCurrentDetector.Data()));
1353                 UpdateShuttleStatus(AliShuttleStatus::kStoreStarted);
1354                 if (StoreOCDB()){
1355                         Log("SHUTTLE", Form("ContinueProcessing - %s: all objects "
1356                                 "successfully stored into main storage",
1357                                 fCurrentDetector.Data()));
1358                 } else {
1359                         Log("SHUTTLE",
1360                                 Form("ContinueProcessing - %s: Grid storage failed again",
1361                                         fCurrentDetector.Data()));
1362                         UpdateShuttleStatus(AliShuttleStatus::kStoreError);
1363                 }
1364                 return kFALSE;
1365         }
1366
1367         // if we get here, there is a restart
1368         Bool_t cont = kFALSE;
1369
1370         // abort conditions
1371         if (status->GetCount() >= fConfig->GetMaxRetries()) {
1372                 Log("SHUTTLE", Form("ContinueProcessing - %s failed %d times in status %s - "
1373                                 "Updating Shuttle Logbook", fCurrentDetector.Data(),
1374                                 status->GetCount(), status->GetStatusName()));
1375                 UpdateShuttleLogbook(fCurrentDetector.Data(), "FAILED");
1376                 UpdateShuttleStatus(AliShuttleStatus::kFailed);
1377
1378                 // there may still be objects in local OCDB and reference storage
1379                 // and FXS databases may be not updated: do it now!
1380                 
1381                 // TODO Currently disabled, we want to keep files in case of failure!
1382                 // CleanLocalStorage(fgkLocalCDB);
1383                 // CleanLocalStorage(fgkLocalRefStorage);
1384                 // UpdateTableFailCase();
1385                 
1386                 // Send mail to detector expert!
1387                 Log("SHUTTLE", Form("ContinueProcessing - Sending mail to %s expert...", 
1388                                         fCurrentDetector.Data()));
1389                 if (!SendMail())
1390                         Log("SHUTTLE", Form("ContinueProcessing - Could not send mail to %s expert",
1391                                         fCurrentDetector.Data()));
1392
1393         } else {
1394                 Log("SHUTTLE", Form("ContinueProcessing - %s: restarting. "
1395                                 "Aborted before with %s. Retry number %d.", fCurrentDetector.Data(),
1396                                 status->GetStatusName(), status->GetCount()));
1397                 Bool_t increaseCount = kTRUE;
1398                 if (status->GetStatus() == AliShuttleStatus::kDCSError || 
1399                         status->GetStatus() == AliShuttleStatus::kDCSStarted)
1400                                 increaseCount = kFALSE;
1401                                 
1402                 UpdateShuttleStatus(AliShuttleStatus::kStarted, increaseCount);
1403                 cont = kTRUE;
1404         }
1405
1406         return cont;
1407 }
1408
1409 //______________________________________________________________________________________________
1410 Bool_t AliShuttle::Process(AliShuttleLogbookEntry* entry)
1411 {
1412         //
1413         // Makes data retrieval for all detectors in the configuration.
1414         // entry: Shuttle logbook entry, contains run paramenters and status of detectors
1415         // (Unprocessed, Inactive, Failed or Done).
1416         // Returns kFALSE in case of error occured and kTRUE otherwise
1417         //
1418
1419         if (!entry) return kFALSE;
1420
1421         fLogbookEntry = entry;
1422
1423         Log("SHUTTLE", Form("\t\t\t^*^*^*^*^*^*^*^*^*^*^*^* run %d: START ^*^*^*^*^*^*^*^*^*^*^*^*",
1424                                         GetCurrentRun()));
1425
1426         // Send the information to ML
1427         TMonaLisaText  mlStatus("SHUTTLE_status", "Processing");
1428         TMonaLisaText  mlRunType("SHUTTLE_runtype", Form("%s (%s)", entry->GetRunType(), entry->GetRunParameter("log")));
1429
1430         TList mlList;
1431         mlList.Add(&mlStatus);
1432         mlList.Add(&mlRunType);
1433
1434         TString mlID;
1435         mlID.Form("%d", GetCurrentRun());
1436         fMonaLisa->SendParameters(&mlList, mlID);
1437
1438         if (fLogbookEntry->IsDone())
1439         {
1440                 Log("SHUTTLE","Process - Shuttle is already DONE. Updating logbook");
1441                 UpdateShuttleLogbook("shuttle_done");
1442                 fLogbookEntry = 0;
1443                 return kTRUE;
1444         }
1445
1446         // read test mode if flag is set
1447         if (fReadTestMode)
1448         {
1449                 fTestMode = kNone;
1450                 TString logEntry(entry->GetRunParameter("log"));
1451                 //printf("log entry = %s\n", logEntry.Data());
1452                 TString searchStr("Testmode: ");
1453                 Int_t pos = logEntry.Index(searchStr.Data());
1454                 //printf("%d\n", pos);
1455                 if (pos >= 0)
1456                 {
1457                         TSubString subStr = logEntry(pos + searchStr.Length(), logEntry.Length());
1458                         //printf("%s\n", subStr.String().Data());
1459                         TString newStr(subStr.Data());
1460                         TObjArray* token = newStr.Tokenize(' ');
1461                         if (token)
1462                         {
1463                                 //token->Print();
1464                                 TObjString* tmpStr = dynamic_cast<TObjString*> (token->First());
1465                                 if (tmpStr)
1466                                 {
1467                                         Int_t testMode = tmpStr->String().Atoi();
1468                                         if (testMode > 0)
1469                                         {
1470                                                 Log("SHUTTLE", Form("Process - Enabling test mode %d", testMode));
1471                                                 SetTestMode((TestMode) testMode);
1472                                         }
1473                                 }
1474                                 delete token;          
1475                         }
1476                 }
1477         }
1478                 
1479         fLogbookEntry->Print("all");
1480
1481         // Initialization
1482         Bool_t hasError = kFALSE;
1483
1484         // Set the CDB and Reference folders according to the year and LHC period
1485         TString lhcPeriod(GetLHCPeriod());
1486         if (lhcPeriod.Length() == 0) 
1487         {
1488                 Log("SHUTTLE","Process - LHCPeriod not found in logbook!");
1489                 return 0; 
1490         }       
1491         
1492         if (fgkMainCDB.Length() == 0)
1493                 fgkMainCDB = Form("alien://folder=/alice/data/%d/%s/OCDB?user=alidaq?cacheFold=/tmp/OCDBCache", 
1494                                         GetCurrentYear(), lhcPeriod.Data());
1495         
1496         if (fgkMainRefStorage.Length() == 0)
1497                 fgkMainRefStorage = Form("alien://folder=/alice/data/%d/%s/Reference?user=alidaq?cacheFold=/tmp/OCDBCache", 
1498                                         GetCurrentYear(), lhcPeriod.Data());
1499         
1500         AliCDBStorage *mainCDBSto = AliCDBManager::Instance()->GetStorage(fgkMainCDB);
1501         if(mainCDBSto) mainCDBSto->QueryCDB(GetCurrentRun());
1502         AliCDBStorage *mainRefSto = AliCDBManager::Instance()->GetStorage(fgkMainRefStorage);
1503         if(mainRefSto) mainRefSto->QueryCDB(GetCurrentRun());
1504
1505         // Loop on detectors in the configuration
1506         TIter iter(fConfig->GetDetectors());
1507         TObjString* aDetector = 0;
1508
1509         while ((aDetector = (TObjString*) iter.Next()))
1510         {
1511                 fCurrentDetector = aDetector->String();
1512
1513                 if (ContinueProcessing() == kFALSE) continue;
1514
1515                 Log("SHUTTLE", Form("\t\t\t****** run %d - %s: START  ******",
1516                                                 GetCurrentRun(), aDetector->GetName()));
1517
1518                 for(Int_t iSys=0;iSys<3;iSys++) fFXSCalled[iSys]=kFALSE;
1519
1520                 Log(fCurrentDetector.Data(), "Process - Starting processing");
1521
1522                 Int_t pid = fork();
1523
1524                 if (pid < 0)
1525                 {
1526                         Log("SHUTTLE", "Process - ERROR: Forking failed");
1527                 }
1528                 else if (pid > 0)
1529                 {
1530                         // parent
1531                         Log("SHUTTLE", Form("Process - In parent process of %d - %s: Starting monitoring",
1532                                                         GetCurrentRun(), aDetector->GetName()));
1533
1534                         Long_t begin = time(0);
1535
1536                         int status; // to be used with waitpid, on purpose an int (not Int_t)!
1537                         while (waitpid(pid, &status, WNOHANG) == 0)
1538                         {
1539                                 Long_t expiredTime = time(0) - begin;
1540
1541                                 if (expiredTime > fConfig->GetPPTimeOut())
1542                                 {
1543                                         TString tmp;
1544                                         tmp.Form("Process - Process of %s time out. "
1545                                                         "Run time: %d seconds. Killing...",
1546                                                         fCurrentDetector.Data(), expiredTime);
1547                                         Log("SHUTTLE", tmp);
1548                                         Log(fCurrentDetector, tmp);
1549
1550                                         kill(pid, 9);
1551
1552                                         UpdateShuttleStatus(AliShuttleStatus::kPPTimeOut);
1553                                         hasError = kTRUE;
1554
1555                                         gSystem->Sleep(1000);
1556                                 }
1557                                 else
1558                                 {
1559                                         gSystem->Sleep(1000);
1560                                         
1561                                         TString checkStr;
1562                                         checkStr.Form("ps -o vsize --pid %d | tail -n 1", pid);
1563                                         FILE* pipe = gSystem->OpenPipe(checkStr, "r");
1564                                         if (!pipe)
1565                                         {
1566                                                 Log("SHUTTLE", Form("Process - Error: "
1567                                                         "Could not open pipe to %s", checkStr.Data()));
1568                                                 continue;
1569                                         }
1570                                                 
1571                                         char buffer[100];
1572                                         if (!fgets(buffer, 100, pipe))
1573                                         {
1574                                                 Log("SHUTTLE", "Process - Error: ps did not return anything");
1575                                                 gSystem->ClosePipe(pipe);
1576                                                 continue;
1577                                         }
1578                                         gSystem->ClosePipe(pipe);
1579                                         
1580                                         //Log("SHUTTLE", Form("ps returned %s", buffer));
1581                                         
1582                                         Int_t mem = 0;
1583                                         if ((sscanf(buffer, "%d\n", &mem) != 1) || !mem)
1584                                         {
1585                                                 Log("SHUTTLE", "Process - Error: Could not parse output of ps");
1586                                                 continue;
1587                                         }
1588                                         
1589                                         if (expiredTime % 60 == 0)
1590                                         {
1591                                                 Log("SHUTTLE", Form("Process - %s: Checking process. "
1592                                                         "Run time: %d seconds - Memory consumption: %d KB",
1593                                                         fCurrentDetector.Data(), expiredTime, mem));
1594                                                 SendAlive();
1595                                         }
1596                                         
1597                                         if (mem > fConfig->GetPPMaxMem())
1598                                         {
1599                                                 TString tmp;
1600                                                 tmp.Form("Process - Process exceeds maximum allowed memory "
1601                                                         "(%d KB > %d KB). Killing...",
1602                                                         mem, fConfig->GetPPMaxMem());
1603                                                 Log("SHUTTLE", tmp);
1604                                                 Log(fCurrentDetector, tmp);
1605         
1606                                                 kill(pid, 9);
1607         
1608                                                 UpdateShuttleStatus(AliShuttleStatus::kPPOutOfMemory);
1609                                                 hasError = kTRUE;
1610         
1611                                                 gSystem->Sleep(1000);
1612                                         }
1613                                 }
1614                         }
1615
1616                         Log("SHUTTLE", Form("Process - In parent process of %d - %s: Client has terminated.",
1617                                                                 GetCurrentRun(), aDetector->GetName()));
1618
1619                         if (WIFEXITED(status))
1620                         {
1621                                 Int_t returnCode = WEXITSTATUS(status);
1622
1623                                 Log("SHUTTLE", Form("Process - %s: the return code is %d", fCurrentDetector.Data(),
1624                                                                                 returnCode));
1625
1626                                 if (returnCode == 0) hasError = kTRUE;
1627                         }
1628                 }
1629                 else if (pid == 0)
1630                 {
1631                         // client
1632                         Log("SHUTTLE", Form("Process - In client process of %d - %s", GetCurrentRun(),
1633                                 aDetector->GetName()));
1634
1635                         Log("SHUTTLE", Form("Process - Redirecting output to %s log",fCurrentDetector.Data()));
1636
1637                         if ((freopen(GetLogFileName(fCurrentDetector), "a", stdout)) == 0)
1638                         {
1639                                 Log("SHUTTLE", "Process - Could not freopen stdout");
1640                         }
1641                         else
1642                         {
1643                                 fOutputRedirected = kTRUE;
1644                                 if ((dup2(fileno(stdout), fileno(stderr))) < 0)
1645                                         Log("SHUTTLE", "Process - Could not redirect stderr");
1646                                 
1647                         }
1648                         
1649                         TString wd = gSystem->WorkingDirectory();
1650                         TString tmpDir = Form("%s/%s_%d_process", GetShuttleTempDir(), 
1651                                 fCurrentDetector.Data(), GetCurrentRun());
1652                         
1653                         Int_t result = gSystem->GetPathInfo(tmpDir.Data(), 0, (Long64_t*) 0, 0, 0);
1654                         if (!result) // temp dir already exists!
1655                         {
1656                                 Log(fCurrentDetector.Data(), 
1657                                         Form("Process - %s dir already exists! Removing...", tmpDir.Data()));
1658                                 gSystem->Exec(Form("rm -rf %s",tmpDir.Data()));         
1659                         } 
1660                         
1661                         if (gSystem->mkdir(tmpDir.Data(), 1))
1662                         {
1663                                 Log(fCurrentDetector.Data(), "Process - could not make temp directory!!");
1664                                 gSystem->Exit(1);
1665                         }
1666                         
1667                         if (!gSystem->ChangeDirectory(tmpDir.Data())) 
1668                         {
1669                                 Log(fCurrentDetector.Data(), "Process - could not change directory!!");
1670                                 gSystem->Exit(1);                       
1671                         }
1672                         
1673                         Bool_t success = ProcessCurrentDetector();
1674                         
1675                         gSystem->ChangeDirectory(wd.Data());
1676                                                 
1677                         if (success) // Preprocessor finished successfully!
1678                         { 
1679                                 // remove temporary folder
1680                                 gSystem->Exec(Form("rm -rf %s",tmpDir.Data()));
1681                                 
1682                                 // Update time_processed field in FXS DB
1683                                 if (UpdateTable() == kFALSE)
1684                                         Log("SHUTTLE", Form("Process - %s: Could not update FXS databases!", 
1685                                                         fCurrentDetector.Data()));
1686
1687                                 // Transfer the data from local storage to main storage (Grid)
1688                                 UpdateShuttleStatus(AliShuttleStatus::kStoreStarted);
1689                                 if (StoreOCDB() == kFALSE)
1690                                 {
1691                                         Log("SHUTTLE", 
1692                                                 Form("\t\t\t****** run %d - %s: STORAGE ERROR ******",
1693                                                         GetCurrentRun(), aDetector->GetName()));
1694                                         UpdateShuttleStatus(AliShuttleStatus::kStoreError);
1695                                         success = kFALSE;
1696                                 } else {
1697                                         Log("SHUTTLE", 
1698                                                 Form("\t\t\t****** run %d - %s: DONE ******",
1699                                                         GetCurrentRun(), aDetector->GetName()));
1700                                         UpdateShuttleStatus(AliShuttleStatus::kDone);
1701                                         UpdateShuttleLogbook(fCurrentDetector, "DONE");
1702                                 }
1703                         } else 
1704                         {
1705                                 Log("SHUTTLE", 
1706                                         Form("\t\t\t****** run %d - %s: PP ERROR ******",
1707                                                 GetCurrentRun(), aDetector->GetName()));
1708                         }
1709
1710                         for (UInt_t iSys=0; iSys<3; iSys++)
1711                         {
1712                                 if (fFXSCalled[iSys]) fFXSlist[iSys].Clear();
1713                         }
1714
1715                         Log("SHUTTLE", Form("Process - Client process of %d - %s is exiting now with %d.",
1716                                                         GetCurrentRun(), aDetector->GetName(), success));
1717
1718                         // the client exits here
1719                         gSystem->Exit(success);
1720
1721                         AliError("We should never get here!!!");
1722                 }
1723         }
1724
1725         Log("SHUTTLE", Form("\t\t\t^*^*^*^*^*^*^*^*^*^*^*^* run %d: FINISH ^*^*^*^*^*^*^*^*^*^*^*^*",
1726                                                         GetCurrentRun()));
1727
1728         //check if shuttle is done for this run, if so update logbook
1729         TObjArray checkEntryArray;
1730         checkEntryArray.SetOwner(1);
1731         TString whereClause = Form("where run=%d", GetCurrentRun());
1732         if (!QueryShuttleLogbook(whereClause.Data(), checkEntryArray) || 
1733                         checkEntryArray.GetEntries() == 0) {
1734                 Log("SHUTTLE", Form("Process - Warning: Cannot check status of run %d on Shuttle logbook!",
1735                                                 GetCurrentRun()));
1736                 return hasError == kFALSE;
1737         }
1738
1739         AliShuttleLogbookEntry* checkEntry = dynamic_cast<AliShuttleLogbookEntry*>
1740                                                 (checkEntryArray.At(0));
1741
1742         if (checkEntry)
1743         {
1744                 if (checkEntry->IsDone())
1745                 {
1746                         Log("SHUTTLE","Process - Shuttle is DONE. Updating logbook");
1747                         UpdateShuttleLogbook("shuttle_done");
1748                 }
1749                 else
1750                 {
1751                         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
1752                         {
1753                                 if (checkEntry->GetDetectorStatus(iDet) == AliShuttleLogbookEntry::kUnprocessed)
1754                                 {
1755                                         AliDebug(2, Form("Run %d: setting %s as \"not first time unprocessed\"",
1756                                                         checkEntry->GetRun(), GetDetName(iDet)));
1757                                         fFirstUnprocessed[iDet] = kFALSE;
1758                                 }
1759                         }
1760                 }
1761         }
1762
1763         fLogbookEntry = 0;
1764
1765         return hasError == kFALSE;
1766 }
1767
1768 //______________________________________________________________________________________________
1769 Bool_t AliShuttle::ProcessCurrentDetector()
1770 {
1771         //
1772         // Makes data retrieval just for a specific detector (fCurrentDetector).
1773         // Threre should be a configuration for this detector.
1774
1775         Log("SHUTTLE", Form("ProcessCurrentDetector - Retrieving values for %s, run %d", 
1776                                                 fCurrentDetector.Data(), GetCurrentRun()));
1777
1778         TString wd = gSystem->WorkingDirectory();
1779         
1780         if (!CleanReferenceStorage(fCurrentDetector.Data()))
1781                 return kFALSE;
1782         
1783         gSystem->ChangeDirectory(wd.Data());
1784         
1785         TMap* dcsMap = new TMap();
1786
1787         // call preprocessor
1788         AliPreprocessor* aPreprocessor =
1789                 dynamic_cast<AliPreprocessor*> (fPreprocessorMap.GetValue(fCurrentDetector));
1790
1791         aPreprocessor->Initialize(GetCurrentRun(), GetCurrentStartTime(), GetCurrentEndTime());
1792
1793         Bool_t processDCS = aPreprocessor->ProcessDCS();
1794
1795         if (!processDCS)
1796         {
1797                 Log(fCurrentDetector, "ProcessCurrentDetector -"
1798                         " The preprocessor requested to skip the retrieval of DCS values");
1799         }
1800         else if (fTestMode & kSkipDCS)
1801         {
1802                 Log(fCurrentDetector, "ProcessCurrentDetector - In TESTMODE: Skipping DCS processing");
1803         } 
1804         else if (fTestMode & kErrorDCS)
1805         {
1806                 Log(fCurrentDetector, "ProcessCurrentDetector - In TESTMODE: Simulating DCS error");
1807                 UpdateShuttleStatus(AliShuttleStatus::kDCSStarted);
1808                 UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1809                 delete dcsMap;
1810                 return kFALSE;
1811         } else {
1812
1813                 UpdateShuttleStatus(AliShuttleStatus::kDCSStarted);
1814
1815                 // Query DCS archive
1816                 Int_t nServers = fConfig->GetNServers(fCurrentDetector);
1817                 
1818                 for (int iServ=0; iServ<nServers; iServ++)
1819                 {
1820                 
1821                         TString host(fConfig->GetDCSHost(fCurrentDetector, iServ));
1822                         Int_t port = fConfig->GetDCSPort(fCurrentDetector, iServ);
1823                         Int_t multiSplit = fConfig->GetMultiSplit(fCurrentDetector, iServ);
1824
1825                         Log(fCurrentDetector, Form("ProcessCurrentDetector -"
1826                                         " Querying DCS Amanda server %s:%d (%d of %d)", 
1827                                         host.Data(), port, iServ+1, nServers));
1828                         
1829                         TMap* aliasMap = 0;
1830                         TMap* dpMap = 0;
1831         
1832                         if (fConfig->GetDCSAliases(fCurrentDetector, iServ)->GetEntries() > 0)
1833                         {
1834                                 aliasMap = GetValueSet(host, port, 
1835                                                 fConfig->GetDCSAliases(fCurrentDetector, iServ), 
1836                                                 kAlias, multiSplit);
1837                                 if (!aliasMap)
1838                                 {
1839                                         Log(fCurrentDetector, 
1840                                                 Form("ProcessCurrentDetector -"
1841                                                         " Error retrieving DCS aliases from server %s."
1842                                                         " Sending mail to DCS experts!", host.Data()));
1843                                         UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1844                                         
1845                                         if (!SendMailToDCS())
1846                                                 Log("SHUTTLE", Form("ProcessCurrentDetector - Could not send mail to DCS experts!"));
1847
1848                                         delete dcsMap;
1849                                         return kFALSE;
1850                                 }
1851                         }
1852                         
1853                         if (fConfig->GetDCSDataPoints(fCurrentDetector, iServ)->GetEntries() > 0)
1854                         {
1855                                 dpMap = GetValueSet(host, port, 
1856                                                 fConfig->GetDCSDataPoints(fCurrentDetector, iServ), 
1857                                                 kDP, multiSplit);
1858                                 if (!dpMap)
1859                                 {
1860                                         Log(fCurrentDetector, 
1861                                                 Form("ProcessCurrentDetector -"
1862                                                         " Error retrieving DCS data points from server %s."
1863                                                         " Sending mail to DCS experts!", host.Data()));
1864                                         UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1865                                         
1866                                         if (!SendMailToDCS())
1867                                                 Log("SHUTTLE", Form("ProcessCurrentDetector - Could not send mail to DCS experts!"));
1868                                         
1869                                         if (aliasMap) delete aliasMap;
1870                                         delete dcsMap;
1871                                         return kFALSE;
1872                                 }                               
1873                         }
1874                         
1875                         // merge aliasMap and dpMap into dcsMap
1876                         if(aliasMap) {
1877                                 TIter iter(aliasMap);
1878                                 TObjString* key = 0;
1879                                 while ((key = (TObjString*) iter.Next()))
1880                                         dcsMap->Add(key, aliasMap->GetValue(key->String()));
1881                                 
1882                                 aliasMap->SetOwner(kFALSE);
1883                                 delete aliasMap;
1884                         }       
1885                         
1886                         if(dpMap) {
1887                                 TIter iter(dpMap);
1888                                 TObjString* key = 0;
1889                                 while ((key = (TObjString*) iter.Next()))
1890                                         dcsMap->Add(key, dpMap->GetValue(key->String()));
1891                                 
1892                                 dpMap->SetOwner(kFALSE);
1893                                 delete dpMap;
1894                         }
1895                 }
1896         }
1897         
1898         // save map into file, to help debugging in case of preprocessor error
1899         TFile* f = TFile::Open("DCSMap.root","recreate");
1900         f->cd();
1901         dcsMap->Write("DCSMap", TObject::kSingleKey);
1902         f->Close();
1903         delete f;
1904         
1905         // DCS Archive DB processing successful. Call Preprocessor!
1906         UpdateShuttleStatus(AliShuttleStatus::kPPStarted);
1907
1908         UInt_t returnValue = aPreprocessor->Process(dcsMap);
1909
1910         if (returnValue > 0) // Preprocessor error!
1911         {
1912                 Log(fCurrentDetector, Form("ProcessCurrentDetector - "
1913                                 "Preprocessor failed. Process returned %d.", returnValue));
1914                 UpdateShuttleStatus(AliShuttleStatus::kPPError);
1915                 dcsMap->DeleteAll();
1916                 delete dcsMap;
1917                 return kFALSE;
1918         }
1919         
1920         // preprocessor ok!
1921         UpdateShuttleStatus(AliShuttleStatus::kPPDone);
1922         Log(fCurrentDetector, Form("ProcessCurrentDetector - %s preprocessor returned success",
1923                                 fCurrentDetector.Data()));
1924
1925         dcsMap->DeleteAll();
1926         delete dcsMap;
1927
1928         return kTRUE;
1929 }
1930
1931 //______________________________________________________________________________________________
1932 Bool_t AliShuttle::QueryShuttleLogbook(const char* whereClause,
1933                 TObjArray& entries)
1934 {
1935         // Query DAQ's Shuttle logbook and fills detector status object.
1936         // Call QueryRunParameters to query DAQ logbook for run parameters.
1937         //
1938
1939         entries.SetOwner(1);
1940
1941         // check connection, in case connect
1942         if(!Connect(3)) return kFALSE;
1943
1944         TString sqlQuery;
1945         sqlQuery = Form("select * from %s %s order by run", fConfig->GetShuttlelbTable(), whereClause);
1946
1947         TSQLResult* aResult = fServer[3]->Query(sqlQuery);
1948         if (!aResult) {
1949                 AliError(Form("Can't execute query <%s>!", sqlQuery.Data()));
1950                 return kFALSE;
1951         }
1952
1953         AliDebug(2,Form("Query = %s", sqlQuery.Data()));
1954
1955         if(aResult->GetRowCount() == 0) {
1956                 Log("SHUTTLE", "No entries in Shuttle Logbook match request");
1957                 delete aResult;
1958                 return kTRUE;
1959         }
1960
1961         // TODO Check field count!
1962         const UInt_t nCols = 23;
1963         if (aResult->GetFieldCount() != (Int_t) nCols) {
1964                 Log("SHUTTLE", "Invalid SQL result field number!");
1965                 delete aResult;
1966                 return kFALSE;
1967         }
1968
1969         TSQLRow* aRow;
1970         while ((aRow = aResult->Next())) {
1971                 TString runString(aRow->GetField(0), aRow->GetFieldLength(0));
1972                 Int_t run = runString.Atoi();
1973
1974                 AliShuttleLogbookEntry *entry = QueryRunParameters(run);
1975                 if (!entry)
1976                         continue;
1977
1978                 // loop on detectors
1979                 for(UInt_t ii = 0; ii < nCols; ii++)
1980                         entry->SetDetectorStatus(aResult->GetFieldName(ii), aRow->GetField(ii));
1981
1982                 entries.AddLast(entry);
1983                 delete aRow;
1984         }
1985
1986         delete aResult;
1987         return kTRUE;
1988 }
1989
1990 //______________________________________________________________________________________________
1991 AliShuttleLogbookEntry* AliShuttle::QueryRunParameters(Int_t run)
1992 {
1993         //
1994         // Retrieve run parameters written in the DAQ logbook and sets them into AliShuttleLogbookEntry object
1995         //
1996
1997         // check connection, in case connect
1998         if (!Connect(3))
1999                 return 0;
2000
2001         TString sqlQuery;
2002         sqlQuery.Form("select * from %s where run=%d", fConfig->GetDAQlbTable(), run);
2003
2004         TSQLResult* aResult = fServer[3]->Query(sqlQuery);
2005         if (!aResult) {
2006                 Log("SHUTTLE", Form("Can't execute query <%s>!", sqlQuery.Data()));
2007                 return 0;
2008         }
2009
2010         if (aResult->GetRowCount() == 0) {
2011                 Log("SHUTTLE", Form("QueryRunParameters - No entry in DAQ Logbook for run %d. Skipping", run));
2012                 delete aResult;
2013                 return 0;
2014         }
2015
2016         if (aResult->GetRowCount() > 1) {
2017                 Log("SHUTTLE", Form("QueryRunParameters - UNEXPECTED: "
2018                                 "more than one entry in DAQ Logbook for run %d!", run));
2019                 delete aResult;
2020                 return 0;
2021         }
2022
2023         TSQLRow* aRow = aResult->Next();
2024         if (!aRow)
2025         {
2026                 Log("SHUTTLE", Form("QueryRunParameters - Could not retrieve row for run %d. Skipping", run));
2027                 delete aResult;
2028                 return 0;
2029         }
2030
2031         AliShuttleLogbookEntry* entry = new AliShuttleLogbookEntry(run);
2032
2033         for (Int_t ii = 0; ii < aResult->GetFieldCount(); ii++)
2034                 entry->SetRunParameter(aResult->GetFieldName(ii), aRow->GetField(ii));
2035
2036         UInt_t startTime = entry->GetStartTime();
2037         UInt_t endTime = entry->GetEndTime();
2038
2039 //      if (!startTime || !endTime || startTime > endTime) 
2040 //      {
2041 //              Log("SHUTTLE",
2042 //                      Form("QueryRunParameters - Invalid parameters for Run %d: startTime = %d, endTime = %d. Skipping!",
2043 //                              run, startTime, endTime));              
2044 //              
2045 //              Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));
2046 //              fLogbookEntry = entry;  
2047 //              if (!UpdateShuttleLogbook("shuttle_done"))
2048 //              {
2049 //                      AliError(Form("Could not update logbook for run %d !", run));
2050 //              }
2051 //              fLogbookEntry = 0;
2052 //                              
2053 //              delete entry;
2054 //              delete aRow;
2055 //              delete aResult;
2056 //              return 0;
2057 //      }
2058
2059         if (!startTime) 
2060         {
2061                 Log("SHUTTLE",
2062                         Form("QueryRunParameters - Invalid parameters for Run %d: " 
2063                                 "startTime = %d, endTime = %d. Skipping!",
2064                                         run, startTime, endTime));              
2065                 
2066                 Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));
2067                 fLogbookEntry = entry;  
2068                 if (!UpdateShuttleLogbook("shuttle_ignored"))
2069                 {
2070                         AliError(Form("Could not update logbook for run %d !", run));
2071                 }
2072                 fLogbookEntry = 0;
2073                                 
2074                 delete entry;
2075                 delete aRow;
2076                 delete aResult;
2077                 return 0;
2078         }
2079         
2080         if (startTime && !endTime) 
2081         {
2082                 // TODO Here we don't mark SHUTTLE done, because this may mean 
2083                 //the run is still ongoing!!            
2084                 Log("SHUTTLE",
2085                         Form("QueryRunParameters - Invalid parameters for Run %d: "
2086                              "startTime = %d, endTime = %d. Skipping (Shuttle won't be marked as DONE)!",
2087                                         run, startTime, endTime));              
2088                 
2089                 //Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));
2090                 //fLogbookEntry = entry;        
2091                 //if (!UpdateShuttleLogbook("shuttle_done"))
2092                 //{
2093                 //      AliError(Form("Could not update logbook for run %d !", run));
2094                 //}
2095                 //fLogbookEntry = 0;
2096                                 
2097                 delete entry;
2098                 delete aRow;
2099                 delete aResult;
2100                 return 0;
2101         }
2102                         
2103         if (startTime && endTime && (startTime > endTime)) 
2104         {
2105                 Log("SHUTTLE",
2106                         Form("QueryRunParameters - Invalid parameters for Run %d: "
2107                                 "startTime = %d, endTime = %d. Skipping!",
2108                                         run, startTime, endTime));              
2109                 
2110                 Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));
2111                 fLogbookEntry = entry;  
2112                 if (!UpdateShuttleLogbook("shuttle_ignored"))
2113                 {
2114                         AliError(Form("Could not update logbook for run %d !", run));
2115                 }
2116                 fLogbookEntry = 0;
2117                                 
2118                 delete entry;
2119                 delete aRow;
2120                 delete aResult;
2121                 return 0;
2122         }
2123                         
2124         TString totEventsStr = entry->GetRunParameter("totalEvents");  
2125         Int_t totEvents = totEventsStr.Atoi();
2126         if (totEvents < 1) 
2127         {
2128                 Log("SHUTTLE",
2129                         Form("QueryRunParameters - Run %d has 0 events - Skipping!", run));             
2130                 
2131                 Log("SHUTTLE", Form("Marking SHUTTLE done for run %d", run));           
2132                 fLogbookEntry = entry;  
2133                 if (!UpdateShuttleLogbook("shuttle_done"))
2134                 {
2135                         AliError(Form("Could not update logbook for run %d !", run));
2136                 }
2137                 fLogbookEntry = 0;
2138                                 
2139                 delete entry;
2140                 delete aRow;
2141                 delete aResult;
2142                 return 0;
2143         }
2144
2145         delete aRow;
2146         delete aResult;
2147
2148         return entry;
2149 }
2150
2151 //______________________________________________________________________________________________
2152 TMap* AliShuttle::GetValueSet(const char* host, Int_t port, const TSeqCollection* entries,
2153                               DCSType type, Int_t multiSplit)
2154 {
2155         // Retrieve all "entry" data points from the DCS server
2156         // host, port: TSocket connection parameters
2157         // entries: list of name of the alias or data point
2158         // type: kAlias or kDP
2159         // returns TMap of values, 0 when failure
2160         
2161         AliDCSClient client(host, port, fTimeout, fRetries, multiSplit);
2162
2163         TMap* result = 0;
2164         if (type == kAlias)
2165         {
2166                 result = client.GetAliasValues(entries, GetCurrentStartTime(), 
2167                         GetCurrentEndTime());
2168         } 
2169         else if (type == kDP)
2170         {
2171                 result = client.GetDPValues(entries, GetCurrentStartTime(), 
2172                         GetCurrentEndTime());
2173         }
2174
2175         if (result == 0)
2176         {
2177                 Log(fCurrentDetector.Data(), Form("GetValueSet - Can't get entries! Reason: %s",
2178                         client.GetErrorString(client.GetResultErrorCode())));
2179                 if (client.GetResultErrorCode() == AliDCSClient::fgkServerError)        
2180                         Log(fCurrentDetector.Data(), Form("GetValueSet - Server error code: %s",
2181                                 client.GetServerError().Data()));
2182
2183                 return 0;
2184         }
2185                 
2186         return result;
2187 }
2188
2189 //______________________________________________________________________________________________
2190 const char* AliShuttle::GetFile(Int_t system, const char* detector,
2191                 const char* id, const char* source)
2192 {
2193         // Get calibration file from file exchange servers
2194         // First queris the FXS database for the file name, using the run, detector, id and source info
2195         // then calls RetrieveFile(filename) for actual copy to local disk
2196         // run: current run being processed (given by Logbook entry fLogbookEntry)
2197         // detector: the Preprocessor name
2198         // id: provided as a parameter by the Preprocessor
2199         // source: provided by the Preprocessor through GetFileSources function
2200
2201         // check if test mode should simulate a FXS error
2202         if (fTestMode & kErrorFXSFiles)
2203         {
2204                 Log(detector, Form("GetFile - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
2205                 return 0;
2206         }
2207         
2208         // check connection, in case connect
2209         if (!Connect(system))
2210         {
2211                 Log(detector, Form("GetFile - Couldn't connect to %s FXS database", GetSystemName(system)));
2212                 return 0;
2213         }
2214
2215         // Query preparation
2216         TString sourceName(source);
2217         Int_t nFields = 3;
2218         TString sqlQueryStart = Form("select filePath,size,fileChecksum from %s where",
2219                                                                 fConfig->GetFXSdbTable(system));
2220         TString whereClause = Form("run=%d and detector=\"%s\" and fileId=\"%s\"",
2221                                                                 GetCurrentRun(), detector, id);
2222
2223         if (system == kDAQ)
2224         {
2225                 whereClause += Form(" and DAQsource=\"%s\"", source);
2226         }
2227         else if (system == kDCS)
2228         {
2229                 sourceName="none";
2230         }
2231         else if (system == kHLT)
2232         {
2233                 whereClause += Form(" and DDLnumbers=\"%s\"", source);
2234                 nFields = 3;
2235         }
2236
2237         TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
2238
2239         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2240
2241         // Query execution
2242         TSQLResult* aResult = 0;
2243         aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2244         if (!aResult) {
2245                 Log(detector, Form("GetFileName - Can't execute SQL query to %s database for: id = %s, source = %s",
2246                                 GetSystemName(system), id, sourceName.Data()));
2247                 return 0;
2248         }
2249
2250         if(aResult->GetRowCount() == 0)
2251         {
2252                 Log(detector,
2253                         Form("GetFileName - No entry in %s FXS db for: id = %s, source = %s",
2254                                 GetSystemName(system), id, sourceName.Data()));
2255                 delete aResult;
2256                 return 0;
2257         }
2258
2259         if (aResult->GetRowCount() > 1) {
2260                 Log(detector,
2261                         Form("GetFileName - More than one entry in %s FXS db for: id = %s, source = %s",
2262                                 GetSystemName(system), id, sourceName.Data()));
2263                 delete aResult;
2264                 return 0;
2265         }
2266
2267         if (aResult->GetFieldCount() != nFields) {
2268                 Log(detector,
2269                         Form("GetFileName - Wrong field count in %s FXS db for: id = %s, source = %s",
2270                                 GetSystemName(system), id, sourceName.Data()));
2271                 delete aResult;
2272                 return 0;
2273         }
2274
2275         TSQLRow* aRow = dynamic_cast<TSQLRow*> (aResult->Next());
2276
2277         if (!aRow){
2278                 Log(detector, Form("GetFileName - Empty set result in %s FXS db from query: id = %s, source = %s",
2279                                 GetSystemName(system), id, sourceName.Data()));
2280                 delete aResult;
2281                 return 0;
2282         }
2283
2284         TString filePath(aRow->GetField(0), aRow->GetFieldLength(0));
2285         TString fileSize(aRow->GetField(1), aRow->GetFieldLength(1));
2286         TString fileChecksum(aRow->GetField(2), aRow->GetFieldLength(2));
2287
2288         delete aResult;
2289         delete aRow;
2290
2291         AliDebug(2, Form("filePath = %s; size = %s, fileChecksum = %s",
2292                                 filePath.Data(), fileSize.Data(), fileChecksum.Data()));
2293
2294         // retrieved file is renamed to make it unique
2295         TString localFileName = Form("%s/%s_%d_process/%s_%s_%d_%s_%s.shuttle",
2296                                         GetShuttleTempDir(), detector, GetCurrentRun(),
2297                                         GetSystemName(system), detector, GetCurrentRun(), 
2298                                         id, sourceName.Data());
2299
2300
2301         // file retrieval from FXS
2302         UInt_t nRetries = 0;
2303         UInt_t maxRetries = 3;
2304         Bool_t result = kFALSE;
2305
2306         // copy!! if successful TSystem::Exec returns 0
2307         while(nRetries++ < maxRetries) {
2308                 AliDebug(2, Form("Trying to copy file. Retry # %d", nRetries));
2309                 result = RetrieveFile(system, filePath.Data(), localFileName.Data());
2310                 if(!result)
2311                 {
2312                         Log(detector, Form("GetFileName - Copy of file %s from %s FXS failed",
2313                                         filePath.Data(), GetSystemName(system)));
2314                         continue;
2315                 } 
2316
2317                 if (fileChecksum.Length()>0)
2318                 {
2319                         // compare md5sum of local file with the one stored in the FXS DB
2320                         Int_t md5Comp = gSystem->Exec(Form("md5sum %s |grep %s 2>&1 > /dev/null",
2321                                                 localFileName.Data(), fileChecksum.Data()));
2322
2323                         if (md5Comp != 0)
2324                         {
2325                                 Log(detector, Form("GetFileName - md5sum of file %s does not match with local copy!",
2326                                                         filePath.Data()));
2327                                 result = kFALSE;
2328                                 continue;
2329                         }
2330                 } else {
2331                         Log(fCurrentDetector, Form("GetFile - md5sum of file %s not set in %s database, skipping comparison",
2332                                                         filePath.Data(), GetSystemName(system)));
2333                 }
2334                 if (result) break;
2335         }
2336
2337         if(!result) return 0;
2338
2339         fFXSCalled[system]=kTRUE;
2340         TObjString *fileParams = new TObjString(Form("%s#!?!#%s", id, sourceName.Data()));
2341         fFXSlist[system].Add(fileParams);
2342
2343         static TString staticLocalFileName;
2344         staticLocalFileName.Form("%s", localFileName.Data());
2345         
2346         Log(fCurrentDetector, Form("GetFile - Retrieved file with id %s and "
2347                         "source %s from %s to %s", id, source, 
2348                         GetSystemName(system), localFileName.Data()));
2349                         
2350         return staticLocalFileName.Data();
2351 }
2352
2353 //______________________________________________________________________________________________
2354 Bool_t AliShuttle::RetrieveFile(UInt_t system, const char* fxsFileName, const char* localFileName)
2355 {
2356         //
2357         // Copies file from FXS to local Shuttle machine
2358         //
2359
2360         // check temp directory: trying to cd to temp; if it does not exist, create it
2361         AliDebug(2, Form("Copy file %s from %s FXS into %s",
2362                         GetSystemName(system), fxsFileName, localFileName));
2363                         
2364         TString tmpDir(localFileName);
2365         
2366         tmpDir = tmpDir(0,tmpDir.Last('/'));
2367
2368         Int_t noDir = gSystem->GetPathInfo(tmpDir.Data(), 0, (Long64_t*) 0, 0, 0);
2369         if (noDir) // temp dir does not exists!
2370         {
2371                 if (gSystem->mkdir(tmpDir.Data(), 1))
2372                 {
2373                         Log(fCurrentDetector.Data(), "RetrieveFile - could not make temp directory!!");
2374                         return kFALSE;
2375                 }
2376         }
2377
2378         TString baseFXSFolder;
2379         if (system == kDAQ)
2380         {
2381                 baseFXSFolder = "FES/";
2382         }
2383         else if (system == kDCS)
2384         {
2385                 baseFXSFolder = "";
2386         }
2387         else if (system == kHLT)
2388         {
2389                 baseFXSFolder = "/opt/FXS/";
2390         }
2391
2392
2393         TString command = Form("scp -oPort=%d -2 %s@%s:%s%s %s",
2394                 fConfig->GetFXSPort(system),
2395                 fConfig->GetFXSUser(system),
2396                 fConfig->GetFXSHost(system),
2397                 baseFXSFolder.Data(),
2398                 fxsFileName,
2399                 localFileName);
2400
2401         AliDebug(2, Form("%s",command.Data()));
2402
2403         Bool_t result = (gSystem->Exec(command.Data()) == 0);
2404
2405         return result;
2406 }
2407
2408 //______________________________________________________________________________________________
2409 TList* AliShuttle::GetFileSources(Int_t system, const char* detector, const char* id)
2410 {
2411         //
2412         // Get sources producing the condition file Id from file exchange servers
2413         // if id is NULL all sources are returned (distinct)
2414         //
2415
2416         Log(detector, Form("GetFileSources - Retrieving sources with id %s from %s", id, GetSystemName(system)));
2417         
2418         // check if test mode should simulate a FXS error
2419         if (fTestMode & kErrorFXSSources)
2420         {
2421                 Log(detector, Form("GetFileSources - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
2422                 return 0;
2423         }
2424
2425         if (system == kDCS)
2426         {
2427                 Log(detector, "GetFileSources - WARNING: DCS system has only one source of data!");
2428                 TList *list = new TList();
2429                 list->SetOwner(1);
2430                 list->Add(new TObjString(" "));
2431                 return list;
2432         }
2433
2434         // check connection, in case connect
2435         if (!Connect(system))
2436         {
2437                 Log(detector, Form("GetFileSources - Couldn't connect to %s FXS database", GetSystemName(system)));
2438                 return NULL;
2439         }
2440
2441         TString sourceName = 0;
2442         if (system == kDAQ)
2443         {
2444                 sourceName = "DAQsource";
2445         } else if (system == kHLT)
2446         {
2447                 sourceName = "DDLnumbers";
2448         }
2449
2450         TString sqlQueryStart = Form("select distinct %s from %s where", sourceName.Data(), fConfig->GetFXSdbTable(system));
2451         TString whereClause = Form("run=%d and detector=\"%s\"",
2452                                 GetCurrentRun(), detector);
2453         if (id)
2454                 whereClause += Form(" and fileId=\"%s\"", id);
2455         TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
2456
2457         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2458
2459         // Query execution
2460         TSQLResult* aResult;
2461         aResult = fServer[system]->Query(sqlQuery);
2462         if (!aResult) {
2463                 Log(detector, Form("GetFileSources - Can't execute SQL query to %s database for id: %s",
2464                                 GetSystemName(system), id));
2465                 return 0;
2466         }
2467
2468         TList *list = new TList();
2469         list->SetOwner(1);
2470         
2471         if (aResult->GetRowCount() == 0)
2472         {
2473                 Log(detector,
2474                         Form("GetFileSources - No entry in %s FXS table for id: %s", GetSystemName(system), id));
2475                 delete aResult;
2476                 return list;
2477         }
2478
2479         Log(detector, Form("GetFileSources - Found %d sources", aResult->GetRowCount()));
2480
2481         TSQLRow* aRow;
2482         while ((aRow = aResult->Next()))
2483         {
2484
2485                 TString source(aRow->GetField(0), aRow->GetFieldLength(0));
2486                 AliDebug(2, Form("%s = %s", sourceName.Data(), source.Data()));
2487                 list->Add(new TObjString(source));
2488                 delete aRow;
2489         }
2490
2491         delete aResult;
2492
2493         return list;
2494 }
2495
2496 //______________________________________________________________________________________________
2497 TList* AliShuttle::GetFileIDs(Int_t system, const char* detector, const char* source)
2498 {
2499         //
2500         // Get all ids of condition files produced by a given source from file exchange servers
2501         //
2502         
2503         Log(detector, Form("GetFileIDs - Retrieving ids with source %s with %s", source, GetSystemName(system)));
2504
2505         // check if test mode should simulate a FXS error
2506         if (fTestMode & kErrorFXSSources)
2507         {
2508                 Log(detector, Form("GetFileIDs - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
2509                 return 0;
2510         }
2511
2512         // check connection, in case connect
2513         if (!Connect(system))
2514         {
2515                 Log(detector, Form("GetFileIDs - Couldn't connect to %s FXS database", GetSystemName(system)));
2516                 return NULL;
2517         }
2518
2519         TString sourceName = 0;
2520         if (system == kDAQ)
2521         {
2522                 sourceName = "DAQsource";
2523         } else if (system == kHLT)
2524         {
2525                 sourceName = "DDLnumbers";
2526         }
2527
2528         TString sqlQueryStart = Form("select fileId from %s where", fConfig->GetFXSdbTable(system));
2529         TString whereClause = Form("run=%d and detector=\"%s\"",
2530                                 GetCurrentRun(), detector);
2531         if (sourceName.Length() > 0 && source)
2532                 whereClause += Form(" and %s=\"%s\"", sourceName.Data(), source);
2533         TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
2534
2535         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2536
2537         // Query execution
2538         TSQLResult* aResult;
2539         aResult = fServer[system]->Query(sqlQuery);
2540         if (!aResult) {
2541                 Log(detector, Form("GetFileIDs - Can't execute SQL query to %s database for source: %s",
2542                                 GetSystemName(system), source));
2543                 return 0;
2544         }
2545
2546         TList *list = new TList();
2547         list->SetOwner(1);
2548         
2549         if (aResult->GetRowCount() == 0)
2550         {
2551                 Log(detector,
2552                         Form("GetFileIDs - No entry in %s FXS table for source: %s", GetSystemName(system), source));
2553                 delete aResult;
2554                 return list;
2555         }
2556
2557         Log(detector, Form("GetFileIDs - Found %d ids", aResult->GetRowCount()));
2558
2559         TSQLRow* aRow;
2560
2561         while ((aRow = aResult->Next()))
2562         {
2563
2564                 TString id(aRow->GetField(0), aRow->GetFieldLength(0));
2565                 AliDebug(2, Form("fileId = %s", id.Data()));
2566                 list->Add(new TObjString(id));
2567                 delete aRow;
2568         }
2569
2570         delete aResult;
2571
2572         return list;
2573 }
2574
2575 //______________________________________________________________________________________________
2576 Bool_t AliShuttle::Connect(Int_t system)
2577 {
2578         // Connect to MySQL Server of the system's FXS MySQL databases
2579         // DAQ Logbook, Shuttle Logbook and DAQ FXS db are on the same host
2580         //
2581
2582         // check connection: if already connected return
2583         if(fServer[system] && fServer[system]->IsConnected()) return kTRUE;
2584
2585         TString dbHost, dbUser, dbPass, dbName;
2586
2587         if (system < 3) // FXS db servers
2588         {
2589                 dbHost = Form("mysql://%s:%d", fConfig->GetFXSdbHost(system), fConfig->GetFXSdbPort(system));
2590                 dbUser = fConfig->GetFXSdbUser(system);
2591                 dbPass = fConfig->GetFXSdbPass(system);
2592                 dbName =   fConfig->GetFXSdbName(system);
2593         } else { // Run & Shuttle logbook servers
2594         // TODO Will the Shuttle logbook server be the same as the Run logbook server ???
2595                 dbHost = Form("mysql://%s:%d", fConfig->GetDAQlbHost(), fConfig->GetDAQlbPort());
2596                 dbUser = fConfig->GetDAQlbUser();
2597                 dbPass = fConfig->GetDAQlbPass();
2598                 dbName =   fConfig->GetDAQlbDB();
2599         }
2600
2601         fServer[system] = TSQLServer::Connect(dbHost.Data(), dbUser.Data(), dbPass.Data());
2602         if (!fServer[system] || !fServer[system]->IsConnected()) {
2603                 if(system < 3)
2604                 {
2605                 AliError(Form("Can't establish connection to FXS database for %s",
2606                                         AliShuttleInterface::GetSystemName(system)));
2607                 } else {
2608                 AliError("Can't establish connection to Run logbook.");
2609                 }
2610                 if(fServer[system]) delete fServer[system];
2611                 return kFALSE;
2612         }
2613
2614         // Get tables
2615         TSQLResult* aResult=0;
2616         switch(system){
2617                 case kDAQ:
2618                         aResult = fServer[kDAQ]->GetTables(dbName.Data());
2619                         break;
2620                 case kDCS:
2621                         aResult = fServer[kDCS]->GetTables(dbName.Data());
2622                         break;
2623                 case kHLT:
2624                         aResult = fServer[kHLT]->GetTables(dbName.Data());
2625                         break;
2626                 default:
2627                         aResult = fServer[3]->GetTables(dbName.Data());
2628                         break;
2629         }
2630
2631         delete aResult;
2632         return kTRUE;
2633 }
2634
2635 //______________________________________________________________________________________________
2636 Bool_t AliShuttle::UpdateTable()
2637 {
2638         //
2639         // Update FXS table filling time_processed field in all rows corresponding to current run and detector
2640         //
2641
2642         Bool_t result = kTRUE;
2643
2644         for (UInt_t system=0; system<3; system++)
2645         {
2646                 if(!fFXSCalled[system]) continue;
2647
2648                 // check connection, in case connect
2649                 if (!Connect(system))
2650                 {
2651                         Log(fCurrentDetector, Form("UpdateTable - Couldn't connect to %s FXS database", GetSystemName(system)));
2652                         result = kFALSE;
2653                         continue;
2654                 }
2655
2656                 TTimeStamp now; // now
2657
2658                 // Loop on FXS list entries
2659                 TIter iter(&fFXSlist[system]);
2660                 TObjString *aFXSentry=0;
2661                 while ((aFXSentry = dynamic_cast<TObjString*> (iter.Next())))
2662                 {
2663                         TString aFXSentrystr = aFXSentry->String();
2664                         TObjArray *aFXSarray = aFXSentrystr.Tokenize("#!?!#");
2665                         if (!aFXSarray || aFXSarray->GetEntries() != 2 )
2666                         {
2667                                 Log(fCurrentDetector, Form("UpdateTable - error updating %s FXS entry. Check string: <%s>",
2668                                         GetSystemName(system), aFXSentrystr.Data()));
2669                                 if(aFXSarray) delete aFXSarray;
2670                                 result = kFALSE;
2671                                 continue;
2672                         }
2673                         const char* fileId = ((TObjString*) aFXSarray->At(0))->GetName();
2674                         const char* source = ((TObjString*) aFXSarray->At(1))->GetName();
2675
2676                         TString whereClause;
2677                         if (system == kDAQ)
2678                         {
2679                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\" and DAQsource=\"%s\";",
2680                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId, source);
2681                         }
2682                         else if (system == kDCS)
2683                         {
2684                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\";",
2685                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId);
2686                         }
2687                         else if (system == kHLT)
2688                         {
2689                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\" and DDLnumbers=\"%s\";",
2690                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId, source);
2691                         }
2692
2693                         delete aFXSarray;
2694
2695                         TString sqlQuery = Form("update %s set time_processed=%d %s", fConfig->GetFXSdbTable(system),
2696                                                                 now.GetSec(), whereClause.Data());
2697
2698                         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2699
2700                         // Query execution
2701                         TSQLResult* aResult;
2702                         aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2703                         if (!aResult)
2704                         {
2705                                 Log(fCurrentDetector, Form("UpdateTable - %s db: can't execute SQL query <%s>",
2706                                                                 GetSystemName(system), sqlQuery.Data()));
2707                                 result = kFALSE;
2708                                 continue;
2709                         }
2710                         delete aResult;
2711                 }
2712         }
2713
2714         return result;
2715 }
2716
2717 //______________________________________________________________________________________________
2718 Bool_t AliShuttle::UpdateTableFailCase()
2719 {
2720         // Update FXS table filling time_processed field in all rows corresponding to current run and detector
2721         // this is called in case the preprocessor is declared failed for the current run, because
2722         // the fields are updated only in case of success
2723
2724         Bool_t result = kTRUE;
2725
2726         for (UInt_t system=0; system<3; system++)
2727         {
2728                 // check connection, in case connect
2729                 if (!Connect(system))
2730                 {
2731                         Log(fCurrentDetector, Form("UpdateTableFailCase - Couldn't connect to %s FXS database",
2732                                                         GetSystemName(system)));
2733                         result = kFALSE;
2734                         continue;
2735                 }
2736
2737                 TTimeStamp now; // now
2738
2739                 // Loop on FXS list entries
2740
2741                 TString whereClause = Form("where run=%d and detector=\"%s\";",
2742                                                 GetCurrentRun(), fCurrentDetector.Data());
2743
2744
2745                 TString sqlQuery = Form("update %s set time_processed=%d %s", fConfig->GetFXSdbTable(system),
2746                                                         now.GetSec(), whereClause.Data());
2747
2748                 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2749
2750                 // Query execution
2751                 TSQLResult* aResult;
2752                 aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2753                 if (!aResult)
2754                 {
2755                         Log(fCurrentDetector, Form("UpdateTableFailCase - %s db: can't execute SQL query <%s>",
2756                                                         GetSystemName(system), sqlQuery.Data()));
2757                         result = kFALSE;
2758                         continue;
2759                 }
2760                 delete aResult;
2761         }
2762
2763         return result;
2764 }
2765
2766 //______________________________________________________________________________________________
2767 Bool_t AliShuttle::UpdateShuttleLogbook(const char* detector, const char* status)
2768 {
2769         //
2770         // Update Shuttle logbook filling detector or shuttle_done column
2771         // ex. of usage: UpdateShuttleLogbook("PHOS", "DONE") or UpdateShuttleLogbook("shuttle_done")
2772         //
2773
2774         // check connection, in case connect
2775         if(!Connect(3)){
2776                 Log("SHUTTLE", "UpdateShuttleLogbook - Couldn't connect to DAQ Logbook.");
2777                 return kFALSE;
2778         }
2779
2780         TString detName(detector);
2781         TString setClause;
2782         if (detName == "shuttle_done" || detName == "shuttle_ignored")
2783         {
2784                 setClause = "set shuttle_done=1";
2785
2786                 if (detName == "shuttle_done")
2787                 {
2788                         // Send the information to ML
2789                         TMonaLisaText  mlStatus("SHUTTLE_status", "Done");
2790
2791                         TList mlList;
2792                         mlList.Add(&mlStatus);
2793                 
2794                         TString mlID;
2795                         mlID.Form("%d", GetCurrentRun());
2796                         fMonaLisa->SendParameters(&mlList, mlID);
2797                 }
2798         } else {
2799                 TString statusStr(status);
2800                 if(statusStr.Contains("done", TString::kIgnoreCase) ||
2801                    statusStr.Contains("failed", TString::kIgnoreCase)){
2802                         setClause = Form("set %s=\"%s\"", detector, status);
2803                 } else {
2804                         Log("SHUTTLE",
2805                                 Form("UpdateShuttleLogbook - Invalid status <%s> for detector %s",
2806                                         status, detector));
2807                         return kFALSE;
2808                 }
2809         }
2810
2811         TString whereClause = Form("where run=%d", GetCurrentRun());
2812
2813         TString sqlQuery = Form("update %s %s %s",
2814                                         fConfig->GetShuttlelbTable(), setClause.Data(), whereClause.Data());
2815
2816         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2817
2818         // Query execution
2819         TSQLResult* aResult;
2820         aResult = dynamic_cast<TSQLResult*> (fServer[3]->Query(sqlQuery));
2821         if (!aResult) {
2822                 Log("SHUTTLE", Form("UpdateShuttleLogbook - Can't execute query <%s>", sqlQuery.Data()));
2823                 return kFALSE;
2824         }
2825         delete aResult;
2826
2827         return kTRUE;
2828 }
2829
2830 //______________________________________________________________________________________________
2831 Int_t AliShuttle::GetCurrentRun() const
2832 {
2833         //
2834         // Get current run from logbook entry
2835         //
2836
2837         return fLogbookEntry ? fLogbookEntry->GetRun() : -1;
2838 }
2839
2840 //______________________________________________________________________________________________
2841 UInt_t AliShuttle::GetCurrentStartTime() const
2842 {
2843         //
2844         // get current start time
2845         //
2846
2847         return fLogbookEntry ? fLogbookEntry->GetStartTime() : 0;
2848 }
2849
2850 //______________________________________________________________________________________________
2851 UInt_t AliShuttle::GetCurrentEndTime() const
2852 {
2853         //
2854         // get current end time from logbook entry
2855         //
2856
2857         return fLogbookEntry ? fLogbookEntry->GetEndTime() : 0;
2858 }
2859
2860 //______________________________________________________________________________________________
2861 UInt_t AliShuttle::GetCurrentYear() const
2862 {
2863         //
2864         // Get current year from logbook entry
2865         //
2866
2867         if (!fLogbookEntry) return 0;
2868         
2869         TTimeStamp startTime(GetCurrentStartTime());
2870         TString year =  Form("%d",startTime.GetDate());
2871         year = year(0,4);
2872         
2873         return year.Atoi();
2874 }
2875
2876 //______________________________________________________________________________________________
2877 const char* AliShuttle::GetLHCPeriod() const
2878 {
2879         //
2880         // Get current LHC period from logbook entry
2881         //
2882
2883         if (!fLogbookEntry) return 0;
2884                 
2885         return fLogbookEntry->GetRunParameter("LHCperiod");
2886 }
2887
2888 //______________________________________________________________________________________________
2889 void AliShuttle::Log(const char* detector, const char* message)
2890 {
2891         //
2892         // Fill log string with a message
2893         //
2894
2895         TString logRunDir = GetShuttleLogDir();
2896         if (GetCurrentRun() >=0)
2897                 logRunDir += Form("/%d", GetCurrentRun());
2898         
2899         void* dir = gSystem->OpenDirectory(logRunDir.Data());
2900         if (dir == NULL) {
2901                 if (gSystem->mkdir(logRunDir.Data(), kTRUE)) {
2902                         AliError(Form("Can't open directory <%s>", GetShuttleLogDir()));
2903                         return;
2904                 }
2905
2906         } else {
2907                 gSystem->FreeDirectory(dir);
2908         }
2909
2910         TString toLog = Form("%s (%d): %s - ", TTimeStamp(time(0)).AsString("s"), getpid(), detector);
2911         if (GetCurrentRun() >= 0) 
2912                 toLog += Form("run %d - ", GetCurrentRun());
2913         toLog += Form("%s", message);
2914
2915         AliInfo(toLog.Data());
2916         
2917         // if we redirect the log output already to the file, leave here
2918         if (fOutputRedirected && strcmp(detector, "SHUTTLE") != 0)
2919                 return;
2920
2921         TString fileName = GetLogFileName(detector);
2922         
2923         gSystem->ExpandPathName(fileName);
2924
2925         ofstream logFile;
2926         logFile.open(fileName, ofstream::out | ofstream::app);
2927
2928         if (!logFile.is_open()) {
2929                 AliError(Form("Could not open file %s", fileName.Data()));
2930                 return;
2931         }
2932
2933         logFile << toLog.Data() << "\n";
2934
2935         logFile.close();
2936 }
2937
2938 //______________________________________________________________________________________________
2939 TString AliShuttle::GetLogFileName(const char* detector) const
2940 {
2941         // 
2942         // returns the name of the log file for a given sub detector
2943         //
2944         
2945         TString fileName;
2946         
2947         if (GetCurrentRun() >= 0) 
2948         {
2949                 fileName.Form("%s/%d/%s_%d.log", GetShuttleLogDir(), GetCurrentRun(), 
2950                         detector, GetCurrentRun());
2951         } else {
2952                 fileName.Form("%s/%s.log", GetShuttleLogDir(), detector);
2953         }
2954
2955         return fileName;
2956 }
2957
2958 //______________________________________________________________________________________________
2959 void AliShuttle::SendAlive()
2960 {
2961         // sends alive message to ML
2962         
2963         TMonaLisaText mlStatus("SHUTTLE_status", "Alive");
2964
2965         TList mlList;
2966         mlList.Add(&mlStatus);
2967
2968         fMonaLisa->SendParameters(&mlList, "__PROCESSINGINFO__");
2969 }
2970
2971 //______________________________________________________________________________________________
2972 Bool_t AliShuttle::Collect(Int_t run)
2973 {
2974         //
2975         // Collects conditions data for all UNPROCESSED run written to DAQ LogBook in case of run = -1 (default)
2976         // If a dedicated run is given this run is processed
2977         //
2978         // In operational mode, this is the Shuttle function triggered by the EOR signal.
2979         //
2980
2981         if (run == -1)
2982                 Log("SHUTTLE","Collect - Shuttle called. Collecting conditions data for unprocessed runs");
2983         else
2984                 Log("SHUTTLE", Form("Collect - Shuttle called. Collecting conditions data for run %d", run));
2985
2986         SetLastAction("Starting");
2987
2988         // create ML instance
2989         if (!fMonaLisa)
2990                 fMonaLisa = new TMonaLisaWriter(fConfig->GetMonitorHost(), fConfig->GetMonitorTable());
2991                 
2992
2993         SendAlive();
2994
2995         TString whereClause("where shuttle_done=0");
2996         if (run != -1)
2997                 whereClause += Form(" and run=%d", run);
2998
2999         TObjArray shuttleLogbookEntries;
3000         if (!QueryShuttleLogbook(whereClause, shuttleLogbookEntries))
3001         {
3002                 Log("SHUTTLE", "Collect - Can't retrieve entries from Shuttle logbook");
3003                 return kFALSE;
3004         }
3005
3006         if (shuttleLogbookEntries.GetEntries() == 0)
3007         {
3008                 if (run == -1)
3009                         Log("SHUTTLE","Collect - Found no UNPROCESSED runs in Shuttle logbook");
3010                 else
3011                         Log("SHUTTLE", Form("Collect - Run %d is already DONE "
3012                                                 "or it does not exist in Shuttle logbook", run));
3013                 return kTRUE;
3014         }
3015
3016         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
3017                 fFirstUnprocessed[iDet] = kTRUE;
3018
3019         if (run != -1)
3020         {
3021                 // query Shuttle logbook for earlier runs, check if some detectors are unprocessed,
3022                 // flag them into fFirstUnprocessed array
3023                 TString whereClause(Form("where shuttle_done=0 and run < %d", run));
3024                 TObjArray tmpLogbookEntries;
3025                 if (!QueryShuttleLogbook(whereClause, tmpLogbookEntries))
3026                 {
3027                         Log("SHUTTLE", "Collect - Can't retrieve entries from Shuttle logbook");
3028                         return kFALSE;
3029                 }
3030
3031                 TIter iter(&tmpLogbookEntries);
3032                 AliShuttleLogbookEntry* anEntry = 0;
3033                 while ((anEntry = dynamic_cast<AliShuttleLogbookEntry*> (iter.Next())))
3034                 {
3035                         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
3036                         {
3037                                 if (anEntry->GetDetectorStatus(iDet) == AliShuttleLogbookEntry::kUnprocessed)
3038                                 {
3039                                         AliDebug(2, Form("Run %d: setting %s as \"not first time unprocessed\"",
3040                                                         anEntry->GetRun(), GetDetName(iDet)));
3041                                         fFirstUnprocessed[iDet] = kFALSE;
3042                                 }
3043                         }
3044
3045                 }
3046
3047         }
3048
3049         if (!RetrieveConditionsData(shuttleLogbookEntries))
3050         {
3051                 Log("SHUTTLE", "Collect - Process of at least one run failed");
3052                 return kFALSE;
3053         }
3054
3055         Log("SHUTTLE", "Collect - Requested run(s) successfully processed");
3056         return kTRUE;
3057 }
3058
3059 //______________________________________________________________________________________________
3060 Bool_t AliShuttle::RetrieveConditionsData(const TObjArray& dateEntries)
3061 {
3062         //
3063         // Retrieve conditions data for all runs that aren't processed yet
3064         //
3065
3066         Bool_t hasError = kFALSE;
3067
3068         TIter iter(&dateEntries);
3069         AliShuttleLogbookEntry* anEntry;
3070
3071         while ((anEntry = (AliShuttleLogbookEntry*) iter.Next())){
3072                 if (!Process(anEntry)){
3073                         hasError = kTRUE;
3074                 }
3075
3076                 // clean SHUTTLE temp directory
3077                 //TString filename = Form("%s/*.shuttle", GetShuttleTempDir());
3078                 //RemoveFile(filename.Data());
3079         }
3080
3081         return hasError == kFALSE;
3082 }