fixed bug in storerefstorage
[u/mrichter/AliRoot.git] / SHUTTLE / AliShuttle.cxx
1 /**************************************************************************
2  * Copyright(c) 1998-1999, ALICE Experiment at CERN, All rights reserved. *
3  *                                                                        *
4  * Author: The ALICE Off-line Project.                                    *
5  * Contributors are mentioned in the code where appropriate.              *
6  *                                                                        *
7  * Permission to use, copy, modify and distribute this software and its   *
8  * documentation strictly for non-commercial purposes is hereby granted   *
9  * without fee, provided that the above copyright notice appears in all   *
10  * copies and that both the copyright notice and this permission notice   *
11  * appear in the supporting documentation. The authors make no claims     *
12  * about the suitability of this software for any purpose. It is          *
13  * provided "as is" without express or implied warranty.                  *
14  **************************************************************************/
15
16 /*
17 $Log$
18 Revision 1.35  2007/04/04 16:26:38  acolla
19 1. Re-organization of function calls in TestPreprocessor to make it more meaningful.
20 2. Added missing dependency in test preprocessors.
21 3. in AliShuttle.cxx: processing time and memory consumption info on a single line.
22
23 Revision 1.34  2007/04/04 10:33:36  jgrosseo
24 1) Storing of files to the Grid is now done _after_ your preprocessors succeeded. This is transparent, which means that you can still use the same functions (Store, StoreReferenceData) to store files to the Grid. However, the Shuttle first stores them locally and transfers them after the preprocessor finished. The return code of these two functions has changed from UInt_t to Bool_t which gives you the success of the storing.
25 In case of an error with the Grid, the Shuttle will retry the storing later, the preprocessor does not need to be run again.
26
27 2) The meaning of the return code of the preprocessor has changed. 0 is now success and any other value means failure. This value is stored in the log and you can use it to keep details about the error condition.
28
29 3) New function StoreReferenceFile to _directly_ store a file (without opening it) to the reference storage.
30
31 4) The memory usage of the preprocessor is monitored. If it exceeds 2 GB it is terminated.
32
33 5) New function AliPreprocessor::ProcessDCS(). If you do not need to have DCS data in all cases, you can skip the processing by implemting this function and returning kFALSE under certain conditions. E.g. if there is a certain run type.
34 If you always need DCS data (like before), you do not need to implement it.
35
36 6) The run type has been added to the monitoring page
37
38 Revision 1.33  2007/04/03 13:56:01  acolla
39 Grid Storage at the end of preprocessing. Added virtual method to disable DCS query according to the
40 run type.
41
42 Revision 1.32  2007/02/28 10:41:56  acolla
43 Run type field added in SHUTTLE framework. Run type is read from "run type" logbook and retrieved by
44 AliPreprocessor::GetRunType() function.
45 Added some ldap definition files.
46
47 Revision 1.30  2007/02/13 11:23:21  acolla
48 Moved getters and setters of Shuttle's main OCDB/Reference, local
49 OCDB/Reference, temp and log folders to AliShuttleInterface
50
51 Revision 1.27  2007/01/30 17:52:42  jgrosseo
52 adding monalisa monitoring
53
54 Revision 1.26  2007/01/23 19:20:03  acolla
55 Removed old ldif files, added TOF, MCH ldif files. Added some options in
56 AliShuttleConfig::Print. Added in Ali Shuttle: SetShuttleTempDir and
57 SetShuttleLogDir
58
59 Revision 1.25  2007/01/15 19:13:52  acolla
60 Moved some AliInfo to AliDebug in SendMail function
61
62 Revision 1.21  2006/12/07 08:51:26  jgrosseo
63 update (alberto):
64 table, db names in ldap configuration
65 added GRP preprocessor
66 DCS data can also be retrieved by data point
67
68 Revision 1.20  2006/11/16 16:16:48  jgrosseo
69 introducing strict run ordering flag
70 removed giving preprocessor name to preprocessor, they have to know their name themselves ;-)
71
72 Revision 1.19  2006/11/06 14:23:04  jgrosseo
73 major update (Alberto)
74 o) reading of run parameters from the logbook
75 o) online offline naming conversion
76 o) standalone DCSclient package
77
78 Revision 1.18  2006/10/20 15:22:59  jgrosseo
79 o) Adding time out to the execution of the preprocessors: The Shuttle forks and the parent process monitors the child
80 o) Merging Collect, CollectAll, CollectNew function
81 o) Removing implementation of empty copy constructors (declaration still there!)
82
83 Revision 1.17  2006/10/05 16:20:55  jgrosseo
84 adapting to new CDB classes
85
86 Revision 1.16  2006/10/05 15:46:26  jgrosseo
87 applying to the new interface
88
89 Revision 1.15  2006/10/02 16:38:39  jgrosseo
90 update (alberto):
91 fixed memory leaks
92 storing of objects that failed to be stored to the grid before
93 interfacing of shuttle status table in daq system
94
95 Revision 1.14  2006/08/29 09:16:05  jgrosseo
96 small update
97
98 Revision 1.13  2006/08/15 10:50:00  jgrosseo
99 effc++ corrections (alberto)
100
101 Revision 1.12  2006/08/08 14:19:29  jgrosseo
102 Update to shuttle classes (Alberto)
103
104 - Possibility to set the full object's path in the Preprocessor's and
105 Shuttle's  Store functions
106 - Possibility to extend the object's run validity in the same classes
107 ("startValidity" and "validityInfinite" parameters)
108 - Implementation of the StoreReferenceData function to store reference
109 data in a dedicated CDB storage.
110
111 Revision 1.11  2006/07/21 07:37:20  jgrosseo
112 last run is stored after each run
113
114 Revision 1.10  2006/07/20 09:54:40  jgrosseo
115 introducing status management: The processing per subdetector is divided into several steps,
116 after each step the status is stored on disk. If the system crashes in any of the steps the Shuttle
117 can keep track of the number of failures and skips further processing after a certain threshold is
118 exceeded. These thresholds can be configured in LDAP.
119
120 Revision 1.9  2006/07/19 10:09:55  jgrosseo
121 new configuration, accesst to DAQ FES (Alberto)
122
123 Revision 1.8  2006/07/11 12:44:36  jgrosseo
124 adding parameters for extended validity range of data produced by preprocessor
125
126 Revision 1.7  2006/07/10 14:37:09  jgrosseo
127 small fix + todo comment
128
129 Revision 1.6  2006/07/10 13:01:41  jgrosseo
130 enhanced storing of last sucessfully processed run (alberto)
131
132 Revision 1.5  2006/07/04 14:59:57  jgrosseo
133 revision of AliDCSValue: Removed wrapper classes, reduced storage size per value by factor 2
134
135 Revision 1.4  2006/06/12 09:11:16  jgrosseo
136 coding conventions (Alberto)
137
138 Revision 1.3  2006/06/06 14:26:40  jgrosseo
139 o) removed files that were moved to STEER
140 o) shuttle updated to follow the new interface (Alberto)
141
142 Revision 1.2  2006/03/07 07:52:34  hristov
143 New version (B.Yordanov)
144
145 Revision 1.6  2005/11/19 17:19:14  byordano
146 RetrieveDATEEntries and RetrieveConditionsData added
147
148 Revision 1.5  2005/11/19 11:09:27  byordano
149 AliShuttle declaration added
150
151 Revision 1.4  2005/11/17 17:47:34  byordano
152 TList changed to TObjArray
153
154 Revision 1.3  2005/11/17 14:43:23  byordano
155 import to local CVS
156
157 Revision 1.1.1.1  2005/10/28 07:33:58  hristov
158 Initial import as subdirectory in AliRoot
159
160 Revision 1.2  2005/09/13 08:41:15  byordano
161 default startTime endTime added
162
163 Revision 1.4  2005/08/30 09:13:02  byordano
164 some docs added
165
166 Revision 1.3  2005/08/29 21:15:47  byordano
167 some docs added
168
169 */
170
171 //
172 // This class is the main manager for AliShuttle. 
173 // It organizes the data retrieval from DCS and call the 
174 // interface methods of AliPreprocessor.
175 // For every detector in AliShuttleConfgi (see AliShuttleConfig),
176 // data for its set of aliases is retrieved. If there is registered
177 // AliPreprocessor for this detector then it will be used
178 // accroding to the schema (see AliPreprocessor).
179 // If there isn't registered AliPreprocessor than the retrieved
180 // data is stored automatically to the undelying AliCDBStorage.
181 // For detSpec is used the alias name.
182 //
183
184 #include "AliShuttle.h"
185
186 #include "AliCDBManager.h"
187 #include "AliCDBStorage.h"
188 #include "AliCDBId.h"
189 #include "AliCDBRunRange.h"
190 #include "AliCDBPath.h"
191 #include "AliCDBEntry.h"
192 #include "AliShuttleConfig.h"
193 #include "DCSClient/AliDCSClient.h"
194 #include "AliLog.h"
195 #include "AliPreprocessor.h"
196 #include "AliShuttleStatus.h"
197 #include "AliShuttleLogbookEntry.h"
198
199 #include <TSystem.h>
200 #include <TObject.h>
201 #include <TString.h>
202 #include <TTimeStamp.h>
203 #include <TObjString.h>
204 #include <TSQLServer.h>
205 #include <TSQLResult.h>
206 #include <TSQLRow.h>
207 #include <TMutex.h>
208 #include <TSystemDirectory.h>
209 #include <TSystemFile.h>
210 #include <TFileMerger.h>
211 #include <TGrid.h>
212 #include <TGridResult.h>
213
214 #include <TMonaLisaWriter.h>
215
216 #include <fstream>
217
218 #include <sys/types.h>
219 #include <sys/wait.h>
220
221 ClassImp(AliShuttle)
222
223 //______________________________________________________________________________________________
224 AliShuttle::AliShuttle(const AliShuttleConfig* config,
225                 UInt_t timeout, Int_t retries):
226 fConfig(config),
227 fTimeout(timeout), fRetries(retries),
228 fPreprocessorMap(),
229 fLogbookEntry(0),
230 fCurrentDetector(),
231 fStatusEntry(0),
232 fMonitoringMutex(0),
233 fLastActionTime(0),
234 fLastAction(),
235 fMonaLisa(0),
236 fTestMode(kNone),
237 fReadTestMode(kFALSE)
238 {
239         //
240         // config: AliShuttleConfig used
241         // timeout: timeout used for AliDCSClient connection
242         // retries: the number of retries in case of connection error.
243         //
244
245         if (!fConfig->IsValid()) AliFatal("********** !!!!! Invalid configuration !!!!! **********");
246         for(int iSys=0;iSys<4;iSys++) {
247                 fServer[iSys]=0;
248                 if (iSys < 3)
249                         fFXSlist[iSys].SetOwner(kTRUE);
250         }
251         fPreprocessorMap.SetOwner(kTRUE);
252
253         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
254                 fFirstUnprocessed[iDet] = kFALSE;
255
256         fMonitoringMutex = new TMutex();
257 }
258
259 //______________________________________________________________________________________________
260 AliShuttle::~AliShuttle()
261 {
262         //
263         // destructor
264         //
265
266         fPreprocessorMap.DeleteAll();
267         for(int iSys=0;iSys<4;iSys++)
268                 if(fServer[iSys]) {
269                         fServer[iSys]->Close();
270                         delete fServer[iSys];
271                         fServer[iSys] = 0;
272                 }
273
274         if (fStatusEntry){
275                 delete fStatusEntry;
276                 fStatusEntry = 0;
277         }
278         
279         if (fMonitoringMutex) 
280         {
281                 delete fMonitoringMutex;
282                 fMonitoringMutex = 0;
283         }
284 }
285
286 //______________________________________________________________________________________________
287 void AliShuttle::RegisterPreprocessor(AliPreprocessor* preprocessor)
288 {
289         //
290         // Registers new AliPreprocessor.
291         // It uses GetName() for indentificator of the pre processor.
292         // The pre processor is registered it there isn't any other
293         // with the same identificator (GetName()).
294         //
295
296         const char* detName = preprocessor->GetName();
297         if(GetDetPos(detName) < 0)
298                 AliFatal(Form("********** !!!!! Invalid detector name: %s !!!!! **********", detName));
299
300         if (fPreprocessorMap.GetValue(detName)) {
301                 AliWarning(Form("AliPreprocessor %s is already registered!", detName));
302                 return;
303         }
304
305         fPreprocessorMap.Add(new TObjString(detName), preprocessor);
306 }
307 //______________________________________________________________________________________________
308 Bool_t AliShuttle::Store(const AliCDBPath& path, TObject* object,
309                 AliCDBMetaData* metaData, Int_t validityStart, Bool_t validityInfinite)
310 {
311         // Stores a CDB object in the storage for offline reconstruction. Objects that are not needed for
312         // offline reconstruction, but should be stored anyway (e.g. for debugging) should NOT be stored
313         // using this function. Use StoreReferenceData instead!
314         // It calls StoreLocally function which temporarily stores the data locally; when the preprocessor
315         // finishes the data are transferred to the main storage (Grid).
316
317         return StoreLocally(fgkLocalCDB, path, object, metaData, validityStart, validityInfinite);
318 }
319
320 //______________________________________________________________________________________________
321 Bool_t AliShuttle::StoreReferenceData(const AliCDBPath& path, TObject* object, AliCDBMetaData* metaData)
322 {
323         // Stores a CDB object in the storage for reference data. This objects will not be available during
324         // offline reconstrunction. Use this function for reference data only!
325         // It calls StoreLocally function which temporarily stores the data locally; when the preprocessor
326         // finishes the data are transferred to the main storage (Grid).
327
328         return StoreLocally(fgkLocalRefStorage, path, object, metaData);
329 }
330
331 //______________________________________________________________________________________________
332 Bool_t AliShuttle::StoreLocally(const TString& localUri,
333                         const AliCDBPath& path, TObject* object, AliCDBMetaData* metaData,
334                         Int_t validityStart, Bool_t validityInfinite)
335 {
336         // Store object temporarily in local storage. Parameters are passed by Store and StoreReferenceData functions.
337         // when the preprocessor finishes the data are transferred to the main storage (Grid).
338         // The parameters are:
339         //   1) Uri of the backup storage (Local)
340         //   2) the object's path.
341         //   3) the object to be stored
342         //   4) the metaData to be associated with the object
343         //   5) the validity start run number w.r.t. the current run,
344         //      if the data is valid only for this run leave the default 0
345         //   6) specifies if the calibration data is valid for infinity (this means until updated),
346         //      typical for calibration runs, the default is kFALSE
347         //
348         // returns 0 if fail, 1 otherwise
349
350         if (fTestMode & kErrorStorage)
351         {
352                 Log(fCurrentDetector, "StoreLocally - In TESTMODE - Simulating error while storing locally");
353                 return kFALSE;
354         }
355         
356         const char* cdbType = (localUri == fgkLocalCDB) ? "CDB" : "Reference";
357
358         Int_t firstRun = GetCurrentRun() - validityStart;
359         if(firstRun < 0) {
360                 AliWarning("First valid run happens to be less than 0! Setting it to 0.");
361                 firstRun=0;
362         }
363
364         Int_t lastRun = -1;
365         if(validityInfinite) {
366                 lastRun = AliCDBRunRange::Infinity();
367         } else {
368                 lastRun = GetCurrentRun();
369         }
370
371         // Version is set to current run, it will be used later to transfer data to Grid
372         AliCDBId id(path, firstRun, lastRun, GetCurrentRun(), -1);
373
374         if(! dynamic_cast<TObjString*> (metaData->GetProperty("RunUsed(TObjString)"))){
375                 TObjString runUsed = Form("%d", GetCurrentRun());
376                 metaData->SetProperty("RunUsed(TObjString)", runUsed.Clone());
377         }
378
379         Bool_t result = kFALSE;
380
381         if (!(AliCDBManager::Instance()->GetStorage(localUri))) {
382                 Log("SHUTTLE", Form("StoreLocally - Cannot activate local %s storage", cdbType));
383         } else {
384                 result = AliCDBManager::Instance()->GetStorage(localUri)
385                                         ->Put(object, id, metaData);
386         }
387
388         if(!result) {
389
390                 Log(fCurrentDetector, Form("StoreLocally - Can't store object <%s>!", id.ToString().Data()));
391         }
392
393         return result;
394 }
395
396 //______________________________________________________________________________________________
397 Bool_t AliShuttle::StoreOCDB()
398 {
399         //
400         // Called when preprocessor ends successfully or when previous storage attempt failed (kStoreError status)
401         // Calls underlying StoreOCDB(const char*) function twice, for OCDB and Reference storage.
402         // Then calls StoreRefFilesToGrid to store reference files. 
403         //
404         
405         if (fTestMode & kErrorGrid)
406         {
407                 Log("SHUTTLE", "StoreOCDB - In TESTMODE - Simulating error while storing in the Grid");
408                 Log(fCurrentDetector, "StoreOCDB - In TESTMODE - Simulating error while storing in the Grid");
409                 return kFALSE;
410         }
411         
412         AliInfo("Storing OCDB data ...");
413         Bool_t resultCDB = StoreOCDB(fgkMainCDB);
414
415         AliInfo("Storing reference data ...");
416         Bool_t resultRef = StoreOCDB(fgkMainRefStorage);
417         
418         AliInfo("Storing reference files ...");
419         Bool_t resultRefFiles = StoreRefFilesToGrid();
420         
421         return resultCDB && resultRef && resultRefFiles;
422 }
423
424 //______________________________________________________________________________________________
425 Bool_t AliShuttle::StoreOCDB(const TString& gridURI)
426 {
427         //
428         // Called by StoreOCDB(), performs actual storage to the main OCDB and reference storages (Grid)
429         //
430
431         TObjArray* gridIds=0;
432
433         Bool_t result = kTRUE;
434
435         const char* type = 0;
436         TString localURI;
437         if(gridURI == fgkMainCDB) {
438                 type = "OCDB";
439                 localURI = fgkLocalCDB;
440         } else if(gridURI == fgkMainRefStorage) {
441                 type = "reference";
442                 localURI = fgkLocalRefStorage;
443         } else {
444                 AliError(Form("Invalid storage URI: %s", gridURI.Data()));
445                 return kFALSE;
446         }
447
448         AliCDBManager* man = AliCDBManager::Instance();
449
450         AliCDBStorage *gridSto = man->GetStorage(gridURI);
451         if(!gridSto) {
452                 Log("SHUTTLE",
453                         Form("StoreOCDB - cannot activate main %s storage", type));
454                 return kFALSE;
455         }
456
457         gridIds = gridSto->GetQueryCDBList();
458
459         // get objects previously stored in local CDB
460         AliCDBStorage *localSto = man->GetStorage(localURI);
461         if(!localSto) {
462                 Log("SHUTTLE",
463                         Form("StoreOCDB - cannot activate local %s storage", type));
464                 return kFALSE;
465         }
466         AliCDBPath aPath(GetOfflineDetName(fCurrentDetector.Data()),"*","*");
467         // Local objects were stored with current run as Grid version!
468         TList* localEntries = localSto->GetAll(aPath.GetPath(), GetCurrentRun(), GetCurrentRun());
469         localEntries->SetOwner(1);
470
471         // loop on local stored objects
472         TIter localIter(localEntries);
473         AliCDBEntry *aLocEntry = 0;
474         while((aLocEntry = dynamic_cast<AliCDBEntry*> (localIter.Next()))){
475                 aLocEntry->SetOwner(1);
476                 AliCDBId aLocId = aLocEntry->GetId();
477                 aLocEntry->SetVersion(-1);
478                 aLocEntry->SetSubVersion(-1);
479
480                 // If local object is valid up to infinity we store it only if it is
481                 // the first unprocessed run!
482                 if (aLocId.GetLastRun() == AliCDBRunRange::Infinity() &&
483                         !fFirstUnprocessed[GetDetPos(fCurrentDetector)])
484                 {
485                         Log("SHUTTLE", Form("StoreOCDB - %s: object %s has validity infinite but "
486                                                 "there are previous unprocessed runs!",
487                                                 fCurrentDetector.Data(), aLocId.GetPath().Data()));
488                         continue;
489                 }
490
491                 // loop on Grid valid Id's
492                 Bool_t store = kTRUE;
493                 TIter gridIter(gridIds);
494                 AliCDBId* aGridId = 0;
495                 while((aGridId = dynamic_cast<AliCDBId*> (gridIter.Next()))){
496                         if(aGridId->GetPath() != aLocId.GetPath()) continue;
497                         // skip all objects valid up to infinity
498                         if(aGridId->GetLastRun() == AliCDBRunRange::Infinity()) continue;
499                         // if we get here, it means there's already some more recent object stored on Grid!
500                         store = kFALSE;
501                         break;
502                 }
503
504                 // If we get here, the file can be stored!
505                 Bool_t storeOk = gridSto->Put(aLocEntry);
506                 if(!store || storeOk){
507
508                         if (!store)
509                         {
510                                 Log(fCurrentDetector.Data(),
511                                         Form("StoreOCDB - A more recent object already exists in %s storage: <%s>",
512                                                 type, aGridId->ToString().Data()));
513                         } else {
514                                 Log("SHUTTLE",
515                                         Form("StoreOCDB - Object <%s> successfully put into %s storage",
516                                                 aLocId.ToString().Data(), type));
517                         }
518
519                         // removing local filename...
520                         TString filename;
521                         localSto->IdToFilename(aLocId, filename);
522                         AliInfo(Form("Removing local file %s", filename.Data()));
523                         RemoveFile(filename.Data());
524                         continue;
525                 } else  {
526                         Log("SHUTTLE",
527                                 Form("StoreOCDB - Grid %s storage of object <%s> failed",
528                                         type, aLocId.ToString().Data()));
529                         result = kFALSE;
530                 }
531         }
532         localEntries->Clear();
533
534         return result;
535 }
536
537 //______________________________________________________________________________________________
538 Bool_t AliShuttle::StoreReferenceFile(const char* detector, const char* localFile, const char* gridFileName)
539 {
540         //
541         // Stores reference file directly (without opening it). This function stores the file locally
542         // renaming it to #runNumber_gridFileName.
543         //
544         
545         if (fTestMode & kErrorStorage)
546         {
547                 Log(fCurrentDetector, "StoreReferenceFile - In TESTMODE - Simulating error while storing locally");
548                 return kFALSE;
549         }
550         
551         AliCDBManager* man = AliCDBManager::Instance();
552         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
553         
554         TString localBaseFolder = sto->GetBaseFolder();
555         
556         TString targetDir;
557         targetDir.Form("%s/%s", localBaseFolder.Data(), detector);
558         
559         TString target;
560         target.Form("%s/%d_%s", targetDir.Data(), GetCurrentRun(), gridFileName);
561         
562         Int_t result = gSystem->GetPathInfo(targetDir, 0, (Long64_t*) 0, 0, 0);
563         if (result)
564         {
565                 result = gSystem->mkdir(targetDir, kTRUE);
566                 if (result != 0)
567                 {
568                         Log("SHUTTLE", Form("StoreReferenceFile - Error creating base directory %s", targetDir.Data()));
569                         return kFALSE;
570                 }
571         }
572                 
573         result = gSystem->CopyFile(localFile, target);
574
575         if (result == 0)
576         {
577                 Log("SHUTTLE", Form("StoreReferenceFile - Stored file %s locally to %s", localFile, target.Data()));
578                 return kTRUE;
579         }
580         else
581         {
582                 Log("SHUTTLE", Form("StoreReferenceFile - Storing file %s locally to %s failed", localFile, target.Data()));
583                 return kFALSE;
584         }       
585 }
586
587 //______________________________________________________________________________________________
588 Bool_t AliShuttle::StoreRefFilesToGrid()
589 {
590         //
591         // Transfers the reference file to the Grid.
592         // The final full path of the file is:
593         // gridBaseReferenceFolder/DET/#runNumber_gridFileName
594         //
595         
596         AliCDBManager* man = AliCDBManager::Instance();
597         AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
598         if (!sto)
599                 return kFALSE;
600         TString localBaseFolder = sto->GetBaseFolder();
601                 
602         TString dir;
603         dir.Form("%s/%s", localBaseFolder.Data(), GetOfflineDetName(fCurrentDetector));
604         
605         AliCDBStorage* gridSto = man->GetStorage(fgkMainRefStorage);
606         if (!gridSto)
607                 return kFALSE;
608         TString gridBaseFolder = gridSto->GetBaseFolder();
609         TString alienDir;
610         alienDir.Form("%s%s", gridBaseFolder.Data(), GetOfflineDetName(fCurrentDetector));
611         
612         if (!gGrid) 
613                 return kFALSE;
614         
615         TString begin;
616         begin.Form("%d_", GetCurrentRun());
617         
618         TSystemDirectory* baseDir = new TSystemDirectory("/", dir);
619         if (!baseDir)
620                 return kTRUE;
621                 
622         TList* dirList            = baseDir->GetListOfFiles();
623         if (!dirList)
624         {
625                 delete baseDir;
626                 return kTRUE;
627         }
628                 
629         Int_t nDirs               = dirList->GetEntries();
630         
631         Bool_t success = kTRUE;
632         Bool_t first = kTRUE;
633         
634         for (Int_t iDir=0; iDir<nDirs; ++iDir)
635         {
636                 TSystemFile* entry = dynamic_cast<TSystemFile*> (dirList->At(iDir));
637                 if (!entry)
638                         continue;
639                         
640                 if (entry->IsDirectory())
641                         continue;
642                         
643                 TString fileName(entry->GetName());
644                 if (!fileName.BeginsWith(begin))
645                         continue;
646                         
647                 if (first)
648                 {
649                         first = kFALSE;
650                         // check that DET folder exists, otherwise create it
651                         TGridResult* result = gGrid->Ls(alienDir.Data(), "a");
652                         
653                         if (!result)
654                                 return kFALSE;
655                         
656                         if (!result->GetFileName(0)) 
657                         {
658                                 if (!gGrid->Mkdir(alienDir.Data(),"",0))
659                                 {
660                                         Log("SHUTTLE", Form("StoreRefFilesToGrid - Cannot create directory %s",
661                                                         alienDir.Data()));
662                                         delete baseDir;
663                                         return kFALSE;
664                                 }
665                                 
666                         }
667                 }
668                         
669                 TString fullLocalPath;
670                 fullLocalPath.Form("%s/%s", dir.Data(), fileName.Data());
671                 
672                 TString fullGridPath;
673                 fullGridPath.Form("alien://%s/%s", alienDir.Data(), fileName.Data());
674
675                 Log("SHUTTLE", Form("StoreRefFilesToGrid - Copying local file %s to %s", fullLocalPath.Data(), fullGridPath.Data()));
676                 
677                 TFileMerger fileMerger;
678                 Bool_t result = fileMerger.Cp(fullLocalPath, fullGridPath);
679                 
680                 if (result)
681                 {
682                         Log("SHUTTLE", Form("StoreRefFilesToGrid - Copying local file %s to %s succeeded", fullLocalPath.Data(), fullGridPath.Data()));
683                         RemoveFile(fullLocalPath);
684                 }
685                 else
686                 {
687                         Log("SHUTTLE", Form("StoreRefFilesToGrid - Copying local file %s to %s failed", fullLocalPath.Data(), fullGridPath.Data()));
688                         success = kFALSE;
689                 }
690         }
691         
692         delete baseDir;
693         
694         return success;
695 }
696
697 //______________________________________________________________________________________________
698 void AliShuttle::CleanLocalStorage(const TString& uri)
699 {
700         //
701         // Called in case the preprocessor is declared failed. Remove remaining objects from the local storages.
702         //
703
704         const char* type = 0;
705         if(uri == fgkLocalCDB) {
706                 type = "OCDB";
707         } else if(uri == fgkLocalRefStorage) {
708                 type = "reference";
709         } else {
710                 AliError(Form("Invalid storage URI: %s", uri.Data()));
711                 return;
712         }
713
714         AliCDBManager* man = AliCDBManager::Instance();
715
716         // open local storage
717         AliCDBStorage *localSto = man->GetStorage(uri);
718         if(!localSto) {
719                 Log("SHUTTLE",
720                         Form("CleanLocalStorage - cannot activate local %s storage", type));
721                 return;
722         }
723
724         TString filename(Form("%s/%s/*/Run*_v%d_s*.root",
725                 localSto->GetBaseFolder().Data(), fCurrentDetector.Data(), GetCurrentRun()));
726
727         AliInfo(Form("filename = %s", filename.Data()));
728
729         AliInfo(Form("Removing remaining local files from run %d and detector %s ...",
730                 GetCurrentRun(), fCurrentDetector.Data()));
731
732         RemoveFile(filename.Data());
733
734 }
735
736 //______________________________________________________________________________________________
737 void AliShuttle::RemoveFile(const char* filename)
738 {
739         //
740         // removes local file
741         //
742
743         TString command(Form("rm -f %s", filename));
744
745         Int_t result = gSystem->Exec(command.Data());
746         if(result != 0)
747         {
748                 Log("SHUTTLE", Form("RemoveFile - %s: Cannot remove file %s!",
749                         fCurrentDetector.Data(), filename));
750         }
751 }
752
753 //______________________________________________________________________________________________
754 AliShuttleStatus* AliShuttle::ReadShuttleStatus()
755 {
756         //
757         // Reads the AliShuttleStatus from the CDB
758         //
759
760         if (fStatusEntry){
761                 delete fStatusEntry;
762                 fStatusEntry = 0;
763         }
764
765         fStatusEntry = AliCDBManager::Instance()->GetStorage(GetLocalCDB())
766                 ->Get(Form("/SHUTTLE/STATUS/%s", fCurrentDetector.Data()), GetCurrentRun());
767
768         if (!fStatusEntry) return 0;
769         fStatusEntry->SetOwner(1);
770
771         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
772         if (!status) {
773                 AliError("Invalid object stored to CDB!");
774                 return 0;
775         }
776
777         return status;
778 }
779
780 //______________________________________________________________________________________________
781 Bool_t AliShuttle::WriteShuttleStatus(AliShuttleStatus* status)
782 {
783         //
784         // writes the status for one subdetector
785         //
786
787         if (fStatusEntry){
788                 delete fStatusEntry;
789                 fStatusEntry = 0;
790         }
791
792         Int_t run = GetCurrentRun();
793
794         AliCDBId id(AliCDBPath("SHUTTLE", "STATUS", fCurrentDetector), run, run);
795
796         fStatusEntry = new AliCDBEntry(status, id, new AliCDBMetaData);
797         fStatusEntry->SetOwner(1);
798
799         UInt_t result = AliCDBManager::Instance()->GetStorage(fgkLocalCDB)->Put(fStatusEntry);
800
801         if (!result) {
802                 Log("SHUTTLE", Form("WriteShuttleStatus - Failed for %s, run %d",
803                                                 fCurrentDetector.Data(), run));
804                 return kFALSE;
805         }
806         
807         SendMLInfo();
808
809         return kTRUE;
810 }
811
812 //______________________________________________________________________________________________
813 void AliShuttle::UpdateShuttleStatus(AliShuttleStatus::Status newStatus, Bool_t increaseCount)
814 {
815         //
816         // changes the AliShuttleStatus for the given detector and run to the given status
817         //
818
819         if (!fStatusEntry){
820                 AliError("UNEXPECTED: fStatusEntry empty");
821                 return;
822         }
823
824         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
825
826         if (!status){
827                 Log("SHUTTLE", "UNEXPECTED: status could not be read from current CDB entry");
828                 return;
829         }
830
831         TString actionStr = Form("UpdateShuttleStatus - %s: Changing state from %s to %s",
832                                 fCurrentDetector.Data(),
833                                 status->GetStatusName(),
834                                 status->GetStatusName(newStatus));
835         Log("SHUTTLE", actionStr);
836         SetLastAction(actionStr);
837
838         status->SetStatus(newStatus);
839         if (increaseCount) status->IncreaseCount();
840
841         AliCDBManager::Instance()->GetStorage(fgkLocalCDB)->Put(fStatusEntry);
842
843         SendMLInfo();
844 }
845
846 //______________________________________________________________________________________________
847 void AliShuttle::SendMLInfo()
848 {
849         //
850         // sends ML information about the current status of the current detector being processed
851         //
852         
853         AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
854         
855         if (!status){
856                 Log("SHUTTLE", "SendMLInfo - UNEXPECTED: status could not be read from current CDB entry");
857                 return;
858         }
859         
860         TMonaLisaText  mlStatus(Form("%s_status", fCurrentDetector.Data()), status->GetStatusName());
861         TMonaLisaValue mlRetryCount(Form("%s_count", fCurrentDetector.Data()), status->GetCount());
862
863         TList mlList;
864         mlList.Add(&mlStatus);
865         mlList.Add(&mlRetryCount);
866
867         fMonaLisa->SendParameters(&mlList);
868 }
869
870 //______________________________________________________________________________________________
871 Bool_t AliShuttle::ContinueProcessing()
872 {
873         // this function reads the AliShuttleStatus information from CDB and
874         // checks if the processing should be continued
875         // if yes it returns kTRUE and updates the AliShuttleStatus with nextStatus
876
877         if (!fConfig->HostProcessDetector(fCurrentDetector)) return kFALSE;
878
879         AliPreprocessor* aPreprocessor =
880                 dynamic_cast<AliPreprocessor*> (fPreprocessorMap.GetValue(fCurrentDetector));
881         if (!aPreprocessor)
882         {
883                 AliInfo(Form("%s: no preprocessor registered", fCurrentDetector.Data()));
884                 return kFALSE;
885         }
886
887         AliShuttleLogbookEntry::Status entryStatus =
888                 fLogbookEntry->GetDetectorStatus(fCurrentDetector);
889
890         if(entryStatus != AliShuttleLogbookEntry::kUnprocessed) {
891                 AliInfo(Form("ContinueProcessing - %s is %s",
892                                 fCurrentDetector.Data(),
893                                 fLogbookEntry->GetDetectorStatusName(entryStatus)));
894                 return kFALSE;
895         }
896
897         // if we get here, according to Shuttle logbook subdetector is in UNPROCESSED state
898
899         // check if current run is first unprocessed run for current detector
900         if (fConfig->StrictRunOrder(fCurrentDetector) &&
901                 !fFirstUnprocessed[GetDetPos(fCurrentDetector)])
902         {
903                 Log("SHUTTLE", Form("ContinueProcessing - %s requires strict run ordering but this is not the first unprocessed run!"));
904                 return kFALSE;
905         }
906
907         AliShuttleStatus* status = ReadShuttleStatus();
908         if (!status) {
909                 // first time
910                 Log("SHUTTLE", Form("ContinueProcessing - %s: Processing first time",
911                                 fCurrentDetector.Data()));
912                 status = new AliShuttleStatus(AliShuttleStatus::kStarted);
913                 return WriteShuttleStatus(status);
914         }
915
916         // The following two cases shouldn't happen if Shuttle Logbook was correctly updated.
917         // If it happens it may mean Logbook updating failed... let's do it now!
918         if (status->GetStatus() == AliShuttleStatus::kDone ||
919             status->GetStatus() == AliShuttleStatus::kFailed){
920                 Log("SHUTTLE", Form("ContinueProcessing - %s is already %s. Updating Shuttle Logbook",
921                                         fCurrentDetector.Data(),
922                                         status->GetStatusName(status->GetStatus())));
923                 UpdateShuttleLogbook(fCurrentDetector.Data(),
924                                         status->GetStatusName(status->GetStatus()));
925                 return kFALSE;
926         }
927
928         if (status->GetStatus() == AliShuttleStatus::kStoreError) {
929                 Log("SHUTTLE",
930                         Form("ContinueProcessing - %s: Grid storage of one or more objects failed. Trying again now",
931                                 fCurrentDetector.Data()));
932                 UpdateShuttleStatus(AliShuttleStatus::kStoreStarted);
933                 if (StoreOCDB()){
934                         Log("SHUTTLE", Form("ContinueProcessing - %s: all objects successfully stored into main storage",
935                                 fCurrentDetector.Data()));
936                         UpdateShuttleStatus(AliShuttleStatus::kDone);
937                         UpdateShuttleLogbook(fCurrentDetector.Data(), "DONE");
938                 } else {
939                         Log("SHUTTLE",
940                                 Form("ContinueProcessing - %s: Grid storage failed again",
941                                         fCurrentDetector.Data()));
942                         UpdateShuttleStatus(AliShuttleStatus::kStoreError);
943                 }
944                 return kFALSE;
945         }
946
947         // if we get here, there is a restart
948         Bool_t cont = kFALSE;
949
950         // abort conditions
951         if (status->GetCount() >= fConfig->GetMaxRetries()) {
952                 Log("SHUTTLE", Form("ContinueProcessing - %s failed %d times in status %s - "
953                                 "Updating Shuttle Logbook", fCurrentDetector.Data(),
954                                 status->GetCount(), status->GetStatusName()));
955                 UpdateShuttleLogbook(fCurrentDetector.Data(), "FAILED");
956                 UpdateShuttleStatus(AliShuttleStatus::kFailed);
957
958                 // there may still be objects in local OCDB and reference storage
959                 // and FXS databases may be not updated: do it now!
960                 
961                 // TODO Currently disabled, we want to keep files in case of failure!
962                 // CleanLocalStorage(fgkLocalCDB);
963                 // CleanLocalStorage(fgkLocalRefStorage);
964                 // UpdateTableFailCase();
965                 
966                 // Send mail to detector expert!
967                 AliInfo(Form("Sending mail to %s expert...", fCurrentDetector.Data()));
968                 if (!SendMail())
969                         Log("SHUTTLE", Form("ContinueProcessing - Could not send mail to %s expert",
970                                         fCurrentDetector.Data()));
971
972         } else {
973                 Log("SHUTTLE", Form("ContinueProcessing - %s: restarting. "
974                                 "Aborted before with %s. Retry number %d.", fCurrentDetector.Data(),
975                                 status->GetStatusName(), status->GetCount()));
976                 Bool_t increaseCount = kTRUE;
977                 if (status->GetStatus() == AliShuttleStatus::kDCSError || status->GetStatus() == AliShuttleStatus::kDCSStarted)
978                         increaseCount = kFALSE;
979                 UpdateShuttleStatus(AliShuttleStatus::kStarted, increaseCount);
980                 cont = kTRUE;
981         }
982
983         return cont;
984 }
985
986 //______________________________________________________________________________________________
987 Bool_t AliShuttle::Process(AliShuttleLogbookEntry* entry)
988 {
989         //
990         // Makes data retrieval for all detectors in the configuration.
991         // entry: Shuttle logbook entry, contains run paramenters and status of detectors
992         // (Unprocessed, Inactive, Failed or Done).
993         // Returns kFALSE in case of error occured and kTRUE otherwise
994         //
995
996         if (!entry) return kFALSE;
997
998         fLogbookEntry = entry;
999
1000         AliInfo(Form("\n\n \t\t\t^*^*^*^*^*^*^*^*^*^*^*^* run %d: START ^*^*^*^*^*^*^*^*^*^*^*^* \n",
1001                                         GetCurrentRun()));
1002
1003         // create ML instance that monitors this run
1004         fMonaLisa = new TMonaLisaWriter(Form("%d", GetCurrentRun()), "SHUTTLE", "aliendb1.cern.ch");
1005         // disable monitoring of other parameters that come e.g. from TFile
1006         gMonitoringWriter = 0;
1007
1008         // Send the information to ML
1009         TMonaLisaText  mlStatus("SHUTTLE_status", "Processing");
1010         TMonaLisaText  mlRunType("SHUTTLE_runtype", Form("%s (%s)", entry->GetRunType(), entry->GetRunParameter("log")));
1011
1012         TList mlList;
1013         mlList.Add(&mlStatus);
1014         mlList.Add(&mlRunType);
1015
1016         fMonaLisa->SendParameters(&mlList);
1017
1018         if (fLogbookEntry->IsDone())
1019         {
1020                 Log("SHUTTLE","Process - Shuttle is already DONE. Updating logbook");
1021                 UpdateShuttleLogbook("shuttle_done");
1022                 fLogbookEntry = 0;
1023                 return kTRUE;
1024         }
1025
1026         // read test mode if flag is set
1027         if (fReadTestMode)
1028         {
1029                 fTestMode = kNone;
1030                 TString logEntry(entry->GetRunParameter("log"));
1031                 //printf("log entry = %s\n", logEntry.Data());
1032                 TString searchStr("Testmode: ");
1033                 Int_t pos = logEntry.Index(searchStr.Data());
1034                 //printf("%d\n", pos);
1035                 if (pos >= 0)
1036                 {
1037                         TSubString subStr = logEntry(pos + searchStr.Length(), logEntry.Length());
1038                         //printf("%s\n", subStr.String().Data());
1039                         TString newStr(subStr.Data());
1040                         TObjArray* token = newStr.Tokenize(' ');
1041                         if (token)
1042                         {
1043                                 //token->Print();
1044                                 TObjString* tmpStr = dynamic_cast<TObjString*> (token->First());
1045                                 if (tmpStr)
1046                                 {
1047                                         Int_t testMode = tmpStr->String().Atoi();
1048                                         if (testMode > 0)
1049                                         {
1050                                                 Log("SHUTTLE", Form("Enabling test mode %d", testMode));
1051                                                 SetTestMode((TestMode) testMode);
1052                                         }
1053                                 }
1054                                 delete token;          
1055                         }
1056                 }
1057         }
1058         
1059         Log("SHUTTLE", Form("The test mode flag is %d", (Int_t) fTestMode));
1060         
1061         fLogbookEntry->Print("all");
1062
1063         // Initialization
1064         Bool_t hasError = kFALSE;
1065
1066         AliCDBStorage *mainCDBSto = AliCDBManager::Instance()->GetStorage(fgkMainCDB);
1067         if(mainCDBSto) mainCDBSto->QueryCDB(GetCurrentRun());
1068         AliCDBStorage *mainRefSto = AliCDBManager::Instance()->GetStorage(fgkMainRefStorage);
1069         if(mainRefSto) mainRefSto->QueryCDB(GetCurrentRun());
1070
1071         // Loop on detectors in the configuration
1072         TIter iter(fConfig->GetDetectors());
1073         TObjString* aDetector = 0;
1074
1075         while ((aDetector = (TObjString*) iter.Next()))
1076         {
1077                 fCurrentDetector = aDetector->String();
1078
1079                 if (ContinueProcessing() == kFALSE) continue;
1080
1081                 AliInfo(Form("\n\n \t\t\t****** run %d - %s: START  ******",
1082                                                 GetCurrentRun(), aDetector->GetName()));
1083
1084                 for(Int_t iSys=0;iSys<3;iSys++) fFXSCalled[iSys]=kFALSE;
1085
1086                 Log(fCurrentDetector.Data(), "Starting processing");
1087
1088                 Int_t pid = fork();
1089
1090                 if (pid < 0)
1091                 {
1092                         Log("SHUTTLE", "ERROR: Forking failed");
1093                 }
1094                 else if (pid > 0)
1095                 {
1096                         // parent
1097                         AliInfo(Form("In parent process of %d - %s: Starting monitoring",
1098                                                         GetCurrentRun(), aDetector->GetName()));
1099
1100                         Long_t begin = time(0);
1101
1102                         int status; // to be used with waitpid, on purpose an int (not Int_t)!
1103                         while (waitpid(pid, &status, WNOHANG) == 0)
1104                         {
1105                                 Long_t expiredTime = time(0) - begin;
1106
1107                                 if (expiredTime > fConfig->GetPPTimeOut())
1108                                 {
1109                                         TString tmp;
1110                                         tmp.Form("Process of %s time out. Run time: %d seconds. Killing...",
1111                                                                 fCurrentDetector.Data(), expiredTime);
1112                                         Log("SHUTTLE", tmp);
1113                                         Log(fCurrentDetector, tmp);
1114
1115                                         kill(pid, 9);
1116
1117                                         UpdateShuttleStatus(AliShuttleStatus::kPPTimeOut);
1118                                         hasError = kTRUE;
1119
1120                                         gSystem->Sleep(1000);
1121                                 }
1122                                 else
1123                                 {
1124                                         gSystem->Sleep(1000);
1125                                         
1126                                         TString checkStr;
1127                                         checkStr.Form("ps -o vsize --pid %d | tail -n 1", pid);
1128                                         FILE* pipe = gSystem->OpenPipe(checkStr, "r");
1129                                         if (!pipe)
1130                                         {
1131                                                 Log("SHUTTLE", Form("Error: Could not open pipe to %s", checkStr.Data()));
1132                                                 continue;
1133                                         }
1134                                                 
1135                                         char buffer[100];
1136                                         if (!fgets(buffer, 100, pipe))
1137                                         {
1138                                                 Log("SHUTTLE", "Error: ps did not return anything");
1139                                                 gSystem->ClosePipe(pipe);
1140                                                 continue;
1141                                         }
1142                                         gSystem->ClosePipe(pipe);
1143                                         
1144                                         //Log("SHUTTLE", Form("ps returned %s", buffer));
1145                                         
1146                                         Int_t mem = 0;
1147                                         if ((sscanf(buffer, "%d\n", &mem) != 1) || !mem)
1148                                         {
1149                                                 Log("SHUTTLE", "Error: Could not parse output of ps");
1150                                                 continue;
1151                                         }
1152                                         
1153                                         if (expiredTime % 60 == 0)
1154                                                 Log("SHUTTLE", Form("%s: Checking process. Run time: %d seconds - Memory consumption: %d KB",
1155                                                                 fCurrentDetector.Data(), expiredTime, mem));
1156                                         
1157                                         if (mem > fConfig->GetPPMaxMem())
1158                                         {
1159                                                 TString tmp;
1160                                                 tmp.Form("Process exceeds maximum allowed memory (%d KB > %d KB). Killing...",
1161                                                         mem, fConfig->GetPPMaxMem());
1162                                                 Log("SHUTTLE", tmp);
1163                                                 Log(fCurrentDetector, tmp);
1164         
1165                                                 kill(pid, 9);
1166         
1167                                                 UpdateShuttleStatus(AliShuttleStatus::kPPOutOfMemory);
1168                                                 hasError = kTRUE;
1169         
1170                                                 gSystem->Sleep(1000);
1171                                         }
1172                                 }
1173                         }
1174
1175                         AliInfo(Form("In parent process of %d - %s: Client has terminated.",
1176                                                                 GetCurrentRun(), aDetector->GetName()));
1177
1178                         if (WIFEXITED(status))
1179                         {
1180                                 Int_t returnCode = WEXITSTATUS(status);
1181
1182                                 Log("SHUTTLE", Form("%s: the return code is %d", fCurrentDetector.Data(),
1183                                                                                 returnCode));
1184
1185                                 if (returnCode == 0) hasError = kTRUE;
1186                         }
1187                 }
1188                 else if (pid == 0)
1189                 {
1190                         // client
1191                         AliInfo(Form("In client process of %d - %s", GetCurrentRun(), aDetector->GetName()));
1192
1193                         Bool_t success = ProcessCurrentDetector();
1194                         if (success) // Preprocessor finished successfully!
1195                         { 
1196                                 // Update time_processed field in FXS DB
1197                                 if (UpdateTable() == kFALSE)
1198                                         Log("SHUTTLE", Form("Process - %s: Could not update FXS databases!"));
1199
1200                                 // Transfer the data from local storage to main storage (Grid)
1201                                 UpdateShuttleStatus(AliShuttleStatus::kStoreStarted);
1202                                 if (StoreOCDB() == kFALSE)
1203                                 {
1204                                         AliInfo(Form("\n \t\t\t****** run %d - %s: STORAGE ERROR ****** \n\n",
1205                                                         GetCurrentRun(), aDetector->GetName()));
1206                                         UpdateShuttleStatus(AliShuttleStatus::kStoreError);
1207                                         success = kFALSE;
1208                                 } else {
1209                                         AliInfo(Form("\n \t\t\t****** run %d - %s: DONE ****** \n\n",
1210                                                         GetCurrentRun(), aDetector->GetName()));
1211                                         UpdateShuttleStatus(AliShuttleStatus::kDone);
1212                                         UpdateShuttleLogbook(fCurrentDetector, "DONE");
1213                                 }
1214                         }
1215
1216                         for (UInt_t iSys=0; iSys<3; iSys++)
1217                         {
1218                                 if (fFXSCalled[iSys]) fFXSlist[iSys].Clear();
1219                         }
1220
1221                         AliInfo(Form("Client process of %d - %s is exiting now with %d.",
1222                                                         GetCurrentRun(), aDetector->GetName(), success));
1223
1224                         // the client exits here
1225                         gSystem->Exit(success);
1226
1227                         AliError("We should never get here!!!");
1228                 }
1229         }
1230
1231         AliInfo(Form("\n\n \t\t\t^*^*^*^*^*^*^*^*^*^*^*^* run %d: FINISH ^*^*^*^*^*^*^*^*^*^*^*^* \n",
1232                                                         GetCurrentRun()));
1233
1234         //check if shuttle is done for this run, if so update logbook
1235         TObjArray checkEntryArray;
1236         checkEntryArray.SetOwner(1);
1237         TString whereClause = Form("where run=%d", GetCurrentRun());
1238         if (!QueryShuttleLogbook(whereClause.Data(), checkEntryArray) || checkEntryArray.GetEntries() == 0) {
1239                 Log("SHUTTLE", Form("Process - Warning: Cannot check status of run %d on Shuttle logbook!",
1240                                                 GetCurrentRun()));
1241                 return hasError == kFALSE;
1242         }
1243
1244         AliShuttleLogbookEntry* checkEntry = dynamic_cast<AliShuttleLogbookEntry*>
1245                                                 (checkEntryArray.At(0));
1246
1247         if (checkEntry)
1248         {
1249                 if (checkEntry->IsDone())
1250                 {
1251                         Log("SHUTTLE","Process - Shuttle is DONE. Updating logbook");
1252                         UpdateShuttleLogbook("shuttle_done");
1253                 }
1254                 else
1255                 {
1256                         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
1257                         {
1258                                 if (checkEntry->GetDetectorStatus(iDet) == AliShuttleLogbookEntry::kUnprocessed)
1259                                 {
1260                                         AliDebug(2, Form("Run %d: setting %s as \"not first time unprocessed\"",
1261                                                         checkEntry->GetRun(), GetDetName(iDet)));
1262                                         fFirstUnprocessed[iDet] = kFALSE;
1263                                 }
1264                         }
1265                 }
1266         }
1267
1268         // remove ML instance
1269         delete fMonaLisa;
1270         fMonaLisa = 0;
1271
1272         fLogbookEntry = 0;
1273
1274         return hasError == kFALSE;
1275 }
1276
1277 //______________________________________________________________________________________________
1278 Bool_t AliShuttle::ProcessCurrentDetector()
1279 {
1280         //
1281         // Makes data retrieval just for a specific detector (fCurrentDetector).
1282         // Threre should be a configuration for this detector.
1283
1284         AliInfo(Form("Retrieving values for %s, run %d", fCurrentDetector.Data(), GetCurrentRun()));
1285
1286         TMap dcsMap;
1287         dcsMap.SetOwner(1);
1288
1289         Bool_t aDCSError = kFALSE;
1290
1291         // call preprocessor
1292         AliPreprocessor* aPreprocessor =
1293                 dynamic_cast<AliPreprocessor*> (fPreprocessorMap.GetValue(fCurrentDetector));
1294
1295         aPreprocessor->Initialize(GetCurrentRun(), GetCurrentStartTime(), GetCurrentEndTime());
1296
1297         Bool_t processDCS = aPreprocessor->ProcessDCS();
1298
1299         if (!processDCS || (fTestMode & kSkipDCS))
1300         {
1301                 Log(fCurrentDetector, "In TESTMODE - Skipping DCS processing!");
1302         } 
1303         else if (fTestMode & kErrorDCS)
1304         {
1305                 Log(fCurrentDetector, "In TESTMODE - Simulating DCS error");
1306                 UpdateShuttleStatus(AliShuttleStatus::kDCSStarted);
1307                 UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1308                 return kFALSE;
1309         } else {
1310
1311                 UpdateShuttleStatus(AliShuttleStatus::kDCSStarted);
1312
1313                 TString host(fConfig->GetDCSHost(fCurrentDetector));
1314                 Int_t port = fConfig->GetDCSPort(fCurrentDetector);
1315
1316                 // Retrieval of Aliases
1317                 TObjString* anAlias = 0;
1318                 Int_t iAlias = 1;
1319                 Int_t nTotAliases= ((TMap*)fConfig->GetDCSAliases(fCurrentDetector))->GetEntries();
1320                 TIter iterAliases(fConfig->GetDCSAliases(fCurrentDetector));
1321                 while ((anAlias = (TObjString*) iterAliases.Next()))
1322                 {
1323                         TObjArray *valueSet = new TObjArray();
1324                         valueSet->SetOwner(1);
1325
1326                         if (((iAlias-1) % 500) == 0 || iAlias == nTotAliases)
1327                                 AliInfo(Form("Querying DCS archive: alias %s (%d of %d)",
1328                                                 anAlias->GetName(), iAlias++, nTotAliases));
1329                         aDCSError = (GetValueSet(host, port, anAlias->String(), valueSet, kAlias) == 0);
1330
1331                         if(!aDCSError)
1332                         {
1333                                 dcsMap.Add(anAlias->Clone(), valueSet);
1334                         } else {
1335                                 Log(fCurrentDetector,
1336                                         Form("ProcessCurrentDetector - Error while retrieving alias %s",
1337                                                 anAlias->GetName()));
1338                                 UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1339                                 dcsMap.DeleteAll();
1340                                 return kFALSE;
1341                         }
1342                 }
1343
1344                 // Retrieval of Data Points
1345                 TObjString* aDP = 0;
1346                 Int_t iDP = 0;
1347                 Int_t nTotDPs= ((TMap*)fConfig->GetDCSDataPoints(fCurrentDetector))->GetEntries();
1348                 TIter iterDP(fConfig->GetDCSDataPoints(fCurrentDetector));
1349                 while ((aDP = (TObjString*) iterDP.Next()))
1350                 {
1351                         TObjArray *valueSet = new TObjArray();
1352                         valueSet->SetOwner(1);
1353                         if (((iDP-1) % 500) == 0 || iDP == nTotDPs)
1354                                 AliInfo(Form("Querying DCS archive: DP %s (%d of %d)",
1355                                                 aDP->GetName(), iDP++, nTotDPs));
1356                         aDCSError = (GetValueSet(host, port, aDP->String(), valueSet, kDP) == 0);
1357
1358                         if(!aDCSError)
1359                         {
1360                                 dcsMap.Add(aDP->Clone(), valueSet);
1361                         } else {
1362                                 Log(fCurrentDetector,
1363                                         Form("ProcessCurrentDetector - Error while retrieving data point %s",
1364                                                 aDP->GetName()));
1365                                 UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1366                                 dcsMap.DeleteAll();
1367                                 return kFALSE;
1368                         }
1369                 }
1370         }
1371
1372         // DCS Archive DB processing successful. Call Preprocessor!
1373         UpdateShuttleStatus(AliShuttleStatus::kPPStarted);
1374
1375         UInt_t returnValue = aPreprocessor->Process(&dcsMap);
1376
1377         if (returnValue > 0) // Preprocessor error!
1378         {
1379                 Log(fCurrentDetector, Form("Preprocessor failed. Process returned %d.", returnValue));
1380                 UpdateShuttleStatus(AliShuttleStatus::kPPError);
1381                 dcsMap.DeleteAll();
1382                 return kFALSE;
1383         }
1384         
1385         // preprocessor ok!
1386         UpdateShuttleStatus(AliShuttleStatus::kPPDone);
1387         Log(fCurrentDetector, Form("ProcessCurrentDetector - %s preprocessor returned success",
1388                                 fCurrentDetector.Data()));
1389
1390         dcsMap.DeleteAll();
1391
1392         return kTRUE;
1393 }
1394
1395 //______________________________________________________________________________________________
1396 Bool_t AliShuttle::QueryShuttleLogbook(const char* whereClause,
1397                 TObjArray& entries)
1398 {
1399         // Query DAQ's Shuttle logbook and fills detector status object.
1400         // Call QueryRunParameters to query DAQ logbook for run parameters.
1401         //
1402
1403         entries.SetOwner(1);
1404
1405         // check connection, in case connect
1406         if(!Connect(3)) return kFALSE;
1407
1408         TString sqlQuery;
1409         sqlQuery = Form("select * from %s %s order by run", fConfig->GetShuttlelbTable(), whereClause);
1410
1411         TSQLResult* aResult = fServer[3]->Query(sqlQuery);
1412         if (!aResult) {
1413                 AliError(Form("Can't execute query <%s>!", sqlQuery.Data()));
1414                 return kFALSE;
1415         }
1416
1417         AliDebug(2,Form("Query = %s", sqlQuery.Data()));
1418
1419         if(aResult->GetRowCount() == 0) {
1420                 AliInfo("No entries in Shuttle Logbook match request");
1421                 delete aResult;
1422                 return kTRUE;
1423         }
1424
1425         // TODO Check field count!
1426         const UInt_t nCols = 22;
1427         if (aResult->GetFieldCount() != (Int_t) nCols) {
1428                 AliError("Invalid SQL result field number!");
1429                 delete aResult;
1430                 return kFALSE;
1431         }
1432
1433         TSQLRow* aRow;
1434         while ((aRow = aResult->Next())) {
1435                 TString runString(aRow->GetField(0), aRow->GetFieldLength(0));
1436                 Int_t run = runString.Atoi();
1437
1438                 AliShuttleLogbookEntry *entry = QueryRunParameters(run);
1439                 if (!entry)
1440                         continue;
1441
1442                 // loop on detectors
1443                 for(UInt_t ii = 0; ii < nCols; ii++)
1444                         entry->SetDetectorStatus(aResult->GetFieldName(ii), aRow->GetField(ii));
1445
1446                 entries.AddLast(entry);
1447                 delete aRow;
1448         }
1449
1450         delete aResult;
1451         return kTRUE;
1452 }
1453
1454 //______________________________________________________________________________________________
1455 AliShuttleLogbookEntry* AliShuttle::QueryRunParameters(Int_t run)
1456 {
1457         //
1458         // Retrieve run parameters written in the DAQ logbook and sets them into AliShuttleLogbookEntry object
1459         //
1460
1461         // check connection, in case connect
1462         if (!Connect(3))
1463                 return 0;
1464
1465         TString sqlQuery;
1466         sqlQuery.Form("select * from %s where run=%d", fConfig->GetDAQlbTable(), run);
1467
1468         TSQLResult* aResult = fServer[3]->Query(sqlQuery);
1469         if (!aResult) {
1470                 AliError(Form("Can't execute query <%s>!", sqlQuery.Data()));
1471                 return 0;
1472         }
1473
1474         if (aResult->GetRowCount() == 0) {
1475                 Log("SHUTTLE", Form("QueryRunParameters - No entry in DAQ Logbook for run %d. Skipping", run));
1476                 delete aResult;
1477                 return 0;
1478         }
1479
1480         if (aResult->GetRowCount() > 1) {
1481                 AliError(Form("More than one entry in DAQ Logbook for run %d. Skipping", run));
1482                 delete aResult;
1483                 return 0;
1484         }
1485
1486         TSQLRow* aRow = aResult->Next();
1487         if (!aRow)
1488         {
1489                 AliError(Form("Could not retrieve row for run %d. Skipping", run));
1490                 delete aResult;
1491                 return 0;
1492         }
1493
1494         AliShuttleLogbookEntry* entry = new AliShuttleLogbookEntry(run);
1495
1496         for (Int_t ii = 0; ii < aResult->GetFieldCount(); ii++)
1497                 entry->SetRunParameter(aResult->GetFieldName(ii), aRow->GetField(ii));
1498
1499         UInt_t startTime = entry->GetStartTime();
1500         UInt_t endTime = entry->GetEndTime();
1501
1502         if (!startTime || !endTime || startTime > endTime) {
1503                 Log("SHUTTLE",
1504                         Form("QueryRunParameters - Invalid parameters for Run %d: startTime = %d, endTime = %d",
1505                                 run, startTime, endTime));
1506                 delete entry;
1507                 delete aRow;
1508                 delete aResult;
1509                 return 0;
1510         }
1511
1512         delete aRow;
1513         delete aResult;
1514
1515         return entry;
1516 }
1517
1518 //______________________________________________________________________________________________
1519 Bool_t AliShuttle::GetValueSet(const char* host, Int_t port, const char* entry,
1520                                 TObjArray* valueSet, DCSType type)
1521 {
1522         // Retrieve all "entry" data points from the DCS server
1523         // host, port: TSocket connection parameters
1524         // entry: name of the alias or data point
1525         // valueSet: array of retrieved AliDCSValue's
1526         // type: kAlias or kDP
1527
1528         AliDCSClient client(host, port, fTimeout, fRetries);
1529         if (!client.IsConnected())
1530         {
1531                 return kFALSE;
1532         }
1533
1534         Int_t result=0;
1535
1536         if (type == kAlias)
1537         {
1538                 result = client.GetAliasValues(entry,
1539                         GetCurrentStartTime(), GetCurrentEndTime(), valueSet);
1540         } else
1541         if (type == kDP)
1542         {
1543                 result = client.GetDPValues(entry,
1544                         GetCurrentStartTime(), GetCurrentEndTime(), valueSet);
1545         }
1546
1547         if (result < 0)
1548         {
1549                 Log(fCurrentDetector.Data(), Form("GetValueSet - Can't get '%s'! Reason: %s",
1550                         entry, AliDCSClient::GetErrorString(result)));
1551
1552                 if (result == AliDCSClient::fgkServerError)
1553                 {
1554                         Log(fCurrentDetector.Data(), Form("GetValueSet - Server error: %s",
1555                                 client.GetServerError().Data()));
1556                 }
1557
1558                 return kFALSE;
1559         }
1560
1561         return kTRUE;
1562 }
1563
1564 //______________________________________________________________________________________________
1565 const char* AliShuttle::GetFile(Int_t system, const char* detector,
1566                 const char* id, const char* source)
1567 {
1568         // Get calibration file from file exchange servers
1569         // First queris the FXS database for the file name, using the run, detector, id and source info
1570         // then calls RetrieveFile(filename) for actual copy to local disk
1571         // run: current run being processed (given by Logbook entry fLogbookEntry)
1572         // detector: the Preprocessor name
1573         // id: provided as a parameter by the Preprocessor
1574         // source: provided by the Preprocessor through GetFileSources function
1575
1576         // check if test mode should simulate a FXS error
1577         if (fTestMode & kErrorFXSFiles)
1578         {
1579                 Log(detector, Form("GetFile - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
1580                 return 0;
1581         }
1582         
1583         // check connection, in case connect
1584         if (!Connect(system))
1585         {
1586                 Log(detector, Form("GetFile - Couldn't connect to %s FXS database", GetSystemName(system)));
1587                 return 0;
1588         }
1589
1590         // Query preparation
1591         TString sourceName(source);
1592         Int_t nFields = 3;
1593         TString sqlQueryStart = Form("select filePath,size,fileChecksum from %s where",
1594                                                                 fConfig->GetFXSdbTable(system));
1595         TString whereClause = Form("run=%d and detector=\"%s\" and fileId=\"%s\"",
1596                                                                 GetCurrentRun(), detector, id);
1597
1598         if (system == kDAQ)
1599         {
1600                 whereClause += Form(" and DAQsource=\"%s\"", source);
1601         }
1602         else if (system == kDCS)
1603         {
1604                 sourceName="none";
1605         }
1606         else if (system == kHLT)
1607         {
1608                 whereClause += Form(" and DDLnumbers=\"%s\"", source);
1609                 nFields = 3;
1610         }
1611
1612         TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
1613
1614         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
1615
1616         // Query execution
1617         TSQLResult* aResult = 0;
1618         aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
1619         if (!aResult) {
1620                 Log(detector, Form("GetFileName - Can't execute SQL query to %s database for: id = %s, source = %s",
1621                                 GetSystemName(system), id, sourceName.Data()));
1622                 return 0;
1623         }
1624
1625         if(aResult->GetRowCount() == 0)
1626         {
1627                 Log(detector,
1628                         Form("GetFileName - No entry in %s FXS db for: id = %s, source = %s",
1629                                 GetSystemName(system), id, sourceName.Data()));
1630                 delete aResult;
1631                 return 0;
1632         }
1633
1634         if (aResult->GetRowCount() > 1) {
1635                 Log(detector,
1636                         Form("GetFileName - More than one entry in %s FXS db for: id = %s, source = %s",
1637                                 GetSystemName(system), id, sourceName.Data()));
1638                 delete aResult;
1639                 return 0;
1640         }
1641
1642         if (aResult->GetFieldCount() != nFields) {
1643                 Log(detector,
1644                         Form("GetFileName - Wrong field count in %s FXS db for: id = %s, source = %s",
1645                                 GetSystemName(system), id, sourceName.Data()));
1646                 delete aResult;
1647                 return 0;
1648         }
1649
1650         TSQLRow* aRow = dynamic_cast<TSQLRow*> (aResult->Next());
1651
1652         if (!aRow){
1653                 Log(detector, Form("GetFileName - Empty set result in %s FXS db from query: id = %s, source = %s",
1654                                 GetSystemName(system), id, sourceName.Data()));
1655                 delete aResult;
1656                 return 0;
1657         }
1658
1659         TString filePath(aRow->GetField(0), aRow->GetFieldLength(0));
1660         TString fileSize(aRow->GetField(1), aRow->GetFieldLength(1));
1661         TString fileChecksum(aRow->GetField(2), aRow->GetFieldLength(2));
1662
1663         delete aResult;
1664         delete aRow;
1665
1666         AliDebug(2, Form("filePath = %s; size = %s, fileChecksum = %s",
1667                                 filePath.Data(), fileSize.Data(), fileChecksum.Data()));
1668
1669         // retrieved file is renamed to make it unique
1670         TString localFileName = Form("%s_%s_%d_%s_%s.shuttle",
1671                                         GetSystemName(system), detector, GetCurrentRun(), id, sourceName.Data());
1672
1673
1674         // file retrieval from FXS
1675         UInt_t nRetries = 0;
1676         UInt_t maxRetries = 3;
1677         Bool_t result = kFALSE;
1678
1679         // copy!! if successful TSystem::Exec returns 0
1680         while(nRetries++ < maxRetries) {
1681                 AliDebug(2, Form("Trying to copy file. Retry # %d", nRetries));
1682                 result = RetrieveFile(system, filePath.Data(), localFileName.Data());
1683                 if(!result)
1684                 {
1685                         Log(detector, Form("GetFileName - Copy of file %s from %s FXS failed",
1686                                         filePath.Data(), GetSystemName(system)));
1687                         continue;
1688                 } else {
1689                         AliInfo(Form("File %s copied from %s FXS into %s/%s",
1690                                                 filePath.Data(), GetSystemName(system),
1691                                                 GetShuttleTempDir(), localFileName.Data()));
1692                 }
1693
1694                 if (fileChecksum.Length()>0)
1695                 {
1696                         // compare md5sum of local file with the one stored in the FXS DB
1697                         Int_t md5Comp = gSystem->Exec(Form("md5sum %s/%s |grep %s 2>&1 > /dev/null",
1698                                                 GetShuttleTempDir(), localFileName.Data(), fileChecksum.Data()));
1699
1700                         if (md5Comp != 0)
1701                         {
1702                                 Log(detector, Form("GetFileName - md5sum of file %s does not match with local copy!",
1703                                                         filePath.Data()));
1704                                 result = kFALSE;
1705                                 continue;
1706                         }
1707                 } else {
1708                         Log(fCurrentDetector, Form("GetFile - md5sum of file %s not set in %s database, skipping comparison",
1709                                                         filePath.Data(), GetSystemName(system)));
1710                 }
1711                 if (result) break;
1712         }
1713
1714         if(!result) return 0;
1715
1716         fFXSCalled[system]=kTRUE;
1717         TObjString *fileParams = new TObjString(Form("%s#!?!#%s", id, sourceName.Data()));
1718         fFXSlist[system].Add(fileParams);
1719
1720         static TString fullLocalFileName;
1721         fullLocalFileName = TString::Format("%s/%s", GetShuttleTempDir(), localFileName.Data());
1722
1723         AliInfo(Form("fullLocalFileName = %s", fullLocalFileName.Data()));
1724
1725         return fullLocalFileName.Data();
1726
1727 }
1728
1729 //______________________________________________________________________________________________
1730 Bool_t AliShuttle::RetrieveFile(UInt_t system, const char* fxsFileName, const char* localFileName)
1731 {
1732         //
1733         // Copies file from FXS to local Shuttle machine
1734         //
1735
1736         // check temp directory: trying to cd to temp; if it does not exist, create it
1737         AliDebug(2, Form("Copy file %s from %s FXS into %s/%s",
1738                         GetSystemName(system), fxsFileName, GetShuttleTempDir(), localFileName));
1739
1740         void* dir = gSystem->OpenDirectory(GetShuttleTempDir());
1741         if (dir == NULL) {
1742                 if (gSystem->mkdir(GetShuttleTempDir(), kTRUE)) {
1743                         AliError(Form("Can't open directory <%s>", GetShuttleTempDir()));
1744                         return kFALSE;
1745                 }
1746
1747         } else {
1748                 gSystem->FreeDirectory(dir);
1749         }
1750
1751         TString baseFXSFolder;
1752         if (system == kDAQ)
1753         {
1754                 baseFXSFolder = "FES/";
1755         }
1756         else if (system == kDCS)
1757         {
1758                 baseFXSFolder = "";
1759         }
1760         else if (system == kHLT)
1761         {
1762                 baseFXSFolder = "~/";
1763         }
1764
1765
1766         TString command = Form("scp -oPort=%d -2 %s@%s:%s%s %s/%s",
1767                 fConfig->GetFXSPort(system),
1768                 fConfig->GetFXSUser(system),
1769                 fConfig->GetFXSHost(system),
1770                 baseFXSFolder.Data(),
1771                 fxsFileName,
1772                 GetShuttleTempDir(),
1773                 localFileName);
1774
1775         AliDebug(2, Form("%s",command.Data()));
1776
1777         Bool_t result = (gSystem->Exec(command.Data()) == 0);
1778
1779         return result;
1780 }
1781
1782 //______________________________________________________________________________________________
1783 TList* AliShuttle::GetFileSources(Int_t system, const char* detector, const char* id)
1784 {
1785         //
1786         // Get sources producing the condition file Id from file exchange servers
1787         //
1788         
1789         // check if test mode should simulate a FXS error
1790         if (fTestMode & kErrorFXSSources)
1791         {
1792                 Log(detector, Form("GetFileSources - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
1793                 return 0;
1794         }
1795
1796
1797         if (system == kDCS)
1798         {
1799                 AliError("DCS system has only one source of data!");
1800                 return NULL;
1801         }
1802
1803         // check connection, in case connect
1804         if (!Connect(system))
1805         {
1806                 Log(detector, Form("GetFile - Couldn't connect to %s FXS database", GetSystemName(system)));
1807                 return NULL;
1808         }
1809
1810         TString sourceName = 0;
1811         if (system == kDAQ)
1812         {
1813                 sourceName = "DAQsource";
1814         } else if (system == kHLT)
1815         {
1816                 sourceName = "DDLnumbers";
1817         }
1818
1819         TString sqlQueryStart = Form("select %s from %s where", sourceName.Data(), fConfig->GetFXSdbTable(system));
1820         TString whereClause = Form("run=%d and detector=\"%s\" and fileId=\"%s\"",
1821                                 GetCurrentRun(), detector, id);
1822         TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
1823
1824         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
1825
1826         // Query execution
1827         TSQLResult* aResult;
1828         aResult = fServer[system]->Query(sqlQuery);
1829         if (!aResult) {
1830                 Log(detector, Form("GetFileSources - Can't execute SQL query to %s database for id: %s",
1831                                 GetSystemName(system), id));
1832                 return 0;
1833         }
1834
1835         if (aResult->GetRowCount() == 0)
1836         {
1837                 Log(detector,
1838                         Form("GetFileSources - No entry in %s FXS table for id: %s", GetSystemName(system), id));
1839                 delete aResult;
1840                 return 0;
1841         }
1842
1843         TSQLRow* aRow;
1844         TList *list = new TList();
1845         list->SetOwner(1);
1846
1847         while ((aRow = aResult->Next()))
1848         {
1849
1850                 TString source(aRow->GetField(0), aRow->GetFieldLength(0));
1851                 AliDebug(2, Form("%s = %s", sourceName.Data(), source.Data()));
1852                 list->Add(new TObjString(source));
1853                 delete aRow;
1854         }
1855
1856         delete aResult;
1857
1858         return list;
1859 }
1860
1861 //______________________________________________________________________________________________
1862 Bool_t AliShuttle::Connect(Int_t system)
1863 {
1864         // Connect to MySQL Server of the system's FXS MySQL databases
1865         // DAQ Logbook, Shuttle Logbook and DAQ FXS db are on the same host
1866         //
1867
1868         // check connection: if already connected return
1869         if(fServer[system] && fServer[system]->IsConnected()) return kTRUE;
1870
1871         TString dbHost, dbUser, dbPass, dbName;
1872
1873         if (system < 3) // FXS db servers
1874         {
1875                 dbHost = Form("mysql://%s:%d", fConfig->GetFXSdbHost(system), fConfig->GetFXSdbPort(system));
1876                 dbUser = fConfig->GetFXSdbUser(system);
1877                 dbPass = fConfig->GetFXSdbPass(system);
1878                 dbName =   fConfig->GetFXSdbName(system);
1879         } else { // Run & Shuttle logbook servers
1880         // TODO Will the Shuttle logbook server be the same as the Run logbook server ???
1881                 dbHost = Form("mysql://%s:%d", fConfig->GetDAQlbHost(), fConfig->GetDAQlbPort());
1882                 dbUser = fConfig->GetDAQlbUser();
1883                 dbPass = fConfig->GetDAQlbPass();
1884                 dbName =   fConfig->GetDAQlbDB();
1885         }
1886
1887         fServer[system] = TSQLServer::Connect(dbHost.Data(), dbUser.Data(), dbPass.Data());
1888         if (!fServer[system] || !fServer[system]->IsConnected()) {
1889                 if(system < 3)
1890                 {
1891                 AliError(Form("Can't establish connection to FXS database for %s",
1892                                         AliShuttleInterface::GetSystemName(system)));
1893                 } else {
1894                 AliError("Can't establish connection to Run logbook.");
1895                 }
1896                 if(fServer[system]) delete fServer[system];
1897                 return kFALSE;
1898         }
1899
1900         // Get tables
1901         TSQLResult* aResult=0;
1902         switch(system){
1903                 case kDAQ:
1904                         aResult = fServer[kDAQ]->GetTables(dbName.Data());
1905                         break;
1906                 case kDCS:
1907                         aResult = fServer[kDCS]->GetTables(dbName.Data());
1908                         break;
1909                 case kHLT:
1910                         aResult = fServer[kHLT]->GetTables(dbName.Data());
1911                         break;
1912                 default:
1913                         aResult = fServer[3]->GetTables(dbName.Data());
1914                         break;
1915         }
1916
1917         delete aResult;
1918         return kTRUE;
1919 }
1920
1921 //______________________________________________________________________________________________
1922 Bool_t AliShuttle::UpdateTable()
1923 {
1924         //
1925         // Update FXS table filling time_processed field in all rows corresponding to current run and detector
1926         //
1927
1928         Bool_t result = kTRUE;
1929
1930         for (UInt_t system=0; system<3; system++)
1931         {
1932                 if(!fFXSCalled[system]) continue;
1933
1934                 // check connection, in case connect
1935                 if (!Connect(system))
1936                 {
1937                         Log(fCurrentDetector, Form("UpdateTable - Couldn't connect to %s FXS database", GetSystemName(system)));
1938                         result = kFALSE;
1939                         continue;
1940                 }
1941
1942                 TTimeStamp now; // now
1943
1944                 // Loop on FXS list entries
1945                 TIter iter(&fFXSlist[system]);
1946                 TObjString *aFXSentry=0;
1947                 while ((aFXSentry = dynamic_cast<TObjString*> (iter.Next())))
1948                 {
1949                         TString aFXSentrystr = aFXSentry->String();
1950                         TObjArray *aFXSarray = aFXSentrystr.Tokenize("#!?!#");
1951                         if (!aFXSarray || aFXSarray->GetEntries() != 2 )
1952                         {
1953                                 Log(fCurrentDetector, Form("UpdateTable - error updating %s FXS entry. Check string: <%s>",
1954                                         GetSystemName(system), aFXSentrystr.Data()));
1955                                 if(aFXSarray) delete aFXSarray;
1956                                 result = kFALSE;
1957                                 continue;
1958                         }
1959                         const char* fileId = ((TObjString*) aFXSarray->At(0))->GetName();
1960                         const char* source = ((TObjString*) aFXSarray->At(1))->GetName();
1961
1962                         TString whereClause;
1963                         if (system == kDAQ)
1964                         {
1965                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\" and DAQsource=\"%s\";",
1966                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId, source);
1967                         }
1968                         else if (system == kDCS)
1969                         {
1970                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\";",
1971                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId);
1972                         }
1973                         else if (system == kHLT)
1974                         {
1975                                 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\" and DDLnumbers=\"%s\";",
1976                                                         GetCurrentRun(), fCurrentDetector.Data(), fileId, source);
1977                         }
1978
1979                         delete aFXSarray;
1980
1981                         TString sqlQuery = Form("update %s set time_processed=%d %s", fConfig->GetFXSdbTable(system),
1982                                                                 now.GetSec(), whereClause.Data());
1983
1984                         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
1985
1986                         // Query execution
1987                         TSQLResult* aResult;
1988                         aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
1989                         if (!aResult)
1990                         {
1991                                 Log(fCurrentDetector, Form("UpdateTable - %s db: can't execute SQL query <%s>",
1992                                                                 GetSystemName(system), sqlQuery.Data()));
1993                                 result = kFALSE;
1994                                 continue;
1995                         }
1996                         delete aResult;
1997                 }
1998         }
1999
2000         return result;
2001 }
2002
2003 //______________________________________________________________________________________________
2004 Bool_t AliShuttle::UpdateTableFailCase()
2005 {
2006         // Update FXS table filling time_processed field in all rows corresponding to current run and detector
2007         // this is called in case the preprocessor is declared failed for the current run, because
2008         // the fields are updated only in case of success
2009
2010         Bool_t result = kTRUE;
2011
2012         for (UInt_t system=0; system<3; system++)
2013         {
2014                 // check connection, in case connect
2015                 if (!Connect(system))
2016                 {
2017                         Log(fCurrentDetector, Form("UpdateTableFailCase - Couldn't connect to %s FXS database",
2018                                                         GetSystemName(system)));
2019                         result = kFALSE;
2020                         continue;
2021                 }
2022
2023                 TTimeStamp now; // now
2024
2025                 // Loop on FXS list entries
2026
2027                 TString whereClause = Form("where run=%d and detector=\"%s\";",
2028                                                 GetCurrentRun(), fCurrentDetector.Data());
2029
2030
2031                 TString sqlQuery = Form("update %s set time_processed=%d %s", fConfig->GetFXSdbTable(system),
2032                                                         now.GetSec(), whereClause.Data());
2033
2034                 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2035
2036                 // Query execution
2037                 TSQLResult* aResult;
2038                 aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2039                 if (!aResult)
2040                 {
2041                         Log(fCurrentDetector, Form("UpdateTableFailCase - %s db: can't execute SQL query <%s>",
2042                                                         GetSystemName(system), sqlQuery.Data()));
2043                         result = kFALSE;
2044                         continue;
2045                 }
2046                 delete aResult;
2047         }
2048
2049         return result;
2050 }
2051
2052 //______________________________________________________________________________________________
2053 Bool_t AliShuttle::UpdateShuttleLogbook(const char* detector, const char* status)
2054 {
2055         //
2056         // Update Shuttle logbook filling detector or shuttle_done column
2057         // ex. of usage: UpdateShuttleLogbook("PHOS", "DONE") or UpdateShuttleLogbook("shuttle_done")
2058         //
2059
2060         // check connection, in case connect
2061         if(!Connect(3)){
2062                 Log("SHUTTLE", "UpdateShuttleLogbook - Couldn't connect to DAQ Logbook.");
2063                 return kFALSE;
2064         }
2065
2066         TString detName(detector);
2067         TString setClause;
2068         if(detName == "shuttle_done")
2069         {
2070                 setClause = "set shuttle_done=1";
2071
2072                 // Send the information to ML
2073                 TMonaLisaText  mlStatus("SHUTTLE_status", "Done");
2074
2075                 TList mlList;
2076                 mlList.Add(&mlStatus);
2077
2078                 fMonaLisa->SendParameters(&mlList);
2079         } else {
2080                 TString statusStr(status);
2081                 if(statusStr.Contains("done", TString::kIgnoreCase) ||
2082                    statusStr.Contains("failed", TString::kIgnoreCase)){
2083                         setClause = Form("set %s=\"%s\"", detector, status);
2084                 } else {
2085                         Log("SHUTTLE",
2086                                 Form("UpdateShuttleLogbook - Invalid status <%s> for detector %s",
2087                                         status, detector));
2088                         return kFALSE;
2089                 }
2090         }
2091
2092         TString whereClause = Form("where run=%d", GetCurrentRun());
2093
2094         TString sqlQuery = Form("update %s %s %s",
2095                                         fConfig->GetShuttlelbTable(), setClause.Data(), whereClause.Data());
2096
2097         AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2098
2099         // Query execution
2100         TSQLResult* aResult;
2101         aResult = dynamic_cast<TSQLResult*> (fServer[3]->Query(sqlQuery));
2102         if (!aResult) {
2103                 Log("SHUTTLE", Form("UpdateShuttleLogbook - Can't execute query <%s>", sqlQuery.Data()));
2104                 return kFALSE;
2105         }
2106         delete aResult;
2107
2108         return kTRUE;
2109 }
2110
2111 //______________________________________________________________________________________________
2112 Int_t AliShuttle::GetCurrentRun() const
2113 {
2114         //
2115         // Get current run from logbook entry
2116         //
2117
2118         return fLogbookEntry ? fLogbookEntry->GetRun() : -1;
2119 }
2120
2121 //______________________________________________________________________________________________
2122 UInt_t AliShuttle::GetCurrentStartTime() const
2123 {
2124         //
2125         // get current start time
2126         //
2127
2128         return fLogbookEntry ? fLogbookEntry->GetStartTime() : 0;
2129 }
2130
2131 //______________________________________________________________________________________________
2132 UInt_t AliShuttle::GetCurrentEndTime() const
2133 {
2134         //
2135         // get current end time from logbook entry
2136         //
2137
2138         return fLogbookEntry ? fLogbookEntry->GetEndTime() : 0;
2139 }
2140
2141 //______________________________________________________________________________________________
2142 void AliShuttle::Log(const char* detector, const char* message)
2143 {
2144         //
2145         // Fill log string with a message
2146         //
2147
2148         void* dir = gSystem->OpenDirectory(GetShuttleLogDir());
2149         if (dir == NULL) {
2150                 if (gSystem->mkdir(GetShuttleLogDir(), kTRUE)) {
2151                         AliError(Form("Can't open directory <%s>", GetShuttleLogDir()));
2152                         return;
2153                 }
2154
2155         } else {
2156                 gSystem->FreeDirectory(dir);
2157         }
2158
2159         TString toLog = Form("%s (%d): %s - ", TTimeStamp(time(0)).AsString("s"), getpid(), detector);
2160         if (GetCurrentRun() >= 0) 
2161                 toLog += Form("run %d - ", GetCurrentRun());
2162         toLog += Form("%s", message);
2163
2164         AliInfo(toLog.Data());
2165
2166         TString fileName;
2167         if (GetCurrentRun() >= 0) 
2168                 fileName.Form("%s/%s_%d.log", GetShuttleLogDir(), detector, GetCurrentRun());
2169         else
2170                 fileName.Form("%s/%s.log", GetShuttleLogDir(), detector);
2171         
2172         gSystem->ExpandPathName(fileName);
2173
2174         ofstream logFile;
2175         logFile.open(fileName, ofstream::out | ofstream::app);
2176
2177         if (!logFile.is_open()) {
2178                 AliError(Form("Could not open file %s", fileName.Data()));
2179                 return;
2180         }
2181
2182         logFile << toLog.Data() << "\n";
2183
2184         logFile.close();
2185 }
2186
2187 //______________________________________________________________________________________________
2188 Bool_t AliShuttle::Collect(Int_t run)
2189 {
2190         //
2191         // Collects conditions data for all UNPROCESSED run written to DAQ LogBook in case of run = -1 (default)
2192         // If a dedicated run is given this run is processed
2193         //
2194         // In operational mode, this is the Shuttle function triggered by the EOR signal.
2195         //
2196
2197         if (run == -1)
2198                 Log("SHUTTLE","Collect - Shuttle called. Collecting conditions data for unprocessed runs");
2199         else
2200                 Log("SHUTTLE", Form("Collect - Shuttle called. Collecting conditions data for run %d", run));
2201
2202         SetLastAction("Starting");
2203
2204         TString whereClause("where shuttle_done=0");
2205         if (run != -1)
2206                 whereClause += Form(" and run=%d", run);
2207
2208         TObjArray shuttleLogbookEntries;
2209         if (!QueryShuttleLogbook(whereClause, shuttleLogbookEntries))
2210         {
2211                 Log("SHUTTLE", "Collect - Can't retrieve entries from Shuttle logbook");
2212                 return kFALSE;
2213         }
2214
2215         if (shuttleLogbookEntries.GetEntries() == 0)
2216         {
2217                 if (run == -1)
2218                         Log("SHUTTLE","Collect - Found no UNPROCESSED runs in Shuttle logbook");
2219                 else
2220                         Log("SHUTTLE", Form("Collect - Run %d is already DONE "
2221                                                 "or it does not exist in Shuttle logbook", run));
2222                 return kTRUE;
2223         }
2224
2225         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
2226                 fFirstUnprocessed[iDet] = kTRUE;
2227
2228         if (run != -1)
2229         {
2230                 // query Shuttle logbook for earlier runs, check if some detectors are unprocessed,
2231                 // flag them into fFirstUnprocessed array
2232                 TString whereClause(Form("where shuttle_done=0 and run < %d", run));
2233                 TObjArray tmpLogbookEntries;
2234                 if (!QueryShuttleLogbook(whereClause, tmpLogbookEntries))
2235                 {
2236                         Log("SHUTTLE", "Collect - Can't retrieve entries from Shuttle logbook");
2237                         return kFALSE;
2238                 }
2239
2240                 TIter iter(&tmpLogbookEntries);
2241                 AliShuttleLogbookEntry* anEntry = 0;
2242                 while ((anEntry = dynamic_cast<AliShuttleLogbookEntry*> (iter.Next())))
2243                 {
2244                         for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
2245                         {
2246                                 if (anEntry->GetDetectorStatus(iDet) == AliShuttleLogbookEntry::kUnprocessed)
2247                                 {
2248                                         AliDebug(2, Form("Run %d: setting %s as \"not first time unprocessed\"",
2249                                                         anEntry->GetRun(), GetDetName(iDet)));
2250                                         fFirstUnprocessed[iDet] = kFALSE;
2251                                 }
2252                         }
2253
2254                 }
2255
2256         }
2257
2258         if (!RetrieveConditionsData(shuttleLogbookEntries))
2259         {
2260                 Log("SHUTTLE", "Collect - Process of at least one run failed");
2261                 return kFALSE;
2262         }
2263
2264         Log("SHUTTLE", "Collect - Requested run(s) successfully processed");
2265         return kTRUE;
2266 }
2267
2268 //______________________________________________________________________________________________
2269 Bool_t AliShuttle::RetrieveConditionsData(const TObjArray& dateEntries)
2270 {
2271         //
2272         // Retrieve conditions data for all runs that aren't processed yet
2273         //
2274
2275         Bool_t hasError = kFALSE;
2276
2277         TIter iter(&dateEntries);
2278         AliShuttleLogbookEntry* anEntry;
2279
2280         while ((anEntry = (AliShuttleLogbookEntry*) iter.Next())){
2281                 if (!Process(anEntry)){
2282                         hasError = kTRUE;
2283                 }
2284
2285                 // clean SHUTTLE temp directory
2286                 TString filename = Form("%s/*.shuttle", GetShuttleTempDir());
2287                 RemoveFile(filename.Data());
2288         }
2289
2290         return hasError == kFALSE;
2291 }
2292
2293 //______________________________________________________________________________________________
2294 ULong_t AliShuttle::GetTimeOfLastAction() const
2295 {
2296         //
2297         // Gets time of last action
2298         //
2299
2300         ULong_t tmp;
2301
2302         fMonitoringMutex->Lock();
2303
2304         tmp = fLastActionTime;
2305
2306         fMonitoringMutex->UnLock();
2307
2308         return tmp;
2309 }
2310
2311 //______________________________________________________________________________________________
2312 const TString AliShuttle::GetLastAction() const
2313 {
2314         //
2315         // returns a string description of the last action
2316         //
2317
2318         TString tmp;
2319
2320         fMonitoringMutex->Lock();
2321         
2322         tmp = fLastAction;
2323         
2324         fMonitoringMutex->UnLock();
2325
2326         return tmp;
2327 }
2328
2329 //______________________________________________________________________________________________
2330 void AliShuttle::SetLastAction(const char* action)
2331 {
2332         //
2333         // updates the monitoring variables
2334         //
2335
2336         fMonitoringMutex->Lock();
2337
2338         fLastAction = action;
2339         fLastActionTime = time(0);
2340         
2341         fMonitoringMutex->UnLock();
2342 }
2343
2344 //______________________________________________________________________________________________
2345 const char* AliShuttle::GetRunParameter(const char* param)
2346 {
2347         //
2348         // returns run parameter read from DAQ logbook
2349         //
2350
2351         if(!fLogbookEntry) {
2352                 AliError("No logbook entry!");
2353                 return 0;
2354         }
2355
2356         return fLogbookEntry->GetRunParameter(param);
2357 }
2358
2359 //______________________________________________________________________________________________
2360 AliCDBEntry* AliShuttle::GetFromOCDB(const char* detector, const AliCDBPath& path)
2361 {
2362         //
2363         // returns object from OCDB valid for current run
2364         //
2365
2366         if (fTestMode & kErrorOCDB)
2367         {
2368                 Log(detector, "GetFromOCDB - In TESTMODE - Simulating error with OCDB");
2369                 return 0;
2370         }
2371         
2372         AliCDBStorage *sto = AliCDBManager::Instance()->GetStorage(fgkMainCDB);
2373         if (!sto)
2374         {
2375                 Log(detector, "GetFromOCDB - Cannot activate main OCDB for query!");
2376                 return 0;
2377         }
2378
2379         return dynamic_cast<AliCDBEntry*> (sto->Get(path, GetCurrentRun()));
2380 }
2381
2382 //______________________________________________________________________________________________
2383 Bool_t AliShuttle::SendMail()
2384 {
2385         //
2386         // sends a mail to the subdetector expert in case of preprocessor error
2387         //
2388         
2389         if (fTestMode != kNone)
2390                 return kTRUE;
2391
2392         void* dir = gSystem->OpenDirectory(GetShuttleLogDir());
2393         if (dir == NULL)
2394         {
2395                 if (gSystem->mkdir(GetShuttleLogDir(), kTRUE))
2396                 {
2397                         AliError(Form("Can't open directory <%s>", GetShuttleLogDir()));
2398                         return kFALSE;
2399                 }
2400
2401         } else {
2402                 gSystem->FreeDirectory(dir);
2403         }
2404
2405         TString bodyFileName;
2406         bodyFileName.Form("%s/mail.body", GetShuttleLogDir());
2407         gSystem->ExpandPathName(bodyFileName);
2408
2409         ofstream mailBody;
2410         mailBody.open(bodyFileName, ofstream::out);
2411
2412         if (!mailBody.is_open())
2413         {
2414                 AliError(Form("Could not open mail body file %s", bodyFileName.Data()));
2415                 return kFALSE;
2416         }
2417
2418         TString to="";
2419         TIter iterExperts(fConfig->GetResponsibles(fCurrentDetector));
2420         TObjString *anExpert=0;
2421         while ((anExpert = (TObjString*) iterExperts.Next()))
2422         {
2423                 to += Form("%s,", anExpert->GetName());
2424         }
2425         to.Remove(to.Length()-1);
2426         AliDebug(2, Form("to: %s",to.Data()));
2427
2428         // TODO this will be removed...
2429         if (to.Contains("not_yet_set")) {
2430                 AliInfo("List of detector responsibles not yet set!");
2431                 return kFALSE;
2432         }
2433
2434         TString cc="alberto.colla@cern.ch";
2435
2436         TString subject = Form("%s Shuttle preprocessor error in run %d !",
2437                                 fCurrentDetector.Data(), GetCurrentRun());
2438         AliDebug(2, Form("subject: %s", subject.Data()));
2439
2440         TString body = Form("Dear %s expert(s), \n\n", fCurrentDetector.Data());
2441         body += Form("SHUTTLE just detected that your preprocessor "
2442                         "exited with ERROR state in run %d!!\n\n", GetCurrentRun());
2443         body += Form("Please check %s status on the web page asap!\n\n", fCurrentDetector.Data());
2444         body += Form("The last 10 lines of %s log file are following:\n\n");
2445
2446         AliDebug(2, Form("Body begin: %s", body.Data()));
2447
2448         mailBody << body.Data();
2449         mailBody.close();
2450         mailBody.open(bodyFileName, ofstream::out | ofstream::app);
2451
2452         TString logFileName = Form("%s/%s_%d.log", GetShuttleLogDir(), fCurrentDetector.Data(), GetCurrentRun());
2453         TString tailCommand = Form("tail -n 10 %s >> %s", logFileName.Data(), bodyFileName.Data());
2454         if (gSystem->Exec(tailCommand.Data()))
2455         {
2456                 mailBody << Form("%s log file not found ...\n\n", fCurrentDetector.Data());
2457         }
2458
2459         TString endBody = Form("------------------------------------------------------\n\n");
2460         endBody += Form("In case of problems please contact the SHUTTLE core team.\n\n");
2461         endBody += "Please do not answer this message directly, it is automatically generated.\n\n";
2462         endBody += "Sincerely yours,\n\n \t\t\tthe SHUTTLE\n";
2463
2464         AliDebug(2, Form("Body end: %s", endBody.Data()));
2465
2466         mailBody << endBody.Data();
2467
2468         mailBody.close();
2469
2470         // send mail!
2471         TString mailCommand = Form("mail -s \"%s\" -c %s %s < %s",
2472                                                 subject.Data(),
2473                                                 cc.Data(),
2474                                                 to.Data(),
2475                                                 bodyFileName.Data());
2476         AliDebug(2, Form("mail command: %s", mailCommand.Data()));
2477
2478         Bool_t result = gSystem->Exec(mailCommand.Data());
2479
2480         return result == 0;
2481 }
2482
2483 //______________________________________________________________________________________________
2484 const char* AliShuttle::GetRunType()
2485 {
2486         //
2487         // returns run type read from "run type" logbook
2488         //
2489
2490         if(!fLogbookEntry) {
2491                 AliError("No logbook entry!");
2492                 return 0;
2493         }
2494
2495         return fLogbookEntry->GetRunType();
2496 }
2497
2498 //______________________________________________________________________________________________
2499 void AliShuttle::SetShuttleTempDir(const char* tmpDir)
2500 {
2501         //
2502         // sets Shuttle temp directory
2503         //
2504
2505         fgkShuttleTempDir = gSystem->ExpandPathName(tmpDir);
2506 }
2507
2508 //______________________________________________________________________________________________
2509 void AliShuttle::SetShuttleLogDir(const char* logDir)
2510 {
2511         //
2512         // sets Shuttle log directory
2513         //
2514
2515         fgkShuttleLogDir = gSystem->ExpandPathName(logDir);
2516 }