]> git.uio.no Git - u/mrichter/AliRoot.git/blame_incremental - SHUTTLE/AliShuttle.cxx
New function StoreRunMetadataFile added to preprocessor and Shuttle interface
[u/mrichter/AliRoot.git] / SHUTTLE / AliShuttle.cxx
... / ...
CommitLineData
1/**************************************************************************
2 * Copyright(c) 1998-1999, ALICE Experiment at CERN, All rights reserved. *
3 * *
4 * Author: The ALICE Off-line Project. *
5 * Contributors are mentioned in the code where appropriate. *
6 * *
7 * Permission to use, copy, modify and distribute this software and its *
8 * documentation strictly for non-commercial purposes is hereby granted *
9 * without fee, provided that the above copyright notice appears in all *
10 * copies and that both the copyright notice and this permission notice *
11 * appear in the supporting documentation. The authors make no claims *
12 * about the suitability of this software for any purpose. It is *
13 * provided "as is" without express or implied warranty. *
14 **************************************************************************/
15
16/*
17$Log$
18Revision 1.58 2007/09/28 15:27:40 acolla
19
20AliDCSClient "multiSplit" option added in the DCS configuration
21in AliDCSMessage: variable MAX_BODY_SIZE set to 500000
22
23Revision 1.57 2007/09/27 16:53:13 acolla
24Detectors can have more than one AMANDA server. SHUTTLE queries the servers sequentially,
25merges the dcs aliases/DPs in one TMap and sends it to the preprocessor.
26
27Revision 1.56 2007/09/14 16:46:14 jgrosseo
281) Connect and Close are called before and after each query, so one can
29keep the same AliDCSClient object.
302) The splitting of a query is moved to GetDPValues/GetAliasValues.
313) Splitting interval can be specified in constructor
32
33Revision 1.55 2007/08/06 12:26:40 acolla
34Function Bool_t GetHLTStatus added to preprocessor. It returns the status of HLT
35read from the run logbook.
36
37Revision 1.54 2007/07/12 09:51:25 jgrosseo
38removed duplicated log message in GetFile
39
40Revision 1.53 2007/07/12 09:26:28 jgrosseo
41updating hlt fxs base path
42
43Revision 1.52 2007/07/12 08:06:45 jgrosseo
44adding log messages in getfile... functions
45adding not implemented copy constructor in alishuttleconfigholder
46
47Revision 1.51 2007/07/03 17:24:52 acolla
48root moved to v5-16-00. TFileMerger->Cp moved to TFile::Cp.
49
50Revision 1.50 2007/07/02 17:19:32 acolla
51preprocessor is run in a temp directory that is removed when process is finished.
52
53Revision 1.49 2007/06/29 10:45:06 acolla
54Number of columns in MySql Shuttle logbook increased by one (HLT added)
55
56Revision 1.48 2007/06/21 13:06:19 acolla
57GetFileSources returns dummy list with 1 source if system=DCS (better than
58returning error as it was)
59
60Revision 1.47 2007/06/19 17:28:56 acolla
61HLT updated; missing map bug removed.
62
63Revision 1.46 2007/06/09 13:01:09 jgrosseo
64Switching to retrieval of several DCS DPs at a time (multiDPrequest)
65
66Revision 1.45 2007/05/30 06:35:20 jgrosseo
67Adding functionality to the Shuttle/TestShuttle:
68o) Function to retrieve list of sources from a given system (GetFileSources with id=0)
69o) Function to retrieve list of IDs for a given source (GetFileIDs)
70These functions are needed for dealing with the tag files that are saved for the GRP preprocessor
71Example code has been added to the TestProcessor in TestShuttle
72
73Revision 1.44 2007/05/11 16:09:32 acolla
74Reference files for ITS, MUON and PHOS are now stored in OfflineDetName/OnlineDetName/run_...
75example: ITS/SPD/100_filename.root
76
77Revision 1.43 2007/05/10 09:59:51 acolla
78Various bug fixes in StoreRefFilesToGrid; Cleaning of reference storage before processing detector (CleanReferenceStorage)
79
80Revision 1.42 2007/05/03 08:01:39 jgrosseo
81typo in last commit :-(
82
83Revision 1.41 2007/05/03 08:00:48 jgrosseo
84fixing log message when pp want to skip dcs value retrieval
85
86Revision 1.40 2007/04/27 07:06:48 jgrosseo
87GetFileSources returns empty list in case of no files, but successful query
88No mails sent in testmode
89
90Revision 1.39 2007/04/17 12:43:57 acolla
91Correction in StoreOCDB; change of text in mail to detector expert
92
93Revision 1.38 2007/04/12 08:26:18 jgrosseo
94updated comment
95
96Revision 1.37 2007/04/10 16:53:14 jgrosseo
97redirecting sub detector stdout, stderr to sub detector log file
98
99Revision 1.35 2007/04/04 16:26:38 acolla
1001. Re-organization of function calls in TestPreprocessor to make it more meaningful.
1012. Added missing dependency in test preprocessors.
1023. in AliShuttle.cxx: processing time and memory consumption info on a single line.
103
104Revision 1.34 2007/04/04 10:33:36 jgrosseo
1051) Storing of files to the Grid is now done _after_ your preprocessors succeeded. This is transparent, which means that you can still use the same functions (Store, StoreReferenceData) to store files to the Grid. However, the Shuttle first stores them locally and transfers them after the preprocessor finished. The return code of these two functions has changed from UInt_t to Bool_t which gives you the success of the storing.
106In case of an error with the Grid, the Shuttle will retry the storing later, the preprocessor does not need to be run again.
107
1082) The meaning of the return code of the preprocessor has changed. 0 is now success and any other value means failure. This value is stored in the log and you can use it to keep details about the error condition.
109
1103) New function StoreReferenceFile to _directly_ store a file (without opening it) to the reference storage.
111
1124) The memory usage of the preprocessor is monitored. If it exceeds 2 GB it is terminated.
113
1145) New function AliPreprocessor::ProcessDCS(). If you do not need to have DCS data in all cases, you can skip the processing by implemting this function and returning kFALSE under certain conditions. E.g. if there is a certain run type.
115If you always need DCS data (like before), you do not need to implement it.
116
1176) The run type has been added to the monitoring page
118
119Revision 1.33 2007/04/03 13:56:01 acolla
120Grid Storage at the end of preprocessing. Added virtual method to disable DCS query according to the
121run type.
122
123Revision 1.32 2007/02/28 10:41:56 acolla
124Run type field added in SHUTTLE framework. Run type is read from "run type" logbook and retrieved by
125AliPreprocessor::GetRunType() function.
126Added some ldap definition files.
127
128Revision 1.30 2007/02/13 11:23:21 acolla
129Moved getters and setters of Shuttle's main OCDB/Reference, local
130OCDB/Reference, temp and log folders to AliShuttleInterface
131
132Revision 1.27 2007/01/30 17:52:42 jgrosseo
133adding monalisa monitoring
134
135Revision 1.26 2007/01/23 19:20:03 acolla
136Removed old ldif files, added TOF, MCH ldif files. Added some options in
137AliShuttleConfig::Print. Added in Ali Shuttle: SetShuttleTempDir and
138SetShuttleLogDir
139
140Revision 1.25 2007/01/15 19:13:52 acolla
141Moved some AliInfo to AliDebug in SendMail function
142
143Revision 1.21 2006/12/07 08:51:26 jgrosseo
144update (alberto):
145table, db names in ldap configuration
146added GRP preprocessor
147DCS data can also be retrieved by data point
148
149Revision 1.20 2006/11/16 16:16:48 jgrosseo
150introducing strict run ordering flag
151removed giving preprocessor name to preprocessor, they have to know their name themselves ;-)
152
153Revision 1.19 2006/11/06 14:23:04 jgrosseo
154major update (Alberto)
155o) reading of run parameters from the logbook
156o) online offline naming conversion
157o) standalone DCSclient package
158
159Revision 1.18 2006/10/20 15:22:59 jgrosseo
160o) Adding time out to the execution of the preprocessors: The Shuttle forks and the parent process monitors the child
161o) Merging Collect, CollectAll, CollectNew function
162o) Removing implementation of empty copy constructors (declaration still there!)
163
164Revision 1.17 2006/10/05 16:20:55 jgrosseo
165adapting to new CDB classes
166
167Revision 1.16 2006/10/05 15:46:26 jgrosseo
168applying to the new interface
169
170Revision 1.15 2006/10/02 16:38:39 jgrosseo
171update (alberto):
172fixed memory leaks
173storing of objects that failed to be stored to the grid before
174interfacing of shuttle status table in daq system
175
176Revision 1.14 2006/08/29 09:16:05 jgrosseo
177small update
178
179Revision 1.13 2006/08/15 10:50:00 jgrosseo
180effc++ corrections (alberto)
181
182Revision 1.12 2006/08/08 14:19:29 jgrosseo
183Update to shuttle classes (Alberto)
184
185- Possibility to set the full object's path in the Preprocessor's and
186Shuttle's Store functions
187- Possibility to extend the object's run validity in the same classes
188("startValidity" and "validityInfinite" parameters)
189- Implementation of the StoreReferenceData function to store reference
190data in a dedicated CDB storage.
191
192Revision 1.11 2006/07/21 07:37:20 jgrosseo
193last run is stored after each run
194
195Revision 1.10 2006/07/20 09:54:40 jgrosseo
196introducing status management: The processing per subdetector is divided into several steps,
197after each step the status is stored on disk. If the system crashes in any of the steps the Shuttle
198can keep track of the number of failures and skips further processing after a certain threshold is
199exceeded. These thresholds can be configured in LDAP.
200
201Revision 1.9 2006/07/19 10:09:55 jgrosseo
202new configuration, accesst to DAQ FES (Alberto)
203
204Revision 1.8 2006/07/11 12:44:36 jgrosseo
205adding parameters for extended validity range of data produced by preprocessor
206
207Revision 1.7 2006/07/10 14:37:09 jgrosseo
208small fix + todo comment
209
210Revision 1.6 2006/07/10 13:01:41 jgrosseo
211enhanced storing of last sucessfully processed run (alberto)
212
213Revision 1.5 2006/07/04 14:59:57 jgrosseo
214revision of AliDCSValue: Removed wrapper classes, reduced storage size per value by factor 2
215
216Revision 1.4 2006/06/12 09:11:16 jgrosseo
217coding conventions (Alberto)
218
219Revision 1.3 2006/06/06 14:26:40 jgrosseo
220o) removed files that were moved to STEER
221o) shuttle updated to follow the new interface (Alberto)
222
223Revision 1.2 2006/03/07 07:52:34 hristov
224New version (B.Yordanov)
225
226Revision 1.6 2005/11/19 17:19:14 byordano
227RetrieveDATEEntries and RetrieveConditionsData added
228
229Revision 1.5 2005/11/19 11:09:27 byordano
230AliShuttle declaration added
231
232Revision 1.4 2005/11/17 17:47:34 byordano
233TList changed to TObjArray
234
235Revision 1.3 2005/11/17 14:43:23 byordano
236import to local CVS
237
238Revision 1.1.1.1 2005/10/28 07:33:58 hristov
239Initial import as subdirectory in AliRoot
240
241Revision 1.2 2005/09/13 08:41:15 byordano
242default startTime endTime added
243
244Revision 1.4 2005/08/30 09:13:02 byordano
245some docs added
246
247Revision 1.3 2005/08/29 21:15:47 byordano
248some docs added
249
250*/
251
252//
253// This class is the main manager for AliShuttle.
254// It organizes the data retrieval from DCS and call the
255// interface methods of AliPreprocessor.
256// For every detector in AliShuttleConfgi (see AliShuttleConfig),
257// data for its set of aliases is retrieved. If there is registered
258// AliPreprocessor for this detector then it will be used
259// accroding to the schema (see AliPreprocessor).
260// If there isn't registered AliPreprocessor than the retrieved
261// data is stored automatically to the undelying AliCDBStorage.
262// For detSpec is used the alias name.
263//
264
265#include "AliShuttle.h"
266
267#include "AliCDBManager.h"
268#include "AliCDBStorage.h"
269#include "AliCDBId.h"
270#include "AliCDBRunRange.h"
271#include "AliCDBPath.h"
272#include "AliCDBEntry.h"
273#include "AliShuttleConfig.h"
274#include "DCSClient/AliDCSClient.h"
275#include "AliLog.h"
276#include "AliPreprocessor.h"
277#include "AliShuttleStatus.h"
278#include "AliShuttleLogbookEntry.h"
279
280#include <TSystem.h>
281#include <TObject.h>
282#include <TString.h>
283#include <TTimeStamp.h>
284#include <TObjString.h>
285#include <TSQLServer.h>
286#include <TSQLResult.h>
287#include <TSQLRow.h>
288#include <TMutex.h>
289#include <TSystemDirectory.h>
290#include <TSystemFile.h>
291#include <TFile.h>
292#include <TGrid.h>
293#include <TGridResult.h>
294
295#include <TMonaLisaWriter.h>
296
297#include <fstream>
298
299#include <sys/types.h>
300#include <sys/wait.h>
301
302ClassImp(AliShuttle)
303
304//______________________________________________________________________________________________
305AliShuttle::AliShuttle(const AliShuttleConfig* config,
306 UInt_t timeout, Int_t retries):
307fConfig(config),
308fTimeout(timeout), fRetries(retries),
309fPreprocessorMap(),
310fLogbookEntry(0),
311fCurrentDetector(),
312fStatusEntry(0),
313fMonitoringMutex(0),
314fLastActionTime(0),
315fLastAction(),
316fMonaLisa(0),
317fTestMode(kNone),
318fReadTestMode(kFALSE),
319fOutputRedirected(kFALSE)
320{
321 //
322 // config: AliShuttleConfig used
323 // timeout: timeout used for AliDCSClient connection
324 // retries: the number of retries in case of connection error.
325 //
326
327 if (!fConfig->IsValid()) AliFatal("********** !!!!! Invalid configuration !!!!! **********");
328 for(int iSys=0;iSys<4;iSys++) {
329 fServer[iSys]=0;
330 if (iSys < 3)
331 fFXSlist[iSys].SetOwner(kTRUE);
332 }
333 fPreprocessorMap.SetOwner(kTRUE);
334
335 for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
336 fFirstUnprocessed[iDet] = kFALSE;
337
338 fMonitoringMutex = new TMutex();
339}
340
341//______________________________________________________________________________________________
342AliShuttle::~AliShuttle()
343{
344 //
345 // destructor
346 //
347
348 fPreprocessorMap.DeleteAll();
349 for(int iSys=0;iSys<4;iSys++)
350 if(fServer[iSys]) {
351 fServer[iSys]->Close();
352 delete fServer[iSys];
353 fServer[iSys] = 0;
354 }
355
356 if (fStatusEntry){
357 delete fStatusEntry;
358 fStatusEntry = 0;
359 }
360
361 if (fMonitoringMutex)
362 {
363 delete fMonitoringMutex;
364 fMonitoringMutex = 0;
365 }
366}
367
368//______________________________________________________________________________________________
369void AliShuttle::RegisterPreprocessor(AliPreprocessor* preprocessor)
370{
371 //
372 // Registers new AliPreprocessor.
373 // It uses GetName() for indentificator of the pre processor.
374 // The pre processor is registered it there isn't any other
375 // with the same identificator (GetName()).
376 //
377
378 const char* detName = preprocessor->GetName();
379 if(GetDetPos(detName) < 0)
380 AliFatal(Form("********** !!!!! Invalid detector name: %s !!!!! **********", detName));
381
382 if (fPreprocessorMap.GetValue(detName)) {
383 AliWarning(Form("AliPreprocessor %s is already registered!", detName));
384 return;
385 }
386
387 fPreprocessorMap.Add(new TObjString(detName), preprocessor);
388}
389//______________________________________________________________________________________________
390Bool_t AliShuttle::Store(const AliCDBPath& path, TObject* object,
391 AliCDBMetaData* metaData, Int_t validityStart, Bool_t validityInfinite)
392{
393 // Stores a CDB object in the storage for offline reconstruction. Objects that are not needed for
394 // offline reconstruction, but should be stored anyway (e.g. for debugging) should NOT be stored
395 // using this function. Use StoreReferenceData instead!
396 // It calls StoreLocally function which temporarily stores the data locally; when the preprocessor
397 // finishes the data are transferred to the main storage (Grid).
398
399 return StoreLocally(fgkLocalCDB, path, object, metaData, validityStart, validityInfinite);
400}
401
402//______________________________________________________________________________________________
403Bool_t AliShuttle::StoreReferenceData(const AliCDBPath& path, TObject* object, AliCDBMetaData* metaData)
404{
405 // Stores a CDB object in the storage for reference data. This objects will not be available during
406 // offline reconstrunction. Use this function for reference data only!
407 // It calls StoreLocally function which temporarily stores the data locally; when the preprocessor
408 // finishes the data are transferred to the main storage (Grid).
409
410 return StoreLocally(fgkLocalRefStorage, path, object, metaData);
411}
412
413//______________________________________________________________________________________________
414Bool_t AliShuttle::StoreLocally(const TString& localUri,
415 const AliCDBPath& path, TObject* object, AliCDBMetaData* metaData,
416 Int_t validityStart, Bool_t validityInfinite)
417{
418 // Store object temporarily in local storage. Parameters are passed by Store and StoreReferenceData functions.
419 // when the preprocessor finishes the data are transferred to the main storage (Grid).
420 // The parameters are:
421 // 1) Uri of the backup storage (Local)
422 // 2) the object's path.
423 // 3) the object to be stored
424 // 4) the metaData to be associated with the object
425 // 5) the validity start run number w.r.t. the current run,
426 // if the data is valid only for this run leave the default 0
427 // 6) specifies if the calibration data is valid for infinity (this means until updated),
428 // typical for calibration runs, the default is kFALSE
429 //
430 // returns 0 if fail, 1 otherwise
431
432 if (fTestMode & kErrorStorage)
433 {
434 Log(fCurrentDetector, "StoreLocally - In TESTMODE - Simulating error while storing locally");
435 return kFALSE;
436 }
437
438 const char* cdbType = (localUri == fgkLocalCDB) ? "CDB" : "Reference";
439
440 Int_t firstRun = GetCurrentRun() - validityStart;
441 if(firstRun < 0) {
442 AliWarning("First valid run happens to be less than 0! Setting it to 0.");
443 firstRun=0;
444 }
445
446 Int_t lastRun = -1;
447 if(validityInfinite) {
448 lastRun = AliCDBRunRange::Infinity();
449 } else {
450 lastRun = GetCurrentRun();
451 }
452
453 // Version is set to current run, it will be used later to transfer data to Grid
454 AliCDBId id(path, firstRun, lastRun, GetCurrentRun(), -1);
455
456 if(! dynamic_cast<TObjString*> (metaData->GetProperty("RunUsed(TObjString)"))){
457 TObjString runUsed = Form("%d", GetCurrentRun());
458 metaData->SetProperty("RunUsed(TObjString)", runUsed.Clone());
459 }
460
461 Bool_t result = kFALSE;
462
463 if (!(AliCDBManager::Instance()->GetStorage(localUri))) {
464 Log("SHUTTLE", Form("StoreLocally - Cannot activate local %s storage", cdbType));
465 } else {
466 result = AliCDBManager::Instance()->GetStorage(localUri)
467 ->Put(object, id, metaData);
468 }
469
470 if(!result) {
471
472 Log(fCurrentDetector, Form("StoreLocally - Can't store object <%s>!", id.ToString().Data()));
473 }
474
475 return result;
476}
477
478//______________________________________________________________________________________________
479Bool_t AliShuttle::StoreOCDB()
480{
481 //
482 // Called when preprocessor ends successfully or when previous storage attempt failed (kStoreError status)
483 // Calls underlying StoreOCDB(const char*) function twice, for OCDB and Reference storage.
484 // Then calls StoreRefFilesToGrid to store reference files.
485 //
486
487 if (fTestMode & kErrorGrid)
488 {
489 Log("SHUTTLE", "StoreOCDB - In TESTMODE - Simulating error while storing in the Grid");
490 Log(fCurrentDetector, "StoreOCDB - In TESTMODE - Simulating error while storing in the Grid");
491 return kFALSE;
492 }
493
494 Log("SHUTTLE","Storing OCDB data ...");
495 Bool_t resultCDB = StoreOCDB(fgkMainCDB);
496
497 Log("SHUTTLE","Storing reference data ...");
498 Bool_t resultRef = StoreOCDB(fgkMainRefStorage);
499
500 Log("SHUTTLE","Storing reference files ...");
501 Bool_t resultRefFiles = StoreRefFilesToGrid();
502
503 return resultCDB && resultRef && resultRefFiles;
504}
505
506//______________________________________________________________________________________________
507Bool_t AliShuttle::StoreOCDB(const TString& gridURI)
508{
509 //
510 // Called by StoreOCDB(), performs actual storage to the main OCDB and reference storages (Grid)
511 //
512
513 TObjArray* gridIds=0;
514
515 Bool_t result = kTRUE;
516
517 const char* type = 0;
518 TString localURI;
519 if(gridURI == fgkMainCDB) {
520 type = "OCDB";
521 localURI = fgkLocalCDB;
522 } else if(gridURI == fgkMainRefStorage) {
523 type = "reference";
524 localURI = fgkLocalRefStorage;
525 } else {
526 AliError(Form("Invalid storage URI: %s", gridURI.Data()));
527 return kFALSE;
528 }
529
530 AliCDBManager* man = AliCDBManager::Instance();
531
532 AliCDBStorage *gridSto = man->GetStorage(gridURI);
533 if(!gridSto) {
534 Log("SHUTTLE",
535 Form("StoreOCDB - cannot activate main %s storage", type));
536 return kFALSE;
537 }
538
539 gridIds = gridSto->GetQueryCDBList();
540
541 // get objects previously stored in local CDB
542 AliCDBStorage *localSto = man->GetStorage(localURI);
543 if(!localSto) {
544 Log("SHUTTLE",
545 Form("StoreOCDB - cannot activate local %s storage", type));
546 return kFALSE;
547 }
548 AliCDBPath aPath(GetOfflineDetName(fCurrentDetector.Data()),"*","*");
549 // Local objects were stored with current run as Grid version!
550 TList* localEntries = localSto->GetAll(aPath.GetPath(), GetCurrentRun(), GetCurrentRun());
551 localEntries->SetOwner(1);
552
553 // loop on local stored objects
554 TIter localIter(localEntries);
555 AliCDBEntry *aLocEntry = 0;
556 while((aLocEntry = dynamic_cast<AliCDBEntry*> (localIter.Next()))){
557 aLocEntry->SetOwner(1);
558 AliCDBId aLocId = aLocEntry->GetId();
559 aLocEntry->SetVersion(-1);
560 aLocEntry->SetSubVersion(-1);
561
562 // If local object is valid up to infinity we store it only if it is
563 // the first unprocessed run!
564 if (aLocId.GetLastRun() == AliCDBRunRange::Infinity() &&
565 !fFirstUnprocessed[GetDetPos(fCurrentDetector)])
566 {
567 Log("SHUTTLE", Form("StoreOCDB - %s: object %s has validity infinite but "
568 "there are previous unprocessed runs!",
569 fCurrentDetector.Data(), aLocId.GetPath().Data()));
570 continue;
571 }
572
573 // loop on Grid valid Id's
574 Bool_t store = kTRUE;
575 TIter gridIter(gridIds);
576 AliCDBId* aGridId = 0;
577 while((aGridId = dynamic_cast<AliCDBId*> (gridIter.Next()))){
578 if(aGridId->GetPath() != aLocId.GetPath()) continue;
579 // skip all objects valid up to infinity
580 if(aGridId->GetLastRun() == AliCDBRunRange::Infinity()) continue;
581 // if we get here, it means there's already some more recent object stored on Grid!
582 store = kFALSE;
583 break;
584 }
585
586 // If we get here, the file can be stored!
587 Bool_t storeOk = gridSto->Put(aLocEntry);
588 if(!store || storeOk){
589
590 if (!store)
591 {
592 Log(fCurrentDetector.Data(),
593 Form("StoreOCDB - A more recent object already exists in %s storage: <%s>",
594 type, aGridId->ToString().Data()));
595 } else {
596 Log("SHUTTLE",
597 Form("StoreOCDB - Object <%s> successfully put into %s storage",
598 aLocId.ToString().Data(), type));
599 Log(fCurrentDetector.Data(),
600 Form("StoreOCDB - Object <%s> successfully put into %s storage",
601 aLocId.ToString().Data(), type));
602 }
603
604 // removing local filename...
605 TString filename;
606 localSto->IdToFilename(aLocId, filename);
607 AliInfo(Form("Removing local file %s", filename.Data()));
608 RemoveFile(filename.Data());
609 continue;
610 } else {
611 Log("SHUTTLE",
612 Form("StoreOCDB - Grid %s storage of object <%s> failed",
613 type, aLocId.ToString().Data()));
614 Log(fCurrentDetector.Data(),
615 Form("StoreOCDB - Grid %s storage of object <%s> failed",
616 type, aLocId.ToString().Data()));
617 result = kFALSE;
618 }
619 }
620 localEntries->Clear();
621
622 return result;
623}
624
625//______________________________________________________________________________________________
626Bool_t AliShuttle::CleanReferenceStorage(const char* detector)
627{
628 // clears the directory used to store reference files of a given subdetector
629
630 AliCDBManager* man = AliCDBManager::Instance();
631 AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
632 TString localBaseFolder = sto->GetBaseFolder();
633
634 TString targetDir = GetRefFilePrefix(localBaseFolder.Data(), detector);
635
636 Log("SHUTTLE", Form("Cleaning %s", targetDir.Data()));
637
638 TString begin;
639 begin.Form("%d_", GetCurrentRun());
640
641 TSystemDirectory* baseDir = new TSystemDirectory("/", targetDir);
642 if (!baseDir)
643 return kTRUE;
644
645 TList* dirList = baseDir->GetListOfFiles();
646 delete baseDir;
647
648 if (!dirList) return kTRUE;
649
650 if (dirList->GetEntries() < 3)
651 {
652 delete dirList;
653 return kTRUE;
654 }
655
656 Int_t nDirs = 0, nDel = 0;
657 TIter dirIter(dirList);
658 TSystemFile* entry = 0;
659
660 Bool_t success = kTRUE;
661
662 while ((entry = dynamic_cast<TSystemFile*> (dirIter.Next())))
663 {
664 if (entry->IsDirectory())
665 continue;
666
667 TString fileName(entry->GetName());
668 if (!fileName.BeginsWith(begin))
669 continue;
670
671 nDirs++;
672
673 // delete file
674 Int_t result = gSystem->Unlink(fileName.Data());
675
676 if (result)
677 {
678 Log("SHUTTLE", Form("Could not delete file %s!", fileName.Data()));
679 success = kFALSE;
680 } else {
681 nDel++;
682 }
683 }
684
685 if(nDirs > 0)
686 Log("SHUTTLE", Form("CleanReferenceStorage - %d (over %d) reference files in folder %s were deleted.",
687 nDel, nDirs, targetDir.Data()));
688
689
690 delete dirList;
691 return success;
692
693
694
695
696
697
698 Int_t result = gSystem->GetPathInfo(targetDir, 0, (Long64_t*) 0, 0, 0);
699 if (result == 0)
700 {
701 // delete directory
702 result = gSystem->Exec(Form("rm -r %s", targetDir.Data()));
703 if (result != 0)
704 {
705 Log("SHUTTLE", Form("StoreReferenceFile - Could not clear directory %s", targetDir.Data()));
706 return kFALSE;
707 }
708 }
709
710 result = gSystem->mkdir(targetDir, kTRUE);
711 if (result != 0)
712 {
713 Log("SHUTTLE", Form("StoreReferenceFile - Error creating base directory %s", targetDir.Data()));
714 return kFALSE;
715 }
716
717 return kTRUE;
718}
719
720//______________________________________________________________________________________________
721Bool_t AliShuttle::StoreReferenceFile(const char* detector, const char* localFile, const char* gridFileName)
722{
723 //
724 // Stores reference file directly (without opening it). This function stores the file locally.
725 //
726 // The file is stored under the following location:
727 // <base folder of local reference storage>/<DET>/<RUN#>_<gridFileName>
728 // where <gridFileName> is the second parameter given to the function
729 //
730
731 if (fTestMode & kErrorStorage)
732 {
733 Log(fCurrentDetector, "StoreReferenceFile - In TESTMODE - Simulating error while storing locally");
734 return kFALSE;
735 }
736
737 AliCDBManager* man = AliCDBManager::Instance();
738 AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
739
740 TString localBaseFolder = sto->GetBaseFolder();
741
742 TString targetDir = GetRefFilePrefix(localBaseFolder.Data(), detector);
743
744 //try to open folder, if does not exist
745 void* dir = gSystem->OpenDirectory(targetDir.Data());
746 if (dir == NULL) {
747 if (gSystem->mkdir(targetDir.Data(), kTRUE)) {
748 Log("SHUTTLE", Form("Can't open directory <%s>", targetDir.Data()));
749 return kFALSE;
750 }
751
752 } else {
753 gSystem->FreeDirectory(dir);
754 }
755
756 TString target;
757 target.Form("%s/%d_%s", targetDir.Data(), GetCurrentRun(), gridFileName);
758
759 Int_t result = gSystem->GetPathInfo(localFile, 0, (Long64_t*) 0, 0, 0);
760 if (result)
761 {
762 Log("SHUTTLE", Form("StoreReferenceFile - %s does not exist", localFile));
763 return kFALSE;
764 }
765
766 result = gSystem->CopyFile(localFile, target);
767
768 if (result == 0)
769 {
770 Log("SHUTTLE", Form("StoreReferenceFile - File %s stored locally to %s", localFile, target.Data()));
771 return kTRUE;
772 }
773 else
774 {
775 Log("SHUTTLE", Form("StoreReferenceFile - Could not store file %s to %s!. Error code = %d",
776 localFile, target.Data(), result));
777 return kFALSE;
778 }
779}
780
781//______________________________________________________________________________________________
782Bool_t AliShuttle::StoreRefFilesToGrid()
783{
784 //
785 // Transfers the reference file to the Grid.
786 //
787 // The files are stored under the following location:
788 // <base folder of reference storage>/<DET>/<RUN#>_<gridFileName>
789 //
790
791 AliCDBManager* man = AliCDBManager::Instance();
792 AliCDBStorage* sto = man->GetStorage(fgkLocalRefStorage);
793 if (!sto)
794 return kFALSE;
795 TString localBaseFolder = sto->GetBaseFolder();
796
797 TString dir = GetRefFilePrefix(localBaseFolder.Data(), fCurrentDetector.Data());
798
799 AliCDBStorage* gridSto = man->GetStorage(fgkMainRefStorage);
800 if (!gridSto)
801 return kFALSE;
802
803 TString gridBaseFolder = gridSto->GetBaseFolder();
804
805 TString alienDir = GetRefFilePrefix(gridBaseFolder.Data(), fCurrentDetector.Data());
806
807 TString begin;
808 begin.Form("%d_", GetCurrentRun());
809
810 TSystemDirectory* baseDir = new TSystemDirectory("/", dir);
811 if (!baseDir)
812 return kTRUE;
813
814 TList* dirList = baseDir->GetListOfFiles();
815 delete baseDir;
816
817 if (!dirList) return kTRUE;
818
819 if (dirList->GetEntries() < 3)
820 {
821 delete dirList;
822 return kTRUE;
823 }
824
825 if (!gGrid)
826 {
827 Log("SHUTTLE", "Connection to Grid failed: Cannot continue!");
828 delete dirList;
829 return kFALSE;
830 }
831
832 Int_t nDirs = 0, nTransfer = 0;
833 TIter dirIter(dirList);
834 TSystemFile* entry = 0;
835
836 Bool_t success = kTRUE;
837 Bool_t first = kTRUE;
838
839 while ((entry = dynamic_cast<TSystemFile*> (dirIter.Next())))
840 {
841 if (entry->IsDirectory())
842 continue;
843
844 TString fileName(entry->GetName());
845 if (!fileName.BeginsWith(begin))
846 continue;
847
848 nDirs++;
849
850 if (first)
851 {
852 first = kFALSE;
853 // check that DET folder exists, otherwise create it
854 TGridResult* result = gGrid->Ls(alienDir.Data(), "a");
855
856 if (!result)
857 {
858 delete dirList;
859 return kFALSE;
860 }
861
862 if (!result->GetFileName(1)) // TODO: It looks like element 0 is always 0!!
863 {
864 if (!gGrid->Mkdir(alienDir.Data(),"",0))
865 {
866 Log("SHUTTLE", Form("StoreRefFilesToGrid - Cannot create directory %s",
867 alienDir.Data()));
868 delete dirList;
869 return kFALSE;
870 } else {
871 Log("SHUTTLE",Form("Folder %s created", alienDir.Data()));
872 }
873
874 } else {
875 Log("SHUTTLE",Form("Folder %s found", alienDir.Data()));
876 }
877 }
878
879 TString fullLocalPath;
880 fullLocalPath.Form("%s/%s", dir.Data(), fileName.Data());
881
882 TString fullGridPath;
883 fullGridPath.Form("alien://%s/%s", alienDir.Data(), fileName.Data());
884
885 Bool_t result = TFile::Cp(fullLocalPath, fullGridPath);
886
887 if (result)
888 {
889 Log("SHUTTLE", Form("StoreRefFilesToGrid - Copying local file %s to %s succeeded!", fullLocalPath.Data(), fullGridPath.Data()));
890 RemoveFile(fullLocalPath);
891 nTransfer++;
892 }
893 else
894 {
895 Log("SHUTTLE", Form("StoreRefFilesToGrid - Copying local file %s to %s FAILED!", fullLocalPath.Data(), fullGridPath.Data()));
896 success = kFALSE;
897 }
898 }
899
900 Log("SHUTTLE", Form("StoreRefFilesToGrid - %d (over %d) reference files in folder %s copied to Grid.", nTransfer, nDirs, dir.Data()));
901
902
903 delete dirList;
904 return success;
905}
906
907//______________________________________________________________________________________________
908const char* AliShuttle::GetRefFilePrefix(const char* base, const char* detector)
909{
910 //
911 // Get folder name of reference files
912 //
913
914 TString offDetStr(GetOfflineDetName(detector));
915 TString dir;
916 if (offDetStr == "ITS" || offDetStr == "MUON" || offDetStr == "PHOS")
917 {
918 dir.Form("%s/%s/%s", base, offDetStr.Data(), detector);
919 } else {
920 dir.Form("%s/%s", base, offDetStr.Data());
921 }
922
923 return dir.Data();
924
925
926}
927//______________________________________________________________________________________________
928void AliShuttle::CleanLocalStorage(const TString& uri)
929{
930 //
931 // Called in case the preprocessor is declared failed. Remove remaining objects from the local storages.
932 //
933
934 const char* type = 0;
935 if(uri == fgkLocalCDB) {
936 type = "OCDB";
937 } else if(uri == fgkLocalRefStorage) {
938 type = "Reference";
939 } else {
940 AliError(Form("Invalid storage URI: %s", uri.Data()));
941 return;
942 }
943
944 AliCDBManager* man = AliCDBManager::Instance();
945
946 // open local storage
947 AliCDBStorage *localSto = man->GetStorage(uri);
948 if(!localSto) {
949 Log("SHUTTLE",
950 Form("CleanLocalStorage - cannot activate local %s storage", type));
951 return;
952 }
953
954 TString filename(Form("%s/%s/*/Run*_v%d_s*.root",
955 localSto->GetBaseFolder().Data(), GetOfflineDetName(fCurrentDetector.Data()), GetCurrentRun()));
956
957 AliInfo(Form("filename = %s", filename.Data()));
958
959 AliInfo(Form("Removing remaining local files from run %d and detector %s ...",
960 GetCurrentRun(), fCurrentDetector.Data()));
961
962 RemoveFile(filename.Data());
963
964}
965
966//______________________________________________________________________________________________
967void AliShuttle::RemoveFile(const char* filename)
968{
969 //
970 // removes local file
971 //
972
973 TString command(Form("rm -f %s", filename));
974
975 Int_t result = gSystem->Exec(command.Data());
976 if(result != 0)
977 {
978 Log("SHUTTLE", Form("RemoveFile - %s: Cannot remove file %s!",
979 fCurrentDetector.Data(), filename));
980 }
981}
982
983//______________________________________________________________________________________________
984AliShuttleStatus* AliShuttle::ReadShuttleStatus()
985{
986 //
987 // Reads the AliShuttleStatus from the CDB
988 //
989
990 if (fStatusEntry){
991 delete fStatusEntry;
992 fStatusEntry = 0;
993 }
994
995 fStatusEntry = AliCDBManager::Instance()->GetStorage(GetLocalCDB())
996 ->Get(Form("/SHUTTLE/STATUS/%s", fCurrentDetector.Data()), GetCurrentRun());
997
998 if (!fStatusEntry) return 0;
999 fStatusEntry->SetOwner(1);
1000
1001 AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1002 if (!status) {
1003 AliError("Invalid object stored to CDB!");
1004 return 0;
1005 }
1006
1007 return status;
1008}
1009
1010//______________________________________________________________________________________________
1011Bool_t AliShuttle::WriteShuttleStatus(AliShuttleStatus* status)
1012{
1013 //
1014 // writes the status for one subdetector
1015 //
1016
1017 if (fStatusEntry){
1018 delete fStatusEntry;
1019 fStatusEntry = 0;
1020 }
1021
1022 Int_t run = GetCurrentRun();
1023
1024 AliCDBId id(AliCDBPath("SHUTTLE", "STATUS", fCurrentDetector), run, run);
1025
1026 fStatusEntry = new AliCDBEntry(status, id, new AliCDBMetaData);
1027 fStatusEntry->SetOwner(1);
1028
1029 UInt_t result = AliCDBManager::Instance()->GetStorage(fgkLocalCDB)->Put(fStatusEntry);
1030
1031 if (!result) {
1032 Log("SHUTTLE", Form("WriteShuttleStatus - Failed for %s, run %d",
1033 fCurrentDetector.Data(), run));
1034 return kFALSE;
1035 }
1036
1037 SendMLInfo();
1038
1039 return kTRUE;
1040}
1041
1042//______________________________________________________________________________________________
1043void AliShuttle::UpdateShuttleStatus(AliShuttleStatus::Status newStatus, Bool_t increaseCount)
1044{
1045 //
1046 // changes the AliShuttleStatus for the given detector and run to the given status
1047 //
1048
1049 if (!fStatusEntry){
1050 AliError("UNEXPECTED: fStatusEntry empty");
1051 return;
1052 }
1053
1054 AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1055
1056 if (!status){
1057 Log("SHUTTLE", "UNEXPECTED: status could not be read from current CDB entry");
1058 return;
1059 }
1060
1061 TString actionStr = Form("UpdateShuttleStatus - %s: Changing state from %s to %s",
1062 fCurrentDetector.Data(),
1063 status->GetStatusName(),
1064 status->GetStatusName(newStatus));
1065 Log("SHUTTLE", actionStr);
1066 SetLastAction(actionStr);
1067
1068 status->SetStatus(newStatus);
1069 if (increaseCount) status->IncreaseCount();
1070
1071 AliCDBManager::Instance()->GetStorage(fgkLocalCDB)->Put(fStatusEntry);
1072
1073 SendMLInfo();
1074}
1075
1076//______________________________________________________________________________________________
1077void AliShuttle::SendMLInfo()
1078{
1079 //
1080 // sends ML information about the current status of the current detector being processed
1081 //
1082
1083 AliShuttleStatus* status = dynamic_cast<AliShuttleStatus*> (fStatusEntry->GetObject());
1084
1085 if (!status){
1086 Log("SHUTTLE", "SendMLInfo - UNEXPECTED: status could not be read from current CDB entry");
1087 return;
1088 }
1089
1090 TMonaLisaText mlStatus(Form("%s_status", fCurrentDetector.Data()), status->GetStatusName());
1091 TMonaLisaValue mlRetryCount(Form("%s_count", fCurrentDetector.Data()), status->GetCount());
1092
1093 TList mlList;
1094 mlList.Add(&mlStatus);
1095 mlList.Add(&mlRetryCount);
1096
1097 fMonaLisa->SendParameters(&mlList);
1098}
1099
1100//______________________________________________________________________________________________
1101Bool_t AliShuttle::ContinueProcessing()
1102{
1103 // this function reads the AliShuttleStatus information from CDB and
1104 // checks if the processing should be continued
1105 // if yes it returns kTRUE and updates the AliShuttleStatus with nextStatus
1106
1107 if (!fConfig->HostProcessDetector(fCurrentDetector)) return kFALSE;
1108
1109 AliPreprocessor* aPreprocessor =
1110 dynamic_cast<AliPreprocessor*> (fPreprocessorMap.GetValue(fCurrentDetector));
1111 if (!aPreprocessor)
1112 {
1113 AliInfo(Form("%s: no preprocessor registered", fCurrentDetector.Data()));
1114 return kFALSE;
1115 }
1116
1117 AliShuttleLogbookEntry::Status entryStatus =
1118 fLogbookEntry->GetDetectorStatus(fCurrentDetector);
1119
1120 if(entryStatus != AliShuttleLogbookEntry::kUnprocessed) {
1121 AliInfo(Form("ContinueProcessing - %s is %s",
1122 fCurrentDetector.Data(),
1123 fLogbookEntry->GetDetectorStatusName(entryStatus)));
1124 return kFALSE;
1125 }
1126
1127 // if we get here, according to Shuttle logbook subdetector is in UNPROCESSED state
1128
1129 // check if current run is first unprocessed run for current detector
1130 if (fConfig->StrictRunOrder(fCurrentDetector) &&
1131 !fFirstUnprocessed[GetDetPos(fCurrentDetector)])
1132 {
1133 if (fTestMode == kNone)
1134 {
1135 Log("SHUTTLE", Form("ContinueProcessing - %s requires strict run ordering but this is not the first unprocessed run!"));
1136 return kFALSE;
1137 }
1138 else
1139 {
1140 Log("SHUTTLE", Form("ContinueProcessing - In TESTMODE - Although %s requires strict run ordering and this is not the first unprocessed run, the SHUTTLE continues"));
1141 }
1142 }
1143
1144 AliShuttleStatus* status = ReadShuttleStatus();
1145 if (!status) {
1146 // first time
1147 Log("SHUTTLE", Form("ContinueProcessing - %s: Processing first time",
1148 fCurrentDetector.Data()));
1149 status = new AliShuttleStatus(AliShuttleStatus::kStarted);
1150 return WriteShuttleStatus(status);
1151 }
1152
1153 // The following two cases shouldn't happen if Shuttle Logbook was correctly updated.
1154 // If it happens it may mean Logbook updating failed... let's do it now!
1155 if (status->GetStatus() == AliShuttleStatus::kDone ||
1156 status->GetStatus() == AliShuttleStatus::kFailed){
1157 Log("SHUTTLE", Form("ContinueProcessing - %s is already %s. Updating Shuttle Logbook",
1158 fCurrentDetector.Data(),
1159 status->GetStatusName(status->GetStatus())));
1160 UpdateShuttleLogbook(fCurrentDetector.Data(),
1161 status->GetStatusName(status->GetStatus()));
1162 return kFALSE;
1163 }
1164
1165 if (status->GetStatus() == AliShuttleStatus::kStoreError) {
1166 Log("SHUTTLE",
1167 Form("ContinueProcessing - %s: Grid storage of one or more objects failed. Trying again now",
1168 fCurrentDetector.Data()));
1169 UpdateShuttleStatus(AliShuttleStatus::kStoreStarted);
1170 if (StoreOCDB()){
1171 Log("SHUTTLE", Form("ContinueProcessing - %s: all objects successfully stored into main storage",
1172 fCurrentDetector.Data()));
1173 UpdateShuttleStatus(AliShuttleStatus::kDone);
1174 UpdateShuttleLogbook(fCurrentDetector.Data(), "DONE");
1175 } else {
1176 Log("SHUTTLE",
1177 Form("ContinueProcessing - %s: Grid storage failed again",
1178 fCurrentDetector.Data()));
1179 UpdateShuttleStatus(AliShuttleStatus::kStoreError);
1180 }
1181 return kFALSE;
1182 }
1183
1184 // if we get here, there is a restart
1185 Bool_t cont = kFALSE;
1186
1187 // abort conditions
1188 if (status->GetCount() >= fConfig->GetMaxRetries()) {
1189 Log("SHUTTLE", Form("ContinueProcessing - %s failed %d times in status %s - "
1190 "Updating Shuttle Logbook", fCurrentDetector.Data(),
1191 status->GetCount(), status->GetStatusName()));
1192 UpdateShuttleLogbook(fCurrentDetector.Data(), "FAILED");
1193 UpdateShuttleStatus(AliShuttleStatus::kFailed);
1194
1195 // there may still be objects in local OCDB and reference storage
1196 // and FXS databases may be not updated: do it now!
1197
1198 // TODO Currently disabled, we want to keep files in case of failure!
1199 // CleanLocalStorage(fgkLocalCDB);
1200 // CleanLocalStorage(fgkLocalRefStorage);
1201 // UpdateTableFailCase();
1202
1203 // Send mail to detector expert!
1204 AliInfo(Form("Sending mail to %s expert...", fCurrentDetector.Data()));
1205 if (!SendMail())
1206 Log("SHUTTLE", Form("ContinueProcessing - Could not send mail to %s expert",
1207 fCurrentDetector.Data()));
1208
1209 } else {
1210 Log("SHUTTLE", Form("ContinueProcessing - %s: restarting. "
1211 "Aborted before with %s. Retry number %d.", fCurrentDetector.Data(),
1212 status->GetStatusName(), status->GetCount()));
1213 Bool_t increaseCount = kTRUE;
1214 if (status->GetStatus() == AliShuttleStatus::kDCSError || status->GetStatus() == AliShuttleStatus::kDCSStarted)
1215 increaseCount = kFALSE;
1216 UpdateShuttleStatus(AliShuttleStatus::kStarted, increaseCount);
1217 cont = kTRUE;
1218 }
1219
1220 return cont;
1221}
1222
1223//______________________________________________________________________________________________
1224Bool_t AliShuttle::Process(AliShuttleLogbookEntry* entry)
1225{
1226 //
1227 // Makes data retrieval for all detectors in the configuration.
1228 // entry: Shuttle logbook entry, contains run paramenters and status of detectors
1229 // (Unprocessed, Inactive, Failed or Done).
1230 // Returns kFALSE in case of error occured and kTRUE otherwise
1231 //
1232
1233 if (!entry) return kFALSE;
1234
1235 fLogbookEntry = entry;
1236
1237 AliInfo(Form("\n\n \t\t\t^*^*^*^*^*^*^*^*^*^*^*^* run %d: START ^*^*^*^*^*^*^*^*^*^*^*^* \n",
1238 GetCurrentRun()));
1239
1240 // create ML instance that monitors this run
1241 fMonaLisa = new TMonaLisaWriter(Form("%d", GetCurrentRun()), "SHUTTLE", "aliendb1.cern.ch");
1242 // disable monitoring of other parameters that come e.g. from TFile
1243 gMonitoringWriter = 0;
1244
1245 // Send the information to ML
1246 TMonaLisaText mlStatus("SHUTTLE_status", "Processing");
1247 TMonaLisaText mlRunType("SHUTTLE_runtype", Form("%s (%s)", entry->GetRunType(), entry->GetRunParameter("log")));
1248
1249 TList mlList;
1250 mlList.Add(&mlStatus);
1251 mlList.Add(&mlRunType);
1252
1253 fMonaLisa->SendParameters(&mlList);
1254
1255 if (fLogbookEntry->IsDone())
1256 {
1257 Log("SHUTTLE","Process - Shuttle is already DONE. Updating logbook");
1258 UpdateShuttleLogbook("shuttle_done");
1259 fLogbookEntry = 0;
1260 return kTRUE;
1261 }
1262
1263 // read test mode if flag is set
1264 if (fReadTestMode)
1265 {
1266 fTestMode = kNone;
1267 TString logEntry(entry->GetRunParameter("log"));
1268 //printf("log entry = %s\n", logEntry.Data());
1269 TString searchStr("Testmode: ");
1270 Int_t pos = logEntry.Index(searchStr.Data());
1271 //printf("%d\n", pos);
1272 if (pos >= 0)
1273 {
1274 TSubString subStr = logEntry(pos + searchStr.Length(), logEntry.Length());
1275 //printf("%s\n", subStr.String().Data());
1276 TString newStr(subStr.Data());
1277 TObjArray* token = newStr.Tokenize(' ');
1278 if (token)
1279 {
1280 //token->Print();
1281 TObjString* tmpStr = dynamic_cast<TObjString*> (token->First());
1282 if (tmpStr)
1283 {
1284 Int_t testMode = tmpStr->String().Atoi();
1285 if (testMode > 0)
1286 {
1287 Log("SHUTTLE", Form("Enabling test mode %d", testMode));
1288 SetTestMode((TestMode) testMode);
1289 }
1290 }
1291 delete token;
1292 }
1293 }
1294 }
1295
1296 Log("SHUTTLE", Form("The test mode flag is %d", (Int_t) fTestMode));
1297
1298 fLogbookEntry->Print("all");
1299
1300 // Initialization
1301 Bool_t hasError = kFALSE;
1302
1303 AliCDBStorage *mainCDBSto = AliCDBManager::Instance()->GetStorage(fgkMainCDB);
1304 if(mainCDBSto) mainCDBSto->QueryCDB(GetCurrentRun());
1305 AliCDBStorage *mainRefSto = AliCDBManager::Instance()->GetStorage(fgkMainRefStorage);
1306 if(mainRefSto) mainRefSto->QueryCDB(GetCurrentRun());
1307
1308 // Loop on detectors in the configuration
1309 TIter iter(fConfig->GetDetectors());
1310 TObjString* aDetector = 0;
1311
1312 while ((aDetector = (TObjString*) iter.Next()))
1313 {
1314 fCurrentDetector = aDetector->String();
1315
1316 if (ContinueProcessing() == kFALSE) continue;
1317
1318 AliInfo(Form("\n\n \t\t\t****** run %d - %s: START ******",
1319 GetCurrentRun(), aDetector->GetName()));
1320
1321 for(Int_t iSys=0;iSys<3;iSys++) fFXSCalled[iSys]=kFALSE;
1322
1323 Log(fCurrentDetector.Data(), "Starting processing");
1324
1325 Int_t pid = fork();
1326
1327 if (pid < 0)
1328 {
1329 Log("SHUTTLE", "ERROR: Forking failed");
1330 }
1331 else if (pid > 0)
1332 {
1333 // parent
1334 AliInfo(Form("In parent process of %d - %s: Starting monitoring",
1335 GetCurrentRun(), aDetector->GetName()));
1336
1337 Long_t begin = time(0);
1338
1339 int status; // to be used with waitpid, on purpose an int (not Int_t)!
1340 while (waitpid(pid, &status, WNOHANG) == 0)
1341 {
1342 Long_t expiredTime = time(0) - begin;
1343
1344 if (expiredTime > fConfig->GetPPTimeOut())
1345 {
1346 TString tmp;
1347 tmp.Form("Process of %s time out. Run time: %d seconds. Killing...",
1348 fCurrentDetector.Data(), expiredTime);
1349 Log("SHUTTLE", tmp);
1350 Log(fCurrentDetector, tmp);
1351
1352 kill(pid, 9);
1353
1354 UpdateShuttleStatus(AliShuttleStatus::kPPTimeOut);
1355 hasError = kTRUE;
1356
1357 gSystem->Sleep(1000);
1358 }
1359 else
1360 {
1361 gSystem->Sleep(1000);
1362
1363 TString checkStr;
1364 checkStr.Form("ps -o vsize --pid %d | tail -n 1", pid);
1365 FILE* pipe = gSystem->OpenPipe(checkStr, "r");
1366 if (!pipe)
1367 {
1368 Log("SHUTTLE", Form("Error: Could not open pipe to %s", checkStr.Data()));
1369 continue;
1370 }
1371
1372 char buffer[100];
1373 if (!fgets(buffer, 100, pipe))
1374 {
1375 Log("SHUTTLE", "Error: ps did not return anything");
1376 gSystem->ClosePipe(pipe);
1377 continue;
1378 }
1379 gSystem->ClosePipe(pipe);
1380
1381 //Log("SHUTTLE", Form("ps returned %s", buffer));
1382
1383 Int_t mem = 0;
1384 if ((sscanf(buffer, "%d\n", &mem) != 1) || !mem)
1385 {
1386 Log("SHUTTLE", "Error: Could not parse output of ps");
1387 continue;
1388 }
1389
1390 if (expiredTime % 60 == 0)
1391 Log("SHUTTLE", Form("%s: Checking process. Run time: %d seconds - Memory consumption: %d KB",
1392 fCurrentDetector.Data(), expiredTime, mem));
1393
1394 if (mem > fConfig->GetPPMaxMem())
1395 {
1396 TString tmp;
1397 tmp.Form("Process exceeds maximum allowed memory (%d KB > %d KB). Killing...",
1398 mem, fConfig->GetPPMaxMem());
1399 Log("SHUTTLE", tmp);
1400 Log(fCurrentDetector, tmp);
1401
1402 kill(pid, 9);
1403
1404 UpdateShuttleStatus(AliShuttleStatus::kPPOutOfMemory);
1405 hasError = kTRUE;
1406
1407 gSystem->Sleep(1000);
1408 }
1409 }
1410 }
1411
1412 AliInfo(Form("In parent process of %d - %s: Client has terminated.",
1413 GetCurrentRun(), aDetector->GetName()));
1414
1415 if (WIFEXITED(status))
1416 {
1417 Int_t returnCode = WEXITSTATUS(status);
1418
1419 Log("SHUTTLE", Form("%s: the return code is %d", fCurrentDetector.Data(),
1420 returnCode));
1421
1422 if (returnCode == 0) hasError = kTRUE;
1423 }
1424 }
1425 else if (pid == 0)
1426 {
1427 // client
1428 AliInfo(Form("In client process of %d - %s", GetCurrentRun(), aDetector->GetName()));
1429
1430 AliInfo("Redirecting output...");
1431
1432 if ((freopen(GetLogFileName(fCurrentDetector), "a", stdout)) == 0)
1433 {
1434 Log("SHUTTLE", "Could not freopen stdout");
1435 }
1436 else
1437 {
1438 fOutputRedirected = kTRUE;
1439 if ((dup2(fileno(stdout), fileno(stderr))) < 0)
1440 Log("SHUTTLE", "Could not redirect stderr");
1441
1442 }
1443
1444 TString wd = gSystem->WorkingDirectory();
1445 TString tmpDir = Form("%s/%s_process",GetShuttleTempDir(),fCurrentDetector.Data());
1446
1447 gSystem->mkdir(tmpDir.Data());
1448 gSystem->ChangeDirectory(tmpDir.Data());
1449
1450 Bool_t success = ProcessCurrentDetector();
1451
1452 gSystem->ChangeDirectory(wd.Data());
1453
1454 gSystem->Exec(Form("rm -rf %s",tmpDir.Data()));
1455
1456 if (success) // Preprocessor finished successfully!
1457 {
1458 // Update time_processed field in FXS DB
1459 if (UpdateTable() == kFALSE)
1460 Log("SHUTTLE", Form("Process - %s: Could not update FXS databases!",
1461 fCurrentDetector.Data()));
1462
1463 // Transfer the data from local storage to main storage (Grid)
1464 UpdateShuttleStatus(AliShuttleStatus::kStoreStarted);
1465 if (StoreOCDB() == kFALSE)
1466 {
1467 AliInfo(Form("\n \t\t\t****** run %d - %s: STORAGE ERROR ****** \n\n",
1468 GetCurrentRun(), aDetector->GetName()));
1469 UpdateShuttleStatus(AliShuttleStatus::kStoreError);
1470 success = kFALSE;
1471 } else {
1472 AliInfo(Form("\n \t\t\t****** run %d - %s: DONE ****** \n\n",
1473 GetCurrentRun(), aDetector->GetName()));
1474 UpdateShuttleStatus(AliShuttleStatus::kDone);
1475 UpdateShuttleLogbook(fCurrentDetector, "DONE");
1476 }
1477 }
1478
1479 for (UInt_t iSys=0; iSys<3; iSys++)
1480 {
1481 if (fFXSCalled[iSys]) fFXSlist[iSys].Clear();
1482 }
1483
1484 AliInfo(Form("Client process of %d - %s is exiting now with %d.",
1485 GetCurrentRun(), aDetector->GetName(), success));
1486
1487 // the client exits here
1488 gSystem->Exit(success);
1489
1490 AliError("We should never get here!!!");
1491 }
1492 }
1493
1494 AliInfo(Form("\n\n \t\t\t^*^*^*^*^*^*^*^*^*^*^*^* run %d: FINISH ^*^*^*^*^*^*^*^*^*^*^*^* \n",
1495 GetCurrentRun()));
1496
1497 //check if shuttle is done for this run, if so update logbook
1498 TObjArray checkEntryArray;
1499 checkEntryArray.SetOwner(1);
1500 TString whereClause = Form("where run=%d", GetCurrentRun());
1501 if (!QueryShuttleLogbook(whereClause.Data(), checkEntryArray) || checkEntryArray.GetEntries() == 0) {
1502 Log("SHUTTLE", Form("Process - Warning: Cannot check status of run %d on Shuttle logbook!",
1503 GetCurrentRun()));
1504 return hasError == kFALSE;
1505 }
1506
1507 AliShuttleLogbookEntry* checkEntry = dynamic_cast<AliShuttleLogbookEntry*>
1508 (checkEntryArray.At(0));
1509
1510 if (checkEntry)
1511 {
1512 if (checkEntry->IsDone())
1513 {
1514 Log("SHUTTLE","Process - Shuttle is DONE. Updating logbook");
1515 UpdateShuttleLogbook("shuttle_done");
1516 }
1517 else
1518 {
1519 for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
1520 {
1521 if (checkEntry->GetDetectorStatus(iDet) == AliShuttleLogbookEntry::kUnprocessed)
1522 {
1523 AliDebug(2, Form("Run %d: setting %s as \"not first time unprocessed\"",
1524 checkEntry->GetRun(), GetDetName(iDet)));
1525 fFirstUnprocessed[iDet] = kFALSE;
1526 }
1527 }
1528 }
1529 }
1530
1531 // remove ML instance
1532 delete fMonaLisa;
1533 fMonaLisa = 0;
1534
1535 fLogbookEntry = 0;
1536
1537 return hasError == kFALSE;
1538}
1539
1540//______________________________________________________________________________________________
1541Bool_t AliShuttle::ProcessCurrentDetector()
1542{
1543 //
1544 // Makes data retrieval just for a specific detector (fCurrentDetector).
1545 // Threre should be a configuration for this detector.
1546
1547 Log("SHUTTLE", Form("ProcessCurrentDetector - Retrieving values for %s, run %d",
1548 fCurrentDetector.Data(), GetCurrentRun()));
1549
1550 if (!CleanReferenceStorage(fCurrentDetector.Data()))
1551 return kFALSE;
1552
1553 TMap* dcsMap = new TMap();
1554
1555 // call preprocessor
1556 AliPreprocessor* aPreprocessor =
1557 dynamic_cast<AliPreprocessor*> (fPreprocessorMap.GetValue(fCurrentDetector));
1558
1559 aPreprocessor->Initialize(GetCurrentRun(), GetCurrentStartTime(), GetCurrentEndTime());
1560
1561 Bool_t processDCS = aPreprocessor->ProcessDCS();
1562
1563 if (!processDCS)
1564 {
1565 Log(fCurrentDetector, "ProcessCurrentDetector -"
1566 " The preprocessor requested to skip the retrieval of DCS values");
1567 }
1568 else if (fTestMode & kSkipDCS)
1569 {
1570 Log(fCurrentDetector, "ProcessCurrentDetector - In TESTMODE: Skipping DCS processing");
1571 }
1572 else if (fTestMode & kErrorDCS)
1573 {
1574 Log(fCurrentDetector, "ProcessCurrentDetector - In TESTMODE: Simulating DCS error");
1575 UpdateShuttleStatus(AliShuttleStatus::kDCSStarted);
1576 UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1577 delete dcsMap;
1578 return kFALSE;
1579 } else {
1580
1581 UpdateShuttleStatus(AliShuttleStatus::kDCSStarted);
1582
1583 // Query DCS archive
1584 Int_t nServers = fConfig->GetNServers(fCurrentDetector);
1585
1586 for (int iServ=0; iServ<nServers; iServ++)
1587 {
1588
1589 TString host(fConfig->GetDCSHost(fCurrentDetector, iServ));
1590 Int_t port = fConfig->GetDCSPort(fCurrentDetector, iServ);
1591 Int_t multiSplit = fConfig->GetMultiSplit(fCurrentDetector, iServ);
1592
1593 Log(fCurrentDetector, Form("ProcessCurrentDetector -"
1594 " Querying DCS Amanda server %s:%d (%d of %d)",
1595 host.Data(), port, iServ+1, nServers));
1596
1597 TMap* aliasMap = 0;
1598 TMap* dpMap = 0;
1599
1600 if (fConfig->GetDCSAliases(fCurrentDetector, iServ)->GetEntries() > 0)
1601 {
1602 aliasMap = GetValueSet(host, port,
1603 fConfig->GetDCSAliases(fCurrentDetector, iServ),
1604 kAlias, multiSplit);
1605 if (!aliasMap)
1606 {
1607 Log(fCurrentDetector,
1608 Form("ProcessCurrentDetector -"
1609 " Error retrieving DCS aliases from server %s",
1610 host.Data()));
1611 UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1612 delete dcsMap;
1613 return kFALSE;
1614 }
1615 }
1616
1617 if (fConfig->GetDCSDataPoints(fCurrentDetector, iServ)->GetEntries() > 0)
1618 {
1619 dpMap = GetValueSet(host, port,
1620 fConfig->GetDCSDataPoints(fCurrentDetector, iServ),
1621 kDP, multiSplit);
1622 if (!dpMap)
1623 {
1624 Log(fCurrentDetector,
1625 Form("ProcessCurrentDetector -"
1626 " Error retrieving DCS data points from server %s",
1627 host.Data()));
1628 UpdateShuttleStatus(AliShuttleStatus::kDCSError);
1629 if (aliasMap) delete aliasMap;
1630 delete dcsMap;
1631 return kFALSE;
1632 }
1633 }
1634
1635 // merge aliasMap and dpMap into dcsMap
1636 if(aliasMap) {
1637 TIter iter(aliasMap);
1638 TObjString* key = 0;
1639 while ((key = (TObjString*) iter.Next()))
1640 dcsMap->Add(key, aliasMap->GetValue(key->String()));
1641
1642 aliasMap->SetOwner(kFALSE);
1643 delete aliasMap;
1644 }
1645
1646 if(dpMap) {
1647 TIter iter(dpMap);
1648 TObjString* key = 0;
1649 while ((key = (TObjString*) iter.Next()))
1650 dcsMap->Add(key, dpMap->GetValue(key->String()));
1651
1652 dpMap->SetOwner(kFALSE);
1653 delete dpMap;
1654 }
1655 }
1656 }
1657
1658 // DCS Archive DB processing successful. Call Preprocessor!
1659 UpdateShuttleStatus(AliShuttleStatus::kPPStarted);
1660
1661 UInt_t returnValue = aPreprocessor->Process(dcsMap);
1662
1663 if (returnValue > 0) // Preprocessor error!
1664 {
1665 Log(fCurrentDetector, Form("Preprocessor failed. Process returned %d.", returnValue));
1666 UpdateShuttleStatus(AliShuttleStatus::kPPError);
1667 dcsMap->DeleteAll();
1668 delete dcsMap;
1669 return kFALSE;
1670 }
1671
1672 // preprocessor ok!
1673 UpdateShuttleStatus(AliShuttleStatus::kPPDone);
1674 Log(fCurrentDetector, Form("ProcessCurrentDetector - %s preprocessor returned success",
1675 fCurrentDetector.Data()));
1676
1677 dcsMap->DeleteAll();
1678 delete dcsMap;
1679
1680 return kTRUE;
1681}
1682
1683//______________________________________________________________________________________________
1684Bool_t AliShuttle::QueryShuttleLogbook(const char* whereClause,
1685 TObjArray& entries)
1686{
1687 // Query DAQ's Shuttle logbook and fills detector status object.
1688 // Call QueryRunParameters to query DAQ logbook for run parameters.
1689 //
1690
1691 entries.SetOwner(1);
1692
1693 // check connection, in case connect
1694 if(!Connect(3)) return kFALSE;
1695
1696 TString sqlQuery;
1697 sqlQuery = Form("select * from %s %s order by run", fConfig->GetShuttlelbTable(), whereClause);
1698
1699 TSQLResult* aResult = fServer[3]->Query(sqlQuery);
1700 if (!aResult) {
1701 AliError(Form("Can't execute query <%s>!", sqlQuery.Data()));
1702 return kFALSE;
1703 }
1704
1705 AliDebug(2,Form("Query = %s", sqlQuery.Data()));
1706
1707 if(aResult->GetRowCount() == 0) {
1708 AliInfo("No entries in Shuttle Logbook match request");
1709 delete aResult;
1710 return kTRUE;
1711 }
1712
1713 // TODO Check field count!
1714 const UInt_t nCols = 23;
1715 if (aResult->GetFieldCount() != (Int_t) nCols) {
1716 AliError("Invalid SQL result field number!");
1717 delete aResult;
1718 return kFALSE;
1719 }
1720
1721 TSQLRow* aRow;
1722 while ((aRow = aResult->Next())) {
1723 TString runString(aRow->GetField(0), aRow->GetFieldLength(0));
1724 Int_t run = runString.Atoi();
1725
1726 AliShuttleLogbookEntry *entry = QueryRunParameters(run);
1727 if (!entry)
1728 continue;
1729
1730 // loop on detectors
1731 for(UInt_t ii = 0; ii < nCols; ii++)
1732 entry->SetDetectorStatus(aResult->GetFieldName(ii), aRow->GetField(ii));
1733
1734 entries.AddLast(entry);
1735 delete aRow;
1736 }
1737
1738 delete aResult;
1739 return kTRUE;
1740}
1741
1742//______________________________________________________________________________________________
1743AliShuttleLogbookEntry* AliShuttle::QueryRunParameters(Int_t run)
1744{
1745 //
1746 // Retrieve run parameters written in the DAQ logbook and sets them into AliShuttleLogbookEntry object
1747 //
1748
1749 // check connection, in case connect
1750 if (!Connect(3))
1751 return 0;
1752
1753 TString sqlQuery;
1754 sqlQuery.Form("select * from %s where run=%d", fConfig->GetDAQlbTable(), run);
1755
1756 TSQLResult* aResult = fServer[3]->Query(sqlQuery);
1757 if (!aResult) {
1758 AliError(Form("Can't execute query <%s>!", sqlQuery.Data()));
1759 return 0;
1760 }
1761
1762 if (aResult->GetRowCount() == 0) {
1763 Log("SHUTTLE", Form("QueryRunParameters - No entry in DAQ Logbook for run %d. Skipping", run));
1764 delete aResult;
1765 return 0;
1766 }
1767
1768 if (aResult->GetRowCount() > 1) {
1769 AliError(Form("More than one entry in DAQ Logbook for run %d. Skipping", run));
1770 delete aResult;
1771 return 0;
1772 }
1773
1774 TSQLRow* aRow = aResult->Next();
1775 if (!aRow)
1776 {
1777 AliError(Form("Could not retrieve row for run %d. Skipping", run));
1778 delete aResult;
1779 return 0;
1780 }
1781
1782 AliShuttleLogbookEntry* entry = new AliShuttleLogbookEntry(run);
1783
1784 for (Int_t ii = 0; ii < aResult->GetFieldCount(); ii++)
1785 entry->SetRunParameter(aResult->GetFieldName(ii), aRow->GetField(ii));
1786
1787 UInt_t startTime = entry->GetStartTime();
1788 UInt_t endTime = entry->GetEndTime();
1789
1790 if (!startTime || !endTime || startTime > endTime) {
1791 Log("SHUTTLE",
1792 Form("QueryRunParameters - Invalid parameters for Run %d: startTime = %d, endTime = %d",
1793 run, startTime, endTime));
1794 delete entry;
1795 delete aRow;
1796 delete aResult;
1797 return 0;
1798 }
1799
1800 delete aRow;
1801 delete aResult;
1802
1803 return entry;
1804}
1805
1806//______________________________________________________________________________________________
1807TMap* AliShuttle::GetValueSet(const char* host, Int_t port, const TSeqCollection* entries,
1808 DCSType type, Int_t multiSplit)
1809{
1810 // Retrieve all "entry" data points from the DCS server
1811 // host, port: TSocket connection parameters
1812 // entries: list of name of the alias or data point
1813 // type: kAlias or kDP
1814 // returns TMap of values, 0 when failure
1815
1816 AliDCSClient client(host, port, fTimeout, fRetries, multiSplit);
1817
1818 TMap* result = 0;
1819 if (type == kAlias)
1820 {
1821 result = client.GetAliasValues(entries, GetCurrentStartTime(),
1822 GetCurrentEndTime());
1823 }
1824 else if (type == kDP)
1825 {
1826 result = client.GetDPValues(entries, GetCurrentStartTime(),
1827 GetCurrentEndTime());
1828 }
1829
1830 if (result == 0)
1831 {
1832 Log(fCurrentDetector.Data(), Form("GetValueSet - Can't get entries! Reason: %s",
1833 client.GetErrorString(client.GetResultErrorCode())));
1834 if (client.GetResultErrorCode() == AliDCSClient::fgkServerError)
1835 Log(fCurrentDetector.Data(), Form("GetValueSet - Server error code: %s",
1836 client.GetServerError().Data()));
1837
1838 return 0;
1839 }
1840
1841 return result;
1842}
1843
1844//______________________________________________________________________________________________
1845const char* AliShuttle::GetFile(Int_t system, const char* detector,
1846 const char* id, const char* source)
1847{
1848 // Get calibration file from file exchange servers
1849 // First queris the FXS database for the file name, using the run, detector, id and source info
1850 // then calls RetrieveFile(filename) for actual copy to local disk
1851 // run: current run being processed (given by Logbook entry fLogbookEntry)
1852 // detector: the Preprocessor name
1853 // id: provided as a parameter by the Preprocessor
1854 // source: provided by the Preprocessor through GetFileSources function
1855
1856 // check if test mode should simulate a FXS error
1857 if (fTestMode & kErrorFXSFiles)
1858 {
1859 Log(detector, Form("GetFile - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
1860 return 0;
1861 }
1862
1863 // check connection, in case connect
1864 if (!Connect(system))
1865 {
1866 Log(detector, Form("GetFile - Couldn't connect to %s FXS database", GetSystemName(system)));
1867 return 0;
1868 }
1869
1870 // Query preparation
1871 TString sourceName(source);
1872 Int_t nFields = 3;
1873 TString sqlQueryStart = Form("select filePath,size,fileChecksum from %s where",
1874 fConfig->GetFXSdbTable(system));
1875 TString whereClause = Form("run=%d and detector=\"%s\" and fileId=\"%s\"",
1876 GetCurrentRun(), detector, id);
1877
1878 if (system == kDAQ)
1879 {
1880 whereClause += Form(" and DAQsource=\"%s\"", source);
1881 }
1882 else if (system == kDCS)
1883 {
1884 sourceName="none";
1885 }
1886 else if (system == kHLT)
1887 {
1888 whereClause += Form(" and DDLnumbers=\"%s\"", source);
1889 nFields = 3;
1890 }
1891
1892 TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
1893
1894 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
1895
1896 // Query execution
1897 TSQLResult* aResult = 0;
1898 aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
1899 if (!aResult) {
1900 Log(detector, Form("GetFileName - Can't execute SQL query to %s database for: id = %s, source = %s",
1901 GetSystemName(system), id, sourceName.Data()));
1902 return 0;
1903 }
1904
1905 if(aResult->GetRowCount() == 0)
1906 {
1907 Log(detector,
1908 Form("GetFileName - No entry in %s FXS db for: id = %s, source = %s",
1909 GetSystemName(system), id, sourceName.Data()));
1910 delete aResult;
1911 return 0;
1912 }
1913
1914 if (aResult->GetRowCount() > 1) {
1915 Log(detector,
1916 Form("GetFileName - More than one entry in %s FXS db for: id = %s, source = %s",
1917 GetSystemName(system), id, sourceName.Data()));
1918 delete aResult;
1919 return 0;
1920 }
1921
1922 if (aResult->GetFieldCount() != nFields) {
1923 Log(detector,
1924 Form("GetFileName - Wrong field count in %s FXS db for: id = %s, source = %s",
1925 GetSystemName(system), id, sourceName.Data()));
1926 delete aResult;
1927 return 0;
1928 }
1929
1930 TSQLRow* aRow = dynamic_cast<TSQLRow*> (aResult->Next());
1931
1932 if (!aRow){
1933 Log(detector, Form("GetFileName - Empty set result in %s FXS db from query: id = %s, source = %s",
1934 GetSystemName(system), id, sourceName.Data()));
1935 delete aResult;
1936 return 0;
1937 }
1938
1939 TString filePath(aRow->GetField(0), aRow->GetFieldLength(0));
1940 TString fileSize(aRow->GetField(1), aRow->GetFieldLength(1));
1941 TString fileChecksum(aRow->GetField(2), aRow->GetFieldLength(2));
1942
1943 delete aResult;
1944 delete aRow;
1945
1946 AliDebug(2, Form("filePath = %s; size = %s, fileChecksum = %s",
1947 filePath.Data(), fileSize.Data(), fileChecksum.Data()));
1948
1949 // retrieved file is renamed to make it unique
1950 TString localFileName = Form("%s_%s_%d_%s_%s.shuttle",
1951 GetSystemName(system), detector, GetCurrentRun(), id, sourceName.Data());
1952
1953
1954 // file retrieval from FXS
1955 UInt_t nRetries = 0;
1956 UInt_t maxRetries = 3;
1957 Bool_t result = kFALSE;
1958
1959 // copy!! if successful TSystem::Exec returns 0
1960 while(nRetries++ < maxRetries) {
1961 AliDebug(2, Form("Trying to copy file. Retry # %d", nRetries));
1962 result = RetrieveFile(system, filePath.Data(), localFileName.Data());
1963 if(!result)
1964 {
1965 Log(detector, Form("GetFileName - Copy of file %s from %s FXS failed",
1966 filePath.Data(), GetSystemName(system)));
1967 continue;
1968 }
1969
1970 if (fileChecksum.Length()>0)
1971 {
1972 // compare md5sum of local file with the one stored in the FXS DB
1973 Int_t md5Comp = gSystem->Exec(Form("md5sum %s/%s |grep %s 2>&1 > /dev/null",
1974 GetShuttleTempDir(), localFileName.Data(), fileChecksum.Data()));
1975
1976 if (md5Comp != 0)
1977 {
1978 Log(detector, Form("GetFileName - md5sum of file %s does not match with local copy!",
1979 filePath.Data()));
1980 result = kFALSE;
1981 continue;
1982 }
1983 } else {
1984 Log(fCurrentDetector, Form("GetFile - md5sum of file %s not set in %s database, skipping comparison",
1985 filePath.Data(), GetSystemName(system)));
1986 }
1987 if (result) break;
1988 }
1989
1990 if(!result) return 0;
1991
1992 fFXSCalled[system]=kTRUE;
1993 TObjString *fileParams = new TObjString(Form("%s#!?!#%s", id, sourceName.Data()));
1994 fFXSlist[system].Add(fileParams);
1995
1996 static TString fullLocalFileName;
1997 fullLocalFileName.Form("%s/%s", GetShuttleTempDir(), localFileName.Data());
1998
1999 Log(fCurrentDetector, Form("GetFile - Retrieved file with id %s and source %s from %s to %s", id, source, GetSystemName(system), fullLocalFileName.Data()));
2000
2001 return fullLocalFileName.Data();
2002}
2003
2004//______________________________________________________________________________________________
2005Bool_t AliShuttle::RetrieveFile(UInt_t system, const char* fxsFileName, const char* localFileName)
2006{
2007 //
2008 // Copies file from FXS to local Shuttle machine
2009 //
2010
2011 // check temp directory: trying to cd to temp; if it does not exist, create it
2012 AliDebug(2, Form("Copy file %s from %s FXS into %s/%s",
2013 GetSystemName(system), fxsFileName, GetShuttleTempDir(), localFileName));
2014
2015 void* dir = gSystem->OpenDirectory(GetShuttleTempDir());
2016 if (dir == NULL) {
2017 if (gSystem->mkdir(GetShuttleTempDir(), kTRUE)) {
2018 AliError(Form("Can't open directory <%s>", GetShuttleTempDir()));
2019 return kFALSE;
2020 }
2021
2022 } else {
2023 gSystem->FreeDirectory(dir);
2024 }
2025
2026 TString baseFXSFolder;
2027 if (system == kDAQ)
2028 {
2029 baseFXSFolder = "FES/";
2030 }
2031 else if (system == kDCS)
2032 {
2033 baseFXSFolder = "";
2034 }
2035 else if (system == kHLT)
2036 {
2037 baseFXSFolder = "/opt/FXS/";
2038 }
2039
2040
2041 TString command = Form("scp -oPort=%d -2 %s@%s:%s%s %s/%s",
2042 fConfig->GetFXSPort(system),
2043 fConfig->GetFXSUser(system),
2044 fConfig->GetFXSHost(system),
2045 baseFXSFolder.Data(),
2046 fxsFileName,
2047 GetShuttleTempDir(),
2048 localFileName);
2049
2050 AliDebug(2, Form("%s",command.Data()));
2051
2052 Bool_t result = (gSystem->Exec(command.Data()) == 0);
2053
2054 return result;
2055}
2056
2057//______________________________________________________________________________________________
2058TList* AliShuttle::GetFileSources(Int_t system, const char* detector, const char* id)
2059{
2060 //
2061 // Get sources producing the condition file Id from file exchange servers
2062 // if id is NULL all sources are returned (distinct)
2063 //
2064
2065 Log(detector, Form("GetFileSources - Retrieving sources with id %s from %s", id, GetSystemName(system)));
2066
2067 // check if test mode should simulate a FXS error
2068 if (fTestMode & kErrorFXSSources)
2069 {
2070 Log(detector, Form("GetFileSources - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
2071 return 0;
2072 }
2073
2074 if (system == kDCS)
2075 {
2076 AliWarning("DCS system has only one source of data!");
2077 TList *list = new TList();
2078 list->SetOwner(1);
2079 list->Add(new TObjString(" "));
2080 return list;
2081 }
2082
2083 // check connection, in case connect
2084 if (!Connect(system))
2085 {
2086 Log(detector, Form("GetFileSources - Couldn't connect to %s FXS database", GetSystemName(system)));
2087 return NULL;
2088 }
2089
2090 TString sourceName = 0;
2091 if (system == kDAQ)
2092 {
2093 sourceName = "DAQsource";
2094 } else if (system == kHLT)
2095 {
2096 sourceName = "DDLnumbers";
2097 }
2098
2099 TString sqlQueryStart = Form("select distinct %s from %s where", sourceName.Data(), fConfig->GetFXSdbTable(system));
2100 TString whereClause = Form("run=%d and detector=\"%s\"",
2101 GetCurrentRun(), detector);
2102 if (id)
2103 whereClause += Form(" and fileId=\"%s\"", id);
2104 TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
2105
2106 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2107
2108 // Query execution
2109 TSQLResult* aResult;
2110 aResult = fServer[system]->Query(sqlQuery);
2111 if (!aResult) {
2112 Log(detector, Form("GetFileSources - Can't execute SQL query to %s database for id: %s",
2113 GetSystemName(system), id));
2114 return 0;
2115 }
2116
2117 TList *list = new TList();
2118 list->SetOwner(1);
2119
2120 if (aResult->GetRowCount() == 0)
2121 {
2122 Log(detector,
2123 Form("GetFileSources - No entry in %s FXS table for id: %s", GetSystemName(system), id));
2124 delete aResult;
2125 return list;
2126 }
2127
2128 Log(detector, Form("GetFileSources - Found %d sources", aResult->GetRowCount()));
2129
2130 TSQLRow* aRow;
2131 while ((aRow = aResult->Next()))
2132 {
2133
2134 TString source(aRow->GetField(0), aRow->GetFieldLength(0));
2135 AliDebug(2, Form("%s = %s", sourceName.Data(), source.Data()));
2136 list->Add(new TObjString(source));
2137 delete aRow;
2138 }
2139
2140 delete aResult;
2141
2142 return list;
2143}
2144
2145//______________________________________________________________________________________________
2146TList* AliShuttle::GetFileIDs(Int_t system, const char* detector, const char* source)
2147{
2148 //
2149 // Get all ids of condition files produced by a given source from file exchange servers
2150 //
2151
2152 Log(detector, Form("GetFileIDs - Retrieving ids with source %s with %s", source, GetSystemName(system)));
2153
2154 // check if test mode should simulate a FXS error
2155 if (fTestMode & kErrorFXSSources)
2156 {
2157 Log(detector, Form("GetFileIDs - In TESTMODE - Simulating error while connecting to %s FXS", GetSystemName(system)));
2158 return 0;
2159 }
2160
2161 // check connection, in case connect
2162 if (!Connect(system))
2163 {
2164 Log(detector, Form("GetFileIDs - Couldn't connect to %s FXS database", GetSystemName(system)));
2165 return NULL;
2166 }
2167
2168 TString sourceName = 0;
2169 if (system == kDAQ)
2170 {
2171 sourceName = "DAQsource";
2172 } else if (system == kHLT)
2173 {
2174 sourceName = "DDLnumbers";
2175 }
2176
2177 TString sqlQueryStart = Form("select fileId from %s where", fConfig->GetFXSdbTable(system));
2178 TString whereClause = Form("run=%d and detector=\"%s\"",
2179 GetCurrentRun(), detector);
2180 if (sourceName.Length() > 0 && source)
2181 whereClause += Form(" and %s=\"%s\"", sourceName.Data(), source);
2182 TString sqlQuery = Form("%s %s", sqlQueryStart.Data(), whereClause.Data());
2183
2184 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2185
2186 // Query execution
2187 TSQLResult* aResult;
2188 aResult = fServer[system]->Query(sqlQuery);
2189 if (!aResult) {
2190 Log(detector, Form("GetFileIDs - Can't execute SQL query to %s database for source: %s",
2191 GetSystemName(system), source));
2192 return 0;
2193 }
2194
2195 TList *list = new TList();
2196 list->SetOwner(1);
2197
2198 if (aResult->GetRowCount() == 0)
2199 {
2200 Log(detector,
2201 Form("GetFileIDs - No entry in %s FXS table for source: %s", GetSystemName(system), source));
2202 delete aResult;
2203 return list;
2204 }
2205
2206 Log(detector, Form("GetFileIDs - Found %d ids", aResult->GetRowCount()));
2207
2208 TSQLRow* aRow;
2209
2210 while ((aRow = aResult->Next()))
2211 {
2212
2213 TString id(aRow->GetField(0), aRow->GetFieldLength(0));
2214 AliDebug(2, Form("fileId = %s", id.Data()));
2215 list->Add(new TObjString(id));
2216 delete aRow;
2217 }
2218
2219 delete aResult;
2220
2221 return list;
2222}
2223
2224//______________________________________________________________________________________________
2225Bool_t AliShuttle::Connect(Int_t system)
2226{
2227 // Connect to MySQL Server of the system's FXS MySQL databases
2228 // DAQ Logbook, Shuttle Logbook and DAQ FXS db are on the same host
2229 //
2230
2231 // check connection: if already connected return
2232 if(fServer[system] && fServer[system]->IsConnected()) return kTRUE;
2233
2234 TString dbHost, dbUser, dbPass, dbName;
2235
2236 if (system < 3) // FXS db servers
2237 {
2238 dbHost = Form("mysql://%s:%d", fConfig->GetFXSdbHost(system), fConfig->GetFXSdbPort(system));
2239 dbUser = fConfig->GetFXSdbUser(system);
2240 dbPass = fConfig->GetFXSdbPass(system);
2241 dbName = fConfig->GetFXSdbName(system);
2242 } else { // Run & Shuttle logbook servers
2243 // TODO Will the Shuttle logbook server be the same as the Run logbook server ???
2244 dbHost = Form("mysql://%s:%d", fConfig->GetDAQlbHost(), fConfig->GetDAQlbPort());
2245 dbUser = fConfig->GetDAQlbUser();
2246 dbPass = fConfig->GetDAQlbPass();
2247 dbName = fConfig->GetDAQlbDB();
2248 }
2249
2250 fServer[system] = TSQLServer::Connect(dbHost.Data(), dbUser.Data(), dbPass.Data());
2251 if (!fServer[system] || !fServer[system]->IsConnected()) {
2252 if(system < 3)
2253 {
2254 AliError(Form("Can't establish connection to FXS database for %s",
2255 AliShuttleInterface::GetSystemName(system)));
2256 } else {
2257 AliError("Can't establish connection to Run logbook.");
2258 }
2259 if(fServer[system]) delete fServer[system];
2260 return kFALSE;
2261 }
2262
2263 // Get tables
2264 TSQLResult* aResult=0;
2265 switch(system){
2266 case kDAQ:
2267 aResult = fServer[kDAQ]->GetTables(dbName.Data());
2268 break;
2269 case kDCS:
2270 aResult = fServer[kDCS]->GetTables(dbName.Data());
2271 break;
2272 case kHLT:
2273 aResult = fServer[kHLT]->GetTables(dbName.Data());
2274 break;
2275 default:
2276 aResult = fServer[3]->GetTables(dbName.Data());
2277 break;
2278 }
2279
2280 delete aResult;
2281 return kTRUE;
2282}
2283
2284//______________________________________________________________________________________________
2285Bool_t AliShuttle::UpdateTable()
2286{
2287 //
2288 // Update FXS table filling time_processed field in all rows corresponding to current run and detector
2289 //
2290
2291 Bool_t result = kTRUE;
2292
2293 for (UInt_t system=0; system<3; system++)
2294 {
2295 if(!fFXSCalled[system]) continue;
2296
2297 // check connection, in case connect
2298 if (!Connect(system))
2299 {
2300 Log(fCurrentDetector, Form("UpdateTable - Couldn't connect to %s FXS database", GetSystemName(system)));
2301 result = kFALSE;
2302 continue;
2303 }
2304
2305 TTimeStamp now; // now
2306
2307 // Loop on FXS list entries
2308 TIter iter(&fFXSlist[system]);
2309 TObjString *aFXSentry=0;
2310 while ((aFXSentry = dynamic_cast<TObjString*> (iter.Next())))
2311 {
2312 TString aFXSentrystr = aFXSentry->String();
2313 TObjArray *aFXSarray = aFXSentrystr.Tokenize("#!?!#");
2314 if (!aFXSarray || aFXSarray->GetEntries() != 2 )
2315 {
2316 Log(fCurrentDetector, Form("UpdateTable - error updating %s FXS entry. Check string: <%s>",
2317 GetSystemName(system), aFXSentrystr.Data()));
2318 if(aFXSarray) delete aFXSarray;
2319 result = kFALSE;
2320 continue;
2321 }
2322 const char* fileId = ((TObjString*) aFXSarray->At(0))->GetName();
2323 const char* source = ((TObjString*) aFXSarray->At(1))->GetName();
2324
2325 TString whereClause;
2326 if (system == kDAQ)
2327 {
2328 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\" and DAQsource=\"%s\";",
2329 GetCurrentRun(), fCurrentDetector.Data(), fileId, source);
2330 }
2331 else if (system == kDCS)
2332 {
2333 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\";",
2334 GetCurrentRun(), fCurrentDetector.Data(), fileId);
2335 }
2336 else if (system == kHLT)
2337 {
2338 whereClause = Form("where run=%d and detector=\"%s\" and fileId=\"%s\" and DDLnumbers=\"%s\";",
2339 GetCurrentRun(), fCurrentDetector.Data(), fileId, source);
2340 }
2341
2342 delete aFXSarray;
2343
2344 TString sqlQuery = Form("update %s set time_processed=%d %s", fConfig->GetFXSdbTable(system),
2345 now.GetSec(), whereClause.Data());
2346
2347 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2348
2349 // Query execution
2350 TSQLResult* aResult;
2351 aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2352 if (!aResult)
2353 {
2354 Log(fCurrentDetector, Form("UpdateTable - %s db: can't execute SQL query <%s>",
2355 GetSystemName(system), sqlQuery.Data()));
2356 result = kFALSE;
2357 continue;
2358 }
2359 delete aResult;
2360 }
2361 }
2362
2363 return result;
2364}
2365
2366//______________________________________________________________________________________________
2367Bool_t AliShuttle::UpdateTableFailCase()
2368{
2369 // Update FXS table filling time_processed field in all rows corresponding to current run and detector
2370 // this is called in case the preprocessor is declared failed for the current run, because
2371 // the fields are updated only in case of success
2372
2373 Bool_t result = kTRUE;
2374
2375 for (UInt_t system=0; system<3; system++)
2376 {
2377 // check connection, in case connect
2378 if (!Connect(system))
2379 {
2380 Log(fCurrentDetector, Form("UpdateTableFailCase - Couldn't connect to %s FXS database",
2381 GetSystemName(system)));
2382 result = kFALSE;
2383 continue;
2384 }
2385
2386 TTimeStamp now; // now
2387
2388 // Loop on FXS list entries
2389
2390 TString whereClause = Form("where run=%d and detector=\"%s\";",
2391 GetCurrentRun(), fCurrentDetector.Data());
2392
2393
2394 TString sqlQuery = Form("update %s set time_processed=%d %s", fConfig->GetFXSdbTable(system),
2395 now.GetSec(), whereClause.Data());
2396
2397 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2398
2399 // Query execution
2400 TSQLResult* aResult;
2401 aResult = dynamic_cast<TSQLResult*> (fServer[system]->Query(sqlQuery));
2402 if (!aResult)
2403 {
2404 Log(fCurrentDetector, Form("UpdateTableFailCase - %s db: can't execute SQL query <%s>",
2405 GetSystemName(system), sqlQuery.Data()));
2406 result = kFALSE;
2407 continue;
2408 }
2409 delete aResult;
2410 }
2411
2412 return result;
2413}
2414
2415//______________________________________________________________________________________________
2416Bool_t AliShuttle::UpdateShuttleLogbook(const char* detector, const char* status)
2417{
2418 //
2419 // Update Shuttle logbook filling detector or shuttle_done column
2420 // ex. of usage: UpdateShuttleLogbook("PHOS", "DONE") or UpdateShuttleLogbook("shuttle_done")
2421 //
2422
2423 // check connection, in case connect
2424 if(!Connect(3)){
2425 Log("SHUTTLE", "UpdateShuttleLogbook - Couldn't connect to DAQ Logbook.");
2426 return kFALSE;
2427 }
2428
2429 TString detName(detector);
2430 TString setClause;
2431 if(detName == "shuttle_done")
2432 {
2433 setClause = "set shuttle_done=1";
2434
2435 // Send the information to ML
2436 TMonaLisaText mlStatus("SHUTTLE_status", "Done");
2437
2438 TList mlList;
2439 mlList.Add(&mlStatus);
2440
2441 fMonaLisa->SendParameters(&mlList);
2442 } else {
2443 TString statusStr(status);
2444 if(statusStr.Contains("done", TString::kIgnoreCase) ||
2445 statusStr.Contains("failed", TString::kIgnoreCase)){
2446 setClause = Form("set %s=\"%s\"", detector, status);
2447 } else {
2448 Log("SHUTTLE",
2449 Form("UpdateShuttleLogbook - Invalid status <%s> for detector %s",
2450 status, detector));
2451 return kFALSE;
2452 }
2453 }
2454
2455 TString whereClause = Form("where run=%d", GetCurrentRun());
2456
2457 TString sqlQuery = Form("update %s %s %s",
2458 fConfig->GetShuttlelbTable(), setClause.Data(), whereClause.Data());
2459
2460 AliDebug(2, Form("SQL query: \n%s",sqlQuery.Data()));
2461
2462 // Query execution
2463 TSQLResult* aResult;
2464 aResult = dynamic_cast<TSQLResult*> (fServer[3]->Query(sqlQuery));
2465 if (!aResult) {
2466 Log("SHUTTLE", Form("UpdateShuttleLogbook - Can't execute query <%s>", sqlQuery.Data()));
2467 return kFALSE;
2468 }
2469 delete aResult;
2470
2471 return kTRUE;
2472}
2473
2474//______________________________________________________________________________________________
2475Int_t AliShuttle::GetCurrentRun() const
2476{
2477 //
2478 // Get current run from logbook entry
2479 //
2480
2481 return fLogbookEntry ? fLogbookEntry->GetRun() : -1;
2482}
2483
2484//______________________________________________________________________________________________
2485UInt_t AliShuttle::GetCurrentStartTime() const
2486{
2487 //
2488 // get current start time
2489 //
2490
2491 return fLogbookEntry ? fLogbookEntry->GetStartTime() : 0;
2492}
2493
2494//______________________________________________________________________________________________
2495UInt_t AliShuttle::GetCurrentEndTime() const
2496{
2497 //
2498 // get current end time from logbook entry
2499 //
2500
2501 return fLogbookEntry ? fLogbookEntry->GetEndTime() : 0;
2502}
2503
2504//______________________________________________________________________________________________
2505void AliShuttle::Log(const char* detector, const char* message)
2506{
2507 //
2508 // Fill log string with a message
2509 //
2510
2511 void* dir = gSystem->OpenDirectory(GetShuttleLogDir());
2512 if (dir == NULL) {
2513 if (gSystem->mkdir(GetShuttleLogDir(), kTRUE)) {
2514 AliError(Form("Can't open directory <%s>", GetShuttleLogDir()));
2515 return;
2516 }
2517
2518 } else {
2519 gSystem->FreeDirectory(dir);
2520 }
2521
2522 TString toLog = Form("%s (%d): %s - ", TTimeStamp(time(0)).AsString("s"), getpid(), detector);
2523 if (GetCurrentRun() >= 0)
2524 toLog += Form("run %d - ", GetCurrentRun());
2525 toLog += Form("%s", message);
2526
2527 AliInfo(toLog.Data());
2528
2529 // if we redirect the log output already to the file, leave here
2530 if (fOutputRedirected && strcmp(detector, "SHUTTLE") != 0)
2531 return;
2532
2533 TString fileName = GetLogFileName(detector);
2534
2535 gSystem->ExpandPathName(fileName);
2536
2537 ofstream logFile;
2538 logFile.open(fileName, ofstream::out | ofstream::app);
2539
2540 if (!logFile.is_open()) {
2541 AliError(Form("Could not open file %s", fileName.Data()));
2542 return;
2543 }
2544
2545 logFile << toLog.Data() << "\n";
2546
2547 logFile.close();
2548}
2549
2550//______________________________________________________________________________________________
2551TString AliShuttle::GetLogFileName(const char* detector) const
2552{
2553 //
2554 // returns the name of the log file for a given sub detector
2555 //
2556
2557 TString fileName;
2558
2559 if (GetCurrentRun() >= 0)
2560 fileName.Form("%s/%s_%d.log", GetShuttleLogDir(), detector, GetCurrentRun());
2561 else
2562 fileName.Form("%s/%s.log", GetShuttleLogDir(), detector);
2563
2564 return fileName;
2565}
2566
2567//______________________________________________________________________________________________
2568Bool_t AliShuttle::Collect(Int_t run)
2569{
2570 //
2571 // Collects conditions data for all UNPROCESSED run written to DAQ LogBook in case of run = -1 (default)
2572 // If a dedicated run is given this run is processed
2573 //
2574 // In operational mode, this is the Shuttle function triggered by the EOR signal.
2575 //
2576
2577 if (run == -1)
2578 Log("SHUTTLE","Collect - Shuttle called. Collecting conditions data for unprocessed runs");
2579 else
2580 Log("SHUTTLE", Form("Collect - Shuttle called. Collecting conditions data for run %d", run));
2581
2582 SetLastAction("Starting");
2583
2584 TString whereClause("where shuttle_done=0");
2585 if (run != -1)
2586 whereClause += Form(" and run=%d", run);
2587
2588 TObjArray shuttleLogbookEntries;
2589 if (!QueryShuttleLogbook(whereClause, shuttleLogbookEntries))
2590 {
2591 Log("SHUTTLE", "Collect - Can't retrieve entries from Shuttle logbook");
2592 return kFALSE;
2593 }
2594
2595 if (shuttleLogbookEntries.GetEntries() == 0)
2596 {
2597 if (run == -1)
2598 Log("SHUTTLE","Collect - Found no UNPROCESSED runs in Shuttle logbook");
2599 else
2600 Log("SHUTTLE", Form("Collect - Run %d is already DONE "
2601 "or it does not exist in Shuttle logbook", run));
2602 return kTRUE;
2603 }
2604
2605 for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
2606 fFirstUnprocessed[iDet] = kTRUE;
2607
2608 if (run != -1)
2609 {
2610 // query Shuttle logbook for earlier runs, check if some detectors are unprocessed,
2611 // flag them into fFirstUnprocessed array
2612 TString whereClause(Form("where shuttle_done=0 and run < %d", run));
2613 TObjArray tmpLogbookEntries;
2614 if (!QueryShuttleLogbook(whereClause, tmpLogbookEntries))
2615 {
2616 Log("SHUTTLE", "Collect - Can't retrieve entries from Shuttle logbook");
2617 return kFALSE;
2618 }
2619
2620 TIter iter(&tmpLogbookEntries);
2621 AliShuttleLogbookEntry* anEntry = 0;
2622 while ((anEntry = dynamic_cast<AliShuttleLogbookEntry*> (iter.Next())))
2623 {
2624 for (UInt_t iDet=0; iDet<NDetectors(); iDet++)
2625 {
2626 if (anEntry->GetDetectorStatus(iDet) == AliShuttleLogbookEntry::kUnprocessed)
2627 {
2628 AliDebug(2, Form("Run %d: setting %s as \"not first time unprocessed\"",
2629 anEntry->GetRun(), GetDetName(iDet)));
2630 fFirstUnprocessed[iDet] = kFALSE;
2631 }
2632 }
2633
2634 }
2635
2636 }
2637
2638 if (!RetrieveConditionsData(shuttleLogbookEntries))
2639 {
2640 Log("SHUTTLE", "Collect - Process of at least one run failed");
2641 return kFALSE;
2642 }
2643
2644 Log("SHUTTLE", "Collect - Requested run(s) successfully processed");
2645 return kTRUE;
2646}
2647
2648//______________________________________________________________________________________________
2649Bool_t AliShuttle::RetrieveConditionsData(const TObjArray& dateEntries)
2650{
2651 //
2652 // Retrieve conditions data for all runs that aren't processed yet
2653 //
2654
2655 Bool_t hasError = kFALSE;
2656
2657 TIter iter(&dateEntries);
2658 AliShuttleLogbookEntry* anEntry;
2659
2660 while ((anEntry = (AliShuttleLogbookEntry*) iter.Next())){
2661 if (!Process(anEntry)){
2662 hasError = kTRUE;
2663 }
2664
2665 // clean SHUTTLE temp directory
2666 TString filename = Form("%s/*.shuttle", GetShuttleTempDir());
2667 RemoveFile(filename.Data());
2668 }
2669
2670 return hasError == kFALSE;
2671}
2672
2673//______________________________________________________________________________________________
2674ULong_t AliShuttle::GetTimeOfLastAction() const
2675{
2676 //
2677 // Gets time of last action
2678 //
2679
2680 ULong_t tmp;
2681
2682 fMonitoringMutex->Lock();
2683
2684 tmp = fLastActionTime;
2685
2686 fMonitoringMutex->UnLock();
2687
2688 return tmp;
2689}
2690
2691//______________________________________________________________________________________________
2692const TString AliShuttle::GetLastAction() const
2693{
2694 //
2695 // returns a string description of the last action
2696 //
2697
2698 TString tmp;
2699
2700 fMonitoringMutex->Lock();
2701
2702 tmp = fLastAction;
2703
2704 fMonitoringMutex->UnLock();
2705
2706 return tmp;
2707}
2708
2709//______________________________________________________________________________________________
2710void AliShuttle::SetLastAction(const char* action)
2711{
2712 //
2713 // updates the monitoring variables
2714 //
2715
2716 fMonitoringMutex->Lock();
2717
2718 fLastAction = action;
2719 fLastActionTime = time(0);
2720
2721 fMonitoringMutex->UnLock();
2722}
2723
2724//______________________________________________________________________________________________
2725const char* AliShuttle::GetRunParameter(const char* param)
2726{
2727 //
2728 // returns run parameter read from DAQ logbook
2729 //
2730
2731 if(!fLogbookEntry) {
2732 AliError("No logbook entry!");
2733 return 0;
2734 }
2735
2736 return fLogbookEntry->GetRunParameter(param);
2737}
2738
2739//______________________________________________________________________________________________
2740AliCDBEntry* AliShuttle::GetFromOCDB(const char* detector, const AliCDBPath& path)
2741{
2742 //
2743 // returns object from OCDB valid for current run
2744 //
2745
2746 if (fTestMode & kErrorOCDB)
2747 {
2748 Log(detector, "GetFromOCDB - In TESTMODE - Simulating error with OCDB");
2749 return 0;
2750 }
2751
2752 AliCDBStorage *sto = AliCDBManager::Instance()->GetStorage(fgkMainCDB);
2753 if (!sto)
2754 {
2755 Log(detector, "GetFromOCDB - Cannot activate main OCDB for query!");
2756 return 0;
2757 }
2758
2759 return dynamic_cast<AliCDBEntry*> (sto->Get(path, GetCurrentRun()));
2760}
2761
2762//______________________________________________________________________________________________
2763Bool_t AliShuttle::SendMail()
2764{
2765 //
2766 // sends a mail to the subdetector expert in case of preprocessor error
2767 //
2768
2769 if (fTestMode != kNone)
2770 return kTRUE;
2771
2772 void* dir = gSystem->OpenDirectory(GetShuttleLogDir());
2773 if (dir == NULL)
2774 {
2775 if (gSystem->mkdir(GetShuttleLogDir(), kTRUE))
2776 {
2777 AliError(Form("Can't open directory <%s>", GetShuttleLogDir()));
2778 return kFALSE;
2779 }
2780
2781 } else {
2782 gSystem->FreeDirectory(dir);
2783 }
2784
2785 TString bodyFileName;
2786 bodyFileName.Form("%s/mail.body", GetShuttleLogDir());
2787 gSystem->ExpandPathName(bodyFileName);
2788
2789 ofstream mailBody;
2790 mailBody.open(bodyFileName, ofstream::out);
2791
2792 if (!mailBody.is_open())
2793 {
2794 AliError(Form("Could not open mail body file %s", bodyFileName.Data()));
2795 return kFALSE;
2796 }
2797
2798 TString to="";
2799 TIter iterExperts(fConfig->GetResponsibles(fCurrentDetector));
2800 TObjString *anExpert=0;
2801 while ((anExpert = (TObjString*) iterExperts.Next()))
2802 {
2803 to += Form("%s,", anExpert->GetName());
2804 }
2805 to.Remove(to.Length()-1);
2806 AliDebug(2, Form("to: %s",to.Data()));
2807
2808 if (to.IsNull()) {
2809 AliInfo("List of detector responsibles not yet set!");
2810 return kFALSE;
2811 }
2812
2813 TString cc="alberto.colla@cern.ch";
2814
2815 TString subject = Form("%s Shuttle preprocessor FAILED in run %d !",
2816 fCurrentDetector.Data(), GetCurrentRun());
2817 AliDebug(2, Form("subject: %s", subject.Data()));
2818
2819 TString body = Form("Dear %s expert(s), \n\n", fCurrentDetector.Data());
2820 body += Form("SHUTTLE just detected that your preprocessor "
2821 "failed processing run %d!!\n\n", GetCurrentRun());
2822 body += Form("Please check %s status on the SHUTTLE monitoring page: \n\n", fCurrentDetector.Data());
2823 body += Form("\thttp://pcalimonitor.cern.ch:8889/shuttle.jsp?time=168 \n\n");
2824 body += Form("Find the %s log for the current run on \n\n"
2825 "\thttp://pcalishuttle01.cern.ch:8880/logs/%s_%d.log \n\n",
2826 fCurrentDetector.Data(), fCurrentDetector.Data(), GetCurrentRun());
2827 body += Form("The last 10 lines of %s log file are following:\n\n");
2828
2829 AliDebug(2, Form("Body begin: %s", body.Data()));
2830
2831 mailBody << body.Data();
2832 mailBody.close();
2833 mailBody.open(bodyFileName, ofstream::out | ofstream::app);
2834
2835 TString logFileName = Form("%s/%s_%d.log", GetShuttleLogDir(), fCurrentDetector.Data(), GetCurrentRun());
2836 TString tailCommand = Form("tail -n 10 %s >> %s", logFileName.Data(), bodyFileName.Data());
2837 if (gSystem->Exec(tailCommand.Data()))
2838 {
2839 mailBody << Form("%s log file not found ...\n\n", fCurrentDetector.Data());
2840 }
2841
2842 TString endBody = Form("------------------------------------------------------\n\n");
2843 endBody += Form("In case of problems please contact the SHUTTLE core team.\n\n");
2844 endBody += "Please do not answer this message directly, it is automatically generated.\n\n";
2845 endBody += "Greetings,\n\n \t\t\tthe SHUTTLE\n";
2846
2847 AliDebug(2, Form("Body end: %s", endBody.Data()));
2848
2849 mailBody << endBody.Data();
2850
2851 mailBody.close();
2852
2853 // send mail!
2854 TString mailCommand = Form("mail -s \"%s\" -c %s %s < %s",
2855 subject.Data(),
2856 cc.Data(),
2857 to.Data(),
2858 bodyFileName.Data());
2859 AliDebug(2, Form("mail command: %s", mailCommand.Data()));
2860
2861 Bool_t result = gSystem->Exec(mailCommand.Data());
2862
2863 return result == 0;
2864}
2865
2866//______________________________________________________________________________________________
2867const char* AliShuttle::GetRunType()
2868{
2869 //
2870 // returns run type read from "run type" logbook
2871 //
2872
2873 if(!fLogbookEntry) {
2874 AliError("No logbook entry!");
2875 return 0;
2876 }
2877
2878 return fLogbookEntry->GetRunType();
2879}
2880
2881//______________________________________________________________________________________________
2882Bool_t AliShuttle::GetHLTStatus()
2883{
2884 // Return HLT status (ON=1 OFF=0)
2885 // Converts the HLT status from the status string read in the run logbook (not just a bool)
2886
2887 if(!fLogbookEntry) {
2888 AliError("No logbook entry!");
2889 return 0;
2890 }
2891
2892 // TODO implement when HLTStatus is inserted in run logbook
2893 //TString hltStatus = fLogbookEntry->GetRunParameter("HLTStatus");
2894 //if(hltStatus == "OFF") {return kFALSE};
2895
2896 return kTRUE;
2897}
2898
2899//______________________________________________________________________________________________
2900void AliShuttle::SetShuttleTempDir(const char* tmpDir)
2901{
2902 //
2903 // sets Shuttle temp directory
2904 //
2905
2906 fgkShuttleTempDir = gSystem->ExpandPathName(tmpDir);
2907}
2908
2909//______________________________________________________________________________________________
2910void AliShuttle::SetShuttleLogDir(const char* logDir)
2911{
2912 //
2913 // sets Shuttle log directory
2914 //
2915
2916 fgkShuttleLogDir = gSystem->ExpandPathName(logDir);
2917}