Discussion:
Need help running Condor with Globus Toolkit
(too old to reply)
itpals
2005-09-02 15:24:32 UTC
Permalink
I am having difficulty running Condor with the Globus toolkit. Need
advise URGENT please!!!!

Here're the sequence of steps I took

Step 1: Start database -

/etc/init.d/postgresql start

Step 2:

globusroot>globus-start-container

Step 3:

condor>condor_master

Step 4:

condor>grid-proxy-init
condor>globus-personal-gatekeeper -start
condor> condor_submit /usr/local/condor/testjobs/globusjob.submit

Step 5: condor_q
condor_q -globus


The response I get to "condor_q" is

-- Submitter: pc-p31972.somedomain.com : <192.168.2.140:33105> :
pc-p31972.somedomain.com
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1.0 condor 9/2 16:11 0+00:00:00 I 0 0.0 date

However, I'm not sure what to do next. If I run the command "condor_q
globu" (or any similar command of the form "condor_q
globusANYCHARACTERS (where ANYCHARACTERS are any random characters)" I
get a response of the form

-- Submitter: pc-p31972.somedomain.com : <192.168.2.140:33105> :
pc-p31972.somedomain.com
ID OWNER STATUS MANAGER HOST EXECUTABLE
1.0 condor UNSUBMITTED fork pc-p31972.somedomain.co
/bin/date


**********************************************************
Kindly advise how to SUBMIT the above jobs over Globus
**********************************************************




By the way, the log file shows the following -
9/2 20:19:19 [6685] Resources down for more than 900 secs -- killing
GAHP
9/2 20:19:19 [6685] GAHP command 'RESULTS' failed
9/2 20:19:19 [6685] ERROR "Gahp Server (pid=6686) died due to signal 9
" at line 359 in file gahp-client.C
9/2 20:19:19 [6843] Resources down for 658 seconds!
9/2 20:19:35 [7274] Resources down for 238 seconds!
9/2 20:20:19 [6843] Resources down for 718 seconds!
9/2 20:20:34 ******************************************************
9/2 20:20:34 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
9/2 20:20:34 ** /usr/local/condor/sbin/condor_gridmanager
9/2 20:20:34 ** $CondorVersion: 6.6.10 Jun 13 2005 $
9/2 20:20:34 ** $CondorPlatform: I386-LINUX_RH80 $
9/2 20:20:34 ** PID = 7633
9/2 20:20:34 ******************************************************
9/2 20:20:34 Using config file: /home/condor/condor_config
9/2 20:20:34 Using local config files:
/usr/local/condor/var/condor_config.local
9/2 20:20:34 DaemonCore: Command Socket at <192.168.2.140:38494>
9/2 20:20:34 [7633] GAHP server pid = 7634
9/2 20:20:35 [7274] Resources down for 298 seconds!
9/2 20:20:37 [7633] DaemonCore: Command received via UDP from host
<192.168.2.140:32795>
9/2 20:20:37 [7633] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
9/2 20:20:37 [7274] resource pc-p31972.somedomain.com:2119 is still
down
9/2 20:20:37 [7633] Found job 8.0 --- inserting
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:20:37 [7633] (8.0) proxy not cached yet, waiting...
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:20:37 [7633] resource pc-p31972.somedomain.com:2119 is now down
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:21:19 [6843] Resources down for 778 seconds!
9/2 20:21:35 [7274] Resources down for 358 seconds!
9/2 20:21:35 [7633] Resources down for 58 seconds!
9/2 20:22:19 [6843] Resources down for 838 seconds!
9/2 20:22:35 [7274] Resources down for 418 seconds!
9/2 20:22:35 [7633] Resources down for 118 seconds!

<stuff deleted>

9/2 20:35:34 Using config file: /home/condor/condor_config
9/2 20:35:34 Using local config files:
/usr/local/condor/var/condor_config.local
9/2 20:35:34 DaemonCore: Command Socket at <192.168.2.140:38875>
9/2 20:35:34 [7916] GAHP server pid = 7917
9/2 20:35:35 [7633] Resources down for 898 seconds!
9/2 20:35:37 [7916] DaemonCore: Command received via UDP from host
<192.168.2.14 0:32797>
9/2 20:35:37 [7916] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
9/2 20:35:37 [7633] resource pc-p31972.somedomain.com:2119 is still
down
9/2 20:35:37 [7916] Found job 6.0 --- inserting
9/2 20:35:37 [7916] Found job 7.0 --- inserting
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (7.0) proxy not cached yet, waiting...
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) proxy not cached yet, waiting...
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] resource pc-p31972.somedomain.com:2119 is now down
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:36:19 [7682] Resources down for 778 seconds!
9/2 20:36:29 [7881] Resources down for 718 seconds!
9/2 20:36:29 [7883] Resources down for 718 seconds!
9/2 20:36:35 [7633] Resources down for more than 900 secs -- killing
GAHP
9/2 20:36:35 [7633] GAHP command 'RESULTS' failed
9/2 20:36:35 [7633] ERROR "Gahp Server (pid=7634) died due to signal 9"
at line 359 in file gahp-client.C
9/2 20:36:35 [7916] Resources down for 58 seconds!
Keith Thompson
2005-09-03 18:48:11 UTC
Permalink
Post by itpals
I am having difficulty running Condor with the Globus toolkit. Need
advise URGENT please!!!!
You might try the condor-users mailing list.

<http://www.cs.wisc.edu/condor/mail-lists/>

"The condor-users list is meant to be a forum for Condor users to
learn from each other, and is not meant to be the official support
channel for Condor. Our support pages have more information about the
offical support channels."
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Loading...