Sun N1 Grid Engine Guides
Tips & Tricks
Get qhost information for machines only of a particular hostgroup:
$ qconf -shgrp_resolved @hostgroup | xargs qhost -h
Get all hostnames of a particular architecture in one line.
qhost -l arch=sol-amd64 | awk '{print $1}' | grep -v HOSTNAME | grep -v '\-\-\-' | xargs
Now that we can get the hostnames in one line, we can quote and comma separate each hostname to be placed in an sql query:
for i in `qhost -l arch=sol-amd64 | awk '{print $1}' | grep -v HOSTNAME | grep -v '\-\-\-'` ; do
echo "'$i',"
; done
Note that this will still place one extra comma after the last hostname so be sure to remove it before placing it in your query. It will also put each hostname on a separate line (trying to figure out a way around this).
Script to determine how many jobs each machine executed in the past day (by hostgroup)
#!/bin/ksh
tmpdir=/tmp
hostgroup=$1
days=$2
: ${hostgroup:="@allhosts"}
: ${days:="1"}
hosts=$tmpdir/mu_hosts$$
jobcount=$tmpdir/mu_jobcount$$
qstat=$tmpdir/mu_qstat$$
state=$tmpdir/mu_state$$
slots=$tmpdir/mu_slots$$
arch=$tmpdir/mu_arch$$
alert=$tmpdir/mu_alert$$
tmpfile=$tmpdir/mu_tmpfile$$
tmpfile2=$tmpdir/mu_tmpfile2$$
# NOTE: literal newline is necessary since \n only works for GNU version of sed
qconf -shgrp_resolved $hostgroup | sed 's/ /\
/g' | sort > $hosts
qacct -d $days -j _* | grep hostname | sort | uniq -c | awk '{print $3, $1}' > $jobcount
qstat -f -q all.q@$hostgroup | grep all.q | cut -f 2- -d '@' | sort -k 1 > $qstat
cat $qstat | awk '{print $1, $5}' > $arch
cat $qstat | awk '{print $1, $6}' > $state
cat $qstat | awk '{print $1, $3}' > $slots
qstat -j | grep all.q | sed 's/.*all.q@//g' | sed 's/" dropped because it is//g' | sort -k 1 > $alert
echo "Hostname Jobs Arch QS Slots Alert"
echo "============================================================="
join -a 1 -o 1.1 2.2 -e 0 $hosts $jobcount > $tmpfile
join -a 1 -o 1.1 1.2 2.2 -e "-NA-" $tmpfile $arch > $tmpfile2
join -a 1 -o 1.1 1.2 1.3 2.2 -e - $tmpfile2 $state > $tmpfile
join -a 1 -o 1.1 1.2 1.3 1.4 2.2 -e "0/0" $tmpfile $slots | sed 's/$/|/g' > $tmpfile2
# GNU sort: sort -gr -k 2, Solaris version:
join -a 1 -1 1 -2 1 $tmpfile2 $alert | sort -k 2nr | awk '{printf "%-20s %-5s %-12s %-3s %-4s %s\n", $1, $2, $3, $4, substr($5,0,length($5)-1), substr($0, index($0, "|")+1, length($0))}'
rm -f $hosts $jobcount $qstat $state $slots $arch $alert $tmpfile $tmpfile2
ARCo Query To View Job Queue Over Time
The following will show you how to use a stored function to graph the amount of jobs that are queued over a period of time (7 days). Below is an image of the resulting graph. It requires two stored functions.
The first function will return the amount of jobs in queue during a given timestamp:
CREATE OR REPLACE FUNCTION jobs_in_queue("time" "timestamp")
RETURNS int8 AS
$BODY$SELECT COUNT(*)
FROM view_job_times
WHERE $1
BETWEEN submission_time
AND start_time$BODY$
LANGUAGE 'sql' VOLATILE;
ALTER FUNCTION jobs_in_queue("time" "timestamp") OWNER TO postgres;
The second function will call the above function over a period of 7 days. You can play around with the intervals to improve performance
CREATE OR REPLACE FUNCTION jobs_in_queue_over_time()
RETURNS SETOF job_queue_length AS
$BODY$DECLARE
jql job_queue_length;
start_time timestamp;
end_time timestamp;
BEGIN
start_time := date_trunc('minute', current_timestamp - interval '7 days');
end_time := current_timestamp;
WHILE start_time < end_time LOOP
jql.time := start_time;
jql.jobs := jobs_in_queue(start_time);
start_time := start_time + interval '1 minute';
RETURN NEXT jql;
END LOOP;
RETURN;
END$BODY$
LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION jobs_in_queue_over_time() OWNER TO postgres;
Finally the ARCo query should be:
SELECT time, jobs FROM jobs_in_queue_over_time();
Additional Resources
--
JesseSuen - 17 Aug 2006