SlideShare uma empresa Scribd logo
1 de 20
CIS 210 February 2013
Sun/Oracle Grid Engine is:
 A quick and easy way to set up a multi-
  cluster system using existing hardware
 Oracle Grid Engine is the most widely
  deployed workload management solution in
  the industry and offers unmatched
  scalability. On top of a rich set of advanced
  scheduling capabilities and the flexibility to
  adapt to any computing environment and
  application workload, Oracle Grid Engine
  offers comprehensive support for the cloud
  computing model.
How to Install
 Via Webappl.blogspot.com
 http://webappl.blogspot.com/2011/05/ins
  tall-sun-grid-engine-sge-on-ubuntu.html
Install SGE on master node:
   Install SGE on master node:
    mpiuser@ub0:~$ sudo apt-get install
    gridengine-client gridengine-common
    gridengine-master gridengine-qmon
    gridengine-exec
    #remove gridengine-exec from the list if
    master node is not supposed to run jobs
    #during the installation, we need to set
    the cluster CELL name (such as
    „default‟)
Install SGE on other nodes:
 Install SGE on other nodes:
 mpiuser@ub1:~$ sudo apt-get install
  gridengine-client gridengine-exec

   The CELL name is set the same as that
    of the master node
Set SGE_ROOT and
SGE_CELL
   Set SGE_ROOT and SGE_CELL
    environment variables:
    $SGE_ROOT refers to the installation path
    of SGE
    $SGE_CELL is cell name which is „default‟
    on our machine
    Edit /etc/profile and /etc/bash.bachrc, add
    the following two lines
    export SGE_ROOT=/var/lib/gridengine
    #this is the path on our machines
    export SGE_CELL=default
    Source the script: source /etc/profile
Configure SGE with qmon
   Configure SGE with qmon (This section is
    modified from a note by Junjun Mao)
   Invoke qmon as superuser:
    mpiuser@ub0:~$ sudo qmon
   #On our machine, qmon failed to start due to
    missing fonts „-adobe-helvetica-…”
   # To solve the fonts problem:
    mpiuser@ub0:~$ sudo apt-get install xfs xfstt
    mpiuser@ub0:~$ sudo apt-get install t1-
    xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-
    nonfree-syriac xfonts-75dpi xfonts-100dpi
    mpiuser@ub0:~$ sudo reboot #after reboot,
    the problem is gone
Configure hosts
 Configure hosts
 "Host Configuration" -> "Administration
  Host" -> Add master node and other
  administrative nodes
  "Host Configuration" -> "Submit Host" ->
  Add master node and other submit
  nodes
  "Host Configuration" -> "Execution Host"
  -> Add slave nodes
  ->Click on "Done" to finish
Configure the user
 Configure the user
 Add or delete users that are allowed to
  access SGE here. In this example, a user
  is added to an existing group and later this
  group will be allowed to submit jobs.
  Everything else is left as default values.
 "User Configuration" -> "Userset" ->
  Highlight userset "arusers" and click on
  "Modify" -> Input user name in
  "User/Group" field
  ->Click "Done" to finish
Configure the queue
   Configure the queue
    While Host Configuration deals what
    computing resources are available and
    User Configuration defines who have
    access to the resources, this Queue
    Control defines ways to connect hosts
    and users.
Queue Control
   "Queue Control" -> "Hosts" -> Confirm the execution
    hosts show up there.
    "Queue Control" -> "Cluster Queues" -> Click on
    "Add" -> Name the queue, add execution nodes to
    Hostlist;
    and
    "Use access" -> allow access to user group arusers;
    "General Configuration" -> Field "Slots" -> Raise the
    number to total CPU cores on slave nodes (ok to use
    a bigger number than actual CPU cores).
    "Queue Control" -> "Queue Instances" -> This is the
    place to manually assign hosts to queues, and
    control the state (active, suspend ...) of hosts.
Configure parallel environment
   Configure parallel environment
    "Queue Control" -> "Cluster Queues" -> Select a queue that will
    run parallel jobs -> Click on "Modify" -> "Parallel Environment" -
    > Click on icon "PE" below the right and left arrows -> Click on
    "Add" -> Name the PE, slots = 999, start_proc_args =
    $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args =
    $SGE_ROOT/mpi/stopmpi.sh, allocation_rule=$fill_up, check
    "Control slaves" to make this variable checked.
    Make sure the configured PE is loaded from "Available PE" to
    "Referenced PE".
    Confirm and close all config windows and open "Queue Control"
    -> "Cluster Queues" -> "Parallel Environment" again, the named
    PE should show up.
    Once created and linked to a queue, PE can be edited from
    "Queue Control" -> "PE" too.
Check whether sge hosts are
running properly
   Check whether sge hosts are running properly
    mpiuser@ub0:~$ qhost #it should list the system info from all
    nodes
    mpiuser@ub0:~$ qconf -sel #it should list the hostnames of
    nodes
    mpiuser@ub0:~$ qconf -sql #it should list the queues
    mpiuser@ub0:~$ ps aux | grep sge_qmaster | grep -v grep
    #check master daemon
    mpiuser@ub0:~$ ps aux | grep sge_execd | grep -v grep
    #check execute daemon
    mpiuser@ub1:~$ ps aux | grep sge_ execd | grep -v grep
    #check execute daemon
    #If sge_qmaster or sge_execd daemon is not running, try
    starting by service
    #mpiuser@ub1:~$ sudo service gridengine-master start
    #mpiuser@ub1:~$ sudo service gridengine-exec start
    …
    #Reboot node(s) if sge_qmaster or sge_execd fails to start
Run a test script
   Run a test script
    Make a script named „test‟ with content:
    #!/bin/bash
    ### Request Bourne shell as shell for job
    #$ -S /bin/bash
    ### Use current directory as working directory
    #$ -CWD
    ### Name the job:
    #$ -N test
    echo “Running environment:”
    env
    echo “=============================”
    ###end of script
Job Submission
   To submit the job: qsub test
    #a job id returned if successful
    Query the job status: qstat
    #If the job is running successfully, there
    will be two output files produced in the
    current working directory with name
    test.oXXX (the standard output) and
    test.eXXX (the standard error), where
    test is the job name and XXX is the job
    id.
Always check your logs
   Check log messages if error occurs
    mpiuser@ub0:~$ less
    /var/spool/gridengine/qmaster/messages
    #master node
    mpiuser@ub0:~$ less
    /var/spool/gridengine/execd/ub0/messag
    es #exec node
Possible Errors
   Question: My output file has a Warning: no
    access to tty (Bad file descriptor).Thus no
    job control in this shell.
    Answer: This warning is caused if you are
    using the tcsh or csh as shell for submitting
    job. It is safe to ignore this warning.
    Alternatively you can qsub -S /bin/bash to
    run your program in different shell or add a
    line of „#$ -S /bin/bash‟ in the job script.
Possible Errors
   Question: Master host failed to respond properly. Error message is “error: commlib
    error: access denied (client IP resolved to host name „ub0…‟. This is not identical to
    clients host name „ub0‟) error: unable to contact qmaster using port 6444 on host „ub0‟”
    Answer: Reboot the master node or install the SGE from source code on master node
    (Solutions not confirmed yet). It also could be due to that the utility of gethostname (full
    path is „/usr/lib/gridengine/gethostname‟ on our machines) returns a different hostname
    to that from running command „hostname -f‟. If this is the case (e.g., host having
    multiple network interfaces), create a file named „host_aliases‟ under
    „$SGE_ROOT/$SGE_CELL/common‟ and populate as follows,
    # cat host_aliases
    ub0 ub0.my.com ub0-grid
    ub1 ub1.my.com ub1-grid
    ub2 ub2.my.com ub2-grid
    ub3 ub3.my.com ub3-grid
    and then restart the gridengine daemon (see man page of sge_host_aliases for
    details). Check the aliases:
    mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0-grid
    mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0
    #both of them should return ub0
Sources
 http://manpages.ubuntu.com/manpages/
  /jaunty/man5/sge_conf.5.html
 http://webappl.blogspot.com/2011/05/ins
  tall-sun-grid-engine-sge-on-ubuntu.html
 http://pka.engr.ccny.cuny.edu/~jmao/nod
  e/49
 http://webappl.blogspot.com/2011/05/set
  ting-up-mpich2-cluster-with-ubuntu.html

Mais conteúdo relacionado

Mais de Dan Morrill

Mais de Dan Morrill (13)

Using Regular Expressions in Grep
Using Regular Expressions in GrepUsing Regular Expressions in Grep
Using Regular Expressions in Grep
 
Understanding the security_organization
Understanding the security_organizationUnderstanding the security_organization
Understanding the security_organization
 
You should ask before copying that media
You should ask before copying that mediaYou should ask before copying that media
You should ask before copying that media
 
Cis 216 – shell scripting
Cis 216 – shell scriptingCis 216 – shell scripting
Cis 216 – shell scripting
 
Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computing
 
Social Media Plan for CityU of Seattle
Social Media Plan for CityU of SeattleSocial Media Plan for CityU of Seattle
Social Media Plan for CityU of Seattle
 
BSIS Overview
BSIS OverviewBSIS Overview
BSIS Overview
 
Case Studies In Social Media Chinese
Case Studies In Social Media ChineseCase Studies In Social Media Chinese
Case Studies In Social Media Chinese
 
Case Studies In Social Media
Case Studies In Social MediaCase Studies In Social Media
Case Studies In Social Media
 
Turn On Tune In Step Out
Turn On Tune In Step OutTurn On Tune In Step Out
Turn On Tune In Step Out
 
Technology And The Future Of Management
Technology And The Future Of ManagementTechnology And The Future Of Management
Technology And The Future Of Management
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

Working with Oracle/Sun Grid Engine

  • 2.
  • 3. Sun/Oracle Grid Engine is:  A quick and easy way to set up a multi- cluster system using existing hardware  Oracle Grid Engine is the most widely deployed workload management solution in the industry and offers unmatched scalability. On top of a rich set of advanced scheduling capabilities and the flexibility to adapt to any computing environment and application workload, Oracle Grid Engine offers comprehensive support for the cloud computing model.
  • 4. How to Install  Via Webappl.blogspot.com  http://webappl.blogspot.com/2011/05/ins tall-sun-grid-engine-sge-on-ubuntu.html
  • 5. Install SGE on master node:  Install SGE on master node: mpiuser@ub0:~$ sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec #remove gridengine-exec from the list if master node is not supposed to run jobs #during the installation, we need to set the cluster CELL name (such as „default‟)
  • 6. Install SGE on other nodes:  Install SGE on other nodes:  mpiuser@ub1:~$ sudo apt-get install gridengine-client gridengine-exec  The CELL name is set the same as that of the master node
  • 7. Set SGE_ROOT and SGE_CELL  Set SGE_ROOT and SGE_CELL environment variables: $SGE_ROOT refers to the installation path of SGE $SGE_CELL is cell name which is „default‟ on our machine Edit /etc/profile and /etc/bash.bachrc, add the following two lines export SGE_ROOT=/var/lib/gridengine #this is the path on our machines export SGE_CELL=default Source the script: source /etc/profile
  • 8. Configure SGE with qmon  Configure SGE with qmon (This section is modified from a note by Junjun Mao)  Invoke qmon as superuser: mpiuser@ub0:~$ sudo qmon  #On our machine, qmon failed to start due to missing fonts „-adobe-helvetica-…”  # To solve the fonts problem: mpiuser@ub0:~$ sudo apt-get install xfs xfstt mpiuser@ub0:~$ sudo apt-get install t1- xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86- nonfree-syriac xfonts-75dpi xfonts-100dpi mpiuser@ub0:~$ sudo reboot #after reboot, the problem is gone
  • 9. Configure hosts  Configure hosts  "Host Configuration" -> "Administration Host" -> Add master node and other administrative nodes "Host Configuration" -> "Submit Host" -> Add master node and other submit nodes "Host Configuration" -> "Execution Host" -> Add slave nodes ->Click on "Done" to finish
  • 10. Configure the user  Configure the user  Add or delete users that are allowed to access SGE here. In this example, a user is added to an existing group and later this group will be allowed to submit jobs. Everything else is left as default values.  "User Configuration" -> "Userset" -> Highlight userset "arusers" and click on "Modify" -> Input user name in "User/Group" field ->Click "Done" to finish
  • 11. Configure the queue  Configure the queue While Host Configuration deals what computing resources are available and User Configuration defines who have access to the resources, this Queue Control defines ways to connect hosts and users.
  • 12. Queue Control  "Queue Control" -> "Hosts" -> Confirm the execution hosts show up there. "Queue Control" -> "Cluster Queues" -> Click on "Add" -> Name the queue, add execution nodes to Hostlist; and "Use access" -> allow access to user group arusers; "General Configuration" -> Field "Slots" -> Raise the number to total CPU cores on slave nodes (ok to use a bigger number than actual CPU cores). "Queue Control" -> "Queue Instances" -> This is the place to manually assign hosts to queues, and control the state (active, suspend ...) of hosts.
  • 13. Configure parallel environment  Configure parallel environment "Queue Control" -> "Cluster Queues" -> Select a queue that will run parallel jobs -> Click on "Modify" -> "Parallel Environment" - > Click on icon "PE" below the right and left arrows -> Click on "Add" -> Name the PE, slots = 999, start_proc_args = $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args = $SGE_ROOT/mpi/stopmpi.sh, allocation_rule=$fill_up, check "Control slaves" to make this variable checked. Make sure the configured PE is loaded from "Available PE" to "Referenced PE". Confirm and close all config windows and open "Queue Control" -> "Cluster Queues" -> "Parallel Environment" again, the named PE should show up. Once created and linked to a queue, PE can be edited from "Queue Control" -> "PE" too.
  • 14. Check whether sge hosts are running properly  Check whether sge hosts are running properly mpiuser@ub0:~$ qhost #it should list the system info from all nodes mpiuser@ub0:~$ qconf -sel #it should list the hostnames of nodes mpiuser@ub0:~$ qconf -sql #it should list the queues mpiuser@ub0:~$ ps aux | grep sge_qmaster | grep -v grep #check master daemon mpiuser@ub0:~$ ps aux | grep sge_execd | grep -v grep #check execute daemon mpiuser@ub1:~$ ps aux | grep sge_ execd | grep -v grep #check execute daemon #If sge_qmaster or sge_execd daemon is not running, try starting by service #mpiuser@ub1:~$ sudo service gridengine-master start #mpiuser@ub1:~$ sudo service gridengine-exec start … #Reboot node(s) if sge_qmaster or sge_execd fails to start
  • 15. Run a test script  Run a test script Make a script named „test‟ with content: #!/bin/bash ### Request Bourne shell as shell for job #$ -S /bin/bash ### Use current directory as working directory #$ -CWD ### Name the job: #$ -N test echo “Running environment:” env echo “=============================” ###end of script
  • 16. Job Submission  To submit the job: qsub test #a job id returned if successful Query the job status: qstat #If the job is running successfully, there will be two output files produced in the current working directory with name test.oXXX (the standard output) and test.eXXX (the standard error), where test is the job name and XXX is the job id.
  • 17. Always check your logs  Check log messages if error occurs mpiuser@ub0:~$ less /var/spool/gridengine/qmaster/messages #master node mpiuser@ub0:~$ less /var/spool/gridengine/execd/ub0/messag es #exec node
  • 18. Possible Errors  Question: My output file has a Warning: no access to tty (Bad file descriptor).Thus no job control in this shell. Answer: This warning is caused if you are using the tcsh or csh as shell for submitting job. It is safe to ignore this warning. Alternatively you can qsub -S /bin/bash to run your program in different shell or add a line of „#$ -S /bin/bash‟ in the job script.
  • 19. Possible Errors  Question: Master host failed to respond properly. Error message is “error: commlib error: access denied (client IP resolved to host name „ub0…‟. This is not identical to clients host name „ub0‟) error: unable to contact qmaster using port 6444 on host „ub0‟” Answer: Reboot the master node or install the SGE from source code on master node (Solutions not confirmed yet). It also could be due to that the utility of gethostname (full path is „/usr/lib/gridengine/gethostname‟ on our machines) returns a different hostname to that from running command „hostname -f‟. If this is the case (e.g., host having multiple network interfaces), create a file named „host_aliases‟ under „$SGE_ROOT/$SGE_CELL/common‟ and populate as follows, # cat host_aliases ub0 ub0.my.com ub0-grid ub1 ub1.my.com ub1-grid ub2 ub2.my.com ub2-grid ub3 ub3.my.com ub3-grid and then restart the gridengine daemon (see man page of sge_host_aliases for details). Check the aliases: mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0-grid mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0 #both of them should return ub0
  • 20. Sources  http://manpages.ubuntu.com/manpages/ /jaunty/man5/sge_conf.5.html  http://webappl.blogspot.com/2011/05/ins tall-sun-grid-engine-sge-on-ubuntu.html  http://pka.engr.ccny.cuny.edu/~jmao/nod e/49  http://webappl.blogspot.com/2011/05/set ting-up-mpich2-cluster-with-ubuntu.html