CESMInstall WIKI CESM - hydra
(→Compilation) |
(→hydra's compilation environment) |
||
(No se muestran 57 ediciones intermedias realizadas por un usuario) | |||
Línea 1: | Línea 1: | ||
+ | These are the CESM's installation notes in [http://www.cima.fcen.uba.ar/ CIMA]/[http://www.cima.fcen.uba.ar/UMI/ IFAECI]'s HPC called <CODE>hydra</CODE>. |
||
+ | Notes and process carried out on 13-17th of May 2024, by Lluís Fita (UBA/CIMA/IFAECI, CABA, Argentina) with the assistance of Ass. Prof. Pedro DiNezio (U. Colorado, Boulder) and Dr Nicolás J. Cosentino (UBA/CIMA/IFAECI, CABA, Argentina), and the non-anonymous help of [https://bb.cgd.ucar.edu/cesm/ CESM forum] and stackoverflow |
||
= intel compilation = |
= intel compilation = |
||
Línea 259: | Línea 261: | ||
Configuration of the compilers is done via the file <CODE>cime/config/cesm/machines/config_cmopilers.xml</CODE>. |
Configuration of the compilers is done via the file <CODE>cime/config/cesm/machines/config_cmopilers.xml</CODE>. |
||
− | Some modifications are introduced in order to make sure that compilation is done throughtout hydra's intel configration |
+ | Some modifications are introduced in order to make sure that compilation is done throughtout hydra's intel configration (after this [https://bb.cgd.ucar.edu/cesm/threads/cesm-2-2-netcdf-issue.5922/ post]) |
+ | |||
<PRE style="shell"> |
<PRE style="shell"> |
||
$ cp cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml |
$ cp cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml |
||
$ diff cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml |
$ diff cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml |
||
− | 320,325c320,325 |
+ | 1636,1690d1635 |
+ | < <compiler MACH="hydra" COMPILER="intel"> |
||
+ | < <CFLAGS> |
||
+ | < <base> -qno-opt-dynamic-align -fp-model precise -std=gnu99 </base> |
||
+ | < <append compile_threaded="TRUE"> -qopenmp </append> |
||
+ | < <append DEBUG="FALSE"> -O2 -debug minimal </append> |
||
+ | < <append DEBUG="TRUE"> -O0 -g </append> |
||
+ | < </CFLAGS> |
||
+ | < <CPPDEFS> |
||
+ | < <!-- http://software.intel.com/en-us/articles/intel-composer-xe/ --> |
||
+ | < <append> -DFORTRANUNDERSCORE -DCPRINTEL</append> |
||
+ | < </CPPDEFS> |
||
+ | < <CXX_LINKER>FORTRAN</CXX_LINKER> |
||
+ | < <FC_AUTO_R8> |
||
+ | < <base> -r8 </base> |
||
+ | < </FC_AUTO_R8> |
||
+ | < <FFLAGS> |
||
+ | < <base> -qno-opt-dynamic-align -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -fp-model source </base> |
||
+ | < <append compile_threaded="TRUE"> -qopenmp </append> |
||
+ | < <append DEBUG="TRUE"> -O0 -g -check uninit -check bounds -check pointers -fpe0 -check noarg_temp_created </append> |
||
+ | < <append DEBUG="FALSE"> -O2 -debug minimal </append> |
||
+ | < </FFLAGS> |
||
+ | < <FFLAGS_NOOPT> |
||
+ | < <base> -O0 </base> |
||
+ | < </FFLAGS_NOOPT> |
||
+ | < <NETCDF_C_PATH>/opt/netcdf/netcdf-4/intel/2021.4.0</NETCDF_C_PATH> |
||
+ | < <NETCDF_FORTRAN_PATH>/opt/netcdf/netcdf-4/intel/2021.4.0/lib</NETCDF_FORTRAN_PATH> |
||
+ | < <FIXEDFLAGS> |
||
+ | < <base> -fixed </base> |
||
+ | < </FIXEDFLAGS> |
||
+ | < <FREEFLAGS> |
||
+ | < <base> -free </base> |
||
+ | < </FREEFLAGS> |
||
+ | < <LDFLAGS> |
||
+ | < <append compile_threaded="TRUE"> -qopenmp </append> |
||
+ | < </LDFLAGS> |
||
< <MPICC> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicc </MPICC> |
< <MPICC> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicc </MPICC> |
||
< <MPICXX> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicxx </MPICXX> |
< <MPICXX> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicxx </MPICXX> |
||
Línea 270: | Línea 272: | ||
< <SCXX> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/icpc </SCXX> |
< <SCXX> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/icpc </SCXX> |
||
< <SFC> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/ifort </SFC> |
< <SFC> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/ifort </SFC> |
||
− | --- |
+ | < <SLIBS> |
− | > <MPICC> mpicc </MPICC> |
+ | < <append MPILIB="mpich"> -mkl=cluster </append> |
− | > <MPICXX> mpicxx </MPICXX> |
+ | < <append MPILIB="mpich2"> -mkl=cluster </append> |
− | > <MPIFC> mpif90 </MPIFC> |
+ | < <append MPILIB="mvapich"> -mkl=cluster </append> |
− | > <SCC> icc </SCC> |
+ | < <append MPILIB="mvapich2"> -mkl=cluster </append> |
− | > <SCXX> icpc </SCXX> |
+ | < <append MPILIB="mpt"> -mkl=cluster </append> |
− | > <SFC> ifort </SFC> |
+ | < <append MPILIB="openmpi"> -mkl=cluster </append> |
+ | < <append MPILIB="impi"> -mkl=cluster </append> |
||
+ | < <append MPILIB="mpi-serial"> -mkl </append> |
||
+ | < <append>-L$(NETCDF_C_PATH)/lib -L$(NETCDF_FORTRAN_PATH)/lib -lnetcdff -lnetcdf -L$ENV{MKLROOT} -lmkl_rt </append> |
||
+ | < </SLIBS> |
||
+ | < <SUPPORTS_CXX>TRUE</SUPPORTS_CXX> |
||
+ | < </compiler> |
||
+ | < |
||
</PRE> |
</PRE> |
||
Línea 287: | Línea 289: | ||
Configuration of the batch system is done with the file <CODE>cime/config/cesm/machines/config_batch.xml</CODE> |
Configuration of the batch system is done with the file <CODE>cime/config/cesm/machines/config_batch.xml</CODE> |
||
− | By now, is it left as it is. |
+ | A new configuration has been added for hydra pbs queue system |
+ | <PRE style="shell"> |
||
+ | $ cp cime/config/cesm/machines/config_batch.xml cime/config/cesm/machines/config_batch_orig.xml |
||
+ | $ diff cime/config/cesm/machines/config_batch.xml cime/config/cesm/machines/config_batch_orig.xml |
||
+ | 698,728d697 |
||
+ | < <batch_system MACH="hydra" type="pbs" > |
||
+ | < <batch_query args="-f" >qstat</batch_query> |
||
+ | < <batch_submit>qsub </batch_submit> |
||
+ | < <batch_cancel>qdel</batch_cancel> |
||
+ | < <batch_env>-v</batch_env> |
||
+ | < <batch_directive>#PBS</batch_directive> |
||
+ | < <jobid_pattern>^(\S+)$</jobid_pattern> |
||
+ | < <depend_string> -W depend=afterok:jobid</depend_string> |
||
+ | < <depend_allow_string> -W depend=afterany:jobid</depend_allow_string> |
||
+ | < <depend_separator>:</depend_separator> |
||
+ | < <walltime_format>%H:%M:%S</walltime_format> |
||
+ | < <batch_mail_flag>-M</batch_mail_flag> |
||
+ | < <batch_mail_type_flag>-m</batch_mail_type_flag> |
||
+ | < <batch_mail_type>, bea, b, e, a</batch_mail_type> |
||
+ | < <submit_args> |
||
+ | < <arg flag="-q" name="$JOB_QUEUE"/> |
||
+ | < <arg flag="-l walltime=" name="$JOB_WALLCLOCK_TIME"/> |
||
+ | < <arg flag="-A" name="$PROJECT"/> |
||
+ | < </submit_args> |
||
+ | < <directives> |
||
+ | < <directive>-N {{ job_id }}</directive> |
||
+ | < <directive default="n"> -r {{ rerunnable }} </directive> |
||
+ | < <!-- <directive> -j oe {{ job_id }} </directive> --> |
||
+ | < <directive> -j oe </directive> |
||
+ | < <directive> -V </directive> |
||
+ | < </directives> |
||
+ | < <queues> |
||
+ | < <queue walltimemin="" walltimemax="168:00:00" nodemin="0" nodemax="5" default="true">larga</queue> |
||
+ | < </queues> |
||
+ | < </batch_system> |
||
+ | < |
||
+ | </PRE> |
||
=== Work-flow === |
=== Work-flow === |
||
Línea 295: | Línea 297: | ||
− | == Compilation == |
+ | == Case creation == |
Prior to the compilation, we need to make sure that all the optional additional components have already been compiled (see next section). After that we are going to use <CODE>cime/scripts/create_newcase</CODE> to compile and use the model. The model is compiled for each new experiment, since user might activate different components each time and compiling only the required components makes the simulation more efficient. |
Prior to the compilation, we need to make sure that all the optional additional components have already been compiled (see next section). After that we are going to use <CODE>cime/scripts/create_newcase</CODE> to compile and use the model. The model is compiled for each new experiment, since user might activate different components each time and compiling only the required components makes the simulation more efficient. |
||
Línea 312: | Línea 314: | ||
$ cd cime/scripts/ |
$ cd cime/scripts/ |
||
$ ./create_newcase --case /share/cesm/expriments/b.day1.0 --res f19_g17 --compset B1850 --mach hydra >& run_create_newcase.log |
$ ./create_newcase --case /share/cesm/expriments/b.day1.0 --res f19_g17 --compset B1850 --mach hydra >& run_create_newcase.log |
||
+ | </PRE> |
||
+ | |||
+ | === Errors found === |
||
+ | One must be aware, that it seems that error messages are not quite informative (by now) |
||
+ | |||
+ | Python error |
||
+ | <PRE> |
||
+ | /usr/bin/env: ‘python’: No such file or directory |
||
+ | </PRE> |
||
+ | |||
+ | Fixed by imposing python3 |
||
+ | <PRE style="shell"> |
||
+ | $ cp create_newcase create_newcase_orig |
||
+ | $ diff create_newcase create_newcase_orig |
||
+ | 1c1 |
||
+ | < #!/usr/bin/env python3 |
||
+ | --- |
||
+ | > #!/usr/bin/env python |
||
</PRE> |
</PRE> |
||
Línea 329: | Línea 349: | ||
/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/cesm/machines/config_machines.xml fails to validate' from dir '/home/lluis.f |
/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/cesm/machines/config_machines.xml fails to validate' from dir '/home/lluis.f |
||
ita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/scripts' |
ita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/scripts' |
||
+ | </PRE> |
||
+ | machine configuration for hydra was lacking, that it does not have <CODE>modules</CODE>, so it was included: |
||
+ | <PRE> |
||
+ | (...) |
||
+ | </mpirun> |
||
+ | <module_system type="none"/> |
||
+ | (...) |
||
+ | </PRE> |
||
+ | |||
+ | Queue-system related error: |
||
+ | <PRE> |
||
+ | Batch_system_type is pbs |
||
+ | job is case.run USER_REQUESTED_WALLTIME None USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S |
||
+ | WARNING: No queue on this system met the requirements for this job. Falling back to defaults |
||
+ | ERROR: No queues found |
||
+ | </PRE> |
||
+ | No queue system was created for hydra. Now it has been added into <CODE>cime/config/cesm/machines/config_batch.xml</CODE> |
||
+ | |||
+ | == Case setup == |
||
+ | Now we are ready to set-up the case. |
||
+ | |||
+ | Going to the folder with the case (<CODE>$ROOTCASE</CODE> folder as it is understood) |
||
+ | <PRE style="shell"> |
||
+ | $ cd /share/cesm/expriments/b.day1.0 |
||
+ | $ ./case.setup >& run_case-setup.log |
||
+ | </PRE> |
||
+ | |||
+ | Looking for errors: |
||
+ | <PRE style="Shell"> |
||
+ | $ tail run_case-setup.log |
||
+ | You can now run './preview_run' to get more info on how your case will be run |
||
+ | </PRE> |
||
+ | |||
+ | Which provides the following configuration: |
||
+ | <PRE style="shell"> |
||
+ | $ ./preview_run >& run_preview_run.log |
||
+ | $ cat run_preview_run.log |
||
+ | CASE INFO: |
||
+ | nodes: 6 |
||
+ | total tasks: 768 |
||
+ | tasks per node: 128 |
||
+ | thread count: 1 |
||
+ | |||
+ | BATCH INFO: |
||
+ | FOR JOB: case.run |
||
+ | ENV: |
||
+ | Setting Environment OMP_NUM_THREADS=1 |
||
+ | |||
+ | SUBMIT CMD: |
||
+ | qsub -q larga -l walltime=168:00:00 -A none -q larga -l walltime=168:00:00 -A none -v ARGS_FOR_SCRIPT='--resubmit' .case.run |
||
+ | |||
+ | MPIRUN (job=case.run): |
||
+ | mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 |
||
+ | |||
+ | FOR JOB: case.st_archive |
||
+ | ENV: |
||
+ | Setting Environment OMP_NUM_THREADS=1 |
||
+ | |||
+ | SUBMIT CMD: |
||
+ | qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:0 -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive |
||
+ | </PRE> |
||
+ | |||
+ | === errors === |
||
+ | First error related to python... |
||
+ | <PRE style="shell"> |
||
+ | $ ./case.setup |
||
+ | /usr/bin/env: ‘python’: No such file or directory |
||
+ | </PRE> |
||
+ | |||
+ | Modifying it to use python3 |
||
+ | <PRE style="shell"> |
||
+ | $ cp case.setup case.setup_orig |
||
+ | $ diff case.setup case.setup_orig |
||
+ | 1c1 |
||
+ | < #!/usr/bin/env python3 |
||
+ | --- |
||
+ | > #!/usr/bin/env python |
||
+ | </PRE> |
||
+ | |||
+ | It happens everywhere, so, either I ask my IT team (as root) to fix it by creating a symlink to python, but based in python3 (see [https://stackoverflow.com/questions/3655306/ubuntu-usr-bin-env-python-no-such-file-or-directory here]) |
||
+ | <PRE style="shell"> |
||
+ | # ln -s /usr/bin/python3 /usr/bin/python |
||
+ | </PRE> |
||
+ | |||
+ | Or I go back to the <CODE>cime/scripts</CODE> folder and change it everywhere: <CODE>create_clone, create_newcase, create_test, query_config, query_testlists</CODE> |
||
+ | |||
+ | Modify all the python scripts within <CODE>cime/scripts/Tools</CODE> |
||
+ | <PRE style="shell"> |
||
+ | $ cp -R Tools Tools_orig |
||
+ | </PRE> |
||
+ | Scripts being modified: <CODE>Tools/archive_metadata, Tools/bld_diff, Tools/bless_test_results, Tools/case.build, Tools/case.cmpgen_namelists, Tools/case_diff, Tools/case.qstatus, Tools/case.setup, Tools/case.submit, Tools/check_case, Tools/check_input_data, Tools/check_lockedfiles, Tools/cime_bisect, Tools/code_checker, Tools/compare_namelists, Tools/compare_test_results, Tools/component_compare_baseline, Tools/component_compare_copy, Tools/component_compare_test, Tools/component_generate_baseline, Tools/cs.status, Tools/e3sm_check_env, Tools/generate_cylc_workflow.py, Tools/get_case_env, Tools/get_standard_makefile_args, Tools/getTiming, Tools/jenkins_generic_job, Tools/list_e3sm_tests, Tools/list_e3sm_tests, Tools/mvsource, Tools/normalize_cases, Tools/pelayout, Tools/preview_namelists, Tools/preview_run, Tools/save_provenance, Tools/simple_compare, Tools/testreporter.py, Tools/wait_for_tests, Tools/xmlchange, Tools/xmlquery</CODE> |
||
+ | |||
+ | Also inside folder <CODE>Tools/xmlconvertors</CODE>: <CODE>Tools/xmlconvertors/config_pes_converter.py, Tools/xmlconvertors/grid_xml_converter.py, Tools/xmlconvertors/convert-grid-v1-to-v2</CODE> |
||
+ | |||
+ | == Case Build == |
||
+ | Compiling the code for the case, before we are going to clean it, just in case... |
||
+ | |||
+ | <PRE style="shell"> |
||
+ | $ ./case.build --clean |
||
+ | $ ./case.build >& run_case-build.log |
||
+ | </PRE> |
||
+ | |||
+ | Looking for errors: |
||
+ | |||
+ | After succesfull compilation, verify the presence of all the required input data with: |
||
+ | <!-- ./create_newcase --case /share/cesm/expriments/b.day1.0.002 --res f19_f19 --compset F1850 --mach hydra --run-unsupported --> |
||
+ | |||
+ | <PRE style="shell"> |
||
+ | $ tail run_case-build.log |
||
+ | (...) |
||
+ | siac built in 1.117004 seconds |
||
+ | sesp built in 1.145180 seconds |
||
+ | cam built in 1.169075 seconds |
||
+ | Component glc build complete with 2 warnings |
||
+ | cism built in 233.087502 seconds |
||
+ | Building cesm from /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/src/drivers/mct/cime_config/buildexe with output to /home/lluis. |
||
+ | fita/cesm/scratch/b.day1.0/bld/cesm.bldlog.240516-151609 |
||
+ | Time spent not building: 6.916887 sec |
||
+ | Time spent building: 260.266792 sec |
||
+ | MODEL BUILD HAS FINISHED SUCCESSFULLY |
||
+ | $ ./check_input_data --download >& run_check_input_data.log |
||
+ | (...) |
||
+ | |||
+ | Model cpl missing file wav2ocn_smapname = '/share/cesm/inputdata/cpl/gridmaps/ww3a/map_ww3a_TO_gx1v7_splice_170214.nc' |
||
+ | Trying to download file: 'cpl/gridmaps/ww3a/map_ww3a_TO_gx1v7_splice_170214.nc' to path '/share/cesm/inputdata/cpl/gridmaps/ww3a/map_ww3a_T |
||
+ | O_gx1v7_splice_170214.nc' using WGET protocol. |
||
+ | SUCCESS |
||
+ | </PRE> |
||
+ | |||
+ | === Errors === |
||
+ | There are errors related to the use of python |
||
+ | |||
+ | <PRE style="shell"> |
||
+ | $ tail run_case-build.log |
||
+ | (...) |
||
+ | Building lnd with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236 |
||
+ | clm built in 0.009455 seconds |
||
+ | ERROR: BUILD FAIL: clm.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236 |
||
+ | </PRE> |
||
+ | |||
+ | Looking into: |
||
+ | <PRE sytle="shell"> |
||
+ | cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236 |
||
+ | /usr/bin/env: ‘python’: No such file or directory |
||
+ | </PRE> |
||
+ | |||
+ | <PRE> |
||
+ | (...) |
||
+ | ERROR: Command /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/components/pop/bld/build-namelist failed rc=2 |
||
+ | out=ERROR in MARBL_diags_to_tavg.py |
||
+ | err=/usr/bin/env: ‘python’: No such file or directory |
||
+ | ERROR: env CASEROOT=/share/cesm/expriments/b.day1.0 CASEBUILD=/share/cesm/expriments/b.day1.0/Buildconf OCN_GRID=gx1v7OCN_TAVG_TRAC |
||
+ | ER_BUDGET=FALSE POPROOT=/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/components/pop /home/lluis.fita/CESM/v2.2.2/intel/my_ces |
||
+ | m_sandbox/components/pop/input_templates/ocn.ecosys.tavg.csh 4 .false. .true. failed: 256 |
||
+ | </PRE> |
||
+ | |||
+ | Editing various python scripts to make sure of the use of python3, in order to make this happen, from <CODE>$ROOTCESM</CODE>, I executed the following code: |
||
+ | <PRE style="shell"> |
||
+ | $ head [place]/* | grep -B 2 'env python' | grep -v directory >& py.log |
||
+ | $ cat py.log |
||
+ | </PRE> |
||
+ | being <CODE>[place]</CODE>, incremental position within the folder structure of CESM: <CODE>./, *, */*, */*/*, */*/*/*, */*/*/*/*, */*/*/*/*/*, ...</CODE> |
||
+ | |||
+ | For example |
||
+ | <PRE style="shell"> |
||
+ | $ head ./* | grep -B 2 'env python' | grep -v directory >& py.log |
||
+ | head: error reading './cime': Is a directory |
||
+ | head: error reading './cime_config': Is a directory |
||
+ | head: error reading './components': Is a directory |
||
+ | head: error reading './components_orig': Is a directory |
||
+ | head: error reading './doc': Is a directory |
||
+ | head: error reading './manage_externals': Is a directory |
||
+ | $ cat py.log |
||
+ | |||
+ | ==> ./describe_version <== |
||
+ | #!/usr/bin/env python3 |
||
+ | </PRE> |
||
+ | |||
+ | Or even better: |
||
+ | <PRE> |
||
+ | $ grep -i python [place]/* | grep env | grep -v orig | grep -v python3 >& py.log |
||
+ | </PRE> |
||
+ | |||
+ | <CODE>components/pop/MARBL_scripts/add_cocco_to_init.py, components/pop/MARBL_scripts/MARBL_diags_to_tavg.py, cime/src/components/stub_comps_mct/siac/cime_config/buildlib, cime/src/components/stub_comps_mct/siac/cime_config/buildlib_cmake, cime/src/components/stub_comps_mct/siac/cime_config/buildnml</CODE> |
||
+ | |||
+ | Modifying more python scripts in <CODE>cime/src/build_scripts</CODE>: <CODE>cime/src/build_scripts/buildlib.cprnc, cime/src/build_scripts/buildlib.csm_share, cime/src/build_scripts/buildlib.gptl, cime/src/build_scripts/buildlib.kokkos, cime/src/build_scripts/buildlib.mct, cime/src/build_scripts/buildlib.mpi-serial, cime/src/build_scripts/buildlib.pio</CODE> |
||
+ | |||
+ | Modifying more python scripts: <CODE>components/clm/bld/namelist_files/createMkSrfEntry.py, components/clm/run_sys_tests, components/cam/cime_config/buildcpp, components/cam/cime_config/buildlib, components/cam/cime_config/buildnml, components/cam/manage_externals/checkout_externals, components/cdeps/cime_config/buildlib, components/cice/cime_config/buildcpp, components/cice/cime_config/buildlib, components/cice/cime_config/buildlib, components/cice/cime_config/buildnml, components/cism/manage_externals/checkout_externals, components/clm/cime_config/buildlib, components/clm/cime_config/buildnml, components/clm/manage_externals/checkout_externals, components/clm/python/run_ctsm_py_tests, components/mosart/cime_config/buildlib, components/mosart/cime_config/buildnml, components/pop/cime_config/phys_cycle_postrun, components/pop/cime_config/phys_cycle_preruncomponents/pop/cime_config/buildcpp, components/pop/cime_config/buildlib, components/pop/cime_config/buildnml, components/rtm/cime_config/buildlibm components/rtm/cime_config/buildnml, components/ww3/cime_config/buildlib, components/ww3/cime_config/buildnml</CODE> |
||
+ | |||
+ | Also: <CODE>components/clm/python/ctsm/test/test_sys_lilac_build_ctsm.py, components/clm/python/ctsm/test/test_unit_lilac_build_ctsm.py, components/clm/python/ctsm/test/test_unit_lilac_make_runtime_inputs.py, components/clm/python/ctsm/test/test_unit_machine.py, components/clm/python/ctsm/test/test_unit_path_utils.py, components/clm/python/ctsm/test/test_unit_run_sys_tests.py, components/clm/python/ctsm/test/test_unit_utils.py, components/clm/src/fates/tools/FatesPFTIndexSwapper.py, components/clm/src/fates/tools/modify_fates_paramfile.py, components/clm/src/fates/tools/ncvarsort.py, components/pop/externals/CVMix/bld/cvmix_setup</CODE> |
||
+ | |||
+ | Another set pf modified files: <CODE>cime/config/cesm/machines/template.case.test, cime/config/cesm/machines/template.st_archive, cime/config/ufs/machines/template.case.run, vim cime/config/ufs/machines/template.case.test, cime/config/ufs/machines/template.st_archive, cime/scripts/lib/CIME/BuildTools/configure.py, cime/scripts/lib/CIME/case/case_submit.py</CODE> |
||
+ | |||
+ | There is an error during the building of the CISM component. The output of <CODE>case.build</CODE>: |
||
+ | <PRE style="shell"> |
||
+ | (...) |
||
+ | - Building clm library |
||
+ | Building lnd with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-155359 |
||
+ | Component lnd build complete with 6 warnings |
||
+ | clm built in 184.188484 seconds |
||
+ | - Building atm Library |
||
+ | Building atm with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/atm.bldlog.240515-155359 |
||
+ | - Building ice Library |
||
+ | Building ice with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/ice.bldlog.240515-155359 |
||
+ | - Building ocn Library |
||
+ | Building ocn with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/ocn.bldlog.240515-155359 |
||
+ | - Building rof Library |
||
+ | Building rof with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/rof.bldlog.240515-155359 |
||
+ | - Building glc Library |
||
+ | Building glc with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240515-155359 |
||
+ | - Building wav Library |
||
+ | Building wav with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/wav.bldlog.240515-155359 |
||
+ | - Building iac Library |
||
+ | Building iac with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/iac.bldlog.240515-155359 |
||
+ | - Building esp Library |
||
+ | Building esp with output to |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/esp.bldlog.240515-155359 |
||
+ | sesp built in 1.537159 seconds |
||
+ | siac built in 1.544745 seconds |
||
+ | ifort: error #10236: File not found: |
||
+ | '/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc/fortran_autogen_srcs/glide_io.F90' |
||
+ | |||
+ | ifort: error #10236: File not found: |
||
+ | '/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc/fortran_autogen_srcs/glint_io.F90' |
||
+ | cism built in 35.435328 seconds |
||
+ | mosart built in 42.410334 seconds |
||
+ | ww built in 54.594975 seconds |
||
+ | Component ice build complete with 1 warnings |
||
+ | cice built in 81.223358 seconds |
||
+ | Component ocn build complete with 13 warnings |
||
+ | pop built in 145.569954 seconds |
||
+ | Component atm build complete with 14 warnings |
||
+ | cam built in 229.021996 seconds |
||
+ | ERROR: BUILD FAIL: cism.buildlib failed, cat |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240515-155359 |
||
+ | </PRE> |
||
+ | |||
+ | These files, that should be automatically created, are not being written: |
||
+ | <PRE> |
||
+ | bld/glc/fortran_autogen_srcs/glide_io.F90 |
||
+ | bld/glc/fortran_autogen_srcs/glint_io.F90 |
||
+ | </PRE> |
||
+ | |||
+ | In the case directory one has the file <CODE>Buildconf/cismIOconf/cism.buildIO.csh</CODE>, with the following content: |
||
+ | <PRE> |
||
+ | (...) |
||
+ | # create new _io.F90 file using CISM's python script |
||
+ | # |
||
+ | --------------------------------------------------------------------------- |
||
+ | $PYTHON generate_ncvars.py $file_varsdef ncdf_template.F90.in |
||
+ | </PRE> |
||
+ | It is looking for the environment variable that holds <CODE>python</CODE>. In the root directory of CESM the file <CODE>components/cism/bld/cismIO/README.cismIO</CODE>: |
||
+ | <PRE> |
||
+ | This directory and its scripts are intended to allow the user to change IO fields from the CISM code. The CISM IO files, *_io.F90, are |
||
+ | auto-generated and typically difficult to modify. However, the corresponding variable definition files, *_vars.def, are easily modified |
||
+ | and the IO files can be re-generated by running the cism.buildIO.csh script contained in this directory. |
||
+ | |||
+ | Usage of this script requires that the user has defined an enviroment variable PYTHON pointing to a local version of python, After that, |
||
+ | the user simply runs the enclosed cism.buildIO.csh script, which runs a python script on each |
||
+ | (...) |
||
+ | </PRE> |
||
+ | Therefore I need to find where the <CODE>PYTHON</CODE> is defined within the <CODE>cime/config/cesm/machines/config_*.xml</CODE> files, or I defined it manually. It does not work it is defined previously the execution: |
||
+ | <PRE style="shell"> |
||
+ | $ export PYTHON=/usr/bin/python3 |
||
+ | $ echo $PYTHON |
||
+ | /usr/bin/python3 |
||
+ | $ ./case.build --clean |
||
+ | $ ./case.build |
||
+ | </PRE> |
||
+ | Therefore, it is defined within <CODE>Buildconf/cismIOconf/cism.buildIO.csh</CODE> (must be defined in the template of this shell script!!). Looking in <CODE>$ROOTCESM</CODE> into <CODE>components/cism/source_cism/utils/build/generate_ncvars.py</CODE>, it is using python2! hydra does not support python2. |
||
+ | <PRE> |
||
+ | #!/usr/bin/env python2 |
||
+ | </PRE> |
||
+ | |||
+ | Code is being modified by |
||
+ | <PRE> |
||
+ | #!/usr/bin/env python3 |
||
+ | </PRE> |
||
+ | |||
+ | Then 2 other files need to be modified also: <CODE>components/cism/source_cism/utils/build/autogenerate-in-build-dir, components/cism/source_cism/utils/build/autogen-for-glint-and-glad-in-build-dir</CODE>, whenever they call <CODE>python</CODE> is replaced by <CODE>python3</CODE> |
||
+ | |||
+ | <PRE stryle="Shell"> |
||
+ | $ cat components/cism/source_cism/utils/build/autogenerate-in-build-dir | grep python |
||
+ | # Call python script with source file arguments |
||
+ | python3 -V |
||
+ | python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLIDE_VARS_PATH $NCDF_TEMPL_PATH |
||
+ | $ cat components/cism/source_cism/utils/build/autogen-for-glint-and-glad-in-build-dir | grep python |
||
+ | # Call python script with source file arguments |
||
+ | python3 -V |
||
+ | python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLINT_VARS_PATH $NCDF_TEMPL_PATH |
||
+ | python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLINT_MBAL_PATH $NCDF_TEMPL_PATH |
||
+ | python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLAD_VARS_PATH $NCDF_TEMPL_PATH |
||
+ | python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLAD_MBAL_PATH $NCDF_TEMPL_PATH |
||
+ | </PRE> |
||
+ | |||
+ | == Case send == |
||
+ | |||
+ | <!-- Manually |
||
+ | $ source /opt/load-libs.sh 1 |
||
+ | $ /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 48 /home/lluis.fita/cesm/scratch/b.day1.0.002/bld/cesm.exe --> |
||
+ | |||
+ | Submitting the case: |
||
+ | <PRE> |
||
+ | $ ./case.submit >& run_case-submit.log |
||
+ | </PRE> |
||
+ | |||
+ | Checking that it is running |
||
+ | <PRE style?="shell"> |
||
+ | $ qstat -u $USER |
||
+ | |||
+ | hydra: |
||
+ | Req'd Req'd Elap |
||
+ | Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time |
||
+ | ----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - --------- |
||
+ | 43629.hydra lluis.fita larga run.b.day1.0 721 1 1 -- 168:00:00 R 00:00:51 |
||
+ | 43630.hydra lluis.fita larga st_archive.b.day -- 1 1 -- 00:20:00 H -- |
||
+ | </PRE> |
||
+ | |||
+ | === Errors === |
||
+ | |||
+ | ==== mpirun not found ==== |
||
+ | |||
+ | Submission error as: |
||
+ | <PRE style="shell"> |
||
+ | $ cat run.b.day1.0.o43627 |
||
+ | (...) |
||
+ | run command is mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 |
||
+ | ERROR: RUN FAIL: Command 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed |
||
+ | See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514 |
||
+ | $ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514 |
||
+ | /bin/sh: 1: mpirun: not found |
||
+ | </PRE> |
||
+ | |||
+ | Redifined <CODE>mpirun</CODE> in <CODE>cime/config/cesm/machines/config_machines.xml</CODE> as: |
||
+ | <PRE> |
||
+ | <executable>/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun</executable> |
||
+ | </PRE> |
||
+ | If it is too late (prior to submit), it can be directly changed in the case directory in file <CODE>env_mach_specific.xml</CODE> |
||
+ | |||
+ | ==== hydra's compilation environment ==== |
||
+ | |||
+ | Simulation finished too early ... |
||
+ | <PRE style="shell"> |
||
+ | $ qstat -u $USER |
||
+ | |||
+ | hydra: |
||
+ | Req'd Req'd Elap |
||
+ | Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time |
||
+ | ----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - --------- |
||
+ | 43629.hydra lluis.fita larga run.b.day1.0 721 1 1 -- 168:00:00 C -- |
||
+ | 43630.hydra lluis.fita larga st_archive.b.day -- 1 1 -- 00:20:00 C -- |
||
+ | </PRE> |
||
+ | |||
+ | Looking in the logs: |
||
+ | <PRE style="shell"> |
||
+ | $ cat run.b.day1.0.o43629 |
||
+ | Generating namelists for /share/cesm/expriments/b.day1.0 |
||
+ | Creating component namelists |
||
+ | (...) |
||
+ | ------------------------------------------------------------------------- |
||
+ | run command is /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> |
||
+ | cesm.log.$LID 2>&1 |
||
+ | tm_poll: got event 1 return 0 |
||
+ | tm_poll: INIT nodes 1 |
||
+ | tm_poll: INIT daddy jobid 43629.hydra |
||
+ | tm_poll: INIT daddy node 0 |
||
+ | tm_poll: INIT daddy tid 1 |
||
+ | new_task: jobid=43629.hydra node=0 task=1 |
||
+ | new_task: jobid=43629.hydra node=0 task=2 |
||
+ | ERROR: RUN FAIL: Command '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/ces |
||
+ | m.exe >> cesm.log.$LID 2>&1 ' failed |
||
+ | See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533 |
||
+ | |||
+ | $ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533 |
||
+ | (...) |
||
+ | Invalid character . in PBS_JOBID |
||
+ | tm_poll: got event 1 return 0 |
||
+ | tm_poll: INIT nodes 1 |
||
+ | tm_poll: INIT daddy jobid 43629.hydra |
||
+ | tm_poll: INIT daddy node -1 |
||
+ | tm_poll: INIT daddy tid 0 |
||
+ | new_task: jobid=43629.hydra node=-1 task=0 |
||
+ | new_task: called with TM_ERROR_NODE |
||
+ | new_task: jobid=43629.hydra node=0 task=1 |
||
+ | |||
+ | =================================================================================== |
||
+ | = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES |
||
+ | = PID 1006 RUNNING AT node43 |
||
+ | = EXIT CODE: 9 |
||
+ | = CLEANING UP REMAINING PROCESSES |
||
+ | = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES |
||
+ | =================================================================================== |
||
+ | tm_poll: got event 2 return 0 |
||
+ | new_task: jobid=43629.hydra node=0 task=2 |
||
+ | tm_poll: got event 3 return 0 |
||
+ | YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9) |
||
+ | This typically refers to a problem with your application. |
||
+ | Please see the FAQ page for debugging suggestions |
||
+ | </PRE> |
||
+ | |||
+ | And also: |
||
+ | <PRE style="shell"> |
||
+ | $ cat CaseStatus |
||
+ | 2024-05-16 11:52:51: case.setup starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 11:52:52: case.setup success |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 11:55:32: case.build starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 12:04:41: case.build error |
||
+ | ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240516-115532 |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 15:12:59: case.build starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 15:13:16: case.build error |
||
+ | ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240516-151259 |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 15:16:09: case.build starting |
||
+ | --------------------------------------------------- |
||
+ | CESM version is release-cesm2.2.2 |
||
+ | Processing externals description file : Externals.cfg (/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox) |
||
+ | (...) |
||
+ | Checking local status of required & optional components: cam, chem_proc, carma, cosp2, clubb, silhs, pumas, atmos_phys, |
||
+ | atmos_cubed_sphere, cice, cdeps, fox, cime, cmeps, cism, source_cism, clm, fates, fms, mom, mosart, pop, cvmix, marbl, |
||
+ | rtm, ww3, |
||
+ | M ./cime |
||
+ | modified sandbox, on cime5.8.32.9 |
||
+ | e-o ./cime/src/drivers/nuopc/ |
||
+ | -, not checked out --> |
||
+ | M ./components/cam |
||
+ | modified sandbox, on cam_cesm2_2_rel_09 |
||
+ | ./components/cam/chem_proc |
||
+ | clean sandbox, on tools/proc_atm/chem_proc/release_tags/chem_proc5_0_04 |
||
+ | ./components/cam/src/atmos_phys |
||
+ | clean sandbox, on version0_00_007 |
||
+ | ./components/cam/src/dynamics/fv3/atmos_cubed_sphere |
||
+ | clean sandbox, on fv3_cesm.04 |
||
+ | ./components/cam/src/physics/carma/base |
||
+ | clean sandbox, on carma/release_tags/carma3_49_rel |
||
+ | ./components/cam/src/physics/clubb |
||
+ | clean sandbox, on clubb_release_b76a124_20200220_c20200320 |
||
+ | ./components/cam/src/physics/cosp2/src |
||
+ | clean sandbox, on v2.1.4cesm |
||
+ | ./components/cam/src/physics/pumas |
||
+ | clean sandbox, on pumas_cam-release_v1.3 |
||
+ | ./components/cam/src/physics/silhs |
||
+ | clean sandbox, on silhs_clubb_release_b76a124_20200220_c20200320 |
||
+ | M ./components/cdeps |
||
+ | modified sandbox, on d808b7c6f78a2d5dcfeb1da0d1a452a9b66e08c8 |
||
+ | ./components/cdeps/fox |
||
+ | clean sandbox, on 7b9488446b193192dd3f0378541e71099cb4e8a8 |
||
+ | M ./components/cice |
||
+ | modified sandbox, on cice5-cesm2.2.2-20231220 |
||
+ | M ./components/cism |
||
+ | modified sandbox, on cism2_1_69_b |
||
+ | M ./components/cism/source_cism |
||
+ | modified sandbox, on release-cesm2.2.2-f1a88d6derecho |
||
+ | M ./components/clm |
||
+ | modified sandbox, on release-cesm2.2.04 |
||
+ | M ./components/clm/src/fates |
||
+ | modified sandbox, on sci.1.30.0_api.8.0.0 |
||
+ | e-o ./components/mom |
||
+ | -, not checked out --> mi_20200908 |
||
+ | M ./components/mosart |
||
+ | modified sandbox, on mosart1_0_37_1 |
||
+ | M ./components/pop |
||
+ | modified sandbox, on pop2_cesm2_2_rel_n01 |
||
+ | M ./components/pop/externals/CVMix |
||
+ | modified sandbox, on v0.98-beta |
||
+ | ./components/pop/externals/MARBL |
||
+ | clean sandbox, on cesm2.2-n00 |
||
+ | M ./components/rtm |
||
+ | modified sandbox, on rtm1_0_72 |
||
+ | M ./components/ww3 |
||
+ | modified sandbox, on ww3_200710 |
||
+ | e-o ./libraries/FMS |
||
+ | -, not checked out --> fi_20200609_cesm2.2_231205 |
||
+ | 2024-05-16 15:20:36: case.build success |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 16:55:05: case.submit starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 16:55:12: case.submit error |
||
+ | ERROR: Command: 'qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:43 |
||
+ | 627.hydra -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error 'qsub: submit error (Invalid request)' fro |
||
+ | m dir '/share/cesm/expriments/b.day1.0' |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 16:55:14: case.run starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 16:55:41: model execution starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 16:55:41: model execution error |
||
+ | ERROR: Command: 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed |
||
+ | with error '' from dir '/home/lluis.fita/cesm/scratch/b.day1.0/run' |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 16:55:41: case.run error |
||
+ | ERROR: RUN FAIL: Command 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 |
||
+ | ' failed |
||
+ | See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514 |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 17:05:25: case.submit starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 17:05:33: case.submit error |
||
+ | ERROR: Command: 'qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:43 |
||
+ | 629.hydra -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error 'qsub: submit error (Invalid request)' fro |
||
+ | m dir '/share/cesm/expriments/b.day1.0' |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 17:05:33: case.run starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 17:05:43: model execution starting |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 17:06:34: model execution error |
||
+ | ERROR: Command: '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/ |
||
+ | cesm.exe >> cesm.log.$LID 2>&1 ' failed with error '' from dir '/home/lluis.fita/cesm/scratch/b.day1.0/run' |
||
+ | --------------------------------------------------- |
||
+ | 2024-05-16 17:06:34: case.run error |
||
+ | ERROR: RUN FAIL: Command '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.da |
||
+ | y1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed |
||
+ | See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533 |
||
+ | --------------------------------------------------- |
||
+ | </PRE> |
||
+ | |||
+ | I suspect that it is related to the compilation environment. hydra does not have <CODE>module</CODE>. It set-ups compilation environment via shell scriupt <CODE>/opt/load-libs.sh</CODE>. There must be a way to systematically introduce it in the <CODE>config_batch.xml</CODE> in order to be executed in all the PBS jobs. A new post is created in the CESM forum [https://bb.cgd.ucar.edu/cesm/threads/introducing-a-system-instruction-in-config_batch-xml.9646/#post-55530 #post-55530] |
||
+ | |||
+ | From another attempt |
||
+ | <PRE style="shell"> |
||
+ | $ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43633.hydra.240517-155125 | more |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading |
||
+ | shared libraries: libnetcdf.so.19: cannot open shared object file: No suc |
||
+ | h file or directory |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading |
||
+ | shared libraries: libnetcdf.so.19: cannot open shared object file: No suc |
||
+ | h file or directory |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading |
||
+ | shared libraries: libnetcdf.so.19: cannot open shared object file: No suc |
||
+ | h file or directory |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading |
||
+ | shared libraries: libnetcdf.so.19: cannot open shared object file: No suc |
||
+ | h file or directory |
||
+ | /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading |
||
+ | shared libraries: libnetcdf.so.19: cannot open shared object file: No suc |
||
+ | h file or directory |
||
+ | (...) |
||
+ | </PRE> |
||
+ | |||
+ | Is set to use the variable <CODE>prerun_script</CODE> into the file in the case directory <CODE>env_run.xml</CODE>, see more details (searching for it [https://docs.cesm.ucar.edu/models/cesm2/settings/current/drv_input.html here]) |
||
+ | |||
+ | Trying different options: |
||
+ | <PRE> |
||
+ | <entry id="PRERUN_SCRIPT"> |
||
+ | <type>char</type> |
||
+ | <desc>External script to be run before model completion</desc> |
||
+ | <values> |
||
+ | <value>source /opt/load-libs.sh 1</value> |
||
+ | </values> |
||
+ | $ ./case.submit |
||
+ | $ cat run.b.day1.0.o43637 |
||
+ | ERROR: External script source /opt/load-libs.sh 1 not found |
||
+ | </PRE> |
||
+ | |||
+ | Looking inside /opt/load-libs.sh |
||
+ | <PRE> |
||
+ | <entry id="PRERUN_SCRIPT"> |
||
+ | <type>char</type> |
||
+ | <desc>External script to be run before model completion</desc> |
||
+ | <values> |
||
+ | <value>/opt/env_scripts/load_intel-2021.4.0_mpich-3.4.2.sh</value> |
||
+ | </values> |
||
+ | $ ./case.submit |
||
+ | $ cat run.b.day1.0.o43639 |
||
+ | Running /opt/env_scripts/load_intel-2021.4.0_mpich-3.4.2.sh |
||
+ | /bin/sh: 1: Syntax error: Bad fd number |
||
+ | </PRE> |
||
+ | |||
+ | <PRE> |
||
+ | <entry id="PRERUN_SCRIPT"> |
||
+ | <type>char</type> |
||
+ | <desc>External script to be run before model completion</desc> |
||
+ | <values> |
||
+ | <value>'source /opt/load-libs.sh 1'</value> |
||
+ | </values> |
||
+ | $ ./case.submit |
||
+ | $ cat run.b.day1.0.o43641 |
||
+ | ERROR: External script 'source /opt/load-libs.sh 1' not found |
||
+ | </PRE> |
||
+ | |||
+ | Creation of a simple shell script with the required content in <CODE>/home/lluis.fita/intel_env.csh</CODE> |
||
+ | <PRE> |
||
+ | #!/bin/sh |
||
+ | export PATH="/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin:$PATH" |
||
+ | export PATH="/opt/netcdf/netcdf-4/intel/2021.4.0/bin:$PATH" |
||
+ | export PATH="/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/bin:$PATH" |
||
+ | export PATH="/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/bin:$PATH" |
||
+ | |||
+ | export LD_LIBRARY_PATH=/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/lib:$LD_LIBRARY_PATH |
||
+ | export LD_LIBRARY_PATH=/opt/zlib/zlib-1.2.11/intel/2021.4.0/lib:$LD_LIBRARY_PATH |
||
+ | export LD_LIBRARY_PATH=/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/lib:$LD_LIBRARY_PATH |
||
+ | export LD_LIBRARY_PATH=/opt/netcdf/netcdf-4/intel/2021.4.0/lib:$LD_LIBRARY_PATH |
||
+ | </PRE> |
||
+ | |||
+ | And then |
||
+ | <PRE> |
||
+ | <entry id="PRERUN_SCRIPT"> |
||
+ | <type>char</type> |
||
+ | <desc>External script to be run before model completion</desc> |
||
+ | <values> |
||
+ | <value>/home/lluis/intel_env.sh</value> |
||
+ | </values> |
||
+ | $ ./case.submit |
||
+ | $ cat run.b.day1.0.o43647 |
||
+ | Running /home/lluis.fita/intel_env.csh |
||
+ | /bin/sh: 1: Syntax error: Bad fd number |
||
+ | </PRE> |
||
+ | |||
+ | whereas the execution of the shell from terminal does not give any error |
||
+ | <PRE style="shell"> |
||
+ | $ /bin/sh /home/lluis.fita/intel_env.csh |
||
+ | |||
</PRE> |
</PRE> |
||
Última revisión de 17:39 17 mayo 2024
These are the CESM's installation notes in CIMA/IFAECI's HPC called hydra
.
Notes and process carried out on 13-17th of May 2024, by Lluís Fita (UBA/CIMA/IFAECI, CABA, Argentina) with the assistance of Ass. Prof. Pedro DiNezio (U. Colorado, Boulder) and Dr Nicolás J. Cosentino (UBA/CIMA/IFAECI, CABA, Argentina), and the non-anonymous help of CESM forum and stackoverflow
Contenido |
[editar] intel compilation
CIMA's hydra intel configuration is done via the following instruction:
$ source /opt/load-libs.sh 1 The following libraries, compiled with Intel 2021.4.0 compilers, were loaded: * MPICH 3.4.2 * NetCDF 4 * HDF5 1.10.5 * JASPER 2.0.33
Which creates the following environment:
declare -x ACL_BOARD_VENDOR_PATH="/opt/Intel/OpenCLFPGA/oneAPI/Boards" declare -x ADVISOR_2021_DIR="/opt/intel/oneapi/advisor/2021.4.0" declare -x APM="/opt/intel/oneapi/advisor/2021.4.0/perfmodels" declare -x CCL_CONFIGURATION="cpu_gpu_dpcpp" declare -x CCL_ROOT="/opt/intel/oneapi/ccl/2021.4.0" declare -x CLASSPATH="/opt/intel/oneapi/mpi/2021.4.0//lib/mpi.jar:/opt/intel/oneapi/dal/2021.4.0/lib/onedal.jar" declare -x CLCK_ROOT="/opt/intel/oneapi/clck/2021.4.0" declare -x CMAKE_PREFIX_PATH="/opt/intel/oneapi/vpl/2021.6.0:/opt/intel/oneapi/tbb/2021.4.0/env/..:/opt/intel/oneapi/dal/2021.4.0" declare -x CMPLR_ROOT="/opt/intel/oneapi/compiler/2021.4.0" declare -x CPATH="/opt/intel/oneapi/vpl/2021.6.0/include:/opt/intel/oneapi/tbb/2021.4.0/env/../include:/opt/intel/oneapi/mpi/2021.4.0//include:/opt/intel/oneapi/mkl/2021.4.0/include:/opt/intel/oneapi/ipp/2021.4.0/include:/opt/intel/oneapi/ippcp/2021.4.0/include:/opt/intel/oneapi/ipp/2021.4.0/include:/opt/intel/oneapi/dpl/2021.5.0/linux/include:/opt/intel/oneapi/dpcpp-ct/2021.4.0/include:/opt/intel/oneapi/dnnl/2021.4.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/dev-utilities/2021.4.0/include:/opt/intel/oneapi/dal/2021.4.0/include:/opt/intel/oneapi/compiler/2021.4.0/linux/include:/opt/intel/oneapi/ccl/2021.4.0/include/cpu_gpu_dpcpp" declare -x CPLUS_INCLUDE_PATH="/opt/intel/oneapi/clck/2021.4.0/include" declare -x DAALROOT="/opt/intel/oneapi/dal/2021.4.0" declare -x DALROOT="/opt/intel/oneapi/dal/2021.4.0" declare -x DAL_MAJOR_BINARY="1" declare -x DAL_MINOR_BINARY="1" declare -x DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/1624/bus" declare -x DNNLROOT="/opt/intel/oneapi/dnnl/2021.4.0/cpu_dpcpp_gpu_dpcpp" declare -x DPCT_BUNDLE_ROOT="/opt/intel/oneapi/dpcpp-ct/2021.4.0" declare -x DPL_ROOT="/opt/intel/oneapi/dpl/2021.5.0" declare -x FI_PROVIDER_PATH="/opt/intel/oneapi/mpi/2021.4.0//libfabric/lib/prov:/usr/lib64/libfabric" declare -x FPGA_VARS_ARGS="1" declare -x FPGA_VARS_DIR="/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga" declare -x GDB_INFO="/opt/intel/oneapi/debugger/10.2.4/documentation/info/" declare -x HOME="/home/lluis.fita" declare -x INFOPATH="/opt/intel/oneapi/debugger/10.2.4/gdb/intel64/lib" declare -x INSPECTOR_2021_DIR="/opt/intel/oneapi/inspector/2021.4.0" declare -x INTELFPGAOCLSDKROOT="/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga" declare -x INTEL_LICENSE_FILE="/opt/intel/licenses:/home/lluis.fita/intel/licenses:/opt/intel/oneapi/clck/2021.4.0/licensing:/opt/intel/licenses:/home/lluis.fita/intel/licenses:/Users/Shared/Library/Application Support/Intel/Licenses" declare -x INTEL_PYTHONHOME="/opt/intel/oneapi/debugger/10.2.4/dep" declare -x IPPCP_TARGET_ARCH="intel64" declare -x IPPCRYPTOROOT="/opt/intel/oneapi/ippcp/2021.4.0" declare -x IPPROOT="/opt/intel/oneapi/ipp/2021.4.0" declare -x IPP_TARGET_ARCH="intel64" declare -x I_MPI_ROOT="/opt/intel/oneapi/mpi/2021.4.0" declare -x LANG="en_US.UTF-8" declare -x LANGUAGE="en_US:en" declare -x LD_LIBRARY_PATH="/opt/netcdf/netcdf-4/intel/2021.4.0/lib:/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/lib:/opt/zlib/zlib-1.2.11/intel/2021.4.0/lib:/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/lib:/opt/intel/oneapi/vpl/2021.6.0/lib:/opt/intel/oneapi/tbb/2021.4.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.4.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.4.0//lib/release:/opt/intel/oneapi/mpi/2021.4.0//lib:/opt/intel/oneapi/mkl/2021.4.0/lib/intel64:/opt/intel/oneapi/itac/2021.4.0/slib:/opt/intel/oneapi/ipp/2021.4.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.4.0/lib/intel64:/opt/intel/oneapi/ipp/2021.4.0/lib/intel64:/opt/intel/oneapi/dnnl/2021.4.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/10.2.4/gdb/intel64/lib:/opt/intel/oneapi/debugger/10.2.4/libipt/intel64/lib:/opt/intel/oneapi/debugger/10.2.4/dep/lib:/opt/intel/oneapi/dal/2021.4.0/lib/intel64:/opt/intel/oneapi/compiler/2021.4.0/linux/lib:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/x64:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/emu:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga/linux64/lib:/opt/intel/oneapi/compiler/2021.4.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.4.0/lib/cpu_gpu_dpcpp" declare -x LIBRARY_PATH="/opt/intel/oneapi/vpl/2021.6.0/lib:/opt/intel/oneapi/tbb/2021.4.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.4.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.4.0//lib/release:/opt/intel/oneapi/mpi/2021.4.0//lib:/opt/intel/oneapi/mkl/2021.4.0/lib/intel64:/opt/intel/oneapi/ipp/2021.4.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.4.0/lib/intel64:/opt/intel/oneapi/ipp/2021.4.0/lib/intel64:/opt/intel/oneapi/dnnl/2021.4.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/dal/2021.4.0/lib/intel64:/opt/intel/oneapi/compiler/2021.4.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/compiler/2021.4.0/linux/lib:/opt/intel/oneapi/clck/2021.4.0/lib/intel64:/opt/intel/oneapi/ccl/2021.4.0/lib/cpu_gpu_dpcpp" declare -x LOGNAME="lluis.fita" declare -x LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:" declare -x MANPATH="/opt/intel/oneapi/mpi/2021.4.0/man:/opt/intel/oneapi/itac/2021.4.0/man:/opt/intel/oneapi/debugger/10.2.4/documentation/man:/opt/intel/oneapi/compiler/2021.4.0/documentation/en/man/common:/opt/intel/oneapi/clck/2021.4.0/man::" declare -x MKLROOT="/opt/intel/oneapi/mkl/2021.4.0" declare -x MOTD_SHOWN="pam" declare -x NLSPATH="/opt/intel/oneapi/mkl/2021.4.0/lib/intel64/locale/%l_%t/%N" declare -x OCL_ICD_FILENAMES="libintelocl_emu.so:libalteracl.so:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/x64/libintelocl.so" declare -x OLDPWD declare -x ONEAPI_ROOT="/opt/intel/oneapi" declare -x PATH="/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/bin:/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/bin:/opt/netcdf/netcdf-4/intel/2021.4.0/bin:/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin:/opt/intel/oneapi/vtune/2021.7.1/bin64:/opt/intel/oneapi/vpl/2021.6.0/bin:/opt/intel/oneapi/mpi/2021.4.0//libfabric/bin:/opt/intel/oneapi/mpi/2021.4.0//bin:/opt/intel/oneapi/mkl/2021.4.0/bin/intel64:/opt/intel/oneapi/itac/2021.4.0/bin:/opt/intel/oneapi/inspector/2021.4.0/bin64:/opt/intel/oneapi/dpcpp-ct/2021.4.0/bin:/opt/intel/oneapi/dev-utilities/2021.4.0/bin:/opt/intel/oneapi/debugger/10.2.4/gdb/intel64/bin:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga/llvm/aocl-bin:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga/bin:/opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64:/opt/intel/oneapi/compiler/2021.4.0/linux/bin:/opt/intel/oneapi/clck/2021.4.0/bin/intel64:/opt/intel/oneapi/advisor/2021.4.0/bin64:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin" declare -x PKG_CONFIG_PATH="/opt/intel/oneapi/vtune/2021.7.1/include/pkgconfig/lib64:/opt/intel/oneapi/vpl/2021.6.0/lib/pkgconfig:/opt/intel/oneapi/tbb/2021.4.0/env/../lib/pkgconfig:/opt/intel/oneapi/mpi/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/mkl/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/ippcp/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/inspector/2021.4.0/include/pkgconfig/lib64:/opt/intel/oneapi/dpl/2021.5.0/lib/pkgconfig:/opt/intel/oneapi/dal/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/compiler/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/ccl/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/advisor/2021.4.0/include/pkgconfig/lib64:" declare -x PWD="/home/lluis.fita" declare -x PYTHONPATH="/opt/intel/oneapi/advisor/2021.4.0/pythonapi" declare -x SETVARS_COMPLETED="1" declare -x SETVARS_VARS_PATH="/opt/intel/oneapi/vtune/latest/env/vars.sh" declare -x SHELL="/bin/bash" declare -x SHLVL="1" declare -x SSH_CLIENT="157.92.36.32 41330 22" declare -x SSH_CONNECTION="157.92.36.32 41330 157.92.28.248 22" declare -x SSH_TTY="/dev/pts/0" declare -x TBBROOT="/opt/intel/oneapi/tbb/2021.4.0/env/.." declare -x TERM="xterm-256color" declare -x USER="lluis.fita" declare -x VTUNE_PROFILER_2021_DIR="/opt/intel/oneapi/vtune/2021.7.1" declare -x VT_ADD_LIBS="-ldwarf -lelf -lvtunwind -lm -lpthread" declare -x VT_LIB_DIR="/opt/intel/oneapi/itac/2021.4.0/lib" declare -x VT_MPI="impi4" declare -x VT_ROOT="/opt/intel/oneapi/itac/2021.4.0" declare -x VT_SLIB_DIR="/opt/intel/oneapi/itac/2021.4.0/slib" declare -x XDG_DATA_DIRS="/usr/local/share:/usr/share:/var/lib/snapd/desktop" declare -x XDG_RUNTIME_DIR="/run/user/1624" declare -x XDG_SESSION_CLASS="user" declare -x XDG_SESSION_ID="1848" declare -x XDG_SESSION_TYPE="tty" declare -x http_proxy="http://proxy1.cima.fcen.uba.ar:3128/"
[editar] Downloading CESM2
Following CESM2 Quick Start Guide and CESM2 github instructions and this tutorial
Cloning the code for version 2.2.2:
$ mkdir -p CESM/v2.2.2/intel $ git clone -b release-cesm2.2.2 git@github.com:ESCOMP/CESM.git my_cesm_sandbox
This will create a directory my_cesm_sandbox/ in your current working directory.
We got the following
$ cd my_cesm_sandbox/ $ ls ChangeLog CODE_OF_CONDUCT.md Externals.cfg manage_externals ChangeLog_template describe_version Externals_cime.cfg README.rst cime_config doc LICENSE.txt
Verifying that installation was fine:
$ ./manage_externals/checkout_externals $ ls ChangeLog CODE_OF_CONDUCT.md Externals.cfg README.rst ChangeLog_template components Externals_cime.cfg cime describe_version LICENSE.txt cime_config doc manage_externals
[editar] Defining the compilation / installation
[editar] Machine
We need to create the configuration for hydra. For that purpose we are going to use the exampled configuration for"centos7-linux"
from file cime/config/cesm/machines/config_machines.xml. This is its content:
<machine MACH="hydra"> <DESC> Example port to CIMA's hydra </DESC> <NODENAME_REGEX>node</NODENAME_REGEX> <OS>LINUX Debian</OS> <PROXY> https://howto.get.out </PROXY> <COMPILERS>gnu</COMPILERS> <MPILIBS>mpich</MPILIBS> <PROJECT>none</PROJECT> <SAVE_TIMING_DIR> </SAVE_TIMING_DIR> <CIME_OUTPUT_ROOT>$ENV{HOME}/cesm/scratch</CIME_OUTPUT_ROOT> <DIN_LOC_ROOT>/share/cesm/inputdata</DIN_LOC_ROOT> <DIN_LOC_ROOT_CLMFORC>/share/cesm/inputdata/lmwg</DIN_LOC_ROOT_CLMFORC> <DOUT_S_ROOT>/share/cesm/expriments/$CASE</DOUT_S_ROOT> <BASELINE_ROOT>$ENV{HOME}/cesm/cesm_baselines</BASELINE_ROOT> <CCSM_CPRNC>$ENV{HOME}/cesm/tools/cime/tools/cprnc/cprnc</CCSM_CPRNC> <GMAKE>make</GMAKE> <GMAKE_J>8</GMAKE_J> <BATCH_SYSTEM>pbs</BATCH_SYSTEM> <SUPPORTED_BY>soporte@cima.fcen.uba.ar</SUPPORTED_BY> <MAX_TASKS_PER_NODE>128</MAX_TASKS_PER_NODE> <MAX_MPITASKS_PER_NODE>128</MAX_MPITASKS_PER_NODE> <PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED> <mpirun mpilib="impi"> <executable>mpiexec</executable> <arguments> <arg name="ntasks"> -np {{ total_tasks }} </arg> </arguments> </mpirun> <environment_variables> <env name="OMP_STACKSIZE">256M</env> </environment_variables> <resource_limits> <resource name="RLIMIT_STACK">-1</resource> </resource_limits> </machine>
Creation of the hydra's CESM main folders
$ mkdir /share/cesm/expriments/ $ mkdir /share/cesm/inputdata
[editar] Compilers
Configuration of the compilers is done via the file cime/config/cesm/machines/config_cmopilers.xml
.
Some modifications are introduced in order to make sure that compilation is done throughtout hydra's intel configration (after this post)
$ cp cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml $ diff cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml 1636,1690d1635 < <compiler MACH="hydra" COMPILER="intel"> < <CFLAGS> < <base> -qno-opt-dynamic-align -fp-model precise -std=gnu99 </base> < <append compile_threaded="TRUE"> -qopenmp </append> < <append DEBUG="FALSE"> -O2 -debug minimal </append> < <append DEBUG="TRUE"> -O0 -g </append> < </CFLAGS> < <CPPDEFS> < <!-- http://software.intel.com/en-us/articles/intel-composer-xe/ --> < <append> -DFORTRANUNDERSCORE -DCPRINTEL</append> < </CPPDEFS> < <CXX_LINKER>FORTRAN</CXX_LINKER> < <FC_AUTO_R8> < <base> -r8 </base> < </FC_AUTO_R8> < <FFLAGS> < <base> -qno-opt-dynamic-align -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -fp-model source </base> < <append compile_threaded="TRUE"> -qopenmp </append> < <append DEBUG="TRUE"> -O0 -g -check uninit -check bounds -check pointers -fpe0 -check noarg_temp_created </append> < <append DEBUG="FALSE"> -O2 -debug minimal </append> < </FFLAGS> < <FFLAGS_NOOPT> < <base> -O0 </base> < </FFLAGS_NOOPT> < <NETCDF_C_PATH>/opt/netcdf/netcdf-4/intel/2021.4.0</NETCDF_C_PATH> < <NETCDF_FORTRAN_PATH>/opt/netcdf/netcdf-4/intel/2021.4.0/lib</NETCDF_FORTRAN_PATH> < <FIXEDFLAGS> < <base> -fixed </base> < </FIXEDFLAGS> < <FREEFLAGS> < <base> -free </base> < </FREEFLAGS> < <LDFLAGS> < <append compile_threaded="TRUE"> -qopenmp </append> < </LDFLAGS> < <MPICC> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicc </MPICC> < <MPICXX> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicxx </MPICXX> < <MPIFC> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpif90 </MPIFC> < <SCC> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/icc </SCC> < <SCXX> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/icpc </SCXX> < <SFC> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/ifort </SFC> < <SLIBS> < <append MPILIB="mpich"> -mkl=cluster </append> < <append MPILIB="mpich2"> -mkl=cluster </append> < <append MPILIB="mvapich"> -mkl=cluster </append> < <append MPILIB="mvapich2"> -mkl=cluster </append> < <append MPILIB="mpt"> -mkl=cluster </append> < <append MPILIB="openmpi"> -mkl=cluster </append> < <append MPILIB="impi"> -mkl=cluster </append> < <append MPILIB="mpi-serial"> -mkl </append> < <append>-L$(NETCDF_C_PATH)/lib -L$(NETCDF_FORTRAN_PATH)/lib -lnetcdff -lnetcdf -L$ENV{MKLROOT} -lmkl_rt </append> < </SLIBS> < <SUPPORTS_CXX>TRUE</SUPPORTS_CXX> < </compiler> <
[editar] PIO
Configuration of the Parallel I/O Library (PIO) used for the input/output of the model is done with the file cime/config/cesm/machines/config_pio.xml
.
By now it is left as it is provided.
[editar] Batch
Configuration of the batch system is done with the file cime/config/cesm/machines/config_batch.xml
A new configuration has been added for hydra pbs queue system
$ cp cime/config/cesm/machines/config_batch.xml cime/config/cesm/machines/config_batch_orig.xml $ diff cime/config/cesm/machines/config_batch.xml cime/config/cesm/machines/config_batch_orig.xml 698,728d697 < <batch_system MACH="hydra" type="pbs" > < <batch_query args="-f" >qstat</batch_query> < <batch_submit>qsub </batch_submit> < <batch_cancel>qdel</batch_cancel> < <batch_env>-v</batch_env> < <batch_directive>#PBS</batch_directive> < <jobid_pattern>^(\S+)$</jobid_pattern> < <depend_string> -W depend=afterok:jobid</depend_string> < <depend_allow_string> -W depend=afterany:jobid</depend_allow_string> < <depend_separator>:</depend_separator> < <walltime_format>%H:%M:%S</walltime_format> < <batch_mail_flag>-M</batch_mail_flag> < <batch_mail_type_flag>-m</batch_mail_type_flag> < <batch_mail_type>, bea, b, e, a</batch_mail_type> < <submit_args> < <arg flag="-q" name="$JOB_QUEUE"/> < <arg flag="-l walltime=" name="$JOB_WALLCLOCK_TIME"/> < <arg flag="-A" name="$PROJECT"/> < </submit_args> < <directives> < <directive>-N {{ job_id }}</directive> < <directive default="n"> -r {{ rerunnable }} </directive> < <!-- <directive> -j oe {{ job_id }} </directive> --> < <directive> -j oe </directive> < <directive> -V </directive> < </directives> < <queues> < <queue walltimemin="" walltimemax="168:00:00" nodemin="0" nodemax="5" default="true">larga</queue> < </queues> < </batch_system> <
[editar] Work-flow
configuration of the work-flow is done via the file cime/config/cesm/machines/config_batch.xml
By now is it left as it is.
[editar] Case creation
Prior to the compilation, we need to make sure that all the optional additional components have already been compiled (see next section). After that we are going to use cime/scripts/create_newcase
to compile and use the model. The model is compiled for each new experiment, since user might activate different components each time and compiling only the required components makes the simulation more efficient.
./create_newcase --case [CaseName] --res [Resolution] --compset [Compset] --mach hydra
Where:
-
[CaseName]
: Convention for the name of the experiment is done following this CESM Naming conventions web-page -
[Resolution]
: Available resolutions to use from CESM grids (not working). (We can use either:python3 cime/scripts/query_config --grids --full
) -
[Compset]
: Componets activated -
hydra
: HPC to use
$ source /opt/load-libs.sh 1 $ cd cime/scripts/ $ ./create_newcase --case /share/cesm/expriments/b.day1.0 --res f19_g17 --compset B1850 --mach hydra >& run_create_newcase.log
[editar] Errors found
One must be aware, that it seems that error messages are not quite informative (by now)
Python error
/usr/bin/env: ‘python’: No such file or directory
Fixed by imposing python3
$ cp create_newcase create_newcase_orig $ diff create_newcase create_newcase_orig 1c1 < #!/usr/bin/env python3 --- > #!/usr/bin/env python
Checking for the correct execution in file run_create_newcase.log
and an error message is detected:
xmllint not found,...
Which seems to be related to the absence of the package libxml2-utils
New error:
Compset specific settings: name is RUN_STARTDATE and value is 0001-01-01 ERROR: Command: '/usr/bin/xmllint --xinclude --noout --schema /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/xml_schemas/co nfig_machines.xsd /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/cesm/machines/config_machines.xml' failed with error '/hom e/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/cesm/machines/config_machines.xml:2776: element machine: Schemas validity error : Element 'machine': Missing child element(s). Expected is one of ( mpirun, module_system ). /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/cesm/machines/config_machines.xml fails to validate' from dir '/home/lluis.f ita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/scripts'
machine configuration for hydra was lacking, that it does not have modules
, so it was included:
(...) </mpirun> <module_system type="none"/> (...)
Queue-system related error:
Batch_system_type is pbs job is case.run USER_REQUESTED_WALLTIME None USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S WARNING: No queue on this system met the requirements for this job. Falling back to defaults ERROR: No queues found
No queue system was created for hydra. Now it has been added into cime/config/cesm/machines/config_batch.xml
[editar] Case setup
Now we are ready to set-up the case.
Going to the folder with the case ($ROOTCASE
folder as it is understood)
$ cd /share/cesm/expriments/b.day1.0 $ ./case.setup >& run_case-setup.log
Looking for errors:
$ tail run_case-setup.log You can now run './preview_run' to get more info on how your case will be run
Which provides the following configuration:
$ ./preview_run >& run_preview_run.log $ cat run_preview_run.log CASE INFO: nodes: 6 total tasks: 768 tasks per node: 128 thread count: 1 BATCH INFO: FOR JOB: case.run ENV: Setting Environment OMP_NUM_THREADS=1 SUBMIT CMD: qsub -q larga -l walltime=168:00:00 -A none -q larga -l walltime=168:00:00 -A none -v ARGS_FOR_SCRIPT='--resubmit' .case.run MPIRUN (job=case.run): mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 FOR JOB: case.st_archive ENV: Setting Environment OMP_NUM_THREADS=1 SUBMIT CMD: qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:0 -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive
[editar] errors
First error related to python...
$ ./case.setup /usr/bin/env: ‘python’: No such file or directory
Modifying it to use python3
$ cp case.setup case.setup_orig $ diff case.setup case.setup_orig 1c1 < #!/usr/bin/env python3 --- > #!/usr/bin/env python
It happens everywhere, so, either I ask my IT team (as root) to fix it by creating a symlink to python, but based in python3 (see here)
# ln -s /usr/bin/python3 /usr/bin/python
Or I go back to the cime/scripts
folder and change it everywhere: create_clone, create_newcase, create_test, query_config, query_testlists
Modify all the python scripts within cime/scripts/Tools
$ cp -R Tools Tools_orig
Scripts being modified: Tools/archive_metadata, Tools/bld_diff, Tools/bless_test_results, Tools/case.build, Tools/case.cmpgen_namelists, Tools/case_diff, Tools/case.qstatus, Tools/case.setup, Tools/case.submit, Tools/check_case, Tools/check_input_data, Tools/check_lockedfiles, Tools/cime_bisect, Tools/code_checker, Tools/compare_namelists, Tools/compare_test_results, Tools/component_compare_baseline, Tools/component_compare_copy, Tools/component_compare_test, Tools/component_generate_baseline, Tools/cs.status, Tools/e3sm_check_env, Tools/generate_cylc_workflow.py, Tools/get_case_env, Tools/get_standard_makefile_args, Tools/getTiming, Tools/jenkins_generic_job, Tools/list_e3sm_tests, Tools/list_e3sm_tests, Tools/mvsource, Tools/normalize_cases, Tools/pelayout, Tools/preview_namelists, Tools/preview_run, Tools/save_provenance, Tools/simple_compare, Tools/testreporter.py, Tools/wait_for_tests, Tools/xmlchange, Tools/xmlquery
Also inside folder Tools/xmlconvertors
: Tools/xmlconvertors/config_pes_converter.py, Tools/xmlconvertors/grid_xml_converter.py, Tools/xmlconvertors/convert-grid-v1-to-v2
[editar] Case Build
Compiling the code for the case, before we are going to clean it, just in case...
$ ./case.build --clean $ ./case.build >& run_case-build.log
Looking for errors:
After succesfull compilation, verify the presence of all the required input data with:
$ tail run_case-build.log (...) siac built in 1.117004 seconds sesp built in 1.145180 seconds cam built in 1.169075 seconds Component glc build complete with 2 warnings cism built in 233.087502 seconds Building cesm from /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/src/drivers/mct/cime_config/buildexe with output to /home/lluis. fita/cesm/scratch/b.day1.0/bld/cesm.bldlog.240516-151609 Time spent not building: 6.916887 sec Time spent building: 260.266792 sec MODEL BUILD HAS FINISHED SUCCESSFULLY $ ./check_input_data --download >& run_check_input_data.log (...) Model cpl missing file wav2ocn_smapname = '/share/cesm/inputdata/cpl/gridmaps/ww3a/map_ww3a_TO_gx1v7_splice_170214.nc' Trying to download file: 'cpl/gridmaps/ww3a/map_ww3a_TO_gx1v7_splice_170214.nc' to path '/share/cesm/inputdata/cpl/gridmaps/ww3a/map_ww3a_T O_gx1v7_splice_170214.nc' using WGET protocol. SUCCESS
[editar] Errors
There are errors related to the use of python
$ tail run_case-build.log (...) Building lnd with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236 clm built in 0.009455 seconds ERROR: BUILD FAIL: clm.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236
Looking into:
cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236 /usr/bin/env: ‘python’: No such file or directory
(...) ERROR: Command /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/components/pop/bld/build-namelist failed rc=2 out=ERROR in MARBL_diags_to_tavg.py err=/usr/bin/env: ‘python’: No such file or directory ERROR: env CASEROOT=/share/cesm/expriments/b.day1.0 CASEBUILD=/share/cesm/expriments/b.day1.0/Buildconf OCN_GRID=gx1v7OCN_TAVG_TRAC ER_BUDGET=FALSE POPROOT=/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/components/pop /home/lluis.fita/CESM/v2.2.2/intel/my_ces m_sandbox/components/pop/input_templates/ocn.ecosys.tavg.csh 4 .false. .true. failed: 256
Editing various python scripts to make sure of the use of python3, in order to make this happen, from $ROOTCESM
, I executed the following code:
$ head [place]/* | grep -B 2 'env python' | grep -v directory >& py.log $ cat py.log
being [place]
, incremental position within the folder structure of CESM: ./, *, */*, */*/*, */*/*/*, */*/*/*/*, */*/*/*/*/*, ...
For example
$ head ./* | grep -B 2 'env python' | grep -v directory >& py.log head: error reading './cime': Is a directory head: error reading './cime_config': Is a directory head: error reading './components': Is a directory head: error reading './components_orig': Is a directory head: error reading './doc': Is a directory head: error reading './manage_externals': Is a directory $ cat py.log ==> ./describe_version <== #!/usr/bin/env python3
Or even better:
$ grep -i python [place]/* | grep env | grep -v orig | grep -v python3 >& py.log
components/pop/MARBL_scripts/add_cocco_to_init.py, components/pop/MARBL_scripts/MARBL_diags_to_tavg.py, cime/src/components/stub_comps_mct/siac/cime_config/buildlib, cime/src/components/stub_comps_mct/siac/cime_config/buildlib_cmake, cime/src/components/stub_comps_mct/siac/cime_config/buildnml
Modifying more python scripts in cime/src/build_scripts
: cime/src/build_scripts/buildlib.cprnc, cime/src/build_scripts/buildlib.csm_share, cime/src/build_scripts/buildlib.gptl, cime/src/build_scripts/buildlib.kokkos, cime/src/build_scripts/buildlib.mct, cime/src/build_scripts/buildlib.mpi-serial, cime/src/build_scripts/buildlib.pio
Modifying more python scripts: components/clm/bld/namelist_files/createMkSrfEntry.py, components/clm/run_sys_tests, components/cam/cime_config/buildcpp, components/cam/cime_config/buildlib, components/cam/cime_config/buildnml, components/cam/manage_externals/checkout_externals, components/cdeps/cime_config/buildlib, components/cice/cime_config/buildcpp, components/cice/cime_config/buildlib, components/cice/cime_config/buildlib, components/cice/cime_config/buildnml, components/cism/manage_externals/checkout_externals, components/clm/cime_config/buildlib, components/clm/cime_config/buildnml, components/clm/manage_externals/checkout_externals, components/clm/python/run_ctsm_py_tests, components/mosart/cime_config/buildlib, components/mosart/cime_config/buildnml, components/pop/cime_config/phys_cycle_postrun, components/pop/cime_config/phys_cycle_preruncomponents/pop/cime_config/buildcpp, components/pop/cime_config/buildlib, components/pop/cime_config/buildnml, components/rtm/cime_config/buildlibm components/rtm/cime_config/buildnml, components/ww3/cime_config/buildlib, components/ww3/cime_config/buildnml
Also: components/clm/python/ctsm/test/test_sys_lilac_build_ctsm.py, components/clm/python/ctsm/test/test_unit_lilac_build_ctsm.py, components/clm/python/ctsm/test/test_unit_lilac_make_runtime_inputs.py, components/clm/python/ctsm/test/test_unit_machine.py, components/clm/python/ctsm/test/test_unit_path_utils.py, components/clm/python/ctsm/test/test_unit_run_sys_tests.py, components/clm/python/ctsm/test/test_unit_utils.py, components/clm/src/fates/tools/FatesPFTIndexSwapper.py, components/clm/src/fates/tools/modify_fates_paramfile.py, components/clm/src/fates/tools/ncvarsort.py, components/pop/externals/CVMix/bld/cvmix_setup
Another set pf modified files: cime/config/cesm/machines/template.case.test, cime/config/cesm/machines/template.st_archive, cime/config/ufs/machines/template.case.run, vim cime/config/ufs/machines/template.case.test, cime/config/ufs/machines/template.st_archive, cime/scripts/lib/CIME/BuildTools/configure.py, cime/scripts/lib/CIME/case/case_submit.py
There is an error during the building of the CISM component. The output of case.build
:
(...) - Building clm library Building lnd with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-155359 Component lnd build complete with 6 warnings clm built in 184.188484 seconds - Building atm Library Building atm with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/atm.bldlog.240515-155359 - Building ice Library Building ice with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/ice.bldlog.240515-155359 - Building ocn Library Building ocn with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/ocn.bldlog.240515-155359 - Building rof Library Building rof with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/rof.bldlog.240515-155359 - Building glc Library Building glc with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240515-155359 - Building wav Library Building wav with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/wav.bldlog.240515-155359 - Building iac Library Building iac with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/iac.bldlog.240515-155359 - Building esp Library Building esp with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/esp.bldlog.240515-155359 sesp built in 1.537159 seconds siac built in 1.544745 seconds ifort: error #10236: File not found: '/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc/fortran_autogen_srcs/glide_io.F90' ifort: error #10236: File not found: '/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc/fortran_autogen_srcs/glint_io.F90' cism built in 35.435328 seconds mosart built in 42.410334 seconds ww built in 54.594975 seconds Component ice build complete with 1 warnings cice built in 81.223358 seconds Component ocn build complete with 13 warnings pop built in 145.569954 seconds Component atm build complete with 14 warnings cam built in 229.021996 seconds ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240515-155359
These files, that should be automatically created, are not being written:
bld/glc/fortran_autogen_srcs/glide_io.F90 bld/glc/fortran_autogen_srcs/glint_io.F90
In the case directory one has the file Buildconf/cismIOconf/cism.buildIO.csh
, with the following content:
(...) # create new _io.F90 file using CISM's python script # --------------------------------------------------------------------------- $PYTHON generate_ncvars.py $file_varsdef ncdf_template.F90.in
It is looking for the environment variable that holds python
. In the root directory of CESM the file components/cism/bld/cismIO/README.cismIO
:
This directory and its scripts are intended to allow the user to change IO fields from the CISM code. The CISM IO files, *_io.F90, are auto-generated and typically difficult to modify. However, the corresponding variable definition files, *_vars.def, are easily modified and the IO files can be re-generated by running the cism.buildIO.csh script contained in this directory. Usage of this script requires that the user has defined an enviroment variable PYTHON pointing to a local version of python, After that, the user simply runs the enclosed cism.buildIO.csh script, which runs a python script on each (...)
Therefore I need to find where the PYTHON
is defined within the cime/config/cesm/machines/config_*.xml
files, or I defined it manually. It does not work it is defined previously the execution:
$ export PYTHON=/usr/bin/python3 $ echo $PYTHON /usr/bin/python3 $ ./case.build --clean $ ./case.build
Therefore, it is defined within Buildconf/cismIOconf/cism.buildIO.csh
(must be defined in the template of this shell script!!). Looking in $ROOTCESM
into components/cism/source_cism/utils/build/generate_ncvars.py
, it is using python2! hydra does not support python2.
#!/usr/bin/env python2
Code is being modified by
#!/usr/bin/env python3
Then 2 other files need to be modified also: components/cism/source_cism/utils/build/autogenerate-in-build-dir, components/cism/source_cism/utils/build/autogen-for-glint-and-glad-in-build-dir
, whenever they call python
is replaced by python3
$ cat components/cism/source_cism/utils/build/autogenerate-in-build-dir | grep python # Call python script with source file arguments python3 -V python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLIDE_VARS_PATH $NCDF_TEMPL_PATH $ cat components/cism/source_cism/utils/build/autogen-for-glint-and-glad-in-build-dir | grep python # Call python script with source file arguments python3 -V python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLINT_VARS_PATH $NCDF_TEMPL_PATH python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLINT_MBAL_PATH $NCDF_TEMPL_PATH python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLAD_VARS_PATH $NCDF_TEMPL_PATH python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLAD_MBAL_PATH $NCDF_TEMPL_PATH
[editar] Case send
Submitting the case:
$ ./case.submit >& run_case-submit.log
Checking that it is running
$ qstat -u $USER hydra: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - --------- 43629.hydra lluis.fita larga run.b.day1.0 721 1 1 -- 168:00:00 R 00:00:51 43630.hydra lluis.fita larga st_archive.b.day -- 1 1 -- 00:20:00 H --
[editar] Errors
[editar] mpirun not found
Submission error as:
$ cat run.b.day1.0.o43627 (...) run command is mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ERROR: RUN FAIL: Command 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514 $ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514 /bin/sh: 1: mpirun: not found
Redifined mpirun
in cime/config/cesm/machines/config_machines.xml
as:
<executable>/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun</executable>
If it is too late (prior to submit), it can be directly changed in the case directory in file env_mach_specific.xml
[editar] hydra's compilation environment
Simulation finished too early ...
$ qstat -u $USER hydra: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - --------- 43629.hydra lluis.fita larga run.b.day1.0 721 1 1 -- 168:00:00 C -- 43630.hydra lluis.fita larga st_archive.b.day -- 1 1 -- 00:20:00 C --
Looking in the logs:
$ cat run.b.day1.0.o43629 Generating namelists for /share/cesm/expriments/b.day1.0 Creating component namelists (...) ------------------------------------------------------------------------- run command is /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 tm_poll: got event 1 return 0 tm_poll: INIT nodes 1 tm_poll: INIT daddy jobid 43629.hydra tm_poll: INIT daddy node 0 tm_poll: INIT daddy tid 1 new_task: jobid=43629.hydra node=0 task=1 new_task: jobid=43629.hydra node=0 task=2 ERROR: RUN FAIL: Command '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/ces m.exe >> cesm.log.$LID 2>&1 ' failed See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533 $ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533 (...) Invalid character . in PBS_JOBID tm_poll: got event 1 return 0 tm_poll: INIT nodes 1 tm_poll: INIT daddy jobid 43629.hydra tm_poll: INIT daddy node -1 tm_poll: INIT daddy tid 0 new_task: jobid=43629.hydra node=-1 task=0 new_task: called with TM_ERROR_NODE new_task: jobid=43629.hydra node=0 task=1 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 1006 RUNNING AT node43 = EXIT CODE: 9 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== tm_poll: got event 2 return 0 new_task: jobid=43629.hydra node=0 task=2 tm_poll: got event 3 return 0 YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions
And also:
$ cat CaseStatus 2024-05-16 11:52:51: case.setup starting --------------------------------------------------- 2024-05-16 11:52:52: case.setup success --------------------------------------------------- 2024-05-16 11:55:32: case.build starting --------------------------------------------------- 2024-05-16 12:04:41: case.build error ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240516-115532 --------------------------------------------------- 2024-05-16 15:12:59: case.build starting --------------------------------------------------- 2024-05-16 15:13:16: case.build error ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240516-151259 --------------------------------------------------- 2024-05-16 15:16:09: case.build starting --------------------------------------------------- CESM version is release-cesm2.2.2 Processing externals description file : Externals.cfg (/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox) (...) Checking local status of required & optional components: cam, chem_proc, carma, cosp2, clubb, silhs, pumas, atmos_phys, atmos_cubed_sphere, cice, cdeps, fox, cime, cmeps, cism, source_cism, clm, fates, fms, mom, mosart, pop, cvmix, marbl, rtm, ww3, M ./cime modified sandbox, on cime5.8.32.9 e-o ./cime/src/drivers/nuopc/ -, not checked out --> M ./components/cam modified sandbox, on cam_cesm2_2_rel_09 ./components/cam/chem_proc clean sandbox, on tools/proc_atm/chem_proc/release_tags/chem_proc5_0_04 ./components/cam/src/atmos_phys clean sandbox, on version0_00_007 ./components/cam/src/dynamics/fv3/atmos_cubed_sphere clean sandbox, on fv3_cesm.04 ./components/cam/src/physics/carma/base clean sandbox, on carma/release_tags/carma3_49_rel ./components/cam/src/physics/clubb clean sandbox, on clubb_release_b76a124_20200220_c20200320 ./components/cam/src/physics/cosp2/src clean sandbox, on v2.1.4cesm ./components/cam/src/physics/pumas clean sandbox, on pumas_cam-release_v1.3 ./components/cam/src/physics/silhs clean sandbox, on silhs_clubb_release_b76a124_20200220_c20200320 M ./components/cdeps modified sandbox, on d808b7c6f78a2d5dcfeb1da0d1a452a9b66e08c8 ./components/cdeps/fox clean sandbox, on 7b9488446b193192dd3f0378541e71099cb4e8a8 M ./components/cice modified sandbox, on cice5-cesm2.2.2-20231220 M ./components/cism modified sandbox, on cism2_1_69_b M ./components/cism/source_cism modified sandbox, on release-cesm2.2.2-f1a88d6derecho M ./components/clm modified sandbox, on release-cesm2.2.04 M ./components/clm/src/fates modified sandbox, on sci.1.30.0_api.8.0.0 e-o ./components/mom -, not checked out --> mi_20200908 M ./components/mosart modified sandbox, on mosart1_0_37_1 M ./components/pop modified sandbox, on pop2_cesm2_2_rel_n01 M ./components/pop/externals/CVMix modified sandbox, on v0.98-beta ./components/pop/externals/MARBL clean sandbox, on cesm2.2-n00 M ./components/rtm modified sandbox, on rtm1_0_72 M ./components/ww3 modified sandbox, on ww3_200710 e-o ./libraries/FMS -, not checked out --> fi_20200609_cesm2.2_231205 2024-05-16 15:20:36: case.build success --------------------------------------------------- 2024-05-16 16:55:05: case.submit starting --------------------------------------------------- 2024-05-16 16:55:12: case.submit error ERROR: Command: 'qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:43 627.hydra -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error 'qsub: submit error (Invalid request)' fro m dir '/share/cesm/expriments/b.day1.0' --------------------------------------------------- 2024-05-16 16:55:14: case.run starting --------------------------------------------------- 2024-05-16 16:55:41: model execution starting --------------------------------------------------- 2024-05-16 16:55:41: model execution error ERROR: Command: 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed with error '' from dir '/home/lluis.fita/cesm/scratch/b.day1.0/run' --------------------------------------------------- 2024-05-16 16:55:41: case.run error ERROR: RUN FAIL: Command 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514 --------------------------------------------------- 2024-05-16 17:05:25: case.submit starting --------------------------------------------------- 2024-05-16 17:05:33: case.submit error ERROR: Command: 'qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:43 629.hydra -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error 'qsub: submit error (Invalid request)' fro m dir '/share/cesm/expriments/b.day1.0' --------------------------------------------------- 2024-05-16 17:05:33: case.run starting --------------------------------------------------- 2024-05-16 17:05:43: model execution starting --------------------------------------------------- 2024-05-16 17:06:34: model execution error ERROR: Command: '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/ cesm.exe >> cesm.log.$LID 2>&1 ' failed with error '' from dir '/home/lluis.fita/cesm/scratch/b.day1.0/run' --------------------------------------------------- 2024-05-16 17:06:34: case.run error ERROR: RUN FAIL: Command '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.da y1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533 ---------------------------------------------------
I suspect that it is related to the compilation environment. hydra does not have module
. It set-ups compilation environment via shell scriupt /opt/load-libs.sh
. There must be a way to systematically introduce it in the config_batch.xml
in order to be executed in all the PBS jobs. A new post is created in the CESM forum #post-55530
From another attempt
$ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43633.hydra.240517-155125 | more /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading shared libraries: libnetcdf.so.19: cannot open shared object file: No suc h file or directory /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading shared libraries: libnetcdf.so.19: cannot open shared object file: No suc h file or directory /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading shared libraries: libnetcdf.so.19: cannot open shared object file: No suc h file or directory /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading shared libraries: libnetcdf.so.19: cannot open shared object file: No suc h file or directory /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading shared libraries: libnetcdf.so.19: cannot open shared object file: No suc h file or directory (...)
Is set to use the variable prerun_script
into the file in the case directory env_run.xml
, see more details (searching for it here)
Trying different options:
<entry id="PRERUN_SCRIPT"> <type>char</type> <desc>External script to be run before model completion</desc> <values> <value>source /opt/load-libs.sh 1</value> </values> $ ./case.submit $ cat run.b.day1.0.o43637 ERROR: External script source /opt/load-libs.sh 1 not found
Looking inside /opt/load-libs.sh
<entry id="PRERUN_SCRIPT"> <type>char</type> <desc>External script to be run before model completion</desc> <values> <value>/opt/env_scripts/load_intel-2021.4.0_mpich-3.4.2.sh</value> </values> $ ./case.submit $ cat run.b.day1.0.o43639 Running /opt/env_scripts/load_intel-2021.4.0_mpich-3.4.2.sh /bin/sh: 1: Syntax error: Bad fd number
<entry id="PRERUN_SCRIPT"> <type>char</type> <desc>External script to be run before model completion</desc> <values> <value>'source /opt/load-libs.sh 1'</value> </values> $ ./case.submit $ cat run.b.day1.0.o43641 ERROR: External script 'source /opt/load-libs.sh 1' not found
Creation of a simple shell script with the required content in /home/lluis.fita/intel_env.csh
#!/bin/sh export PATH="/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin:$PATH" export PATH="/opt/netcdf/netcdf-4/intel/2021.4.0/bin:$PATH" export PATH="/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/bin:$PATH" export PATH="/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/bin:$PATH" export LD_LIBRARY_PATH=/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/opt/zlib/zlib-1.2.11/intel/2021.4.0/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/opt/netcdf/netcdf-4/intel/2021.4.0/lib:$LD_LIBRARY_PATH
And then
<entry id="PRERUN_SCRIPT"> <type>char</type> <desc>External script to be run before model completion</desc> <values> <value>/home/lluis/intel_env.sh</value> </values> $ ./case.submit $ cat run.b.day1.0.o43647 Running /home/lluis.fita/intel_env.csh /bin/sh: 1: Syntax error: Bad fd number
whereas the execution of the shell from terminal does not give any error
$ /bin/sh /home/lluis.fita/intel_env.csh
[editar] Optional additional components
[editar] ESM
It is necessary the High Performance Modeling Infrastructure (ESM) in May 2024, the latest version was the 8.6.0 and is the one being installed.
Getting the code in hydra in folder ESM
$ wget https://github.com/esmf-org/esmf/archive/refs/tags/v8.6.0.tar.gz
Compilation is done following these documentation, which the right detailed procedure specific-compilation-instructions.
[editar] GNU compilation
Installation location
$ mkdir -p v860/gnu $ cd v860/gnu $ tar xvfz ../v8.6.0.tar.gz $ cd esmf-8.6.0/ $ ls build build_config cmake LICENSE makefile README.md scripts src
Starting by defining the location of the code
$ export ESMF_DIR=$PWD
Running to get local information
$ make info >& run_make_info.log
Launching the makefile in parallel:
$ make -j8 lib >& run_make.log $ tail run_make.log (...) ESMF library built successfully on Mon 13 May 2024 12:58:46 PM -03 To verify, build and run the unit and system tests with: make check or the more extensive: make all_tests
Verifying that installation worked fine
$ make all_tests >& run_make_tests.log $ $ cat run_make_tests.log | grep failed The following unit test files failed to build, failed to execute or crashed during execution: Found 7800 non-exhaustive single processor unit tests, 7744 passed and 56 failed. Found 8 single processor system tests, 8 passed and 0 failed. Found 44 single processor examples, 44 passed and 0 failed. Found 8 single processor system tests, 8 passed and 0 failed. Found 44 single processor examples, 44 passed and 0 failed. Found 7800 non-exhaustive single processor unit tests, 7744 passed and 56 failed.
Installing
$ make install >& run_make_install.log $ tail run_make_install.log (...) ESMF installation complete.
[editar] intel compilation
Loading hydra's intel compilation environment
$ source /opt/load-libs.sh 1
Installation location
$ mkdir v860/intel $ cd v860/intel $ tar xvfz ../v8.6.0.tar.gz $ cd esmf-8.6.0/ $ ls build build_config cmake LICENSE makefile README.md scripts src
[editar] Porting and validating CIME on a new platform
http://esmci.github.io/cime/versions/master/html/users_guide/porting-cime.html
[editar] Downloading the Input data
All input data will be downloaded in
/share/cesm
Input datasets are needed to run the model. CESM input data are available through a separate Subversion input data repository.
- Change check_input_data header so it runs with Python 2.7.x version:
sed -i -e 's!/usr/bin/env python!/share/anaconda2/bin/python!' ./cime/scripts/Tools/check_input_data