CESMInstall WIKI CESM - hydra

De Wikicima
(Diferencias entre revisiones)
Saltar a: navegación, buscar
(errors)
(hydra's compilation environment)
 
(No se muestran 47 ediciones intermedias realizadas por un usuario)
Línea 1: Línea 1:
  +
These are the CESM's installation notes in [http://www.cima.fcen.uba.ar/ CIMA]/[http://www.cima.fcen.uba.ar/UMI/ IFAECI]'s HPC called <CODE>hydra</CODE>.
   
  +
Notes and process carried out on 13-17th of May 2024, by Lluís Fita (UBA/CIMA/IFAECI, CABA, Argentina) with the assistance of Ass. Prof. Pedro DiNezio (U. Colorado, Boulder) and Dr Nicolás J. Cosentino (UBA/CIMA/IFAECI, CABA, Argentina), and the non-anonymous help of [https://bb.cgd.ucar.edu/cesm/ CESM forum] and stackoverflow
   
 
= intel compilation =
 
= intel compilation =
Línea 259: Línea 261:
 
Configuration of the compilers is done via the file <CODE>cime/config/cesm/machines/config_cmopilers.xml</CODE>.
 
Configuration of the compilers is done via the file <CODE>cime/config/cesm/machines/config_cmopilers.xml</CODE>.
   
Some modifications are introduced in order to make sure that compilation is done throughtout hydra's intel configration
+
Some modifications are introduced in order to make sure that compilation is done throughtout hydra's intel configration (after this [https://bb.cgd.ucar.edu/cesm/threads/cesm-2-2-netcdf-issue.5922/ post])
  +
 
<PRE style="shell">
 
<PRE style="shell">
 
$ cp cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml
 
$ cp cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml
 
$ diff cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml
 
$ diff cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml
320,325c320,325
+
1636,1690d1635
  +
< <compiler MACH="hydra" COMPILER="intel">
  +
< <CFLAGS>
  +
< <base> -qno-opt-dynamic-align -fp-model precise -std=gnu99 </base>
  +
< <append compile_threaded="TRUE"> -qopenmp </append>
  +
< <append DEBUG="FALSE"> -O2 -debug minimal </append>
  +
< <append DEBUG="TRUE"> -O0 -g </append>
  +
< </CFLAGS>
  +
< <CPPDEFS>
  +
< <!-- http://software.intel.com/en-us/articles/intel-composer-xe/ -->
  +
< <append> -DFORTRANUNDERSCORE -DCPRINTEL</append>
  +
< </CPPDEFS>
  +
< <CXX_LINKER>FORTRAN</CXX_LINKER>
  +
< <FC_AUTO_R8>
  +
< <base> -r8 </base>
  +
< </FC_AUTO_R8>
  +
< <FFLAGS>
  +
< <base> -qno-opt-dynamic-align -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -fp-model source </base>
  +
< <append compile_threaded="TRUE"> -qopenmp </append>
  +
< <append DEBUG="TRUE"> -O0 -g -check uninit -check bounds -check pointers -fpe0 -check noarg_temp_created </append>
  +
< <append DEBUG="FALSE"> -O2 -debug minimal </append>
  +
< </FFLAGS>
  +
< <FFLAGS_NOOPT>
  +
< <base> -O0 </base>
  +
< </FFLAGS_NOOPT>
  +
< <NETCDF_C_PATH>/opt/netcdf/netcdf-4/intel/2021.4.0</NETCDF_C_PATH>
  +
< <NETCDF_FORTRAN_PATH>/opt/netcdf/netcdf-4/intel/2021.4.0/lib</NETCDF_FORTRAN_PATH>
  +
< <FIXEDFLAGS>
  +
< <base> -fixed </base>
  +
< </FIXEDFLAGS>
  +
< <FREEFLAGS>
  +
< <base> -free </base>
  +
< </FREEFLAGS>
  +
< <LDFLAGS>
  +
< <append compile_threaded="TRUE"> -qopenmp </append>
  +
< </LDFLAGS>
 
< <MPICC> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicc </MPICC>
 
< <MPICC> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicc </MPICC>
 
< <MPICXX> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicxx </MPICXX>
 
< <MPICXX> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicxx </MPICXX>
Línea 270: Línea 272:
 
< <SCXX> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/icpc </SCXX>
 
< <SCXX> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/icpc </SCXX>
 
< <SFC> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/ifort </SFC>
 
< <SFC> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/ifort </SFC>
---
+
< <SLIBS>
> <MPICC> mpicc </MPICC>
+
< <append MPILIB="mpich"> -mkl=cluster </append>
> <MPICXX> mpicxx </MPICXX>
+
< <append MPILIB="mpich2"> -mkl=cluster </append>
> <MPIFC> mpif90 </MPIFC>
+
< <append MPILIB="mvapich"> -mkl=cluster </append>
> <SCC> icc </SCC>
+
< <append MPILIB="mvapich2"> -mkl=cluster </append>
> <SCXX> icpc </SCXX>
+
< <append MPILIB="mpt"> -mkl=cluster </append>
> <SFC> ifort </SFC>
+
< <append MPILIB="openmpi"> -mkl=cluster </append>
  +
< <append MPILIB="impi"> -mkl=cluster </append>
  +
< <append MPILIB="mpi-serial"> -mkl </append>
  +
< <append>-L$(NETCDF_C_PATH)/lib -L$(NETCDF_FORTRAN_PATH)/lib -lnetcdff -lnetcdf -L$ENV{MKLROOT} -lmkl_rt </append>
  +
< </SLIBS>
  +
< <SUPPORTS_CXX>TRUE</SUPPORTS_CXX>
  +
< </compiler>
  +
<
 
</PRE>
 
</PRE>
   
Línea 408: Línea 410:
 
$ cd /share/cesm/expriments/b.day1.0
 
$ cd /share/cesm/expriments/b.day1.0
 
$ ./case.setup >& run_case-setup.log
 
$ ./case.setup >& run_case-setup.log
  +
</PRE>
  +
  +
Looking for errors:
  +
<PRE style="Shell">
  +
$ tail run_case-setup.log
  +
You can now run './preview_run' to get more info on how your case will be run
  +
</PRE>
  +
  +
Which provides the following configuration:
  +
<PRE style="shell">
  +
$ ./preview_run >& run_preview_run.log
  +
$ cat run_preview_run.log
  +
CASE INFO:
  +
nodes: 6
  +
total tasks: 768
  +
tasks per node: 128
  +
thread count: 1
  +
  +
BATCH INFO:
  +
FOR JOB: case.run
  +
ENV:
  +
Setting Environment OMP_NUM_THREADS=1
  +
  +
SUBMIT CMD:
  +
qsub -q larga -l walltime=168:00:00 -A none -q larga -l walltime=168:00:00 -A none -v ARGS_FOR_SCRIPT='--resubmit' .case.run
  +
  +
MPIRUN (job=case.run):
  +
mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1
  +
  +
FOR JOB: case.st_archive
  +
ENV:
  +
Setting Environment OMP_NUM_THREADS=1
  +
  +
SUBMIT CMD:
  +
qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:0 -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive
 
</PRE>
 
</PRE>
   
Línea 439: Línea 476:
 
</PRE>
 
</PRE>
 
Scripts being modified: <CODE>Tools/archive_metadata, Tools/bld_diff, Tools/bless_test_results, Tools/case.build, Tools/case.cmpgen_namelists, Tools/case_diff, Tools/case.qstatus, Tools/case.setup, Tools/case.submit, Tools/check_case, Tools/check_input_data, Tools/check_lockedfiles, Tools/cime_bisect, Tools/code_checker, Tools/compare_namelists, Tools/compare_test_results, Tools/component_compare_baseline, Tools/component_compare_copy, Tools/component_compare_test, Tools/component_generate_baseline, Tools/cs.status, Tools/e3sm_check_env, Tools/generate_cylc_workflow.py, Tools/get_case_env, Tools/get_standard_makefile_args, Tools/getTiming, Tools/jenkins_generic_job, Tools/list_e3sm_tests, Tools/list_e3sm_tests, Tools/mvsource, Tools/normalize_cases, Tools/pelayout, Tools/preview_namelists, Tools/preview_run, Tools/save_provenance, Tools/simple_compare, Tools/testreporter.py, Tools/wait_for_tests, Tools/xmlchange, Tools/xmlquery</CODE>
 
Scripts being modified: <CODE>Tools/archive_metadata, Tools/bld_diff, Tools/bless_test_results, Tools/case.build, Tools/case.cmpgen_namelists, Tools/case_diff, Tools/case.qstatus, Tools/case.setup, Tools/case.submit, Tools/check_case, Tools/check_input_data, Tools/check_lockedfiles, Tools/cime_bisect, Tools/code_checker, Tools/compare_namelists, Tools/compare_test_results, Tools/component_compare_baseline, Tools/component_compare_copy, Tools/component_compare_test, Tools/component_generate_baseline, Tools/cs.status, Tools/e3sm_check_env, Tools/generate_cylc_workflow.py, Tools/get_case_env, Tools/get_standard_makefile_args, Tools/getTiming, Tools/jenkins_generic_job, Tools/list_e3sm_tests, Tools/list_e3sm_tests, Tools/mvsource, Tools/normalize_cases, Tools/pelayout, Tools/preview_namelists, Tools/preview_run, Tools/save_provenance, Tools/simple_compare, Tools/testreporter.py, Tools/wait_for_tests, Tools/xmlchange, Tools/xmlquery</CODE>
Also inside folder <CODE>Tools/xmlconvertors/</CODE>: <CODE>Tools/xmlconvertors/config_pes_converter.py</CODE>
 
   
== Case Build ==
+
Also inside folder <CODE>Tools/xmlconvertors</CODE>: <CODE>Tools/xmlconvertors/config_pes_converter.py, Tools/xmlconvertors/grid_xml_converter.py, Tools/xmlconvertors/convert-grid-v1-to-v2</CODE>
  +
  +
== Case Build ==
  +
Compiling the code for the case, before we are going to clean it, just in case...
  +
  +
<PRE style="shell">
  +
$ ./case.build --clean
  +
$ ./case.build >& run_case-build.log
  +
</PRE>
  +
  +
Looking for errors:
  +
  +
After succesfull compilation, verify the presence of all the required input data with:
  +
<!-- ./create_newcase --case /share/cesm/expriments/b.day1.0.002 --res f19_f19 --compset F1850 --mach hydra --run-unsupported -->
  +
  +
<PRE style="shell">
  +
$ tail run_case-build.log
  +
(...)
  +
siac built in 1.117004 seconds
  +
sesp built in 1.145180 seconds
  +
cam built in 1.169075 seconds
  +
Component glc build complete with 2 warnings
  +
cism built in 233.087502 seconds
  +
Building cesm from /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/src/drivers/mct/cime_config/buildexe with output to /home/lluis.
  +
fita/cesm/scratch/b.day1.0/bld/cesm.bldlog.240516-151609
  +
Time spent not building: 6.916887 sec
  +
Time spent building: 260.266792 sec
  +
MODEL BUILD HAS FINISHED SUCCESSFULLY
  +
$ ./check_input_data --download >& run_check_input_data.log
  +
(...)
  +
  +
Model cpl missing file wav2ocn_smapname = '/share/cesm/inputdata/cpl/gridmaps/ww3a/map_ww3a_TO_gx1v7_splice_170214.nc'
  +
Trying to download file: 'cpl/gridmaps/ww3a/map_ww3a_TO_gx1v7_splice_170214.nc' to path '/share/cesm/inputdata/cpl/gridmaps/ww3a/map_ww3a_T
  +
O_gx1v7_splice_170214.nc' using WGET protocol.
  +
SUCCESS
  +
</PRE>
  +
  +
=== Errors ===
  +
There are errors related to the use of python
  +
  +
<PRE style="shell">
  +
$ tail run_case-build.log
  +
(...)
  +
Building lnd with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236
  +
clm built in 0.009455 seconds
  +
ERROR: BUILD FAIL: clm.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236
  +
</PRE>
  +
  +
Looking into:
  +
<PRE sytle="shell">
  +
cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236
  +
/usr/bin/env: ‘python’: No such file or directory
  +
</PRE>
  +
  +
<PRE>
  +
(...)
  +
ERROR: Command /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/components/pop/bld/build-namelist failed rc=2
  +
out=ERROR in MARBL_diags_to_tavg.py
  +
err=/usr/bin/env: ‘python’: No such file or directory
  +
ERROR: env CASEROOT=/share/cesm/expriments/b.day1.0 CASEBUILD=/share/cesm/expriments/b.day1.0/Buildconf OCN_GRID=gx1v7OCN_TAVG_TRAC
  +
ER_BUDGET=FALSE POPROOT=/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/components/pop /home/lluis.fita/CESM/v2.2.2/intel/my_ces
  +
m_sandbox/components/pop/input_templates/ocn.ecosys.tavg.csh 4 .false. .true. failed: 256
  +
</PRE>
  +
  +
Editing various python scripts to make sure of the use of python3, in order to make this happen, from <CODE>$ROOTCESM</CODE>, I executed the following code:
  +
<PRE style="shell">
  +
$ head [place]/* | grep -B 2 'env python' | grep -v directory >& py.log
  +
$ cat py.log
  +
</PRE>
  +
being <CODE>[place]</CODE>, incremental position within the folder structure of CESM: <CODE>./, *, */*, */*/*, */*/*/*, */*/*/*/*, */*/*/*/*/*, ...</CODE>
  +
  +
For example
  +
<PRE style="shell">
  +
$ head ./* | grep -B 2 'env python' | grep -v directory >& py.log
  +
head: error reading './cime': Is a directory
  +
head: error reading './cime_config': Is a directory
  +
head: error reading './components': Is a directory
  +
head: error reading './components_orig': Is a directory
  +
head: error reading './doc': Is a directory
  +
head: error reading './manage_externals': Is a directory
  +
$ cat py.log
  +
  +
==> ./describe_version <==
  +
#!/usr/bin/env python3
  +
</PRE>
  +
  +
Or even better:
  +
<PRE>
  +
$ grep -i python [place]/* | grep env | grep -v orig | grep -v python3 >& py.log
  +
</PRE>
  +
  +
<CODE>components/pop/MARBL_scripts/add_cocco_to_init.py, components/pop/MARBL_scripts/MARBL_diags_to_tavg.py, cime/src/components/stub_comps_mct/siac/cime_config/buildlib, cime/src/components/stub_comps_mct/siac/cime_config/buildlib_cmake, cime/src/components/stub_comps_mct/siac/cime_config/buildnml</CODE>
  +
  +
Modifying more python scripts in <CODE>cime/src/build_scripts</CODE>: <CODE>cime/src/build_scripts/buildlib.cprnc, cime/src/build_scripts/buildlib.csm_share, cime/src/build_scripts/buildlib.gptl, cime/src/build_scripts/buildlib.kokkos, cime/src/build_scripts/buildlib.mct, cime/src/build_scripts/buildlib.mpi-serial, cime/src/build_scripts/buildlib.pio</CODE>
  +
  +
Modifying more python scripts: <CODE>components/clm/bld/namelist_files/createMkSrfEntry.py, components/clm/run_sys_tests, components/cam/cime_config/buildcpp, components/cam/cime_config/buildlib, components/cam/cime_config/buildnml, components/cam/manage_externals/checkout_externals, components/cdeps/cime_config/buildlib, components/cice/cime_config/buildcpp, components/cice/cime_config/buildlib, components/cice/cime_config/buildlib, components/cice/cime_config/buildnml, components/cism/manage_externals/checkout_externals, components/clm/cime_config/buildlib, components/clm/cime_config/buildnml, components/clm/manage_externals/checkout_externals, components/clm/python/run_ctsm_py_tests, components/mosart/cime_config/buildlib, components/mosart/cime_config/buildnml, components/pop/cime_config/phys_cycle_postrun, components/pop/cime_config/phys_cycle_preruncomponents/pop/cime_config/buildcpp, components/pop/cime_config/buildlib, components/pop/cime_config/buildnml, components/rtm/cime_config/buildlibm components/rtm/cime_config/buildnml, components/ww3/cime_config/buildlib, components/ww3/cime_config/buildnml</CODE>
  +
  +
Also: <CODE>components/clm/python/ctsm/test/test_sys_lilac_build_ctsm.py, components/clm/python/ctsm/test/test_unit_lilac_build_ctsm.py, components/clm/python/ctsm/test/test_unit_lilac_make_runtime_inputs.py, components/clm/python/ctsm/test/test_unit_machine.py, components/clm/python/ctsm/test/test_unit_path_utils.py, components/clm/python/ctsm/test/test_unit_run_sys_tests.py, components/clm/python/ctsm/test/test_unit_utils.py, components/clm/src/fates/tools/FatesPFTIndexSwapper.py, components/clm/src/fates/tools/modify_fates_paramfile.py, components/clm/src/fates/tools/ncvarsort.py, components/pop/externals/CVMix/bld/cvmix_setup</CODE>
  +
  +
Another set pf modified files: <CODE>cime/config/cesm/machines/template.case.test, cime/config/cesm/machines/template.st_archive, cime/config/ufs/machines/template.case.run, vim cime/config/ufs/machines/template.case.test, cime/config/ufs/machines/template.st_archive, cime/scripts/lib/CIME/BuildTools/configure.py, cime/scripts/lib/CIME/case/case_submit.py</CODE>
  +
  +
There is an error during the building of the CISM component. The output of <CODE>case.build</CODE>:
  +
<PRE style="shell">
  +
(...)
  +
- Building clm library
  +
Building lnd with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-155359
  +
Component lnd build complete with 6 warnings
  +
clm built in 184.188484 seconds
  +
- Building atm Library
  +
Building atm with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/atm.bldlog.240515-155359
  +
- Building ice Library
  +
Building ice with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/ice.bldlog.240515-155359
  +
- Building ocn Library
  +
Building ocn with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/ocn.bldlog.240515-155359
  +
- Building rof Library
  +
Building rof with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/rof.bldlog.240515-155359
  +
- Building glc Library
  +
Building glc with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240515-155359
  +
- Building wav Library
  +
Building wav with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/wav.bldlog.240515-155359
  +
- Building iac Library
  +
Building iac with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/iac.bldlog.240515-155359
  +
- Building esp Library
  +
Building esp with output to
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/esp.bldlog.240515-155359
  +
sesp built in 1.537159 seconds
  +
siac built in 1.544745 seconds
  +
ifort: error #10236: File not found:
  +
'/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc/fortran_autogen_srcs/glide_io.F90'
  +
  +
ifort: error #10236: File not found:
  +
'/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc/fortran_autogen_srcs/glint_io.F90'
  +
cism built in 35.435328 seconds
  +
mosart built in 42.410334 seconds
  +
ww built in 54.594975 seconds
  +
Component ice build complete with 1 warnings
  +
cice built in 81.223358 seconds
  +
Component ocn build complete with 13 warnings
  +
pop built in 145.569954 seconds
  +
Component atm build complete with 14 warnings
  +
cam built in 229.021996 seconds
  +
ERROR: BUILD FAIL: cism.buildlib failed, cat
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240515-155359
  +
</PRE>
  +
  +
These files, that should be automatically created, are not being written:
  +
<PRE>
  +
bld/glc/fortran_autogen_srcs/glide_io.F90
  +
bld/glc/fortran_autogen_srcs/glint_io.F90
  +
</PRE>
  +
  +
In the case directory one has the file <CODE>Buildconf/cismIOconf/cism.buildIO.csh</CODE>, with the following content:
  +
<PRE>
  +
(...)
  +
# create new _io.F90 file using CISM's python script
  +
#
  +
---------------------------------------------------------------------------
  +
$PYTHON generate_ncvars.py $file_varsdef ncdf_template.F90.in
  +
</PRE>
  +
It is looking for the environment variable that holds <CODE>python</CODE>. In the root directory of CESM the file <CODE>components/cism/bld/cismIO/README.cismIO</CODE>:
  +
<PRE>
  +
This directory and its scripts are intended to allow the user to change IO fields from the CISM code. The CISM IO files, *_io.F90, are
  +
auto-generated and typically difficult to modify. However, the corresponding variable definition files, *_vars.def, are easily modified
  +
and the IO files can be re-generated by running the cism.buildIO.csh script contained in this directory.
  +
  +
Usage of this script requires that the user has defined an enviroment variable PYTHON pointing to a local version of python, After that,
  +
the user simply runs the enclosed cism.buildIO.csh script, which runs a python script on each
  +
(...)
  +
</PRE>
  +
Therefore I need to find where the <CODE>PYTHON</CODE> is defined within the <CODE>cime/config/cesm/machines/config_*.xml</CODE> files, or I defined it manually. It does not work it is defined previously the execution:
  +
<PRE style="shell">
  +
$ export PYTHON=/usr/bin/python3
  +
$ echo $PYTHON
  +
/usr/bin/python3
  +
$ ./case.build --clean
  +
$ ./case.build
  +
</PRE>
  +
Therefore, it is defined within <CODE>Buildconf/cismIOconf/cism.buildIO.csh</CODE> (must be defined in the template of this shell script!!). Looking in <CODE>$ROOTCESM</CODE> into <CODE>components/cism/source_cism/utils/build/generate_ncvars.py</CODE>, it is using python2! hydra does not support python2.
  +
<PRE>
  +
#!/usr/bin/env python2
  +
</PRE>
  +
  +
Code is being modified by
  +
<PRE>
  +
#!/usr/bin/env python3
  +
</PRE>
  +
  +
Then 2 other files need to be modified also: <CODE>components/cism/source_cism/utils/build/autogenerate-in-build-dir, components/cism/source_cism/utils/build/autogen-for-glint-and-glad-in-build-dir</CODE>, whenever they call <CODE>python</CODE> is replaced by <CODE>python3</CODE>
  +
  +
<PRE stryle="Shell">
  +
$ cat components/cism/source_cism/utils/build/autogenerate-in-build-dir | grep python
  +
# Call python script with source file arguments
  +
python3 -V
  +
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLIDE_VARS_PATH $NCDF_TEMPL_PATH
  +
$ cat components/cism/source_cism/utils/build/autogen-for-glint-and-glad-in-build-dir | grep python
  +
# Call python script with source file arguments
  +
python3 -V
  +
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLINT_VARS_PATH $NCDF_TEMPL_PATH
  +
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLINT_MBAL_PATH $NCDF_TEMPL_PATH
  +
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLAD_VARS_PATH $NCDF_TEMPL_PATH
  +
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLAD_MBAL_PATH $NCDF_TEMPL_PATH
  +
</PRE>
   
 
== Case send ==
 
== Case send ==
  +
  +
<!-- Manually
  +
$ source /opt/load-libs.sh 1
  +
$ /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 48 /home/lluis.fita/cesm/scratch/b.day1.0.002/bld/cesm.exe -->
  +
  +
Submitting the case:
  +
<PRE>
  +
$ ./case.submit >& run_case-submit.log
  +
</PRE>
  +
  +
Checking that it is running
  +
<PRE style?="shell">
  +
$ qstat -u $USER
  +
  +
hydra:
  +
Req'd Req'd Elap
  +
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
  +
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
  +
43629.hydra lluis.fita larga run.b.day1.0 721 1 1 -- 168:00:00 R 00:00:51
  +
43630.hydra lluis.fita larga st_archive.b.day -- 1 1 -- 00:20:00 H --
  +
</PRE>
  +
  +
=== Errors ===
  +
  +
==== mpirun not found ====
  +
  +
Submission error as:
  +
<PRE style="shell">
  +
$ cat run.b.day1.0.o43627
  +
(...)
  +
run command is mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1
  +
ERROR: RUN FAIL: Command 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
  +
See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514
  +
$ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514
  +
/bin/sh: 1: mpirun: not found
  +
</PRE>
  +
  +
Redifined <CODE>mpirun</CODE> in <CODE>cime/config/cesm/machines/config_machines.xml</CODE> as:
  +
<PRE>
  +
<executable>/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun</executable>
  +
</PRE>
  +
If it is too late (prior to submit), it can be directly changed in the case directory in file <CODE>env_mach_specific.xml</CODE>
  +
  +
==== hydra's compilation environment ====
  +
  +
Simulation finished too early ...
  +
<PRE style="shell">
  +
$ qstat -u $USER
  +
  +
hydra:
  +
Req'd Req'd Elap
  +
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
  +
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
  +
43629.hydra lluis.fita larga run.b.day1.0 721 1 1 -- 168:00:00 C --
  +
43630.hydra lluis.fita larga st_archive.b.day -- 1 1 -- 00:20:00 C --
  +
</PRE>
  +
  +
Looking in the logs:
  +
<PRE style="shell">
  +
$ cat run.b.day1.0.o43629
  +
Generating namelists for /share/cesm/expriments/b.day1.0
  +
Creating component namelists
  +
(...)
  +
-------------------------------------------------------------------------
  +
run command is /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >>
  +
cesm.log.$LID 2>&1
  +
tm_poll: got event 1 return 0
  +
tm_poll: INIT nodes 1
  +
tm_poll: INIT daddy jobid 43629.hydra
  +
tm_poll: INIT daddy node 0
  +
tm_poll: INIT daddy tid 1
  +
new_task: jobid=43629.hydra node=0 task=1
  +
new_task: jobid=43629.hydra node=0 task=2
  +
ERROR: RUN FAIL: Command '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/ces
  +
m.exe >> cesm.log.$LID 2>&1 ' failed
  +
See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533
  +
  +
$ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533
  +
(...)
  +
Invalid character . in PBS_JOBID
  +
tm_poll: got event 1 return 0
  +
tm_poll: INIT nodes 1
  +
tm_poll: INIT daddy jobid 43629.hydra
  +
tm_poll: INIT daddy node -1
  +
tm_poll: INIT daddy tid 0
  +
new_task: jobid=43629.hydra node=-1 task=0
  +
new_task: called with TM_ERROR_NODE
  +
new_task: jobid=43629.hydra node=0 task=1
  +
  +
===================================================================================
  +
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
  +
= PID 1006 RUNNING AT node43
  +
= EXIT CODE: 9
  +
= CLEANING UP REMAINING PROCESSES
  +
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
  +
===================================================================================
  +
tm_poll: got event 2 return 0
  +
new_task: jobid=43629.hydra node=0 task=2
  +
tm_poll: got event 3 return 0
  +
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
  +
This typically refers to a problem with your application.
  +
Please see the FAQ page for debugging suggestions
  +
</PRE>
  +
  +
And also:
  +
<PRE style="shell">
  +
$ cat CaseStatus
  +
2024-05-16 11:52:51: case.setup starting
  +
---------------------------------------------------
  +
2024-05-16 11:52:52: case.setup success
  +
---------------------------------------------------
  +
2024-05-16 11:55:32: case.build starting
  +
---------------------------------------------------
  +
2024-05-16 12:04:41: case.build error
  +
ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240516-115532
  +
---------------------------------------------------
  +
2024-05-16 15:12:59: case.build starting
  +
---------------------------------------------------
  +
2024-05-16 15:13:16: case.build error
  +
ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240516-151259
  +
---------------------------------------------------
  +
2024-05-16 15:16:09: case.build starting
  +
---------------------------------------------------
  +
CESM version is release-cesm2.2.2
  +
Processing externals description file : Externals.cfg (/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox)
  +
(...)
  +
Checking local status of required & optional components: cam, chem_proc, carma, cosp2, clubb, silhs, pumas, atmos_phys,
  +
atmos_cubed_sphere, cice, cdeps, fox, cime, cmeps, cism, source_cism, clm, fates, fms, mom, mosart, pop, cvmix, marbl,
  +
rtm, ww3,
  +
M ./cime
  +
modified sandbox, on cime5.8.32.9
  +
e-o ./cime/src/drivers/nuopc/
  +
-, not checked out -->
  +
M ./components/cam
  +
modified sandbox, on cam_cesm2_2_rel_09
  +
./components/cam/chem_proc
  +
clean sandbox, on tools/proc_atm/chem_proc/release_tags/chem_proc5_0_04
  +
./components/cam/src/atmos_phys
  +
clean sandbox, on version0_00_007
  +
./components/cam/src/dynamics/fv3/atmos_cubed_sphere
  +
clean sandbox, on fv3_cesm.04
  +
./components/cam/src/physics/carma/base
  +
clean sandbox, on carma/release_tags/carma3_49_rel
  +
./components/cam/src/physics/clubb
  +
clean sandbox, on clubb_release_b76a124_20200220_c20200320
  +
./components/cam/src/physics/cosp2/src
  +
clean sandbox, on v2.1.4cesm
  +
./components/cam/src/physics/pumas
  +
clean sandbox, on pumas_cam-release_v1.3
  +
./components/cam/src/physics/silhs
  +
clean sandbox, on silhs_clubb_release_b76a124_20200220_c20200320
  +
M ./components/cdeps
  +
modified sandbox, on d808b7c6f78a2d5dcfeb1da0d1a452a9b66e08c8
  +
./components/cdeps/fox
  +
clean sandbox, on 7b9488446b193192dd3f0378541e71099cb4e8a8
  +
M ./components/cice
  +
modified sandbox, on cice5-cesm2.2.2-20231220
  +
M ./components/cism
  +
modified sandbox, on cism2_1_69_b
  +
M ./components/cism/source_cism
  +
modified sandbox, on release-cesm2.2.2-f1a88d6derecho
  +
M ./components/clm
  +
modified sandbox, on release-cesm2.2.04
  +
M ./components/clm/src/fates
  +
modified sandbox, on sci.1.30.0_api.8.0.0
  +
e-o ./components/mom
  +
-, not checked out --> mi_20200908
  +
M ./components/mosart
  +
modified sandbox, on mosart1_0_37_1
  +
M ./components/pop
  +
modified sandbox, on pop2_cesm2_2_rel_n01
  +
M ./components/pop/externals/CVMix
  +
modified sandbox, on v0.98-beta
  +
./components/pop/externals/MARBL
  +
clean sandbox, on cesm2.2-n00
  +
M ./components/rtm
  +
modified sandbox, on rtm1_0_72
  +
M ./components/ww3
  +
modified sandbox, on ww3_200710
  +
e-o ./libraries/FMS
  +
-, not checked out --> fi_20200609_cesm2.2_231205
  +
2024-05-16 15:20:36: case.build success
  +
---------------------------------------------------
  +
2024-05-16 16:55:05: case.submit starting
  +
---------------------------------------------------
  +
2024-05-16 16:55:12: case.submit error
  +
ERROR: Command: 'qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:43
  +
627.hydra -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error 'qsub: submit error (Invalid request)' fro
  +
m dir '/share/cesm/expriments/b.day1.0'
  +
---------------------------------------------------
  +
2024-05-16 16:55:14: case.run starting
  +
---------------------------------------------------
  +
2024-05-16 16:55:41: model execution starting
  +
---------------------------------------------------
  +
2024-05-16 16:55:41: model execution error
  +
ERROR: Command: 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
  +
with error '' from dir '/home/lluis.fita/cesm/scratch/b.day1.0/run'
  +
---------------------------------------------------
  +
2024-05-16 16:55:41: case.run error
  +
ERROR: RUN FAIL: Command 'mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe >> cesm.log.$LID 2>&1
  +
' failed
  +
See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514
  +
---------------------------------------------------
  +
2024-05-16 17:05:25: case.submit starting
  +
---------------------------------------------------
  +
2024-05-16 17:05:33: case.submit error
  +
ERROR: Command: 'qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none -W depend=afterok:43
  +
629.hydra -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error 'qsub: submit error (Invalid request)' fro
  +
m dir '/share/cesm/expriments/b.day1.0'
  +
---------------------------------------------------
  +
2024-05-16 17:05:33: case.run starting
  +
---------------------------------------------------
  +
2024-05-16 17:05:43: model execution starting
  +
---------------------------------------------------
  +
2024-05-16 17:06:34: model execution error
  +
ERROR: Command: '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.day1.0/bld/
  +
cesm.exe >> cesm.log.$LID 2>&1 ' failed with error '' from dir '/home/lluis.fita/cesm/scratch/b.day1.0/run'
  +
---------------------------------------------------
  +
2024-05-16 17:06:34: case.run error
  +
ERROR: RUN FAIL: Command '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun -np 768 /home/lluis.fita/cesm/scratch/b.da
  +
y1.0/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
  +
See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533
  +
---------------------------------------------------
  +
</PRE>
  +
  +
I suspect that it is related to the compilation environment. hydra does not have <CODE>module</CODE>. It set-ups compilation environment via shell scriupt <CODE>/opt/load-libs.sh</CODE>. There must be a way to systematically introduce it in the <CODE>config_batch.xml</CODE> in order to be executed in all the PBS jobs. A new post is created in the CESM forum [https://bb.cgd.ucar.edu/cesm/threads/introducing-a-system-instruction-in-config_batch-xml.9646/#post-55530 #post-55530]
  +
  +
From another attempt
  +
<PRE style="shell">
  +
$ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43633.hydra.240517-155125 | more
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading
  +
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
  +
h file or directory
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading
  +
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
  +
h file or directory
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading
  +
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
  +
h file or directory
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading
  +
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
  +
h file or directory
  +
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading
  +
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
  +
h file or directory
  +
(...)
  +
</PRE>
  +
  +
Is set to use the variable <CODE>prerun_script</CODE> into the file in the case directory <CODE>env_run.xml</CODE>, see more details (searching for it [https://docs.cesm.ucar.edu/models/cesm2/settings/current/drv_input.html here])
  +
  +
Trying different options:
  +
<PRE>
  +
<entry id="PRERUN_SCRIPT">
  +
<type>char</type>
  +
<desc>External script to be run before model completion</desc>
  +
<values>
  +
<value>source /opt/load-libs.sh 1</value>
  +
</values>
  +
$ ./case.submit
  +
$ cat run.b.day1.0.o43637
  +
ERROR: External script source /opt/load-libs.sh 1 not found
  +
</PRE>
  +
  +
Looking inside /opt/load-libs.sh
  +
<PRE>
  +
<entry id="PRERUN_SCRIPT">
  +
<type>char</type>
  +
<desc>External script to be run before model completion</desc>
  +
<values>
  +
<value>/opt/env_scripts/load_intel-2021.4.0_mpich-3.4.2.sh</value>
  +
</values>
  +
$ ./case.submit
  +
$ cat run.b.day1.0.o43639
  +
Running /opt/env_scripts/load_intel-2021.4.0_mpich-3.4.2.sh
  +
/bin/sh: 1: Syntax error: Bad fd number
  +
</PRE>
  +
  +
<PRE>
  +
<entry id="PRERUN_SCRIPT">
  +
<type>char</type>
  +
<desc>External script to be run before model completion</desc>
  +
<values>
  +
<value>'source /opt/load-libs.sh 1'</value>
  +
</values>
  +
$ ./case.submit
  +
$ cat run.b.day1.0.o43641
  +
ERROR: External script 'source /opt/load-libs.sh 1' not found
  +
</PRE>
  +
  +
Creation of a simple shell script with the required content in <CODE>/home/lluis.fita/intel_env.csh</CODE>
  +
<PRE>
  +
#!/bin/sh
  +
export PATH="/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin:$PATH"
  +
export PATH="/opt/netcdf/netcdf-4/intel/2021.4.0/bin:$PATH"
  +
export PATH="/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/bin:$PATH"
  +
export PATH="/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/bin:$PATH"
  +
  +
export LD_LIBRARY_PATH=/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/lib:$LD_LIBRARY_PATH
  +
export LD_LIBRARY_PATH=/opt/zlib/zlib-1.2.11/intel/2021.4.0/lib:$LD_LIBRARY_PATH
  +
export LD_LIBRARY_PATH=/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/lib:$LD_LIBRARY_PATH
  +
export LD_LIBRARY_PATH=/opt/netcdf/netcdf-4/intel/2021.4.0/lib:$LD_LIBRARY_PATH
  +
</PRE>
  +
  +
And then
  +
<PRE>
  +
<entry id="PRERUN_SCRIPT">
  +
<type>char</type>
  +
<desc>External script to be run before model completion</desc>
  +
<values>
  +
<value>/home/lluis/intel_env.sh</value>
  +
</values>
  +
$ ./case.submit
  +
$ cat run.b.day1.0.o43647
  +
Running /home/lluis.fita/intel_env.csh
  +
/bin/sh: 1: Syntax error: Bad fd number
  +
</PRE>
  +
  +
whereas the execution of the shell from terminal does not give any error
  +
<PRE style="shell">
  +
$ /bin/sh /home/lluis.fita/intel_env.csh
  +
  +
</PRE>
   
 
= Optional additional components =
 
= Optional additional components =

Última revisión de 17:39 17 mayo 2024

These are the CESM's installation notes in CIMA/IFAECI's HPC called hydra.

Notes and process carried out on 13-17th of May 2024, by Lluís Fita (UBA/CIMA/IFAECI, CABA, Argentina) with the assistance of Ass. Prof. Pedro DiNezio (U. Colorado, Boulder) and Dr Nicolás J. Cosentino (UBA/CIMA/IFAECI, CABA, Argentina), and the non-anonymous help of CESM forum and stackoverflow

Contenido

[editar] intel compilation

CIMA's hydra intel configuration is done via the following instruction:

$ source /opt/load-libs.sh 1

The following libraries, compiled with Intel 2021.4.0 compilers, were loaded:
* MPICH 3.4.2
* NetCDF 4
* HDF5 1.10.5
* JASPER 2.0.33

Which creates the following environment:

declare -x ACL_BOARD_VENDOR_PATH="/opt/Intel/OpenCLFPGA/oneAPI/Boards"
declare -x ADVISOR_2021_DIR="/opt/intel/oneapi/advisor/2021.4.0"
declare -x APM="/opt/intel/oneapi/advisor/2021.4.0/perfmodels"
declare -x CCL_CONFIGURATION="cpu_gpu_dpcpp"
declare -x CCL_ROOT="/opt/intel/oneapi/ccl/2021.4.0"
declare -x CLASSPATH="/opt/intel/oneapi/mpi/2021.4.0//lib/mpi.jar:/opt/intel/oneapi/dal/2021.4.0/lib/onedal.jar"
declare -x CLCK_ROOT="/opt/intel/oneapi/clck/2021.4.0"
declare -x CMAKE_PREFIX_PATH="/opt/intel/oneapi/vpl/2021.6.0:/opt/intel/oneapi/tbb/2021.4.0/env/..:/opt/intel/oneapi/dal/2021.4.0"
declare -x CMPLR_ROOT="/opt/intel/oneapi/compiler/2021.4.0"
declare -x CPATH="/opt/intel/oneapi/vpl/2021.6.0/include:/opt/intel/oneapi/tbb/2021.4.0/env/../include:/opt/intel/oneapi/mpi/2021.4.0//include:/opt/intel/oneapi/mkl/2021.4.0/include:/opt/intel/oneapi/ipp/2021.4.0/include:/opt/intel/oneapi/ippcp/2021.4.0/include:/opt/intel/oneapi/ipp/2021.4.0/include:/opt/intel/oneapi/dpl/2021.5.0/linux/include:/opt/intel/oneapi/dpcpp-ct/2021.4.0/include:/opt/intel/oneapi/dnnl/2021.4.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/dev-utilities/2021.4.0/include:/opt/intel/oneapi/dal/2021.4.0/include:/opt/intel/oneapi/compiler/2021.4.0/linux/include:/opt/intel/oneapi/ccl/2021.4.0/include/cpu_gpu_dpcpp"
declare -x CPLUS_INCLUDE_PATH="/opt/intel/oneapi/clck/2021.4.0/include"
declare -x DAALROOT="/opt/intel/oneapi/dal/2021.4.0"
declare -x DALROOT="/opt/intel/oneapi/dal/2021.4.0"
declare -x DAL_MAJOR_BINARY="1"
declare -x DAL_MINOR_BINARY="1"
declare -x DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/1624/bus"
declare -x DNNLROOT="/opt/intel/oneapi/dnnl/2021.4.0/cpu_dpcpp_gpu_dpcpp"
declare -x DPCT_BUNDLE_ROOT="/opt/intel/oneapi/dpcpp-ct/2021.4.0"
declare -x DPL_ROOT="/opt/intel/oneapi/dpl/2021.5.0"
declare -x FI_PROVIDER_PATH="/opt/intel/oneapi/mpi/2021.4.0//libfabric/lib/prov:/usr/lib64/libfabric"
declare -x FPGA_VARS_ARGS="1"
declare -x FPGA_VARS_DIR="/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga"
declare -x GDB_INFO="/opt/intel/oneapi/debugger/10.2.4/documentation/info/"
declare -x HOME="/home/lluis.fita"
declare -x INFOPATH="/opt/intel/oneapi/debugger/10.2.4/gdb/intel64/lib"
declare -x INSPECTOR_2021_DIR="/opt/intel/oneapi/inspector/2021.4.0"
declare -x INTELFPGAOCLSDKROOT="/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga"
declare -x INTEL_LICENSE_FILE="/opt/intel/licenses:/home/lluis.fita/intel/licenses:/opt/intel/oneapi/clck/2021.4.0/licensing:/opt/intel/licenses:/home/lluis.fita/intel/licenses:/Users/Shared/Library/Application Support/Intel/Licenses"
declare -x INTEL_PYTHONHOME="/opt/intel/oneapi/debugger/10.2.4/dep"
declare -x IPPCP_TARGET_ARCH="intel64"
declare -x IPPCRYPTOROOT="/opt/intel/oneapi/ippcp/2021.4.0"
declare -x IPPROOT="/opt/intel/oneapi/ipp/2021.4.0"
declare -x IPP_TARGET_ARCH="intel64"
declare -x I_MPI_ROOT="/opt/intel/oneapi/mpi/2021.4.0"
declare -x LANG="en_US.UTF-8"
declare -x LANGUAGE="en_US:en"
declare -x LD_LIBRARY_PATH="/opt/netcdf/netcdf-4/intel/2021.4.0/lib:/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/lib:/opt/zlib/zlib-1.2.11/intel/2021.4.0/lib:/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/lib:/opt/intel/oneapi/vpl/2021.6.0/lib:/opt/intel/oneapi/tbb/2021.4.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.4.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.4.0//lib/release:/opt/intel/oneapi/mpi/2021.4.0//lib:/opt/intel/oneapi/mkl/2021.4.0/lib/intel64:/opt/intel/oneapi/itac/2021.4.0/slib:/opt/intel/oneapi/ipp/2021.4.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.4.0/lib/intel64:/opt/intel/oneapi/ipp/2021.4.0/lib/intel64:/opt/intel/oneapi/dnnl/2021.4.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/10.2.4/gdb/intel64/lib:/opt/intel/oneapi/debugger/10.2.4/libipt/intel64/lib:/opt/intel/oneapi/debugger/10.2.4/dep/lib:/opt/intel/oneapi/dal/2021.4.0/lib/intel64:/opt/intel/oneapi/compiler/2021.4.0/linux/lib:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/x64:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/emu:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga/linux64/lib:/opt/intel/oneapi/compiler/2021.4.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.4.0/lib/cpu_gpu_dpcpp"
declare -x LIBRARY_PATH="/opt/intel/oneapi/vpl/2021.6.0/lib:/opt/intel/oneapi/tbb/2021.4.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.4.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.4.0//lib/release:/opt/intel/oneapi/mpi/2021.4.0//lib:/opt/intel/oneapi/mkl/2021.4.0/lib/intel64:/opt/intel/oneapi/ipp/2021.4.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.4.0/lib/intel64:/opt/intel/oneapi/ipp/2021.4.0/lib/intel64:/opt/intel/oneapi/dnnl/2021.4.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/dal/2021.4.0/lib/intel64:/opt/intel/oneapi/compiler/2021.4.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/compiler/2021.4.0/linux/lib:/opt/intel/oneapi/clck/2021.4.0/lib/intel64:/opt/intel/oneapi/ccl/2021.4.0/lib/cpu_gpu_dpcpp"
declare -x LOGNAME="lluis.fita"
declare -x LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:"
declare -x MANPATH="/opt/intel/oneapi/mpi/2021.4.0/man:/opt/intel/oneapi/itac/2021.4.0/man:/opt/intel/oneapi/debugger/10.2.4/documentation/man:/opt/intel/oneapi/compiler/2021.4.0/documentation/en/man/common:/opt/intel/oneapi/clck/2021.4.0/man::"
declare -x MKLROOT="/opt/intel/oneapi/mkl/2021.4.0"
declare -x MOTD_SHOWN="pam"
declare -x NLSPATH="/opt/intel/oneapi/mkl/2021.4.0/lib/intel64/locale/%l_%t/%N"
declare -x OCL_ICD_FILENAMES="libintelocl_emu.so:libalteracl.so:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/x64/libintelocl.so"
declare -x OLDPWD
declare -x ONEAPI_ROOT="/opt/intel/oneapi"
declare -x PATH="/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/bin:/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/bin:/opt/netcdf/netcdf-4/intel/2021.4.0/bin:/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin:/opt/intel/oneapi/vtune/2021.7.1/bin64:/opt/intel/oneapi/vpl/2021.6.0/bin:/opt/intel/oneapi/mpi/2021.4.0//libfabric/bin:/opt/intel/oneapi/mpi/2021.4.0//bin:/opt/intel/oneapi/mkl/2021.4.0/bin/intel64:/opt/intel/oneapi/itac/2021.4.0/bin:/opt/intel/oneapi/inspector/2021.4.0/bin64:/opt/intel/oneapi/dpcpp-ct/2021.4.0/bin:/opt/intel/oneapi/dev-utilities/2021.4.0/bin:/opt/intel/oneapi/debugger/10.2.4/gdb/intel64/bin:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga/llvm/aocl-bin:/opt/intel/oneapi/compiler/2021.4.0/linux/lib/oclfpga/bin:/opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64:/opt/intel/oneapi/compiler/2021.4.0/linux/bin:/opt/intel/oneapi/clck/2021.4.0/bin/intel64:/opt/intel/oneapi/advisor/2021.4.0/bin64:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin"
declare -x PKG_CONFIG_PATH="/opt/intel/oneapi/vtune/2021.7.1/include/pkgconfig/lib64:/opt/intel/oneapi/vpl/2021.6.0/lib/pkgconfig:/opt/intel/oneapi/tbb/2021.4.0/env/../lib/pkgconfig:/opt/intel/oneapi/mpi/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/mkl/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/ippcp/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/inspector/2021.4.0/include/pkgconfig/lib64:/opt/intel/oneapi/dpl/2021.5.0/lib/pkgconfig:/opt/intel/oneapi/dal/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/compiler/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/ccl/2021.4.0/lib/pkgconfig:/opt/intel/oneapi/advisor/2021.4.0/include/pkgconfig/lib64:"
declare -x PWD="/home/lluis.fita"
declare -x PYTHONPATH="/opt/intel/oneapi/advisor/2021.4.0/pythonapi"
declare -x SETVARS_COMPLETED="1"
declare -x SETVARS_VARS_PATH="/opt/intel/oneapi/vtune/latest/env/vars.sh"
declare -x SHELL="/bin/bash"
declare -x SHLVL="1"
declare -x SSH_CLIENT="157.92.36.32 41330 22"
declare -x SSH_CONNECTION="157.92.36.32 41330 157.92.28.248 22"
declare -x SSH_TTY="/dev/pts/0"
declare -x TBBROOT="/opt/intel/oneapi/tbb/2021.4.0/env/.."
declare -x TERM="xterm-256color"
declare -x USER="lluis.fita"
declare -x VTUNE_PROFILER_2021_DIR="/opt/intel/oneapi/vtune/2021.7.1"
declare -x VT_ADD_LIBS="-ldwarf -lelf -lvtunwind -lm -lpthread"
declare -x VT_LIB_DIR="/opt/intel/oneapi/itac/2021.4.0/lib"
declare -x VT_MPI="impi4"
declare -x VT_ROOT="/opt/intel/oneapi/itac/2021.4.0"
declare -x VT_SLIB_DIR="/opt/intel/oneapi/itac/2021.4.0/slib"
declare -x XDG_DATA_DIRS="/usr/local/share:/usr/share:/var/lib/snapd/desktop"
declare -x XDG_RUNTIME_DIR="/run/user/1624"
declare -x XDG_SESSION_CLASS="user"
declare -x XDG_SESSION_ID="1848"
declare -x XDG_SESSION_TYPE="tty"
declare -x http_proxy="http://proxy1.cima.fcen.uba.ar:3128/"

[editar] Downloading CESM2

Following CESM2 Quick Start Guide and CESM2 github instructions and this tutorial

Cloning the code for version 2.2.2:

$ mkdir -p CESM/v2.2.2/intel
$ git clone -b release-cesm2.2.2 git@github.com:ESCOMP/CESM.git my_cesm_sandbox

This will create a directory my_cesm_sandbox/ in your current working directory.

We got the following

$ cd my_cesm_sandbox/
$ ls
ChangeLog           CODE_OF_CONDUCT.md  Externals.cfg       manage_externals
ChangeLog_template  describe_version    Externals_cime.cfg  README.rst
cime_config         doc                 LICENSE.txt

Verifying that installation was fine:

$ ./manage_externals/checkout_externals
$ ls 
ChangeLog           CODE_OF_CONDUCT.md  Externals.cfg       README.rst
ChangeLog_template  components          Externals_cime.cfg
cime                describe_version    LICENSE.txt
cime_config         doc                 manage_externals


[editar] Defining the compilation / installation

[editar] Machine

We need to create the configuration for hydra. For that purpose we are going to use the exampled configuration for "centos7-linux" from file
cime/config/cesm/machines/config_machines.xml
. This is its content:
  <machine MACH="hydra">
    <DESC>
      Example port to CIMA's hydra
    </DESC>
    <NODENAME_REGEX>node</NODENAME_REGEX>
    <OS>LINUX Debian</OS>
    <PROXY> https://howto.get.out </PROXY>
    <COMPILERS>gnu</COMPILERS>
    <MPILIBS>mpich</MPILIBS>
    <PROJECT>none</PROJECT>
    <SAVE_TIMING_DIR> </SAVE_TIMING_DIR>
    <CIME_OUTPUT_ROOT>$ENV{HOME}/cesm/scratch</CIME_OUTPUT_ROOT>
    <DIN_LOC_ROOT>/share/cesm/inputdata</DIN_LOC_ROOT>
    <DIN_LOC_ROOT_CLMFORC>/share/cesm/inputdata/lmwg</DIN_LOC_ROOT_CLMFORC>
    <DOUT_S_ROOT>/share/cesm/expriments/$CASE</DOUT_S_ROOT>
    <BASELINE_ROOT>$ENV{HOME}/cesm/cesm_baselines</BASELINE_ROOT>
    <CCSM_CPRNC>$ENV{HOME}/cesm/tools/cime/tools/cprnc/cprnc</CCSM_CPRNC>
    <GMAKE>make</GMAKE>
    <GMAKE_J>8</GMAKE_J>
    <BATCH_SYSTEM>pbs</BATCH_SYSTEM>
    <SUPPORTED_BY>soporte@cima.fcen.uba.ar</SUPPORTED_BY>
    <MAX_TASKS_PER_NODE>128</MAX_TASKS_PER_NODE>
    <MAX_MPITASKS_PER_NODE>128</MAX_MPITASKS_PER_NODE>
    <PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
    <mpirun mpilib="impi">
      <executable>mpiexec</executable>
      <arguments>
        <arg name="ntasks"> -np {{ total_tasks }} </arg>
      </arguments>
    </mpirun>
    <environment_variables>
      <env name="OMP_STACKSIZE">256M</env>
    </environment_variables>
    <resource_limits>
      <resource name="RLIMIT_STACK">-1</resource>
    </resource_limits>
  </machine>

Creation of the hydra's CESM main folders

$ mkdir /share/cesm/expriments/
$ mkdir /share/cesm/inputdata

[editar] Compilers

Configuration of the compilers is done via the file cime/config/cesm/machines/config_cmopilers.xml.

Some modifications are introduced in order to make sure that compilation is done throughtout hydra's intel configration (after this post)

$ cp cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml
$ diff cime/config/cesm/machines/config_compilers.xml cime/config/cesm/machines/config_compilers_orig.xml
1636,1690d1635
< <compiler MACH="hydra" COMPILER="intel">
<   <CFLAGS>
<     <base>  -qno-opt-dynamic-align -fp-model precise -std=gnu99 </base>
<     <append compile_threaded="TRUE"> -qopenmp </append>
<     <append DEBUG="FALSE"> -O2 -debug minimal </append>
<     <append DEBUG="TRUE"> -O0 -g </append>
<   </CFLAGS>
<   <CPPDEFS>
<     <!-- http://software.intel.com/en-us/articles/intel-composer-xe/ -->
<     <append> -DFORTRANUNDERSCORE -DCPRINTEL</append>
<   </CPPDEFS>
<   <CXX_LINKER>FORTRAN</CXX_LINKER>
<   <FC_AUTO_R8>
<     <base> -r8 </base>
<   </FC_AUTO_R8>
<   <FFLAGS>
<     <base> -qno-opt-dynamic-align  -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -fp-model source  </base>
<     <append compile_threaded="TRUE"> -qopenmp </append>
<     <append DEBUG="TRUE"> -O0 -g -check uninit -check bounds -check pointers -fpe0 -check noarg_temp_created </append>
<     <append DEBUG="FALSE"> -O2 -debug minimal </append>
<   </FFLAGS>
<   <FFLAGS_NOOPT>
<     <base> -O0 </base>
<   </FFLAGS_NOOPT>
<   <NETCDF_C_PATH>/opt/netcdf/netcdf-4/intel/2021.4.0</NETCDF_C_PATH>
<   <NETCDF_FORTRAN_PATH>/opt/netcdf/netcdf-4/intel/2021.4.0/lib</NETCDF_FORTRAN_PATH>
<   <FIXEDFLAGS>
<     <base> -fixed  </base>
<   </FIXEDFLAGS>
<   <FREEFLAGS>
<     <base> -free </base>
<   </FREEFLAGS>
<   <LDFLAGS>
<     <append compile_threaded="TRUE"> -qopenmp </append>
<   </LDFLAGS>
<   <MPICC> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicc  </MPICC>
<   <MPICXX> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpicxx </MPICXX>
<   <MPIFC> /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpif90 </MPIFC>
<   <SCC> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/icc </SCC>
<   <SCXX> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/icpc </SCXX>
<   <SFC> /opt/intel/oneapi/compiler/2021.4.0/linux/bin/intel64/ifort </SFC>
<   <SLIBS>
<     <append MPILIB="mpich"> -mkl=cluster </append>
<     <append MPILIB="mpich2"> -mkl=cluster </append>
<     <append MPILIB="mvapich"> -mkl=cluster </append>
<     <append MPILIB="mvapich2"> -mkl=cluster </append>
<     <append MPILIB="mpt"> -mkl=cluster </append>
<     <append MPILIB="openmpi"> -mkl=cluster </append>
<     <append MPILIB="impi"> -mkl=cluster </append>
<     <append MPILIB="mpi-serial"> -mkl </append>
<     <append>-L$(NETCDF_C_PATH)/lib -L$(NETCDF_FORTRAN_PATH)/lib -lnetcdff -lnetcdf -L$ENV{MKLROOT} -lmkl_rt </append>
<   </SLIBS>
<   <SUPPORTS_CXX>TRUE</SUPPORTS_CXX>
< </compiler>
< 

[editar] PIO

Configuration of the Parallel I/O Library (PIO) used for the input/output of the model is done with the file cime/config/cesm/machines/config_pio.xml.

By now it is left as it is provided.

[editar] Batch

Configuration of the batch system is done with the file cime/config/cesm/machines/config_batch.xml

A new configuration has been added for hydra pbs queue system

$ cp cime/config/cesm/machines/config_batch.xml cime/config/cesm/machines/config_batch_orig.xml
$ diff cime/config/cesm/machines/config_batch.xml cime/config/cesm/machines/config_batch_orig.xml
698,728d697
<   <batch_system MACH="hydra" type="pbs" >
<     <batch_query args="-f" >qstat</batch_query>
<     <batch_submit>qsub </batch_submit>
<     <batch_cancel>qdel</batch_cancel>
<     <batch_env>-v</batch_env>
<     <batch_directive>#PBS</batch_directive>
<     <jobid_pattern>^(\S+)$</jobid_pattern>
<     <depend_string> -W depend=afterok:jobid</depend_string>
<     <depend_allow_string> -W depend=afterany:jobid</depend_allow_string>
<     <depend_separator>:</depend_separator>
<     <walltime_format>%H:%M:%S</walltime_format>
<     <batch_mail_flag>-M</batch_mail_flag>
<     <batch_mail_type_flag>-m</batch_mail_type_flag>
<     <batch_mail_type>, bea, b, e, a</batch_mail_type>
<     <submit_args>
<       <arg flag="-q" name="$JOB_QUEUE"/>
<       <arg flag="-l walltime=" name="$JOB_WALLCLOCK_TIME"/>
<       <arg flag="-A" name="$PROJECT"/>
<     </submit_args>
<     <directives>
<       <directive>-N {{ job_id }}</directive>
<       <directive default="n"> -r {{ rerunnable }} </directive>
<       <!-- <directive> -j oe {{ job_id }} </directive> -->
<       <directive> -j oe </directive>
<       <directive> -V </directive>
<     </directives>
<     <queues>
<       <queue walltimemin="" walltimemax="168:00:00" nodemin="0" nodemax="5" default="true">larga</queue>
<     </queues>
<   </batch_system>
< 

[editar] Work-flow

configuration of the work-flow is done via the file cime/config/cesm/machines/config_batch.xml

By now is it left as it is.


[editar] Case creation

Prior to the compilation, we need to make sure that all the optional additional components have already been compiled (see next section). After that we are going to use cime/scripts/create_newcase to compile and use the model. The model is compiled for each new experiment, since user might activate different components each time and compiling only the required components makes the simulation more efficient.

./create_newcase --case [CaseName] --res [Resolution] --compset [Compset] --mach hydra 

Where:

  • [CaseName]: Convention for the name of the experiment is done following this CESM Naming conventions web-page
  • [Resolution]: Available resolutions to use from CESM grids (not working). (We can use either: python3 cime/scripts/query_config --grids --full)
  • [Compset]: Componets activated
  • hydra: HPC to use
$ source /opt/load-libs.sh 1
$ cd cime/scripts/
$ ./create_newcase --case /share/cesm/expriments/b.day1.0 --res f19_g17 --compset B1850 --mach hydra >& run_create_newcase.log

[editar] Errors found

One must be aware, that it seems that error messages are not quite informative (by now)

Python error

/usr/bin/env: ‘python’: No such file or directory

Fixed by imposing python3

$ cp create_newcase create_newcase_orig
$ diff create_newcase create_newcase_orig
1c1
< #!/usr/bin/env python3
---
> #!/usr/bin/env python

Checking for the correct execution in file run_create_newcase.log and an error message is detected:

xmllint not found,...

Which seems to be related to the absence of the package libxml2-utils

New error:

Compset specific settings: name is RUN_STARTDATE and value is 0001-01-01
ERROR: Command: '/usr/bin/xmllint --xinclude --noout --schema /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/xml_schemas/co
nfig_machines.xsd /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/cesm/machines/config_machines.xml' failed with error '/hom
e/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/cesm/machines/config_machines.xml:2776: element machine: Schemas validity error 
: Element 'machine': Missing child element(s). Expected is one of ( mpirun, module_system ).
/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/config/cesm/machines/config_machines.xml fails to validate' from dir '/home/lluis.f
ita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/scripts'

machine configuration for hydra was lacking, that it does not have modules, so it was included:

(...)
    </mpirun>
    <module_system type="none"/>
(...)

Queue-system related error:

Batch_system_type is pbs
job is case.run USER_REQUESTED_WALLTIME None USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S
WARNING: No queue on this system met the requirements for this job. Falling back to defaults
ERROR: No queues found

No queue system was created for hydra. Now it has been added into cime/config/cesm/machines/config_batch.xml

[editar] Case setup

Now we are ready to set-up the case.

Going to the folder with the case ($ROOTCASE folder as it is understood)

$ cd /share/cesm/expriments/b.day1.0
$ ./case.setup >& run_case-setup.log

Looking for errors:

$ tail run_case-setup.log
You can now run './preview_run' to get more info on how your case will be run

Which provides the following configuration:

$ ./preview_run >& run_preview_run.log
$ cat run_preview_run.log 
CASE INFO:
  nodes: 6
  total tasks: 768
  tasks per node: 128
  thread count: 1

BATCH INFO:
  FOR JOB: case.run
    ENV:
      Setting Environment OMP_NUM_THREADS=1

    SUBMIT CMD:
      qsub -q larga -l walltime=168:00:00 -A none -q larga -l walltime=168:00:00 -A none -v ARGS_FOR_SCRIPT='--resubmit' .case.run

    MPIRUN (job=case.run):
      mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe   >> cesm.log.$LID 2>&1 

  FOR JOB: case.st_archive
    ENV:
      Setting Environment OMP_NUM_THREADS=1

    SUBMIT CMD:
      qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none  -W depend=afterok:0 -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive

[editar] errors

First error related to python...

$ ./case.setup 
/usr/bin/env: ‘python’: No such file or directory

Modifying it to use python3

$ cp case.setup case.setup_orig
$ diff case.setup case.setup_orig
1c1
< #!/usr/bin/env python3
---
> #!/usr/bin/env python

It happens everywhere, so, either I ask my IT team (as root) to fix it by creating a symlink to python, but based in python3 (see here)

# ln -s /usr/bin/python3 /usr/bin/python

Or I go back to the cime/scripts folder and change it everywhere: create_clone, create_newcase, create_test, query_config, query_testlists

Modify all the python scripts within cime/scripts/Tools

$ cp -R Tools Tools_orig

Scripts being modified: Tools/archive_metadata, Tools/bld_diff, Tools/bless_test_results, Tools/case.build, Tools/case.cmpgen_namelists, Tools/case_diff, Tools/case.qstatus, Tools/case.setup, Tools/case.submit, Tools/check_case, Tools/check_input_data, Tools/check_lockedfiles, Tools/cime_bisect, Tools/code_checker, Tools/compare_namelists, Tools/compare_test_results, Tools/component_compare_baseline, Tools/component_compare_copy, Tools/component_compare_test, Tools/component_generate_baseline, Tools/cs.status, Tools/e3sm_check_env, Tools/generate_cylc_workflow.py, Tools/get_case_env, Tools/get_standard_makefile_args, Tools/getTiming, Tools/jenkins_generic_job, Tools/list_e3sm_tests, Tools/list_e3sm_tests, Tools/mvsource, Tools/normalize_cases, Tools/pelayout, Tools/preview_namelists, Tools/preview_run, Tools/save_provenance, Tools/simple_compare, Tools/testreporter.py, Tools/wait_for_tests, Tools/xmlchange, Tools/xmlquery

Also inside folder Tools/xmlconvertors: Tools/xmlconvertors/config_pes_converter.py, Tools/xmlconvertors/grid_xml_converter.py, Tools/xmlconvertors/convert-grid-v1-to-v2

[editar] Case Build

Compiling the code for the case, before we are going to clean it, just in case...

$ ./case.build --clean
$ ./case.build >& run_case-build.log

Looking for errors:

After succesfull compilation, verify the presence of all the required input data with:

$ tail run_case-build.log
(...)
siac built in 1.117004 seconds
sesp built in 1.145180 seconds
cam built in 1.169075 seconds
Component glc build complete with 2 warnings
cism built in 233.087502 seconds
Building cesm from /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/cime/src/drivers/mct/cime_config/buildexe with output to /home/lluis.
fita/cesm/scratch/b.day1.0/bld/cesm.bldlog.240516-151609 
Time spent not building: 6.916887 sec
Time spent building: 260.266792 sec
MODEL BUILD HAS FINISHED SUCCESSFULLY
$ ./check_input_data --download >& run_check_input_data.log
(...)

Model cpl missing file wav2ocn_smapname = '/share/cesm/inputdata/cpl/gridmaps/ww3a/map_ww3a_TO_gx1v7_splice_170214.nc'
Trying to download file: 'cpl/gridmaps/ww3a/map_ww3a_TO_gx1v7_splice_170214.nc' to path '/share/cesm/inputdata/cpl/gridmaps/ww3a/map_ww3a_T
O_gx1v7_splice_170214.nc' using WGET protocol.
SUCCESS

[editar] Errors

There are errors related to the use of python

$ tail run_case-build.log 
(...)
Building lnd with output to /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236
clm built in 0.009455 seconds
ERROR: BUILD FAIL: clm.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236

Looking into:

cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-143236
/usr/bin/env: ‘python’: No such file or directory
(...)
ERROR: Command /home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/components/pop/bld/build-namelist failed rc=2
out=ERROR in MARBL_diags_to_tavg.py
err=/usr/bin/env: ‘python’: No such file or directory
ERROR: env CASEROOT=/share/cesm/expriments/b.day1.0 CASEBUILD=/share/cesm/expriments/b.day1.0/Buildconf OCN_GRID=gx1v7OCN_TAVG_TRAC
ER_BUDGET=FALSE POPROOT=/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox/components/pop /home/lluis.fita/CESM/v2.2.2/intel/my_ces
m_sandbox/components/pop/input_templates/ocn.ecosys.tavg.csh 4 .false. .true. failed: 256

Editing various python scripts to make sure of the use of python3, in order to make this happen, from $ROOTCESM, I executed the following code:

$ head [place]/* | grep -B 2 'env python' | grep -v directory >& py.log
$ cat py.log

being [place], incremental position within the folder structure of CESM: ./, *, */*, */*/*, */*/*/*, */*/*/*/*, */*/*/*/*/*, ...

For example

$ head ./* | grep -B 2 'env python' | grep -v directory >& py.log
head: error reading './cime': Is a directory
head: error reading './cime_config': Is a directory
head: error reading './components': Is a directory
head: error reading './components_orig': Is a directory
head: error reading './doc': Is a directory
head: error reading './manage_externals': Is a directory
$ cat py.log

==> ./describe_version <==
#!/usr/bin/env python3

Or even better:

$ grep -i python [place]/* | grep env | grep -v orig  | grep -v python3 >& py.log

components/pop/MARBL_scripts/add_cocco_to_init.py, components/pop/MARBL_scripts/MARBL_diags_to_tavg.py, cime/src/components/stub_comps_mct/siac/cime_config/buildlib, cime/src/components/stub_comps_mct/siac/cime_config/buildlib_cmake, cime/src/components/stub_comps_mct/siac/cime_config/buildnml

Modifying more python scripts in cime/src/build_scripts: cime/src/build_scripts/buildlib.cprnc, cime/src/build_scripts/buildlib.csm_share, cime/src/build_scripts/buildlib.gptl, cime/src/build_scripts/buildlib.kokkos, cime/src/build_scripts/buildlib.mct, cime/src/build_scripts/buildlib.mpi-serial, cime/src/build_scripts/buildlib.pio

Modifying more python scripts: components/clm/bld/namelist_files/createMkSrfEntry.py, components/clm/run_sys_tests, components/cam/cime_config/buildcpp, components/cam/cime_config/buildlib, components/cam/cime_config/buildnml, components/cam/manage_externals/checkout_externals, components/cdeps/cime_config/buildlib, components/cice/cime_config/buildcpp, components/cice/cime_config/buildlib, components/cice/cime_config/buildlib, components/cice/cime_config/buildnml, components/cism/manage_externals/checkout_externals, components/clm/cime_config/buildlib, components/clm/cime_config/buildnml, components/clm/manage_externals/checkout_externals, components/clm/python/run_ctsm_py_tests, components/mosart/cime_config/buildlib, components/mosart/cime_config/buildnml, components/pop/cime_config/phys_cycle_postrun, components/pop/cime_config/phys_cycle_preruncomponents/pop/cime_config/buildcpp, components/pop/cime_config/buildlib, components/pop/cime_config/buildnml, components/rtm/cime_config/buildlibm components/rtm/cime_config/buildnml, components/ww3/cime_config/buildlib, components/ww3/cime_config/buildnml

Also: components/clm/python/ctsm/test/test_sys_lilac_build_ctsm.py, components/clm/python/ctsm/test/test_unit_lilac_build_ctsm.py, components/clm/python/ctsm/test/test_unit_lilac_make_runtime_inputs.py, components/clm/python/ctsm/test/test_unit_machine.py, components/clm/python/ctsm/test/test_unit_path_utils.py, components/clm/python/ctsm/test/test_unit_run_sys_tests.py, components/clm/python/ctsm/test/test_unit_utils.py, components/clm/src/fates/tools/FatesPFTIndexSwapper.py, components/clm/src/fates/tools/modify_fates_paramfile.py, components/clm/src/fates/tools/ncvarsort.py, components/pop/externals/CVMix/bld/cvmix_setup

Another set pf modified files: cime/config/cesm/machines/template.case.test, cime/config/cesm/machines/template.st_archive, cime/config/ufs/machines/template.case.run, vim cime/config/ufs/machines/template.case.test, cime/config/ufs/machines/template.st_archive, cime/scripts/lib/CIME/BuildTools/configure.py, cime/scripts/lib/CIME/case/case_submit.py

There is an error during the building of the CISM component. The output of case.build:

(...)
- Building clm library
Building lnd with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/lnd.bldlog.240515-155359
Component lnd build complete with 6 warnings
clm built in 184.188484 seconds
- Building atm Library
Building atm with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/atm.bldlog.240515-155359
- Building ice Library
Building ice with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/ice.bldlog.240515-155359
- Building ocn Library
Building ocn with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/ocn.bldlog.240515-155359
- Building rof Library
Building rof with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/rof.bldlog.240515-155359
- Building glc Library
Building glc with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240515-155359
- Building wav Library
Building wav with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/wav.bldlog.240515-155359
- Building iac Library
Building iac with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/iac.bldlog.240515-155359
- Building esp Library
Building esp with output to
/home/lluis.fita/cesm/scratch/b.day1.0/bld/esp.bldlog.240515-155359
sesp built in 1.537159 seconds
siac built in 1.544745 seconds
ifort: error #10236: File not found:
'/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc/fortran_autogen_srcs/glide_io.F90'

ifort: error #10236: File not found:
'/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc/fortran_autogen_srcs/glint_io.F90'
cism built in 35.435328 seconds
mosart built in 42.410334 seconds
ww built in 54.594975 seconds
Component ice build complete with 1 warnings
cice built in 81.223358 seconds
Component ocn build complete with 13 warnings
pop built in 145.569954 seconds
Component atm build complete with 14 warnings
cam built in 229.021996 seconds
ERROR: BUILD FAIL: cism.buildlib failed, cat
/home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240515-155359

These files, that should be automatically created, are not being written:

bld/glc/fortran_autogen_srcs/glide_io.F90
bld/glc/fortran_autogen_srcs/glint_io.F90

In the case directory one has the file Buildconf/cismIOconf/cism.buildIO.csh, with the following content:

(...)
   #  create new _io.F90 file using CISM's python script
   #
---------------------------------------------------------------------------
   $PYTHON generate_ncvars.py $file_varsdef ncdf_template.F90.in

It is looking for the environment variable that holds python. In the root directory of CESM the file components/cism/bld/cismIO/README.cismIO:

This directory and its scripts are intended to allow the user to change IO fields from the CISM code.  The CISM IO files, *_io.F90, are
auto-generated and typically difficult to modify.  However, the corresponding variable definition files, *_vars.def, are easily modified
and the IO files can be re-generated by running the cism.buildIO.csh script contained in this directory.

Usage of this script requires that the user has defined an enviroment variable PYTHON pointing to a local version of python,  After that, 
the user simply runs the enclosed cism.buildIO.csh script, which runs a python script on each
(...)

Therefore I need to find where the PYTHON is defined within the cime/config/cesm/machines/config_*.xml files, or I defined it manually. It does not work it is defined previously the execution:

$ export PYTHON=/usr/bin/python3
$ echo $PYTHON
/usr/bin/python3
$ ./case.build --clean
$ ./case.build

Therefore, it is defined within Buildconf/cismIOconf/cism.buildIO.csh (must be defined in the template of this shell script!!). Looking in $ROOTCESM into components/cism/source_cism/utils/build/generate_ncvars.py, it is using python2! hydra does not support python2.

#!/usr/bin/env python2

Code is being modified by

#!/usr/bin/env python3

Then 2 other files need to be modified also: components/cism/source_cism/utils/build/autogenerate-in-build-dir, components/cism/source_cism/utils/build/autogen-for-glint-and-glad-in-build-dir, whenever they call python is replaced by python3

$ cat components/cism/source_cism/utils/build/autogenerate-in-build-dir | grep python
# Call python script with source file arguments
python3 -V
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLIDE_VARS_PATH $NCDF_TEMPL_PATH
$ cat components/cism/source_cism/utils/build/autogen-for-glint-and-glad-in-build-dir | grep python
# Call python script with source file arguments
python3 -V
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLINT_VARS_PATH $NCDF_TEMPL_PATH
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLINT_MBAL_PATH $NCDF_TEMPL_PATH
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLAD_VARS_PATH $NCDF_TEMPL_PATH
python3 $CISM_SOURCE_DIR/utils/build/generate_ncvars.py $GLAD_MBAL_PATH $NCDF_TEMPL_PATH

[editar] Case send

Submitting the case:

$ ./case.submit >& run_case-submit.log

Checking that it is running

$ qstat -u $USER

hydra: 
                                                                                  Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
43629.hydra             lluis.fita  larga    run.b.day1.0        721     1      1       --  168:00:00 R  00:00:51
43630.hydra             lluis.fita  larga    st_archive.b.day    --      1      1       --   00:20:00 H       -- 

[editar] Errors

[editar] mpirun not found

Submission error as:

$ cat run.b.day1.0.o43627
(...)
run command is mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe   >> cesm.log.$LID 2>&1  
ERROR: RUN FAIL: Command 'mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe   >> cesm.log.$LID 2>&1 ' failed
See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514
$ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514
/bin/sh: 1: mpirun: not found

Redifined mpirun in cime/config/cesm/machines/config_machines.xml as:

      <executable>/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun</executable>

If it is too late (prior to submit), it can be directly changed in the case directory in file env_mach_specific.xml

[editar] hydra's compilation environment

Simulation finished too early ...

$ qstat -u $USER

hydra: 
                                                                                  Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
43629.hydra             lluis.fita  larga    run.b.day1.0        721     1      1       --  168:00:00 C       -- 
43630.hydra             lluis.fita  larga    st_archive.b.day    --      1      1       --   00:20:00 C       -- 

Looking in the logs:

$ cat run.b.day1.0.o43629 
Generating namelists for /share/cesm/expriments/b.day1.0
Creating component namelists
(...)
-------------------------------------------------------------------------
run command is /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe   >> 
cesm.log.$LID 2>&1  
tm_poll: got event 1 return 0
tm_poll: INIT nodes 1
tm_poll: INIT daddy jobid 43629.hydra
tm_poll: INIT daddy node 0
tm_poll: INIT daddy tid 1
new_task: jobid=43629.hydra node=0 task=1
new_task: jobid=43629.hydra node=0 task=2
ERROR: RUN FAIL: Command '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.day1.0/bld/ces
m.exe   >> cesm.log.$LID 2>&1 ' failed
See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533

$ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533
(...)
Invalid character . in PBS_JOBID
tm_poll: got event 1 return 0
tm_poll: INIT nodes 1
tm_poll: INIT daddy jobid 43629.hydra
tm_poll: INIT daddy node -1
tm_poll: INIT daddy tid 0
new_task: jobid=43629.hydra node=-1 task=0
new_task: called with TM_ERROR_NODE
new_task: jobid=43629.hydra node=0 task=1

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 1006 RUNNING AT node43
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
tm_poll: got event 2 return 0
new_task: jobid=43629.hydra node=0 task=2
tm_poll: got event 3 return 0
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

And also:

$ cat CaseStatus 
2024-05-16 11:52:51: case.setup starting 
 ---------------------------------------------------
2024-05-16 11:52:52: case.setup success 
 ---------------------------------------------------
2024-05-16 11:55:32: case.build starting 
 ---------------------------------------------------
2024-05-16 12:04:41: case.build error 
ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240516-115532
 ---------------------------------------------------
2024-05-16 15:12:59: case.build starting 
 ---------------------------------------------------
2024-05-16 15:13:16: case.build error 
ERROR: BUILD FAIL: cism.buildlib failed, cat /home/lluis.fita/cesm/scratch/b.day1.0/bld/glc.bldlog.240516-151259
 ---------------------------------------------------
2024-05-16 15:16:09: case.build starting 
 ---------------------------------------------------
CESM version is release-cesm2.2.2
Processing externals description file : Externals.cfg (/home/lluis.fita/CESM/v2.2.2/intel/my_cesm_sandbox)
(...)
Checking local status of required & optional components: cam, chem_proc, carma, cosp2, clubb, silhs, pumas, atmos_phys, 
atmos_cubed_sphere, cice, cdeps, fox, cime, cmeps, cism, source_cism, clm, fates, fms, mom, mosart, pop, cvmix, marbl, 
rtm, ww3, 
 M  ./cime
        modified sandbox, on cime5.8.32.9
e-o ./cime/src/drivers/nuopc/
        -, not checked out --> 
 M  ./components/cam
        modified sandbox, on cam_cesm2_2_rel_09
    ./components/cam/chem_proc
        clean sandbox, on tools/proc_atm/chem_proc/release_tags/chem_proc5_0_04
    ./components/cam/src/atmos_phys
        clean sandbox, on version0_00_007
    ./components/cam/src/dynamics/fv3/atmos_cubed_sphere
        clean sandbox, on fv3_cesm.04
    ./components/cam/src/physics/carma/base
        clean sandbox, on carma/release_tags/carma3_49_rel
    ./components/cam/src/physics/clubb
        clean sandbox, on clubb_release_b76a124_20200220_c20200320
    ./components/cam/src/physics/cosp2/src
        clean sandbox, on v2.1.4cesm
    ./components/cam/src/physics/pumas
        clean sandbox, on pumas_cam-release_v1.3
    ./components/cam/src/physics/silhs
        clean sandbox, on silhs_clubb_release_b76a124_20200220_c20200320
 M  ./components/cdeps
        modified sandbox, on d808b7c6f78a2d5dcfeb1da0d1a452a9b66e08c8
    ./components/cdeps/fox
        clean sandbox, on 7b9488446b193192dd3f0378541e71099cb4e8a8
 M  ./components/cice
        modified sandbox, on cice5-cesm2.2.2-20231220
 M  ./components/cism
        modified sandbox, on cism2_1_69_b
 M  ./components/cism/source_cism
        modified sandbox, on release-cesm2.2.2-f1a88d6derecho
 M  ./components/clm
        modified sandbox, on release-cesm2.2.04
 M  ./components/clm/src/fates
        modified sandbox, on sci.1.30.0_api.8.0.0
e-o ./components/mom
        -, not checked out --> mi_20200908
 M  ./components/mosart
        modified sandbox, on mosart1_0_37_1
 M  ./components/pop
        modified sandbox, on pop2_cesm2_2_rel_n01
 M  ./components/pop/externals/CVMix
        modified sandbox, on v0.98-beta
    ./components/pop/externals/MARBL
        clean sandbox, on cesm2.2-n00
 M  ./components/rtm
        modified sandbox, on rtm1_0_72
 M  ./components/ww3
        modified sandbox, on ww3_200710
e-o ./libraries/FMS
        -, not checked out --> fi_20200609_cesm2.2_231205
2024-05-16 15:20:36: case.build success 
 ---------------------------------------------------
2024-05-16 16:55:05: case.submit starting 
 ---------------------------------------------------
2024-05-16 16:55:12: case.submit error 
ERROR: Command: 'qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none  -W depend=afterok:43
627.hydra -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error 'qsub: submit error (Invalid request)' fro
m dir '/share/cesm/expriments/b.day1.0'
 ---------------------------------------------------
2024-05-16 16:55:14: case.run starting 
 ---------------------------------------------------
2024-05-16 16:55:41: model execution starting 
 ---------------------------------------------------
2024-05-16 16:55:41: model execution error 
ERROR: Command: 'mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe   >> cesm.log.$LID 2>&1 ' failed 
with error '' from dir '/home/lluis.fita/cesm/scratch/b.day1.0/run'
 ---------------------------------------------------
2024-05-16 16:55:41: case.run error 
ERROR: RUN FAIL: Command 'mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe   >> cesm.log.$LID 2>&1 
' failed
See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43627.hydra.240516-165514
 ---------------------------------------------------
2024-05-16 17:05:25: case.submit starting 
 ---------------------------------------------------
2024-05-16 17:05:33: case.submit error 
ERROR: Command: 'qsub -q larga -l walltime=00:20:00 -A none -q larga -l walltime=00:20:00 -A none  -W depend=afterok:43
629.hydra -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error 'qsub: submit error (Invalid request)' fro
m dir '/share/cesm/expriments/b.day1.0'
 ---------------------------------------------------
2024-05-16 17:05:33: case.run starting 
 ---------------------------------------------------
2024-05-16 17:05:43: model execution starting 
 ---------------------------------------------------
2024-05-16 17:06:34: model execution error 
ERROR: Command: '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.day1.0/bld/
cesm.exe   >> cesm.log.$LID 2>&1 ' failed with error '' from dir '/home/lluis.fita/cesm/scratch/b.day1.0/run'
 ---------------------------------------------------
2024-05-16 17:06:34: case.run error 
ERROR: RUN FAIL: Command '/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpirun  -np 768  /home/lluis.fita/cesm/scratch/b.da
y1.0/bld/cesm.exe   >> cesm.log.$LID 2>&1 ' failed
See log file for details: /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43629.hydra.240516-170533
 ---------------------------------------------------

I suspect that it is related to the compilation environment. hydra does not have module. It set-ups compilation environment via shell scriupt /opt/load-libs.sh. There must be a way to systematically introduce it in the config_batch.xml in order to be executed in all the PBS jobs. A new post is created in the CESM forum #post-55530

From another attempt

$ cat /home/lluis.fita/cesm/scratch/b.day1.0/run/cesm.log.43633.hydra.240517-155125 | more
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading 
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
h file or directory
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading 
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
h file or directory
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading 
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
h file or directory
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading 
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
h file or directory
/home/lluis.fita/cesm/scratch/b.day1.0/bld/cesm.exe: error while loading 
shared libraries: libnetcdf.so.19: cannot open shared object file: No suc
h file or directory
(...)

Is set to use the variable prerun_script into the file in the case directory env_run.xml, see more details (searching for it here)

Trying different options:

    <entry id="PRERUN_SCRIPT">
      <type>char</type>
      <desc>External script to be run before model completion</desc>
      <values>
        <value>source /opt/load-libs.sh 1</value>
      </values>
$ ./case.submit
$ cat run.b.day1.0.o43637
ERROR: External script source /opt/load-libs.sh 1 not found

Looking inside /opt/load-libs.sh

    <entry id="PRERUN_SCRIPT">
      <type>char</type>
      <desc>External script to be run before model completion</desc>
      <values>
        <value>/opt/env_scripts/load_intel-2021.4.0_mpich-3.4.2.sh</value>
      </values>
$ ./case.submit
$ cat run.b.day1.0.o43639
   Running /opt/env_scripts/load_intel-2021.4.0_mpich-3.4.2.sh 
/bin/sh: 1: Syntax error: Bad fd number
    <entry id="PRERUN_SCRIPT">
      <type>char</type>
      <desc>External script to be run before model completion</desc>
      <values>
        <value>'source /opt/load-libs.sh 1'</value>
      </values>
$ ./case.submit
$ cat run.b.day1.0.o43641
ERROR: External script 'source /opt/load-libs.sh 1' not found

Creation of a simple shell script with the required content in /home/lluis.fita/intel_env.csh

#!/bin/sh
export PATH="/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin:$PATH"
export PATH="/opt/netcdf/netcdf-4/intel/2021.4.0/bin:$PATH"
export PATH="/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/bin:$PATH"
export PATH="/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/bin:$PATH"

export LD_LIBRARY_PATH=/opt/jasper/jasper-version-2.0.33/intel/2021.4.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/zlib/zlib-1.2.11/intel/2021.4.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/hdf5/hdf5-1.10.5/intel/2021.4.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/netcdf/netcdf-4/intel/2021.4.0/lib:$LD_LIBRARY_PATH

And then

    <entry id="PRERUN_SCRIPT">
      <type>char</type>
      <desc>External script to be run before model completion</desc>
      <values>
        <value>/home/lluis/intel_env.sh</value>
      </values>
$ ./case.submit
$ cat run.b.day1.0.o43647
Running /home/lluis.fita/intel_env.csh 
/bin/sh: 1: Syntax error: Bad fd number

whereas the execution of the shell from terminal does not give any error

$ /bin/sh /home/lluis.fita/intel_env.csh 

[editar] Optional additional components

[editar] ESM

It is necessary the High Performance Modeling Infrastructure (ESM) in May 2024, the latest version was the 8.6.0 and is the one being installed.

Getting the code in hydra in folder ESM

$ wget https://github.com/esmf-org/esmf/archive/refs/tags/v8.6.0.tar.gz

Compilation is done following these documentation, which the right detailed procedure specific-compilation-instructions.

[editar] GNU compilation

Installation location

$ mkdir -p v860/gnu
$ cd v860/gnu
$ tar xvfz ../v8.6.0.tar.gz
$ cd esmf-8.6.0/
$ ls
build  build_config  cmake  LICENSE  makefile  README.md  scripts  src

Starting by defining the location of the code

$ export ESMF_DIR=$PWD

Running to get local information

$ make info >& run_make_info.log


Launching the makefile in parallel:

$ make -j8 lib >& run_make.log
$ tail run_make.log
(...)
ESMF library built successfully on Mon 13 May 2024 12:58:46 PM -03
To verify, build and run the unit and system tests with: make check
 or the more extensive: make all_tests

Verifying that installation worked fine

$ make all_tests >& run_make_tests.log
$ $ cat run_make_tests.log  | grep failed
The following unit test files failed to build, failed to execute or crashed during execution:
Found 7800 non-exhaustive single processor unit tests, 7744 passed and 56 failed.
Found 8 single processor system tests, 8 passed and 0 failed.
Found 44 single processor examples, 44 passed and 0 failed.
Found 8 single processor system tests, 8 passed and 0 failed.
Found 44 single processor examples, 44 passed and 0 failed.
Found 7800 non-exhaustive single processor unit tests, 7744 passed and 56 failed.

Installing

$ make install >& run_make_install.log
$ tail run_make_install.log
(...)
ESMF installation complete.

[editar] intel compilation

Loading hydra's intel compilation environment

$ source /opt/load-libs.sh 1

Installation location

$ mkdir v860/intel
$ cd v860/intel
$ tar xvfz ../v8.6.0.tar.gz
$ cd esmf-8.6.0/
$ ls
build  build_config  cmake  LICENSE  makefile  README.md  scripts  src

[editar] Porting and validating CIME on a new platform

http://esmci.github.io/cime/versions/master/html/users_guide/porting-cime.html


[editar] Downloading the Input data

All input data will be downloaded in

/share/cesm

Input datasets are needed to run the model. CESM input data are available through a separate Subversion input data repository.

  • Change check_input_data header so it runs with Python 2.7.x version:
sed -i -e 's!/usr/bin/env python!/share/anaconda2/bin/python!' ./cime/scripts/Tools/check_input_data
Herramientas personales