Diferencia entre revisiones de «EMU»
| Línea 318: | Línea 318: | ||
| (...) | (...) | ||
| ipo_out2.f:(.text.hot000a3+0xc30): undefined reference to `mpi_allreduce_' | |||
| ld: ipo_out2.f:(.text.hot000a3+0xdd5): undefined reference to `mpi_allreduce_' | |||
| ld: ipo_out2.f:(.text.hot000a3+0xecf): undefined reference to `mpi_allreduce_' | |||
| ld: /tmp/ipo_ifortmsrIiB2.o: in function `all_proc_die_': | |||
| ipo_out2.f:(.text.hot000af+0x266): undefined reference to `mpi_finalize_' | |||
| ld: /tmp/ipo_ifortmsrIiB2.o: in function `stop_if_error_': | |||
| ipo_out2.f:(.text.hot000b1+0x129b): undefined reference to `mpi_allreduce_' | |||
| make[1]: *** [Makefile:2183: mitgcmuv] Error 1 | |||
| make[1]: Leaving directory '/share/EMU/emu_dir/WORKDIR/MITgcm/V4r4/flux-forced/build_trc' | |||
| make: *** [Makefile:2179: fwd_exe_target] Error 2 | |||
| </PRE> | </PRE> | ||
Revisión del 19:16 2 jul 2025
Here we describe the installation and use of the ECCO model in CIMA-IFAECI's hydra HPC
Installation
We are going to follow the instructions from the 2025 ECCO summer school - EMU Installation, specifically to install the tools by our selves following these instructions.
First we look into the right GIThub repository, being
https://github.com/ECCO-GROUP/ECCO-EIS/tree/main/emu
We are going to install the model for all hydra users. So, our $INSTALLDIR will be:
INSTALLDIR=/share/EMU
Obtaining the code and installing $INSTALLDIR (we are going to use [NASA https://ecco.jpl.nasa.gov/drive/ Earthdata] user: lluisfita (using its WebDAV password). NOTE: MIT certificate is not well set-up, we needed to modify the script to keep going. To get the WebDAV password log into NASA Earthdata.
MIT certificate issues
The MIT certificate set-up to download the emu codes provides the following message
$ wget --no-check-certificate --spider --user="lluisfita" --password="[pwd]" https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/other/flux-forced Spider mode enabled. Check if remote file exists. --2025-06-23 12:34:57-- https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/other/flux-forced Resolving ecco.jpl.nasa.gov (ecco.jpl.nasa.gov)... 128.149.52.112 Connecting to ecco.jpl.nasa.gov (ecco.jpl.nasa.gov)|128.149.52.112|:443... connected. WARNING: The certificate of 'ecco.jpl.nasa.gov' is not trusted. WARNING: The certificate of 'ecco.jpl.nasa.gov' doesn't have a known issuer. HTTP request sent, awaiting response... 401 Unauthorized Authentication selected: Basic realm="ECCO_Drive" Connecting to ecco.jpl.nasa.gov (ecco.jpl.nasa.gov)|128.149.52.112|:443... connected. WARNING: The certificate of 'ecco.jpl.nasa.gov' is not trusted. WARNING: The certificate of 'ecco.jpl.nasa.gov' doesn't have a known issuer. HTTP request sent, awaiting response... 200 OK Length: 13420 (13K) [text/html] Remote file exists and could contain further links, but recursion is disabled -- not retrieving.
Including --no-check-certificate into the wget inside emu_setup.sh allows to download the data
cd $INSTALLDIR wget https://raw.githubusercontent.com/ECCO-GROUP/ECCO-EIS/main/emu/emu_setup.sh chmod +x ./emu_setup.sh ./emu_setup.sh ------------------------------------------------------------------------------ This script sets up EMU, a collection of computational tools for analyzing the ECCO model (flux-forced version of ECCO Version 4 Release 4). The Tools include the following; 1) Sampling (samp); Evaluates state time-series from model output. 2) Forward Gradient (fgrd); Computes model's forward gradient. 3) Adjoint (adj); Computes model's adjoint gradient. 4) Convolution (conv); Evaluates adjoint gradient decomposition. 5) Tracer (trc); Computes passive tracer evolution. 6) Budget (budg); Evaluates budget time-series from model output. 7) Modified Simulation (msim); Re-runs model with modified input. 8) Attribution (atrb); Evaluates state time-series by control type. 9) Auxiliary (aux): Generates user input files for other EMU tools. ************************ This script will install EMU's Programs (~1GB), its User Interface (~2MB), and download its Input Files (~1TB) to user-specified directories. Users should not move or alter these directories or their files unless noted otherwise (e.g., conforming batch scripts pbs_*.sh for the host system, installed in the User Interface directory). Once installed, any user of the host system should be able to utilize the installed files and programs; Separate installations for different users are not necessary. Installation requires obtaining a NASA Earthdata account for downloading files from https://ecco.jpl.nasa.gov/drive/. Enter your Earthdata username and WebDAV password (not your Earthdata password) at the prompts below. The WebDAV password can be found at this URL after logging in with your Earthdata username and Earthdata password, or click the 'Back to WebDAV Credentials' button when browsing files at the URL. See the README file that will be installed in the User Interface directory for details of EMU, including instructions on how to use it. ************************ Press ENTER key to continue ... ---------------------- Enter your Earthdata username: lluisfita Enter your WebDAV password (*NOT* Earthdata password): LX7SPsj9N8U8puZA5whS ---------------------- Enter directory name (emu_dir) to download and set up EMU's Programs (~1 GB) or press the ENTER key to use EMU's default (emu_dir under the present directory) ... ? EMU's Programs will be installed in /share/EMU/emu_dir ---------------------- Enter directory name (emu_userinterface_dir) to install EMU's User Interface (~2 MB) or press the ENTER key to use EMU's default (emu_userinterface_dir under the present directory) ... ? EMU's User Interface will be installed in /share/EMU/emu_userinterface_dir ---------------------- Enter directory name (emu_input_dir) to download up to 1.1 TB of EMU's Input Files or press the ENTER key to use EMU's default (emu_input_dir under the present directory) .... ? EMU's Input Files will be downloaded to /share/EMU/emu_input_dir ************************ NOTE: See *.log files in /share/EMU/emu_dir/temp_setup should this script fail. ************************ ---------------------- EMU's Programs can be installed in two different ways; 1) Compiling source code on host (native) 2) Using Singularity image (singularity) Option 1) requires a TAF license to derive the MITgcm adjoint used by EMU's Adjoint Tool. Option 2) has compiled versions of the code in containerized form that do not require a separate TAF license to use. Enter choice for type of EMU implementation ... (1 or 2)? 1 Implementation type choice is 1 ---------------------- EMU uses batch scripts to run some of its tools in PBS (Portable Batch System). The PBS commands in these shell scripts (pbs_*.sh), installed in EMU's User Interface directory (emu_userinterface_dir) /share/EMU/emu_userinterface_dir may need to be revised for different batch systems and/or different hosts. Alternatively, these shell scripts can be run interactively if sufficient resources are available. Enter the command for submitting batch jobs (e.g., qsub, sbatch, bsub <, condor_submit, msub) or press the ENTER key to have EMU run its batch scripts interactively ... ? qsub Command to submit EMU's batch job scripts will be: qsub ---------------------- EMU's Input Files total 1.1 TB, of which (directory) 175 GB (emu_ref) is needed by Sampling, Forward Gradient, Adjoint, Tracer, Budget, and Attribution 195 GB (forcing) is needed by Forward Gradient, Adjoint, Modified Simultion 380 GB (state_weekly) is needed by Tracer 290 GB (emu_msim) is needed by Attribution (Convolution Tool uses results of the Adjoint Tool and files downloaded by default.) Choose among the following to download ... 0) All Input Files (1.1 TB) 1) Files (~175 GB) needed for Sampling and Budget Tools 2) Files (~195 GB) needed for Modified Simultion Tools 3) Files (~370 GB) needed for Adjoint and Forward Gradient Tool 4) Files (~465 GB) needed for Attribution Tool 5) Files (~555 GB) needed for Tracer Tool or press the ENTER key to skip this step, which can take a while (~13 hours if downloading all input files.) EMU's Input Files can be downloaded later with shell script /share/EMU/emu_userinterface_dir/emu_input_setup.sh See /share/EMU/emu_userinterface_dir/README_input_setup for additional detail, including options to download the input in batch mode. Enter Input Files download choice ... ? 0 ---------------------- Choose number of CPU cores (nproc) for running MITgcm. Choose among the following nproc ... 13 36 48 68 72 96 192 360 Enter choice for nproc ... ? 48 Number of CPU cores to be used for MITgcm: 48 ********************** End of user input for EMU setup Rest of this script is conducted without user input. ---------------------- Download and compiling EMU on host system in directory /share/EMU/emu_dir ---------------------- Download and compiling MITgcm and its adjoint in /share/EMU/emu_dir/emu/exe/nproc This can take a while (~30 minutes). Progress can be monitored in file /share/EMU/emu_dir/temp_setup/emu_compile_mdl.log tail /share/EMU/emu_dir/temp_setup/emu_compile_mdl.log
Now we will work in solving the problems found in the log files, always looking into the newest file in emu_dir/temp_setup
emu_dir/emu/native/emu_compile_mdl.sh: line 106: /usr/local/lib/global.profile: No such file or directory
Inside emu_dir/emu/native/emu_compile_mdl.sh we found the following lines, where the compilation environment is set-up
# Get the directory containing the script (full path to emu/native)
nativedir=$(dirname "$script_path")
(...)
# 7) Load module for compilation. 
source /usr/local/lib/global.profile
source ${nativedir}/set_modules.sh
In CIMA's hydra there is not /usr/local/lib/global.profile. File ${nativedir}/set_modules.sh assumes the existence of modules. hydra does not have modules, it uses on-purpose scripts to set-up compilation environment... Wea are going to adapt emu_compile_mdl.sh to hydra characteristics
$ cp emu_dir/emu/native/emu_compile_mdl.sh emu_dir/emu/native/emu_compile_mdl_orig.sh
$ vim emu_dir/emu/native/emu_compile_mdl.sh
$ diff emu_dir/emu/native/emu_compile_mdl.sh emu_dir/emu/native/emu_compile_mdl_orig.sh
106,108c106,107
< #source /usr/local/lib/global.profile
< #source ${nativedir}/set_modules.sh
< #source /opt/load-libs.sh 1
---
> source /usr/local/lib/global.profile
> source ${nativedir}/set_modules.sh
These modifications are not working, because the code is being downloaded at each time. Therefore, we modify emu_setup.sh in order to avoid re-downloading the code, if it is already there. (see next point).
This is not working, since /opt/load-libs.sh has a misconfiguration, which provokes to upload emu's script into the source, instead of the right configuration. Therefor, we must pre-upload configuration environment before compiling/downloading EMU. In hydra to upload :
$ source /opt/load-libs.sh 1 The following libraries, compiled with Intel 2021.4.0 compilers, were loaded: * MPICH 3.4.2 * NetCDF 4 * HDF5 1.10.5 * JASPER 2.0.33
We need to define system variables in order to be able to compile the code
$ nc-config --all (...) --has-nczarr -> yes --prefix -> /opt/netcdf/netcdf-4/intel/2021.4.0 --includedir -> /opt/netcdf/netcdf-4/intel/2021.4.0/include --libdir -> /opt/netcdf/netcdf-4/intel/2021.4.0/lib --version -> netCDF 4.8.1 $ export NETCDF=/opt/netcdf/netcdf-4/intel/2021.4.0 $ which mpif90 /opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpif90 $ export MPI_ROOT=/opt/mpich/mpich-3.4.2/intel/2021.4.0 $ export MPI=true
After solving this problem, setup shell abruptly stops without any message. However we found in emu_dir/temp_setup/download_emu_source.log
$ cat ./emu_dir/temp_setup/download_emu_source.log Cloning into 'ECCO-EIS'... mv: cannot move 'ECCO-EIS/emu' to './emu': Directory not empty
In the previous attempt we already cloned the code, therefore we are going to make sure, that we remove the folder before the clone is made, again modifying scripts, this time emu_setup.sh:
$ cp emu_setup.sh emu_setup_orig.sh
$ vim emu_setup.sh 
$ diff emu_setup.sh emu_setup_orig.sh
539,541d538
<     if test -d emu; then
<       echo "  Code has already previously been cloned... removing the pre-existing one" > "$log_file" 
  2>> "$log_file"
<       rm -rf ${emu_dir}/WORKDIR
<     else
548d544
<     fi
Compilation failed. Looking to the emu_dir/temp_setup/emu_compile_mdl.log 
Failed compilation: Compilation's environment
Compilation error says:
$ cat emu_dir/temp_setup/emu_compile_mdl.log (...) profiles_init_ncfile.f(406): remark #15009: profiles_init_ncfile_ has been targeted for automatic cpu dispatch tracers_correction_step.f(385): remark #15009: tracers_correction_step_ has been targeted for automatic cpu dispatch ptracers_reset.f(483): remark #15009: ptracers_reset_ has been targeted for automatic cpu dispatch ld: /tmp/ipo_ifortCaY5f31.o: in function `diagnostics_calc_phivel_.V': ipo_out1.f:(.text+0x3011c): undefined reference to `mpi_allreduce_' ld: ipo_out1.f:(.text+0x30336): undefined reference to `mpi_allreduce_' ld: /tmp/ipo_ifortJ3swBq1.o: in function `diagnostics_calc_phivel_.A': (...) ipo_out2.f:(.text.hot000a3+0xc30): undefined reference to `mpi_allreduce_' ld: ipo_out2.f:(.text.hot000a3+0xdd5): undefined reference to `mpi_allreduce_' ld: ipo_out2.f:(.text.hot000a3+0xecf): undefined reference to `mpi_allreduce_' ld: /tmp/ipo_ifortmsrIiB2.o: in function `all_proc_die_': ipo_out2.f:(.text.hot000af+0x266): undefined reference to `mpi_finalize_' ld: /tmp/ipo_ifortmsrIiB2.o: in function `stop_if_error_': ipo_out2.f:(.text.hot000b1+0x129b): undefined reference to `mpi_allreduce_' make[1]: *** [Makefile:2183: mitgcmuv] Error 1 make[1]: Leaving directory '/share/EMU/emu_dir/WORKDIR/MITgcm/V4r4/flux-forced/build_trc' make: *** [Makefile:2179: fwd_exe_target] Error 2
Compilation is performed via the emu_dir/emu/native/emu_compile_mdl.sh script, which includes the following compilation
(...)
echo " " 
echo "COMPILING offline tracer model ADJOINT -----------------------------------"
echo " "
if [ $new_compilation -eq 0 ]; then
    mkdir build_trc_ad
    cd build_trc_ad
#    ../../../tools/genmake2 -mods=../code_offline_ptracer 
  -optfile=../../../tools/build_options/linux_amd64_ifort+mpi_ice_nas -mpi
    ../../../tools/genmake2 -mods=../code_offline_ptracer 
  -optfile=../code/linux_amd64_ifort+mpi_ice_nas -mpi
    make depend
Looking for the environment file
$ find ./ -name linux_amd64_ifort+mpi_ice_nas ./emu_dir/WORKDIR/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas ./emu_dir/WORKDIR/MITgcm/V4r4/flux-forced/code/linux_amd64_ifort+mpi_ice_nas ./emu_dir/WORKDIR/MITgcm/V4r4/flux-forced/code_offline_ptracer/ linux_amd64_ifort+mpi_ice_nas
linux_amd64_ifort+mpi_ice_nas
We found multiple pre-defined files, we will need to adapt them to CIMA's hydra and try to keep them even after downloading from GIT repository
$ cat ./emu_dir/WORKDIR/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas (...) LIBS='-L/nasa/sgi/mpt/2.14r19/lib -lmpi -L/nasa/netcdf/4.0/lib -lnetcdf' INCLUDES='-I/nasa/sgi/mpt/2.14r19/include -I/nasa/netcdf/4.0/include'
$ cat ./emu_dir/WORKDIR/MITgcm/V4r4/flux-forced/code/linux_amd64_ifort+mpi_ice_nas
(...)
LIBS=''
if [ -n "$MPI_ROOT" -a "x$MPI" = xtrue ] ; then
    if [ -z "$MPI_INC_DIR" ]; then
      MPI_INC_DIR="${MPI_ROOT}/include"
    fi
    LIBS="$LIBS -L${MPI_ROOT}/lib -lmpi"
fi
if [ -n "$MPI_INC_DIR" -a "x$MPI" = xtrue ] ; then
    INCLUDES="$INCLUDES -I${MPI_INC_DIR}"
    #INCLUDEDIRS="$INCLUDEDIRS $MPI_INC_DIR"
    #- used for parallel (MPI) DIVA
cat ./emu_dir/WORKDIR/MITgcm/V4r4/flux-forced/code_offline_ptracer/
  linux_amd64_ifort+mpi_ice_nas
(...)
INCLUDES=''
LIBS=''
if [ -n "$MPI_ROOT" -a "x$MPI" = xtrue ] ; then
    if [ -z "$MPI_INC_DIR" ]; then
      MPI_INC_DIR="${MPI_ROOT}/include"
    fi
    LIBS="$LIBS -L${MPI_ROOT}/lib -lmpi"
fi
if [ -n "$MPI_INC_DIR" -a "x$MPI" = xtrue ] ; then
    INCLUDES="$INCLUDES -I${MPI_INC_DIR}"
    #INCLUDEDIRS="$INCLUDEDIRS $MPI_INC_DIR"
File code/linux_amd64_ifort+mpi_ice_nas contains the compilation environment with the definition of multiple environment variables:
#! /usr/bin/env bash
(...)
FC=ifort
CC=icc
CPP='/lib/cpp -traditional -P'
DEFINES='-DWORDLENGTH=4 -DINTEL_COMMITQQ'
F90FIXEDFORMAT='-fixed -Tf'
EXTENDED_SRC_FLAG='-132'
(...)
F90FLAGS=$FFLAGS
F90OPTIM=$FOPTIM
INCLUDEDIRS=''
INCLUDES=''
LIBS=''
if [ -n "$MPI_ROOT" -a "x$MPI" = xtrue ] ; then
    if [ -z "$MPI_INC_DIR" ]; then
                           MPI_INC_DIR="${MPI_ROOT}/include"
    fi
    LIBS="$LIBS -L${MPI_ROOT}/lib -lmpi"
fi
if [ -n "$MPI_INC_DIR" -a "x$MPI" = xtrue ] ; then
    INCLUDES="$INCLUDES -I${MPI_INC_DIR}"
    #INCLUDEDIRS="$INCLUDEDIRS $MPI_INC_DIR"
    #- used for parallel (MPI) DIVA
    MPIINCLUDEDIR="$MPI_INC_DIR"
    #MPI_HEADER_FILES='mpif.h mpiof.h'
fi
if [ "x$NETCDF" != x ] ; then
    INCLUDES="$INCLUDES -I${NETCDF}/include"
    #INCLUDEDIRS="$INCLUDEDIRS ${NETCDF}/include"
    LIBS="$LIBS -L${NETCDF}/lib"
fi
CIMA's hydra does not have defined environment variables like $MPI_ROOT, $MPI, ... will be necessarily be defined on purpose after uploading the compilation environment.
Only the file in linux_amd64_ifort+mpi_ice_nas in emu_dir/WORKDIR/MITgcm/tools/build_options/, has library paths from NASA HPCs, we need to create a generalized one to make it work.
$ cp ./emu_dir/WORKDIR/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas linux_amd64_ifort+mpi_ice_nas_generic
$ diff linux_amd64_ifort+mpi_ice_nas_generic ./emu_dir/WORKDIR/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas
20,21c20,21
< LIBS='-L${MPI_ROOT}/lib -lmpi -L${NETCDF}/lib -lnetcdf'
< INCLUDES='-I${MPI_ROOT}/include -I${NETCDF}/include'
---
> LIBS='-L/nasa/sgi/mpt/2.14r19/lib -lmpi -L/nasa/netcdf/4.0/lib -lnetcdf'
> INCLUDES='-I/nasa/sgi/mpt/2.14r19/include -I/nasa/netcdf/4.0/include'
Inside emu_dir/emu/native/emu_compile_mdl.sh we modify it to use the generic version of the environment file
$ diff emu_dir/emu/native/emu_compile_mdl_orig.sh emu_dir/emu/native/emu_compile_mdl.sh 
(...)
104a105,109
> # Generalizing compilation....
> cp ${emu_dir}/WORKDIR/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas ${emu_dir}/WORKDIR/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas_orig
> cp ${emu_dir}/../linux_amd64_ifort+mpi_ice_nas_generic ${emu_dir}/WORKDIR/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas
> 
(...)
