Diferencia entre revisiones de «EMU»

De Wikicima
Línea 307: Línea 307:
profiles_init_ncfile.f(406): remark #15009: profiles_init_ncfile_ has been  
profiles_init_ncfile.f(406): remark #15009: profiles_init_ncfile_ has been  
   targeted for automatic cpu dispatch
   targeted for automatic cpu dispatch
tracers_correction_step.f(385): remark #15009: tracers_correction_step_ has been
tracers_correction_step.f(385): remark #15009: tracers_correction_step_ has  
   targeted for automatic cpu dispatch
   been targeted for automatic cpu dispatch
ptracers_reset.f(483): remark #15009: ptracers_reset_ has been targeted for automatic cpu dispatch
ptracers_reset.f(483): remark #15009: ptracers_reset_ has been targeted for  
  automatic cpu dispatch
ld: /tmp/ipo_ifortCaY5f31.o: in function `diagnostics_calc_phivel_.V':
ld: /tmp/ipo_ifortCaY5f31.o: in function `diagnostics_calc_phivel_.V':
ipo_out1.f:(.text+0x3011c): undefined reference to `mpi_allreduce_'
ld: ipo_out1.f:(.text+0x30336): undefined reference to `mpi_allreduce_'
ld: /tmp/ipo_ifortJ3swBq1.o: in function `diagnostics_calc_phivel_.A':
(...)
(...)
</PRE>
</PRE>

Revisión del 17:48 2 jul 2025

Here we describe the installation and use of the ECCO model in CIMA-IFAECI's hydra HPC

Installation

We are going to follow the instructions from the 2025 ECCO summer school - EMU Installation, specifically to install the tools by our selves following these instructions.

First we look into the right GIThub repository, being

https://github.com/ECCO-GROUP/ECCO-EIS/tree/main/emu

We are going to install the model for all hydra users. So, our $INSTALLDIR will be:

INSTALLDIR=/share/EMU

Obtaining the code and installing $INSTALLDIR (we are going to use [NASA https://ecco.jpl.nasa.gov/drive/ Earthdata] user: lluisfita (using its WebDAV password). NOTE: MIT certificate is not well set-up, we needed to modify the script to keep going. To get the WebDAV password log into NASA Earthdata.

MIT certificate issues

The MIT certificate set-up to download the emu codes provides the following message

$ wget --no-check-certificate --spider --user="lluisfita" --password="[pwd]"
https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/other/flux-forced
Spider mode enabled. Check if remote file exists.
--2025-06-23 12:34:57--
https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/other/flux-forced
Resolving ecco.jpl.nasa.gov (ecco.jpl.nasa.gov)... 128.149.52.112
Connecting to ecco.jpl.nasa.gov
(ecco.jpl.nasa.gov)|128.149.52.112|:443... connected.
WARNING: The certificate of 'ecco.jpl.nasa.gov' is not trusted.
WARNING: The certificate of 'ecco.jpl.nasa.gov' doesn't have a known issuer.
HTTP request sent, awaiting response... 401 Unauthorized
Authentication selected: Basic realm="ECCO_Drive"
Connecting to ecco.jpl.nasa.gov
(ecco.jpl.nasa.gov)|128.149.52.112|:443... connected.
WARNING: The certificate of 'ecco.jpl.nasa.gov' is not trusted.
WARNING: The certificate of 'ecco.jpl.nasa.gov' doesn't have a known issuer.
HTTP request sent, awaiting response... 200 OK
Length: 13420 (13K) [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

Including --no-check-certificate into the wget inside emu_setup.sh allows to download the data

cd $INSTALLDIR
wget https://raw.githubusercontent.com/ECCO-GROUP/ECCO-EIS/main/emu/emu_setup.sh
chmod +x ./emu_setup.sh
./emu_setup.sh

------------------------------------------------------------------------------
 This script sets up EMU, a collection of computational tools for analyzing
 the ECCO model (flux-forced version of ECCO Version 4 Release 4). The Tools 
 include the following;

   1) Sampling (samp); Evaluates state time-series from model output.
   2) Forward Gradient (fgrd); Computes model's forward gradient.
   3) Adjoint (adj); Computes model's adjoint gradient.
   4) Convolution (conv); Evaluates adjoint gradient decomposition.
   5) Tracer (trc); Computes passive tracer evolution.
   6) Budget (budg); Evaluates budget time-series from model output.
   7) Modified Simulation (msim); Re-runs model with modified input.
   8) Attribution (atrb); Evaluates state time-series by control type.
   9) Auxiliary (aux): Generates user input files for other EMU tools.

************************
 This script will install EMU's Programs (~1GB), its User Interface (~2MB), 
 and download its Input Files (~1TB) to user-specified directories. 

 Users should not move or alter these directories or their files unless 
 noted otherwise (e.g., conforming batch scripts pbs_*.sh for the host 
 system, installed in the User Interface directory). Once installed, 
 any user of the host system should be able to utilize the installed files 
 and programs; Separate installations for different users are not necessary. 

 Installation requires obtaining a NASA Earthdata account for downloading 
 files from https://ecco.jpl.nasa.gov/drive/. Enter your Earthdata 
 username and WebDAV password (not your Earthdata password) at the prompts 
 below. The WebDAV password can be found at this URL after logging in with 
 your Earthdata username and Earthdata password, or click the 'Back to 
 WebDAV Credentials' button when browsing files at the URL.

 See the README file that will be installed in the User Interface directory 
 for details of EMU, including instructions on how to use it.
************************

Press ENTER key to continue ... 


----------------------
Enter your Earthdata username: lluisfita
Enter your WebDAV password (*NOT* Earthdata password): LX7SPsj9N8U8puZA5whS


----------------------
Enter directory name (emu_dir) to download and set up EMU's Programs (~1 GB) 
or press the ENTER key to use EMU's default (emu_dir under the present directory) ... ?


EMU's Programs will be installed in 
/share/EMU/emu_dir

----------------------
Enter directory name (emu_userinterface_dir) to install EMU's User Interface
(~2 MB) or press the ENTER key to use EMU's default (emu_userinterface_dir 
under the present directory) ... ?


EMU's User Interface will be installed in 
/share/EMU/emu_userinterface_dir

----------------------
Enter directory name (emu_input_dir) to download up to 1.1 TB of EMU's Input 
Files or press the ENTER key to use EMU's default (emu_input_dir under the 
present directory) .... ? 


EMU's Input Files will be downloaded to 
/share/EMU/emu_input_dir

************************
NOTE: See *.log files in /share/EMU/emu_dir/temp_setup should this script fail.
************************

----------------------
EMU's Programs can be installed in two different ways;
   1) Compiling source code on host (native) 
   2) Using Singularity image (singularity)

Option 1) requires a TAF license to derive the MITgcm adjoint used by EMU's 
Adjoint Tool. Option 2) has compiled versions of the code in 
containerized form that do not require a separate TAF license to use.

Enter choice for type of EMU implementation ... (1 or 2)?
1

Implementation type choice is 1

----------------------
EMU uses batch scripts to run some of its tools in PBS (Portable 
Batch System). The PBS commands in these shell scripts (pbs_*.sh),
installed in EMU's User Interface directory (emu_userinterface_dir)
/share/EMU/emu_userinterface_dir
may need to be revised for different batch systems and/or different hosts. 
Alternatively, these shell scripts can be run interactively if sufficient 
resources are available.

Enter the command for submitting batch jobs (e.g., qsub, sbatch, 
bsub <, condor_submit, msub) or press the ENTER key to have EMU 
run its batch scripts interactively ... ?
qsub

Command to submit EMU's batch job scripts will be: qsub

----------------------
EMU's Input Files total 1.1 TB, of which (directory)
   175 GB (emu_ref) is needed by Sampling, Forward Gradient, Adjoint, Tracer, Budget, and Attribution
   195 GB (forcing) is needed by Forward Gradient, Adjoint, Modified Simultion
   380 GB (state_weekly) is needed by Tracer
   290 GB (emu_msim) is needed by Attribution
   (Convolution Tool uses results of the Adjoint Tool and files downloaded by default.)

Choose among the following to download ... 
   0) All Input Files (1.1 TB) 
   1) Files (~175 GB) needed for Sampling and Budget Tools
   2) Files (~195 GB) needed for Modified Simultion Tools
   3) Files (~370 GB) needed for Adjoint and Forward Gradient Tool
   4) Files (~465 GB) needed for Attribution Tool
   5) Files (~555 GB) needed for Tracer Tool
or press the ENTER key to skip this step, which can take a while
(~13 hours if downloading all input files.) 

EMU's Input Files can be downloaded later with shell script
   /share/EMU/emu_userinterface_dir/emu_input_setup.sh 
See 
   /share/EMU/emu_userinterface_dir/README_input_setup 
for additional detail, including options to download the input
in batch mode.

Enter Input Files download choice ... ?
0


----------------------
Choose number of CPU cores (nproc) for running MITgcm.
Choose among the following nproc ... 

13
36
48
68
72
96
192
360

Enter choice for nproc ... ?
48


Number of CPU cores to be used for MITgcm: 48

**********************
 End of user input for EMU setup 
 Rest of this script is conducted without user input.

----------------------
Download and compiling EMU on host system in directory 
/share/EMU/emu_dir


----------------------
Download and compiling MITgcm and its adjoint in 
/share/EMU/emu_dir/emu/exe/nproc
This can take a while (~30 minutes). 
Progress can be monitored in file  /share/EMU/emu_dir/temp_setup/emu_compile_mdl.log
  tail /share/EMU/emu_dir/temp_setup/emu_compile_mdl.log 

Now we will work in solving the problems found in the log files, always looking into the newest file in emu_dir/temp_setup

/share/EMU/emu_dir/temp_setup/emu_compile_mdl.log

emu_dir/emu/native/emu_compile_mdl.sh: line 106: /usr/local/lib/global.profile: No such file or directory

Inside emu_dir/emu/native/emu_compile_mdl.sh we found the following lines, where the compilation environment is set-up


# Get the directory containing the script (full path to emu/native)
nativedir=$(dirname "$script_path")
(...)
# 7) Load module for compilation. 
source /usr/local/lib/global.profile
source ${nativedir}/set_modules.sh

In CIMA's hydra there is not /usr/local/lib/global.profile. File ${nativedir}/set_modules.sh assumes the existence of modules. hydra does not have modules, it uses on-purpose scripts to set-up compilation environment... Wea are going to adapt emu_compile_mdl.sh to hydra characteristics

$ cp emu_dir/emu/native/emu_compile_mdl.sh emu_dir/emu/native/emu_compile_mdl_orig.sh
$ vim emu_dir/emu/native/emu_compile_mdl.sh
$ diff emu_dir/emu/native/emu_compile_mdl.sh emu_dir/emu/native/emu_compile_mdl_orig.sh
106,108c106,107
< #source /usr/local/lib/global.profile
< #source ${nativedir}/set_modules.sh
< #source /opt/load-libs.sh 1
---
> source /usr/local/lib/global.profile
> source ${nativedir}/set_modules.sh

These modifications are not working, because the code is being downloaded at each time. Therefore, we modify emu_setup.sh in order to avoid re-downloading the code, if it is already there. (see next point).

This is not working, since /opt/load-libs.sh has a misconfiguration, which provokes to upload emu's script into the source, instead of the right configuration. Therefor, we must pre-upload configuration environment before compiling/downloading EMU. In hydra to upload :

$ source /opt/load-libs.sh 1
The following libraries, compiled with Intel 2021.4.0 compilers, were loaded:
* MPICH 3.4.2
* NetCDF 4
* HDF5 1.10.5
* JASPER 2.0.33

We need to define system variables in order to be able to compile the code

$ nc-config --all
(...)
  --has-nczarr    -> yes

  --prefix        -> /opt/netcdf/netcdf-4/intel/2021.4.0
  --includedir    -> /opt/netcdf/netcdf-4/intel/2021.4.0/include
  --libdir        -> /opt/netcdf/netcdf-4/intel/2021.4.0/lib
  --version       -> netCDF 4.8.1
$ export NETCDF=/opt/netcdf/netcdf-4/intel/2021.4.0
$ which mpif90
/opt/mpich/mpich-3.4.2/intel/2021.4.0/bin/mpif90
$ export MPI_ROOT=/opt/mpich/mpich-3.4.2/intel/2021.4.0

/share/EMU/emu_dir/temp_setup/download_emu_source.log

After solving this problem, setup shell abruptly stops without any message. However we found in emu_dir/temp_setup/download_emu_source.log

$ cat ./emu_dir/temp_setup/download_emu_source.log
Cloning into 'ECCO-EIS'...
mv: cannot move 'ECCO-EIS/emu' to './emu': Directory not empty

In the previous attempt we already cloned the code, therefore we are going to make sure, that we remove the folder before the clone is made, again modifying scripts, this time emu_setup.sh:

$ cp emu_setup.sh emu_setup_orig.sh
$ vim emu_setup.sh 
$ diff emu_setup.sh emu_setup_orig.sh
539,541d538
<     if test -d emu; then
<       echo "  Code has already previously been cloned... removing the pre-existing one" > "$log_file" 
  2>> "$log_file"
<       rm -rf ${emu_dir}/WORKDIR
<     else
548d544
<     fi

Compilation failed. Looking to the emu_dir/temp_setup/emu_compile_mdl.log

Failed compilation: Compilation's environment

Compilation error says:

$ cat emu_dir/temp_setup/emu_compile_mdl.log 
(...)
profiles_init_ncfile.f(406): remark #15009: profiles_init_ncfile_ has been 
  targeted for automatic cpu dispatch
tracers_correction_step.f(385): remark #15009: tracers_correction_step_ has 
  been targeted for automatic cpu dispatch
ptracers_reset.f(483): remark #15009: ptracers_reset_ has been targeted for 
  automatic cpu dispatch
ld: /tmp/ipo_ifortCaY5f31.o: in function `diagnostics_calc_phivel_.V':
ipo_out1.f:(.text+0x3011c): undefined reference to `mpi_allreduce_'
ld: ipo_out1.f:(.text+0x30336): undefined reference to `mpi_allreduce_'
ld: /tmp/ipo_ifortJ3swBq1.o: in function `diagnostics_calc_phivel_.A':

(...)

Compilation is performed via the emu_dir/emu/native/emu_compile_mdl.sh script, which includes the following compilation

(...)
echo " " 
echo "COMPILING offline tracer model ADJOINT -----------------------------------"
echo " "
if [ $new_compilation -eq 0 ]; then
    mkdir build_trc_ad
    cd build_trc_ad
#    ../../../tools/genmake2 -mods=../code_offline_ptracer -optfile=../../../tools/build_options/linux_amd64_ifort+mpi_ice_nas -mpi
    ../../../tools/genmake2 -mods=../code_offline_ptracer -optfile=../code/linux_amd64_ifort+mpi_ice_nas -mpi
    make depend

File code/linux_amd64_ifort+mpi_ice_nas contains the compilation environment with the definition of multiple environment variables:

#! /usr/bin/env bash

(...)
FC=ifort
CC=icc

CPP='/lib/cpp -traditional -P'
DEFINES='-DWORDLENGTH=4 -DINTEL_COMMITQQ'
F90FIXEDFORMAT='-fixed -Tf'
EXTENDED_SRC_FLAG='-132'
(...)

F90FLAGS=$FFLAGS
F90OPTIM=$FOPTIM

INCLUDEDIRS=''
INCLUDES=''
LIBS=''

if [ -n "$MPI_ROOT" -a "x$MPI" = xtrue ] ; then
    if [ -z "$MPI_INC_DIR" ]; then
                           MPI_INC_DIR="${MPI_ROOT}/include"
    fi
    LIBS="$LIBS -L${MPI_ROOT}/lib -lmpi"
fi

if [ -n "$MPI_INC_DIR" -a "x$MPI" = xtrue ] ; then
    INCLUDES="$INCLUDES -I${MPI_INC_DIR}"
    #INCLUDEDIRS="$INCLUDEDIRS $MPI_INC_DIR"
    #- used for parallel (MPI) DIVA
    MPIINCLUDEDIR="$MPI_INC_DIR"
    #MPI_HEADER_FILES='mpif.h mpiof.h'
fi

if [ "x$NETCDF" != x ] ; then
    INCLUDES="$INCLUDES -I${NETCDF}/include"
    #INCLUDEDIRS="$INCLUDEDIRS ${NETCDF}/include"
    LIBS="$LIBS -L${NETCDF}/lib"
fi

CIMA's hydra does not have defined environment variables like $MPI_ROOT, $MPI, ... will be necessarily be defined on purpose after uploading the compilation environment.

Use