ORCHIDEE/OR4L OR4L
(Añadiendo qué buscar cuando hay errores II) |
(Retoques) |
||
(No se muestran 3 ediciones intermedias realizadas por un usuario) | |||
Línea 68: | Línea 68: | ||
$ ls -lrS out_orchidee_* |
$ ls -lrS out_orchidee_* |
||
</pre> |
</pre> |
||
− | * <code>run_or.log</code>: These are the files which content the standard output of the model. Search for `segmentation faults', etc... |
+ | * <code>run_or.log</code>: These are the files which content the standard output of the model. Search for `segmentation faults' in form of (it might differ): |
+ | <pre>forrtl: error (63): output conversion error, unit -5, file Internal Formatted Write |
||
+ | Image PC Routine Line Source |
||
+ | orchidee_ol 00000000032B736A Unknown Unknown Unknown |
||
+ | orchidee_ol 00000000032B5EE5 Unknown Unknown Unknown |
||
+ | orchidee_ol 0000000003265966 Unknown Unknown Unknown |
||
+ | orchidee_ol 0000000003226EB5 Unknown Unknown Unknown |
||
+ | orchidee_ol 0000000003226671 Unknown Unknown Unknown |
||
+ | orchidee_ol 000000000324BC3C Unknown Unknown Unknown |
||
+ | orchidee_ol 0000000003249C94 Unknown Unknown Unknown |
||
+ | orchidee_ol 00000000021B75FB routing_mp_routin 3212 routing.f90 |
||
+ | orchidee_ol 000000000217970D routing_mp_routin 2690 routing.f90 |
||
+ | orchidee_ol 00000000020915AE routing_mp_routin 475 routing.f90 |
||
+ | orchidee_ol 0000000000988860 sechiba_mp_sechib 491 sechiba.f90 |
||
+ | orchidee_ol 00000000005E5A45 intersurf_mp_inte 374 intersurf.f90 |
||
+ | orchidee_ol 00000000004DFB82 MAIN__ 1250 dim2_driver.f90 |
||
+ | orchidee_ol 00000000004184DC Unknown Unknown Unknown |
||
+ | libc.so.6 000000319021ECDD Unknown Unknown Unknown |
||
+ | orchidee_ol 00000000004183D9 Unknown Unknown Unknown |
||
+ | </pre> |
||
* on <code>[runHOME]/[ExpName]/[SimName]</code>, check the output of the PBS jobs. Which are called: |
* on <code>[runHOME]/[ExpName]/[SimName]</code>, check the output of the PBS jobs. Which are called: |
||
− | * <code>exp_oF-[SimName].o[nnnn]</code>: output of the <code>run_experiment.pbs</code> |
+ | ** <code>exp_oF-[SimName].o[nnnn]</code>: output of the <code>run_experiment.pbs</code> |
− | * <code>or_oF-[SimName].o[nnnn]</code>: output of the <code>run_OR.pbs</code> |
+ | ** <code>or_oF-[SimName].o[nnnn]</code>: output of the <code>run_OR.pbs</code> |
+ | * Check <code>[runHOME]/[ExpName]/[SimName]/run/used_run.def</code> which holds all the parameters (even the default ones) used in the simulation |
||
== EXPERIMENTSparameters.txt == |
== EXPERIMENTSparameters.txt == |
||
Línea 120: | Línea 120: | ||
* '''NOTE:''' this will only work if one set-up the <code>.ssh</code> public/private keys in each involved USER/HOST. |
* '''NOTE:''' this will only work if one set-up the <code>.ssh</code> public/private keys in each involved USER/HOST. |
||
* '''NOTE 2:''' All the forcings, compiled code, ... are already at <code>hydra</code> at the common space called <code>share</code> |
* '''NOTE 2:''' All the forcings, compiled code, ... are already at <code>hydra</code> at the common space called <code>share</code> |
||
− | * '''NOTE 3:''' From the computing nodes, one can not access to the <code>/share</code> folder and to any of the CIMA's storage machines: skogul, freyja, ... For that reason, one need to use these system of <code>[USER]@[HOST]</code> accounts. <code>*.pbs</code> scripts uses a series of wrappers of the standard functions: <code>cp, ln, ls, mv, ....</code> which manage them `from' and `to' different pairs of <code>[USER]@[HOST]</code> |
+ | * '''NOTE 3:''' From the computing nodes, one can not access to the <code>/share</code> folder and to any of the CIMA's storage machines: skogul, freyja, ... For that reason, one need to use these system of <code>[USER]@[HOST]</code> accounts. <code>*.pbs</code> scripts uses a series of wrappers of the standard functions: <code>cp, ln, ls, mv, ....</code> which manage them `from' and `to' different pairs of <code>[USER]@[HOST]</code>. '''NOTE:''' this will only work if public keys have been set-up (see more details in [[llaves_ssh]]) |
<pre> |
<pre> |
||
# Hosts |
# Hosts |
||
Línea 265: | Línea 265: | ||
</pre> |
</pre> |
||
− | ':' separated list of headers of netCDF file names from ORCHIDEE's output which need to be kept |
+ | ':' separated list of headers of netCDF file names from ORCHIDEE's output which need to be kept ('''NOTE:''' if one of these files does not exist, system will stop!) |
<pre> |
<pre> |
||
# Headers of netCDF files need to be kept |
# Headers of netCDF files need to be kept |
Última revisión de 15:37 7 nov 2017
Contenido |
[editar] OR4L
L. Fita's work-flow management for ORCHIDEE model
ORCHIDEE work-flow management is done via 3 scripts (these are the specifics for hydra [CIMA cluster]):
-
EXPERIMENTparameters.txt
: General ASCII file which configures the experiment and chain of simulations (chunks). This is the unique file to modify -
run_experiments.pbs
: PBS-queue job which prepares the experiment of the environment -
run_OR.pbs
: PBS-queue job which launch the ORCHIDEE model and a range the output - There is a folder called
components
with shell and python scripts necessary for the work-flow management
An experiment which contains a period of simulation is divided by chunks small pieces of times which are manageable by the model. The work-flow follows these steps using run_experiments.pbs
:
- Copy and link all the required files for a given chunk of the whole period of simulation following the content of
EXPERIMENTparameters.txt
- Launches
run_OR.pbs
which will simulated the period of the given chunk - Launches the next
run_experiments.pbs
(which waits until the end ofrun_OR.pbs
)
All the scripts are located in hydra
at:
/share/tools/work-flows/OR4L/hydra
[editar] How to simulate
- Creation of a new folder from where launch the experiment [ExperimentName]
$ mkdir [ExperimentName] cd [ExperimentName]
- copy the OR4L files to this folder
$ cp /share/tools/work-flows/OR4L/hydra/EXPERIMENTparameters.txt ./ $ cp /share/tools/work-flows/OR4L/hydra/run_experiment.pbs ./ $ cp /share/tools/work-flows/OR4L/hydra/run_OR.pbs ./
- Edit the configuration/set-up of the simulation of the experiment
$ vim EXPERIMENTparameters.txt
- Launch the simulation of the experiment
$ qsub run_experiment.pbs
When it is running one would have (runnig ORCHIDEE job or_[SimName]
`R', and exp_[SimName]
in hold `H'):
$ qstat -u $USER hydra: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 397.hydra lluis.fi larga or_oF-OKstomate_ 27567 1 16 20gb 168:0 R -- 398.hydra lluis.fi larga exp_oF-OKstomate -- 1 1 2gb 168:0 H --
In case of crash of the simulation, after fixing the issue, go to [runHOME]/[ExpName]/[SimName]
and re-launch the experiment (after the first run the scratch
is switched automatically to `false')
$ qsub run_experiment.pbs
[editar] Checking the experiment
Once the experiment runs, one needs to look on (following name of the variables from EXPERIMENTparameters.txt
-
[runHOME]/[ExpName]/[SimName]
: Will content the copies of the templatesrun.def
,*.xml
and a filechunk_attemps.inf
which counts how many times a chunk has been attempted to be run (if it reached 4 times, theOPR4L
is stopped) -
[runHOME]/[ExpName]/[SimName]/run
: actual folder where the computing nodes run the model. In a folder calledorout
there is a folder for each chunk with the standard output of the model -
[runHOME]/[ExpName]/[SimName]/run/orout/[YYYYi][MMi][DDi][HHi][MIi][SSi]-[YYYYf][MMf][DDf][HHf][MIf][SSf]
: folder with the standard output and all the required files to run a given chunk. The content of all this folder is compressed and kept in[storageHOME]/[ExpName]/[SimName]/config_[YYYYi][MMi][DDi][HHi][MIi][SSi]-[YYYYf][MMf][DDf][HHf][MIf][SSf].tar.gz
-
[storageHOME]/[ExpName]/[SimName]
(in [storageHOST]): output of the already ran chunks as[YYYYi][MMi][DDi][HHi][MIi][SSi]-[YYYYf][MMf][DDf][HHf][MIf][SSf]
for a chunk from[YYYYi]/[MMi]/[DDi] [HHi]:[MIi]:[SSi]
to[YYYYf]/[MMf]/[DDf] [HHf]:[MIf]:[SSf]
[editar] When something went wrong
If there has been any problem check the last chunk (in orout/[PERIODchunk]) to try to understand what happens and where the problem comes from:
-
out_orchidee_[nnnn]
: These are the files which content the standard output while running the model. One file for each process. If the problem was something related to model execution and it has been prepared for the error, a correct message must appear. (look first for the largest files...
$ ls -lrS out_orchidee_*
-
run_or.log
: These are the files which content the standard output of the model. Search for `segmentation faults' in form of (it might differ):
forrtl: error (63): output conversion error, unit -5, file Internal Formatted Write Image PC Routine Line Source orchidee_ol 00000000032B736A Unknown Unknown Unknown orchidee_ol 00000000032B5EE5 Unknown Unknown Unknown orchidee_ol 0000000003265966 Unknown Unknown Unknown orchidee_ol 0000000003226EB5 Unknown Unknown Unknown orchidee_ol 0000000003226671 Unknown Unknown Unknown orchidee_ol 000000000324BC3C Unknown Unknown Unknown orchidee_ol 0000000003249C94 Unknown Unknown Unknown orchidee_ol 00000000021B75FB routing_mp_routin 3212 routing.f90 orchidee_ol 000000000217970D routing_mp_routin 2690 routing.f90 orchidee_ol 00000000020915AE routing_mp_routin 475 routing.f90 orchidee_ol 0000000000988860 sechiba_mp_sechib 491 sechiba.f90 orchidee_ol 00000000005E5A45 intersurf_mp_inte 374 intersurf.f90 orchidee_ol 00000000004DFB82 MAIN__ 1250 dim2_driver.f90 orchidee_ol 00000000004184DC Unknown Unknown Unknown libc.so.6 000000319021ECDD Unknown Unknown Unknown orchidee_ol 00000000004183D9 Unknown Unknown Unknown
- on
[runHOME]/[ExpName]/[SimName]
, check the output of the PBS jobs. Which are called:-
exp_oF-[SimName].o[nnnn]
: output of therun_experiment.pbs
-
or_oF-[SimName].o[nnnn]
: output of therun_OR.pbs
-
- Check
[runHOME]/[ExpName]/[SimName]/run/used_run.def
which holds all the parameters (even the default ones) used in the simulation
[editar] EXPERIMENTSparameters.txt
This ASCII file configures all the simulation. It assumes:
- Required files, forcings, storage, compiled version of the code might be at different machines.
- There is a folder with a given template version of the
run.def
which will be used and changed accordingly to the requirement of the experiments
Name of the experiment
# Experiment name ExpName = DiPolo
Name of the simulation. Here is understood that a given experiment could have the model configured with different set-ups (here identified with a different name of simulation)
# Simulation name SimName = OKstomate_CRUNCEP_spinup
Which binary of python
2.x to be used
# python binary pyBIN=/home/lluis.fita/bin/anaconda2/bin/python
Should this simulation be run from the beginning or not. If it is set to `true', it will remove all the pre-existing content of the folder [ExpName]/[SimName] in the running and in the storage spaces. Be careful. In case of `false' simulation will continue from the last successful ran chunk (checking the restart files).
# Start from the beginning (keeping folder structure) scratch = false
Period of the simulation of the simulation (In this example from 1958 Jan 1st to 2015 Dec 31)
# Experiment starting date exp_start_date = 19580101000000 # Experiment ending date exp_end_date = 20150101000000
Length of the chunks (here and in all ORCHIDEE runs maximum to 1-year!!)
# Chunk Length [N]@[unit] # [unit]=[year, month, week, day, hour, minute, second] chunk_length = 1@year
Selection of the machines and users to each machine where the different requirement files are located and the output should be placed.
- NOTE: this will only work if one set-up the
.ssh
public/private keys in each involved USER/HOST. - NOTE 2: All the forcings, compiled code, ... are already at
hydra
at the common space calledshare
- NOTE 3: From the computing nodes, one can not access to the
/share
folder and to any of the CIMA's storage machines: skogul, freyja, ... For that reason, one need to use these system of[USER]@[HOST]
accounts.*.pbs
scripts uses a series of wrappers of the standard functions:cp, ln, ls, mv, ....
which manage them `from' and `to' different pairs of[USER]@[HOST]
. NOTE: this will only work if public keys have been set-up (see more details in llaves_ssh)
# Hosts # list of different hosts and specific user # [USER]@[HOST] # NOTE: this will only work if public keys have been set-up ## # Host with compiled code, namelist templates codeHOST=lluis.fita@hydra # forcing Host with forcings (atmospherics and morphologicals) forcingHOST=lluis.fita@hydra # output Host with storage of output (including restarts) outHOST=lluis.fita@hydra
Templates of the configuration of ORCHIDEE: run.def
, *.xml
files. NOTE: only run.def
will be changed according to the content of EXPERIMENTparameters.txt
like period of the chunk, atmospheric forcing, differences of the set-up, ... (located in the [codeHOST]
# Folder with the `run.def' and `xml' of the experiment domainHOME = /home/lluis.fita/salidas/estudios/dominios/DiPolo/daily
Folder where the ORCHIDEE model will run in the computing nodes (on top of that there will be two more folders [ExpName]/[SimName]). ORCHIDEE will run at the folder [ExpName]/[SimName]/run
# Running folder runHOME = /home/lluis.fita/estudios/DiPolo/sims
Folder with the compiled version of the model (located at [codeHOST]
)
# Folder with the compiled source of ORCHIDEE orHOME = /share/modipsl/bin/
Folder to storage all the output of the model (history files, restarts and compressed file with content of the configuration and the standard output of the given run). The content of the folder will be organized by chunks (located at [storageHOST]
)
# Storage folder of the output storageHOME = /home/lluis.fita/salidas/estudios/DiPolo/sims/output
Wether modules should be load (not used for hydra
)
# Modules to load ('None' for any) modulesLOAD = None
Which kind of simulation will be run (at this time only prepared for 'offline')
# Simulation kind # 'offline': Realistic off-line run, with initial conditions at each change of year # 'periodic': Realistic off-line run, with the same initial conditions for each year kindSIM = offline
Names of the files used to check that the chunk has properly ran
# Model reference output names (to be used as checking file names) nameLISTfile = run.def # namelist nameRSTfile = sechiba_rest_out.nc # restart file nameOUTfile = sechiba_history.nc # output file
Extensions of the files which content the configuration of the model
# Extensions of the files with the configuration of the model configEXTS = def:xml
To continue from a previous chunk one needs to use the `restart' files. But they need to be renamed, because otherwise they will be re-written. Here one specifies the original name of the file [origFile]
and the name to be used to avoid the re-writting [destFile]
. It uses a complex bash script which even can deal with the change of dates according to the period of the chunk (':' list of [origFile]@[destFile]
). They will located at the [storageHOST]
# restart file names # ':' list of [tmplrstfilen|[NNNNN1]?[val1]#[...[NNNNNn]?[valn]]@[tmpllinkname]|[NNNNN1]?[val1]#[...[NNNNNn]?[valn]] # [tmplrstfilen]: template name of the restart file (if necessary with [NNNNN] variables to be substituted) # [NNNNN]: section of the file name to be automatically substituted # `[YYYY]': year in 4 digits # `[YY]': year in 2 digits # `[MM]': month in 2 digits # `[DD]': day in 2 digits # `[HH]': hour in 2 digits # `[SS]': second in 2 digits # `[JJJ]': julian day in 3 digits # [val]: value to use (which is systematically defined in `run_OR.pbs') # `%Y%': year in 4 digits # `%y%': year in 2 digits # `%m%': month in 2 digits # `%d%': day in 2 digits # `%h%': hour in 2 digits # `%s%': second in 2 digits # `%j%': julian day in 3 digits # [tmpllinkname]: template name of the link of the restart file (if necessary with [NNNNN] variables to be substituted) rstFILES=sechiba_rest_out.nc@sechiba_rst.nc:stomate_rest_out.nc@stomate_rst.nc
Folder with the forcing data (located at [forcingHOST]
).
# Folder with the input morphological forcing data indataHOME = /share/ORCHIDEE/data/IGCM/SRF
Files to be used as morphological forcings (It uses the same complex bash script as in the restarts)
# ':' separated list of [morphfilen]|[NNNNN1]?[val1]#[...[NNNNNn]?[valn]]@[tpmllinkname]|[NNNNN1]?[val1]#[...[NNNNNn]?[valn]] # [morphfilen]: morphological forcing file (relative to ${indataHOME}) (if necessary with [NNNNN] variables to be substituted) # [tmpllinkname]: template name of the link of the restart file (if necessary with [NNNNN] variables to be substituted) indataFILES = albedo/alb_bg_modisopt_2D_ESA_v2.nc@alb_bg_modisopt_2D_ESA.nc:cartepente2d_15min.nc@cartepente2d_15min.nc:carteveg5km.nc@carteveg5km.nc:floodplains.nc@floodplains.nc:lai2D.nc@lai2D.nc:PFTMAPS/CMIP6/ESA-LUH2/historical/v1.2/withNoBio/13PFTmap_[YYYY]_ESA_LUH2v2h_withNoBio_v1.2.nc|YYYY?%Y%@PFTmap_025.nc:PFTmap_IPCC_1850.nc@PFTmap_IPCC.nc:reftemp.nc@reftemp.nc:soils_param.nc@soils_param.nc:soils_param_usda.nc@soils_param_usda.nc:soils_param_usdatop.nc@soils_param_usdatop.nc:routing.nc@routing.nc
Folder with the atmospheric forcing data (located at [forcingHOST]
).
# Folder which contents the atmospheric data to force the model (here an example for CRU-NCEP v5.4 half degree at <code>hydra</code>) iniatmosHOME = /share/ORCHIDEE/data/IGCM/SRF/METEO/CRU-NCEP/v5.4/halfdeg
Files to be used as atmospheric forcings (It uses the same complex bash script as in the restarts). Files must be located at [forcingHOST]
. In this example a CRU-NCEP file which is called cruncep_halfdeg_[YYYY].nc
(where [YYYY]
is for a year in four digits). It is said to change the [YYYY]
by %Y%
which will be the year of the chunk with four digits (C-like)
# ':' list of [atmosfilen]|[NNNNN1]?[val1]:[...[NNNNNn]?[valn]]@[tpmllinkname]|[NNNNN1]?[val1]#[...[NNNNNn]?[valn]] # [filenTMPL]: template of the atmospheric data file name with [NNNN] variables to be substitued # [tmpllinkname]: template name of the link of the restart file (if necessary with [NNNNN] variables to be substituted) filenameTMPL = cruncep_halfdeg_[YYYY].nc|YYYY?%Y%@atmos_forcing.nc
Name of the files with the set-up of the model
## configuration files (':' separated list) ORdef = run.def ORxml = context_orchidee.xml:field_def_orchidee.xml:file_def_orchidee.xml:iodef.xml
Here on can change values on the template run.def
. It will change the values of the provided parameters with a new value. If the given parameter is not in the template of the run.def
it will be automatically added.
## def,xml changes ([fileA]@[parm1]:[val1];...;[parmN]:[valN]|...|[fileZ]@....) nlparametres = run.def@STOMATE_OK_STOMATE:y;STOMATE_OK_CO2:y
Name of ORCHIDEE's executable (to be localized at [orHOME]
folder from [codeHOST]
)
# Name of the exectuable nameEXEC=orchidee_ol
':' separated list of netCDF file names from ORCHIDEE's output which do not need to be kept
# netCDF Files which will not be kept anywhere NokeptfileNAMES=''
':' separated list of headers of netCDF file names from ORCHIDEE's output which need to be kept (NOTE: if one of these files does not exist, system will stop!)
# Headers of netCDF files need to be kept HkeptfileNAMES=sechiba_history:stomate_history:sechiba_history_4dim:sechiba_history_alma
':' separated list of headers of restarts netCDF file names from ORCHIDEE's output which need to be kept
# Headers of netCDF restart files need to be kept HrstfileNAMES=sechiba_rest_out:stomate_rest_out
ORCHIDEE off-line can not run with the parallel-netCDF. For that reason, output files are written for each computing node. At the end of the simulation they need to be concatenated with the tool flio_rbld
(Already compiled in hydra
). This is done automatically at the end of the simulation. (to be found at [codeHOST]
)
# Extras. rebuild program folder binREBUILD = /share/modipsl_IOIPSLtools/bin
Parallel configuration of the run. NOTE: ORCHIDEE off-line can not be run using sharing memory
# ORCHIDEE parallel run configuration ## Number of nodes Nnodes = 1 ## Number of mpi procs Nmpiprocs = 16 ## Number of shared memory threads ('None' for no openMP threads) Nopenthreads = None ## Memory size of shared memory threads SIZEopenthreads = 200M
Generic definitions
## Generic errormsg=ERROR -- error -- ERROR -- error warnmsg=WARNING -- warning -- WARNING -- warning