Slurm Burst Buffer Guide
- Overview
- Configuration (for system administrators)
- Job Submission Commands
- Persistent Burst Buffer Creation and Deletion Directives
- Interactive Job Options
- Status Commands
- Advanced Reservations
- Job Dependencies
Slurm includes support for
burst buffers,
a shared high-speed storage resource.
Slurm provides support for allocating these resources, staging files in,
scheduling compute nodes for jobs using these resources, then staging files out.
Burst buffers can also be used as temporary storage during a job's lifetime,
without file staging.
Another typical use case is for persistent storage, not associated with any
specific job.
This support is provided using a plugin mechanism so that a various burst
buffer infrastructures may be easily configured.
One plugin is provided currently:
- datawarp - Uses Cray APIs to perform underlying management functions
Additional plugins may be provided in future releases of Slurm.
Slurm's mode of operation follows this general pattern:
- Read configuration information and initial state information
構成情報と初期状態情報を読む - After expected start times for pending jobs are established, allocate
burst buffers to those jobs expected to start earliest and start stage-in
of required files
保留中のジョブの予想開始時刻が確立されたら、バーストバッファーを、最も早く開始し、必要なファイルのステージインを開始すると予想されるジョブに割り当てます。 - After stage-in has completed, jobs can be allocated compute nodes and begin
段階的導入が完了した後、ジョブに計算ノードを割り当てて実行を開始できます - After job has completed execution, begin file stage-out from burst buffer
ジョブの実行が完了したら、バーストバッファからファイルのステージアウトを開始します - After file stage-out has completed, burst buffer can be released and the
job record purged
Configuration (for system administrators)
Burst buffer support in Slurm is enabled by specifying the plugin(s) to
load for managing these resources using the BurstBufferType
configuration parameter in the slurm.conf file.
Multiple plugin names may be specified in a comma separated list.
Detailed logging of burst buffer specific actions may be generated for debugging
purposes by using the DebugFlags=BurstBuffer configuration parameter.
DebugFlags = BurstBuffer構成パラメーターを使用して、デバッグ目的でバーストバッファー固有のアクションの詳細なログを生成できます。
The BurstBuffer DebugFlags (like many other DebugFlags) can result in very
verbose logging and is only intended for diagnostic purposes rather than for
use in a production system.
BurstBuffer DebugFlags(他の多くのDebugFlagsと同様)は、非常に詳細なログを生成する可能性があり、本番システムでの使用ではなく、診断目的でのみ使用されます。
# Excerpt of example slurm.conf file BurstBufferType=burst_buffer/datawarp # DebugFlags=BurstBuffer # Commented out
Burst buffer specific options should be defined in a burst_buffer.conf
This file can contain information about who can or can not use burst buffers,
timeouts, and paths of scripts used to perform various functions, etc.
TRES limits can be configured to establish
limits by association, QOS, etc.
The size of a job's burst buffer requirements can be used as a factor in
setting the job priority as described in the
multifactor priority document.
Note for Cray systems: The JSON-C library must be installed in order
to build Slurm's burst_buffer/datawarp plugin, which must parse JSON format data.
Crayシステムに関する注意:JSON形式のデータを解析する必要があるSlurmのburst_buffer / datawarpプラグインをビルドするには、JSON-Cライブラリをインストールする必要があります。
See Slurm's JSON installation information
for details.
Job Submission Commands
The normal mode of operation is for batch jobs to specify burst buffer
requirements within the batch script.
Batch script lines containing a prefix of "#BB" identify the job's burst buffer
space requirements, files to be staged in, files to be staged out, etc.
when using the burst_buffer/generic plugin.
burst_buffer / genericプラグインを使用する場合。
The prefix of "#DW" (for "DataWarp") is used for burst buffer directives when
using the burst_buffer/datawarp plugin.
「#DW」(「DataWarp」の略)のプレフィックスは、burst_buffer / datawarpプラグインを使用するときのバーストバッファーディレクティブに使用されます。
Please reference Cray documentation for details about the DataWarp options.
For DataWarp systems, the prefix of "#BB" can be used to create or delete
persistent burst buffer storage (NOTE: The "#BB" prefix is used since the
command is interpreted by Slurm and not by the Cray Datawarp software).
DataWarpシステムの場合、「#BB」のプレフィックスを使用して、永続バーストバッファストレージを作成または削除できます(注:コマンドはCray DatawarpソフトウェアではなくSlurmによって解釈されるため、「#BB」プレフィックスが使用されます)。
Interactive jobs (those submitted using the salloc and srun
commands) can specify their burst buffer space requirements using the "--bb"
or "--bbf" command line options, as described later in this document.
All of the "#SBATCH", "#DW" and "#BB" directives should be placed at the top
of the script (before any non-comment lines).
All of the persistent burst buffer creations and deletions happen before the
job's compute portion happens.
In a similar fashion, you can't stage files in/out at various points in the
script execution.
All file stage-in happens prior to the job's compute portion
and all file stage-out happens after computation.
The salloc and srun commands can create and use job-specific burst buffers.
For both commands, the "--bb" or "--bbf" option is used to specify the job's
burst buffer requirements.
Note the burst buffer may not be accessible from a login node, but require
that salloc spawn a shell on one of its allocated compute nodes.
See the
description of SallocDefaultCommand in the slurm.conf man page for more
information about how to spawn a remote shell.
A basic validation is performed on the job's burst buffer options at job
submit time.
If the options are invalid, the job will be rejected and an error message will
be returned directly to the user.
Note that unrecognized options may be ignored in order to support backward
compatibility (i.e. a job submission would not fail in the case of an option
being specified that is recognized by some versions of Slurm, but not recognized
by other versions).
If the job is accepted, but later fails (e.g. some problem staging files), the
job will be held and its "Reason" field will be set to error message provided
by the underlying infrastructure.
Users may also request to be notified by email upon completion of burst
buffer stage out using the "--mail-type=stage_out" or "--mail-type=all" option.
ユーザーは、「-mail-type = stage_out」または「--mail-type = all」オプションを使用して、バーストバッファーのステージアウトの完了時に電子メールで通知を要求することもできます。
The subject line of the email will be of this form:
SLURM Job_id=12 Name=my_app Staged Out, StageOut time 00:05:07
Persistent Burst Buffer Creation and Deletion Directives
These options are used for both the burst_buffer/datawarp and
burst_buffer/generic plugins to create and delete persistent burst
buffers, which have a lifetime independent of the job.
これらのオプションは、burst_buffer / datawarpプラグインとburst_buffer / genericプラグインの両方で使用され、ジョブに依存しない有効期間を持つ永続的なバーストバッファーを作成および削除します。
- #BB create_persistent name=<name> capacity=<number> [access=<access>] [pool=<pool> [type=<type>]
- #BB destroy_persistent name=<name> [hurry]
The persistent burst buffer name may not start with a numeric value (numeric
names are reserved for job-specific burst buffers).
The capacity (size) specification can include a suffix of "N" (nodes),
"K|KiB", "M|MiB", "G|GiB", "T|TiB", "P|PiB" (for powers of 1024) and "KB",
"MB", "GB", "TB", "PB" (for powers of 1000).
容量(サイズ)の仕様には、「N」(ノード)、「K | KiB」、「M | MiB」、「G | GiB」、「T | TiB」、「P | PiB」(電力の場合)のサフィックスを含めることができます。 1024の)および「KB」、「MB」、「GB」、「TB」、「PB」(1000の累乗の場合)。
NOTE: Usually Slurm interprets
KB, MB, GB, TB, PB, TB units as powers of 1024, but for Burst Buffers size
specifications Slurm supports both IEC/SI formats.
注:通常、SlurmはKB、MB、GB、TB、PB、TBの単位を1024の累乗として解釈しますが、バーストバッファーのサイズ仕様の場合、Slurmは両方のIEC / SI形式をサポートします。
This is because the CRAY API
supports both formats.
The access parameter identifies the buffer access mode.
Supported access
modes for the burst_buffer/datawarp plugin include: striped, private, and ldbalance.
burst_buffer / datawarpプラグインでサポートされているアクセスモードには、striped、private、およびldbalanceがあります。
The pool parameter identifies the resource pool from which the burst buffer
should be created.
The default and available pools are configuration dependent.
The type parameter identifies the buffer type.
Supported type
modes for the burst_buffer/datawarp plugin include: cache and scratch.
burst_buffer / datawarpプラグインでサポートされているタイプモードには、キャッシュとスクラッチが含まれます。
Multiple persistent burst buffers may be created or deleted within a single
A sample batch script follows:
#!/bin/bash #BB create_persistent name=alpha capacity=32GB access=striped type=scratch #DW jobdw type=scratch capacity=1GB access_mode=striped #DW stage_in type=file source=/home/alan/ destination=$DW_JOB_STRIPED/data #DW stage_out type=file destination=/home/alan/data.out source=$DW_JOB_STRIPED/data /home/alan/a.out
Persistent burst buffers can be created and deleted by a job requiring no
compute resources.
Submit a job with the desired burst buffer directives and
specify a node count of zero (e.g. "sbatch -N0 setup_buffers.bash").
Attempts to submit a zero size job without burst buffer directives or with
job-specific burst buffer directives will generate an error.
Note that zero size jobs are not supported for job arrays or heterogeneous
job allocations.
NOTE: The ability to create and destroy persistent burst buffers may be
limited by the "Flags" option in the burst_buffer.conf file.
See the burst_buffer.conf man page for more information.
By default only privileged users
(i.e. Slurm operators and administrators)
can create or destroy persistent burst buffers.
Interactive Job Options
Interactive jobs may include directives for creating job-specific burst
buffers as well as file staging.
These options may be specified using either the "--bb" or "--bbf" option of
the salloc or srun command.
Note that support for creation and destruction of persistent burst buffers using
the "--bb" option is not provided.
The "--bbf" option take as an argument a filename and that file should contain
a collection of burst buffer operations identical to that used for batch jobs.
This file may contain file staging directives.
Alternately the "--bb" option may be used to specify burst buffer directives
as the option argument.
The format of those directives can either be identical
to those used in a batch script OR a very limited set of directives can be used,
which are translated to the equivalent script for later processing.
Multiple directives should be space separated.
- access=<access>
- capacity=<number>
- swap=<number>
- type=<type>
- pool=<name>
If a swap option is specified, the job must also specify the required
node count.
The capacity (size) specification can include a suffix of "N" (nodes),
"K|KiB", "M|MiB", "G|GiB", "T|TiB", "P|PiB" (for powers of 1024) and "KB",
"MB", "GB", "TB", "PB" (for powers of 1000).
容量(サイズ)の仕様には、「N」(ノード)、「K | KiB」、「M | MiB」、「G | GiB」、「T | TiB」、「P | PiB」(電力の場合)のサフィックスを含めることができます。 1024の)および「KB」、「MB」、「GB」、「TB」、「PB」(1000の累乗の場合)。
NOTE: Usually Slurm interprets
KB, MB, GB, TB, PB, TB units as powers of 1024, but for Burst Buffers size
specifications Slurm supports both IEC/SI formats.
注:通常、SlurmはKB、MB、GB、TB、PB、TBの単位を1024の累乗として解釈しますが、バーストバッファーのサイズ仕様の場合、Slurmは両方のIEC / SI形式をサポートします。
This is because the CRAY API
supports both formats.
A sample command line follows and we also show the equivalent burst buffer
script generated by the options:
# Sample execute line: srun --bb="capacity=1G access=striped type=scratch" a.out # Equivalent script as generated by Slurm's burst_buffer/datawarp plugin #DW jobdw capacity=1GiB access_mode=striped type=scratch
Symbol Replacement
Slurm supports a number of symbols that can be used to automatically
fill in certain job details, e.g., to make stage_in or stage_out directory
paths vary with each job submission.
Supported symbols include:
%% | % |
%A | Array Master Job Id |
%a | Array Task Id |
%d | Workdir |
%j | Job Id |
%u | User Name |
%x | Job Name |
\\ | Stop further processing of the line |
Status Commands
Slurm's current burst buffer state information is available using the
scontrol show burst command or by using the sview command's
Burst Buffer tab.
Slurmの現在のバーストバッファー状態情報は、scontrol showburstコマンドまたはsviewコマンドの[BurstBuffer]タブを使用して入手できます。
A sample scontrol output is shown below.
The scontrol
"-v" option may be used for a more verbose output format.
$ scontrol show burst Name=generic DefaultPool=ssd Granularity=100G TotalSpace=50T UsedSpace=42T StageInTimeout=30 StageOutTimeout=30 Flags=EnablePersistent,PrivateData AllowUsers=alan,brenda CreateBuffer=/usr/local/slurm/17.11/sbin/CB DestroyBuffer=/usr/local/slurm/17.11/sbin/DB GetSysState=/usr/local/slurm/17.11/sbin/GSS StartStageIn=/usr/local/slurm/17.11/sbin/SSI StartStageIn=/usr/local/slurm/17.11/sbin/SSO StopStageIn=/usr/local/slurm/17.11/sbin/PSI StopStageIn=/usr/local/slurm/17.11/sbin/PSO Allocated Buffers: JobID=18 CreateTime=2017-08-19T16:46:05 Pool=dwcache Size=10T State=allocated UserID=alan(1000) JobID=20 CreateTime=2017-08-19T16:46:45 Pool=dwcache Size=10T State=allocated UserID=alan(1000) Name=DB1 CreateTime=2017-08-19T16:46:45 Pool=dwcache Size=22T State=allocated UserID=brenda(1001) Per User Buffer Use: UserID=alan(1000) Used=20T UserID=brenda(1001) Used=22T
Access to the Cray burst buffer status tool, dwstat, is available from
the scontrol command using the "scontrol show bbstat ..." or
"scontrol show dwstat ..." command.
Crayバーストバッファステータスツールdwstatにアクセスするには、scontrolコマンドから「scontrolshow bbstat ...」または「scontrolshowdwstat ...」コマンドを使用します。
Options following "bbstat" or "dwstat" on
the scontrol execute line are passed directly to the dwstat command as shown
Cray DataWarp documentation
for details about dwstat options and output.
/opt/cray/dws/default/bin/dwstat $ scontrol show dwstat pool units quantity free gran' wlm_pool bytes 7.28TiB 7.28TiB 1GiB' $ scontrol show dwstat sessions sess state token creator owner created expiration nodes 832 CA--- 783000000 tester 12345 2015-09-08T16:20:36 never 20 833 CA--- 784100000 tester 12345 2015-09-08T16:21:36 never 1 903 D---- 1875700000 tester 12345 2015-09-08T17:26:05 never 0 $ scontrol show dwstat configurations conf state inst type access_type activs 715 CA--- 753 scratch stripe 1 716 CA--- 754 scratch stripe 1 759 D--T- 807 scratch stripe 0 760 CA--- 808 scratch stripe 1
Advanced Reservations
Burst buffer resources can be placed in an advanced reservation using the
BurstBuffer option.
The argument consists of four elements:
plugin is the burst buffer plugin name, currently either "datawarp" or
If no plugin is specified, the reservation applies to all configured burst
buffer plugins.
pool specifies a Cray generic burst buffer resource pool.
If "type" is not specified, the number is a measure of storage space.
units may be "N" (nodes), "K|KiB", "M|MiB", "G|GiB", "T|TiB", "P|PiB"
(for powers of 1024) and "KB", "MB", "GB", "TB", "PB" (for powers of 1000).
単位は、「N」(ノード)、「K | KiB」、「M | MiB」、「G | GiB」、「T | TiB」、「P | PiB」(1024の累乗)、「KB」です。 「MB」、「GB」、「TB」、「PB」(1000の累乗の場合)。
default units are bytes for reservations of storage space.
NOTE: Usually Slurm interprets KB, MB, GB, TB, PB, TB units as powers of 1024,
but for Burst Buffers size specifications Slurm supports both IEC/SI formats.
注:通常、SlurmはKB、MB、GB、TB、PB、TBの単位を1024の累乗として解釈しますが、バーストバッファーのサイズ仕様の場合、Slurmは両方のIEC / SI形式をサポートします。
This is because the Cray DataWarp API supports both formats.
これは、Cray DataWarpAPIが両方の形式をサポートしているためです。
Jobs using this reservation are not restricted to these burst buffer resources,
but may use these reserved resources plus any which are generally available.
Some examples follow.
$ scontrol create reservation starttime=now duration=60 \ users=alan flags=any_nodes \ burstbuffer=datawarp:100G,generic:20G $ scontrol create reservation StartTime=noon duration=60 \ users=brenda NodeCnt=8 \ BurstBuffer=datawarp:20G $ scontrol create reservation StartTime=16:00 duration=60 \ users=joseph flags=any_nodes \ BurstBuffer=datawarp:pool_test:4G
Job Dependencies
If two jobs use burst buffers and one is dependent on the other (e.g.
"sbatch --dependency=afterok:123 ...") then the second job will not begin until
the first job completes and its burst buffer stage out completes.
2つのジョブがバーストバッファを使用し、一方が他方に依存している場合(たとえば、「sbatch --dependency = afterok:123 ...」)、最初のジョブが完了してバーストバッファのステージアウトが完了するまで、2番目のジョブは開始されません。
If the second job does not use a burst buffer, but is dependent upon the first
job's completion, then it will not wait for the stage out operation of the first
job to complete.
The second job can be made to wait for the first job's stage out operation to
complete using the "afterburstbuffer" dependency option (e.g.
"sbatch --dependency=afterburstbuffer:123 ...").
2番目のジョブは、「afterburstbuffer」依存関係オプション(「sbatch --dependency = afterburstbuffer:123 ...」など)を使用して、最初のジョブのステージアウト操作が完了するのを待機させることができます。
Last modified 31 March 2020