How to Write a Model Definition File
A Model Definition file is written in YAML and defines everything that both DAFNI and other users need to know about a Model, e.g. the name of the Model or a description of what it is for. We'll cover some basics of this file format in the following examples, but there is plenty more to it that the formal reference covers in full.
If you've not used YAML before you might find it helpful to read through a Beginner's Guide to YAML. I'll link out to relevant sections of the YAML guide throughout this guide.
Document Root
First, we will define two top-level items in our definition file.
# example-model-definition.yml
kind: M
api_version: v1beta3
YAML Syntax
The syntax used for the
kind
andapi_version
fields defines a basic YAML mapping.You may find this guide useful in understanding the YAML syntax
Firstly we have set the value of kind
to M
. This lets DAFNI know that
this definition file defines a Model (there are definition files for other
assets too). Next we define api_version
which tells DAFNI which version of the
Model definition specification this definition conforms to. As DAFNI
continues to develop and add new functionality, the Model definition
specification will evolve and change. By specifying the version in the file, we can
ensure that we always know how a particular definition file should be read. See the
formal reference to see what versions
are currently available.
Metadata
Next we will add a metadata
section that allows you to define some important
user-facing fields. The display_name
and summary
are two crucial fields for people
discovering your Model. These are the values that you and other users will
see in the Model Catalogue when browsing the Models on the
platform. You should also add your contact details for the model into the relevant contact_point
fields (as show in the example below). The description
is an area that allows you to provide a
far richer description of your Model and will be displayed when someone clicks to view
the full entry for your Model in the Model Catalogue. The final
field we need is the type
field. This should be a one word description
of what type the Model will be, for instance it could be forecasting,
optimisation or testing; the following examples use model.
kind: M
api_version: v1beta3
metadata:
display_name: Example Model
name: example-model
summary: A brief, one to two line summary of the Model.
type: model
publisher: DAFNI Example
contact_point_name: DAFNI
contact_point_email: info@dafni.ac.uk
description: >
A longer description that explains the purpose of the Model, its intended
applications and other useful information such as assumptions that have been made
when creating the Model and any potential impacts of these.
The description can be written in paragraphs to provide clarity. Just leave a blank
line in the description to start a new paragraph.
YAML Syntax
You will notice that the new fields we have added under
metadata
are indented. Whitespace is important to the meaning of YAML.You might also notice that you don't need to wrap the values in quotations to make them strings. We have also used a
>
to define a multiline string.Further information on YAML's syntax can be found here.
Spec
The last major part to add to the definition file is the spec
part. This section of
the definition contains the information required by DAFNI to be able to run the
Model. It covers information such as what data the Model expects
as inputs and what results the Model produces. Not only does this information
allow DAFNI to run the Model, it also allows the Model to be
linked with other Models in Workflows.
For the sake of brevity, I won't keep repeating the rest of the definition file in the
following examples, instead it will be replaced with # rest of document #
. Just
remember that the rest of the information is required to form a valid Model
Definition.
Inputs
The inputs section allows you to define what inputs your Model expects in order to run. DAFNI supports a range of input options that allow data to be passed to the Model in different ways.
Parameters
The Model Definition file allows you to define input environment variables
using the parameters
field. Each of these definitions supports a range of additional
information such as the data type the value should be considered as among others.
# rest of document #
spec:
inputs:
parameters:
- name: START_YEAR
title: Start Year
description: The year at which the Model execution should start.
type: integer
default: 2015
min: 2010
max: 2020
required: true
- name: END_YEAR
title: End Year
description: The year at which the Model execution should stop.
type: integer
default: 2025
min: 2020
max: 2030
required: true
YAML Syntax
The above example uses YAML's syntax for defining a list of items as the value of
parameters
.Further information on YAML's syntax can be found here.
Because the parameters
field is a list, you can add multiple definitions of input environment
variables. There are other supported fields and types for defining input environment
variables so be sure to take a look at the
formal Model Definition reference
for more information.
Note: One thing to note in particular is that yaml expects boolean values to be set in lower case, that is - yaml expects bool values to be set as
true
orfalse
. This definition is described in section 10.2.1.2 of the YAML docs.
Datasets
Another input field that can be specified is a Dataslot, or a number of Dataslots,
that can be filled with a Dataset or multiple Datasets from the
National Infrastructure Database (NID). Dataslots are specified using the dataslots
field. Dataslots are filled with Datasets when the Model is run
in a Workflow. This enables users to update the data being inserted into the
Dataslot at run time. To help users of the Model choose the right kind of
Datasets to insert into a Dataslot, a name
and description
should be
provided for each of the slots. You must also provide the path
that the
Model expects the Datasets to be made available at. The
required
field dictates whether the Dataslot must be filled with a
Dataset or whether this slot can be left empty. Finally, the default
field is used to specify default Datasets to use in this slot. A default
must be specified if required
is true.
To add a default Dataset to a Dataslot, you need to know the unique
ID of the Dataset, and the version of that particular
Dataset you wish to use. The uid
and the versionId
of the
Dataset should be set to their respective unique IDs, these
identifiers take the form of "universally unique identifier" (UUID), for example
09f4e250-bfbf-4b2f-9aed-0f18444f605e
.You can find both
of these in the details page for any Dataset listed in the access panel
shown in the image below.
You can click the copy buttons next to the UUIDs to copy them individually or alternatively you can click the "Copy YAML for Model Definition" button to copy the full YAML needed to put in the datasets list:
- aaab2e9e-5f85-4401-8cbf-7f9eecec94e9
You would then need to replace the path
specific to where you would like the dataset
to be loaded into.
# rest of document #
spec:
inputs:
parameters:
# environment variables would be here #
dataslots:
- name: Geospatial Data
description: >
Description of what this Geospatial Data should contain.
default:
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
path: inputs/geospatial-data
required: true
n.b. The path the Datasets in a Dataslot are to be included at must always be a child directory of
inputs/
e.g.inputs/my-dataset-directory
.
As with parameters
, dataslots
takes a list as an argument so multiple Dataslots can be
specified for a Model and each of these slots can take multiple
Datasets in the default
field.
Complete Example
Putting the pieces from the examples together, we end up with a definition file looking like the following.
kind: M
api_version: v1beta3
metadata:
display_name: Example Model
name: example-model
publisher: DAFNI Example
contact_point_name: DAFNI
contact_point_email: info@dafni.ac.uk
type: model
summary: A brief, one to two line summary of the Model.
description: >
A longer description that explains the purpose of the Model, its intended
applications and other useful information such as assumptions that have been made
when creating the Model and any potential impacts of these.
The description can be written in paragraphs to provide clarity. Just leave a blank
line in the description to start a new paragraph.
spec:
inputs:
parameters:
- name: START_YEAR
title: Start Year
description: The year at which the Model execution should start.
type: integer
default: 2015
min: 2010
max: 2020
required: true
- name: END_YEAR
title: End Year
description: The year at which the Model execution should stop.
type: integer
default: 2025
min: 2020
max: 2030
required: true
dataslots:
- name: Geospatial Data
description: >
Description of what this Geospatial Data should contain.
default:
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
path: inputs/geospatial-data
required: true
Here is an example of a larger, more complex definition file with additional optional fields. You can find more information for these fields in the formal reference.
You can also take a look at other example models in our Example Models Repository.
kind: M
api_version: v1beta3
metadata:
display_name: Example Model
name: example-model
type: model
publisher: DAFNI Example
contact_point_name: DAFNI
contact_point_email: info@dafni.ac.uk
summary: A brief, one to two line summary of the model.
description: >
A longer description that explains the purpose of the Model, its intended
applications and other useful information such as assumptions that have been made
when creating the Model and any potential impacts of these.
The description can be written in paragraphs to provide clarity. Just leave a blank
line in the description to start a new paragraph.
source_code: https://github.com/example/source-code-repo
licence: https://creativecommons.org/licenses/by/4.0/
rights: open
subject: Farming
project_name: Example Project
project_url: https://www.example.com
funding: Funded by example project
embargo_end_date: '2025-01-25'
spec:
command: ["python", "/src/main.py"]
inputs:
parameters:
- name: START_YEAR
title: Start Year
description: The year at which the Model execution should start.
type: integer
default: 2015
min: 2010
max: 2020
required: true
- name: END_YEAR
title: End Year
description: The year at which the Model execution should stop.
type: integer
default: 2025
min: 2020
max: 2030
required: true
required: true
- name: START_TIME
title: Start Time of the sequence
type: string
default: None
description: Start of sequence
required: True
- name: USE_CONDITION
title: Use special condition
type: boolean
default: false
description: Boolean for using a special condition
required: True
- name: TYPE
title: Type
default: None
options:
- name: red
title: Red
- name: amber
title: Amber
- name: green
title: Green
description: Which type to use for the sequence
required: True
dataslots:
- name: Geospatial Data
description: >
Description of what this Geospatial Data should contain.
default:
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
path: inputs/geospatial-data
required: true
outputs:
datasets:
- name: output_1.json
type: json
description: A JSON file outputed from the Model.
- name: output_2.csv
type: json
description: A csv file outputed from the Model.
resources:
use_gpu: true
readiness_probe:
host: localhost
scheme: http
path: /
port: 8080
sidecars:
- name: example-sidecar
image: sidecar-image
command: ["python", "/src/main.py"]
Template
Below is a template for writing a Model definition file. This template provides a structured format to help you create a comprehensive definition file for your Model. Fill in the required fields and adjust the optional fields as necessary to suit your requirements. For detailed information on specific fields, refer to the Model Definition Reference.
kind: M # required
api_version: v1beta3 # required
metadata:
display_name: <model display name> # required
name: <model name> # required
publisher: <publisher name> # required
summary: <model summary> # required
description: > # required - multi-line string (use '>' for multi-line)
<model description>
source_code: <link to source code> # optional
contact_point_name: <contact point name> # required
contact_point_email: <contact point email> # required
licence: <url of applicable licence> # optional
rights: <details of usage rights> # optional
subject: <subject> # optional - options from same list used for workflows/datasets
project_name: <project name> # optional - project name and url both required if one is provided
project_url: <url of associated project> # optional - project name and url both required if one is provided
funding: <project funding details> # optional
embargo_end_date: <date embargo is lifted> # optional
spec:
command: [<command>] # optional
inputs: # optional
parameters: # optional
- name: <parameter name 1> # required
title: <parameter title 1> # required
description: <parameter description 1> # optional
type: <parameter type 1> # required
default: <parameter default 1> # optional - only needed if 'required: true'
required: <true or false> # required
min: <parameter min 1> # optional
max: <parameter max 1> # optional
#- ... more parameters as needed
- name: <parameter name 2> # required
title: <parameter title 2> # required
default: <parameter default 2> # optional - only needed if 'required: true'
options: # optional - for parameter with multiple "options" - only supports strings/ints/floats
- name: <name> # required - value of this parameter option
title: <title> # required - name displayed in drop-down box when selecting parameter value
#- ... add more options as needed
#-
description: <> # optional
required: <true or false> # required
dataslots: # optional
- name: <dataslot name 1> # required
description: <dataslot description 1> # optional
default:
- <default UID 1> # optional - only needed if 'required: true'
#- ... add more as needed
path: <data path 1> # required
required: <true or false> # required
#- ... add more data slots as needed
outputs: # optional
datasets:
- name: <output file name 1> # required
type: <csv or json> # required
description: <output description 1> # optional
#- ... add more data slots as needed
resources: # optional
use_gpu: <true or false> # optional
readiness_probe: # optional
host: <readiness host> # optional
scheme: <readiness scheme> # optional
path: <readiness path> # optional
port: <readiness port> # optional
sidecars: # optional
- name: <sidecar name>
image: <sidecar image>
command: [<sidecar command>]