How to Write a Model Definition File
A Model Definition file is written in YAML and defines everything that both DAFNI and other users need to know about a Model, e.g. the name of the Model or a description of what it is for. We'll cover some basics of this file format in the following examples, but there is plenty more to it that the formal reference covers in full.
If you've not used YAML before you might find it helpful to read through a Beginner's Guide to YAML. I'll link out to relevant sections of the YAML guide throughout this guide.
Document Root
First, we will define two top-level items in our definition file.
# example-model-definition.yml
kind: M
api_version: v1beta3
YAML Syntax
The syntax used for the
kind
andapi_version
fields defines a basic YAML mapping.You may find this guide useful in understanding the YAML syntax
Firstly we have set the value of kind
to M
. This lets DAFNI know that
this definition file defines a Model (there are definition files for other
assets too). Next we define api_version
which tells DAFNI which version of the
Model definition specification this definition conforms to. As DAFNI
continues to develop and add new functionality, the Model definition
specification will evolve and change. By specifying the version in the file, we can
ensure that we always know how a particular definition file should be read. See the
formal reference to see what versions
are currently available.
Metadata
Next we will add a metadata
section that allows you to define some important
user-facing fields. The display_name
and summary
are two crucial fields for people
discovering your Model. These are the values that you and other users will
see in the Model Catalogue when browsing the Models on the
platform. You should also add your contact details for the model into the relevant contact_point
fields (as show in the example below). The description
is an area that allows you to provide a
far richer description of your Model and will be displayed when someone clicks to view
the full entry for your Model in the Model Catalogue. The final
field we need is the type
field. This should be a one word description
of what type the Model will be, for instance it could be forecasting,
optimisation or testing; the following examples use model.
kind: M
api_version: v1beta3
metadata:
display_name: Example Model
name: example-model
summary: A brief, one to two line summary of the Model.
type: model
publisher: DAFNI Example
contact_point_name: DAFNI
contact_point_email: info@dafni.ac.uk
description: >
A longer description that explains the purpose of the Model, its intended
applications and other useful information such as assumptions that have been made
when creating the Model and any potential impacts of these.
The description can be written in paragraphs to provide clarity. Just leave a blank
line in the description to start a new paragraph.
YAML Syntax
You will notice that the new fields we have added under
metadata
are indented. Whitespace is important to the meaning of YAML.You might also notice that you don't need to wrap the values in quotations to make them strings. We have also used a
>
to define a multiline string.Further information on YAML's syntax can be found here.
Spec
The last major part to add to the definition file is the spec
part. This section of
the definition contains the information required by DAFNI to be able to run the
Model. It covers information such as what data the Model expects
as inputs and what results the Model produces. Not only does this information
allow DAFNI to run the Model, it also allows the Model to be
linked with other Models in Workflows.
For the sake of brevity, I won't keep repeating the rest of the definition file in the
following examples, instead it will be replaced with # rest of document #
. Just
remember that the rest of the information is required to form a valid Model
Definition.
Inputs
The inputs section allows you to define what inputs your Model expects in order to run. DAFNI supports a range of input options that allow data to be passed to the Model in different ways.
Parameters
The Model Definition file allows you to define input environment variables
using the parameters
field. Each of these definitions supports a range of additional
information such as the data type the value should be considered as among others.
# rest of document #
spec:
inputs:
parameters:
- name: START_YEAR
title: Start Year
description: The year at which the Model execution should start.
type: integer
default: 2015
min: 2010
max: 2020
required: true
- name: END_YEAR
title: End Year
description: The year at which the Model execution should stop.
type: integer
default: 2025
min: 2020
max: 2030
required: true
YAML Syntax
The above example uses YAML's syntax for defining a list of items as the value of
parameters
.Further information on YAML's syntax can be found here.
Because the parameters
field is a list, you can add multiple definitions of input environment
variables. There are other supported fields and types for defining input environment
variables so be sure to take a look at the
formal Model Definition reference
for more information.
Note: One thing to note in particular is that yaml expects boolean values to be set in lower case, that is - yaml expects bool values to be set as
true
orfalse
. This definition is described in section 10.2.1.2 of the YAML docs.
Datasets
Another input field that can be specified is a Dataslot, or a number of Dataslots,
that can be filled with a Dataset or multiple Datasets from the
National Infrastructure Database (NID). Dataslots are specified using the dataslots
field. Dataslots are filled with Datasets when the Model is run
in a Workflow. This enables users to update the data being inserted into the
Dataslot at run time. To help users of the Model choose the right kind of
Datasets to insert into a Dataslot, a name
and description
should be
provided for each of the slots. You must also provide the path
that the
Model expects the Datasets to be made available at. The
required
field dictates whether the Dataslot must be filled with a
Dataset or whether this slot can be left empty. Finally, the default
field is used to specify default Datasets to use in this slot. A default
must be specified if required
is true.
To add a default Dataset to a Dataslot, you need to know the unique
ID of the Dataset, and the version of that particular
Dataset you wish to use. The uid
and the versionId
of the
Dataset should be set to their respective unique IDs, these
identifiers take the form of "universally unique identifier" (UUID), for example
09f4e250-bfbf-4b2f-9aed-0f18444f605e
.You can find both
of these in the details page for any Dataset listed in the access panel
shown in the image below.
You can click the copy buttons next to the UUIDs to copy them individually or alternatively you can click the "Copy YAML for Model Definition" button to copy the full YAML needed to put in the datasets list:
- aaab2e9e-5f85-4401-8cbf-7f9eecec94e9
You would then need to replace the path
specific to where you would like the dataset
to be loaded into.
# rest of document #
spec:
inputs:
parameters:
# environment variables would be here #
dataslots:
- name: Geospatial Data
description: >
Description of what this Geospatial Data should contain.
default:
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
path: inputs/geospatial-data
required: true
n.b. The path the Datasets in a Dataslot are to be included at must always be a child directory of
inputs/
e.g.inputs/my-dataset-directory
.
As with parameters
, dataslots
takes a list as an argument so multiple Dataslots can be
specified for a Model and each of these slots can take multiple
Datasets in the default
field.
Complete Example
Putting the pieces from the examples together, we end up with a definition file looking like the following.
kind: M
api_version: v1beta3
metadata:
display_name: Example Model
name: example-model
publisher: DAFNI Example
type: model
summary: A brief, one to two line summary of the Model.
description: >
A longer description that explains the purpose of the Model, its intended
applications and other useful information such as assumptions that have been made
when creating the Model and any potential impacts of these.
The description can be written in paragraphs to provide clarity. Just leave a blank
line in the description to start a new paragraph.
spec:
inputs:
parameters:
- name: START_YEAR
title: Start Year
description: The year at which the Model execution should start.
type: integer
default: 2015
min: 2010
max: 2020
required: true
- name: END_YEAR
title: End Year
description: The year at which the Model execution should stop.
type: integer
default: 2025
min: 2020
max: 2030
required: true
dataslots:
- name: Geospatial Data
description: >
Description of what this Geospatial Data should contain.
default:
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
path: inputs/geospatial-data
required: true