Configuration¶

The default configuration language for NYMMS is written in YAML. For the most part it follows the YAML standard. It has one main addition, the !include macro.

!include can be used to include another file in a given file. This is useful when you have a main config file (say nodes.yaml) but want to allow external programs to provide more config (say in /etc/nymms/nodes/*.yaml).

In that specific example you’d put the following in the yaml file where you want the files included:

!include /etc/nymms/nodes/*.yaml

config.yaml¶

The config.yaml file is the main configuration for all of the daemons and scripts in NYMMS.

You can see an example by expanding the code block below.

Example config.yaml

monitor_timeout

This represents the default amount of time, in seconds, each monitor is given before it times out. Type: Integer. Default: 30

resources

This points to the filesystem location of the resources config (see resources.yaml). Type: String, file location. Default: /etc/nymms/resources.yaml

region

The AWS region used by the various daemons. Type: String, AWS Region. Default: us-east-1

state_domain

The SDB domain used for storing state. Type: String. Default: nymms_state

tasks_queue

The name of the SQS queue used for distributing tasks. Type: String. Default: nymms_tasks

results_topic

The name of the SNS topic where results are sent. Type: String. Default: nymms_results

private_context_file

The location of the private context file (see private.yaml). Type: String, file location. Default: /etc/nymms/private.yaml

task_expiration

If a task is found by a probe, and it is older than this time in seconds, then the probe will throw it away. Type: Integer. Default: 600

probe

This is a dictionary where probe specific configuration goes. Type: Dictionary.

max_retries: The maximum amount of times the probe will retry a monitor that is in a non-OK state. Type: Integer. Default: 2
queue_wait_time: The amount of time the probe will wait for a task to appear in the tasks_queue. AWS SQS only allows this to be a maximum of 20 seconds. In most cases, the default should be fine. Type: Integer. Default: 20
retry_delay: The amount of time in seconds that a probe will delay retries on non-OK, non-HARD monitors. This allows you to quickly retry monitors that are supposed to be failing, to verify that there is an actual issue. Type: Integer. Default: 30

reactor

This is a dictionary where reactor specific configuration goes. Type: Dictionary

handler_config_path: The directory where Reactor Handlers specific configurations are found. Type: String. Default: /etc/nymms/handlers
queue_name: The name of the SQS queue where reactions will be found. Type: String. Default: reactor_queue
queue_wait_time: The amount of time the probe will wait for a result to appear in the queue named in reactor.queue_name. AWS SQS only allows this to be a maximum of 20 seconds. In most cases, the default should be fine. Type: Integer. Default: 20
visibility_timeout: The amount of time (in seconds) that a message will disappear from the SQS reactor queue (defined in reactor.queue_name above) when it is picked up by a reactor. If the reactor doesn’t finish it’s work and delete the message within this amount of time, the message will re-appear in the queue. This allows the reactions to survive reactor crashes and the like. Type: Integer. Default: 30

scheduler

This is a dictionary where reactor specific configuration goes. Type: Dictionary

interval

How often, in seconds, the scheduler will schedule tasks. Type: Integer. Default: 300

backend

The dot-separated class path to use for the backend. The backend is what is used to find nodes that need to be monitored. Type: String. Default: nymms.scheduler.backends.yaml_backend.YamlBackend

backend_args

Any configuration args that the scheduler.backend above needs. Type: Dictionary

path: This is used by the YamlBackend, which is the default. This gives the name of the yaml file with node definitions that the YamlBackend uses. Type: String. Default: /etc/nymms/nodes.yaml

lock_backend

The backend used for locking multiple schedulers. Currently only SDB is available. Type: String. Default: SDB

lock_args

Any configuration args that the scheduler.lock_backend needs. Type: Dictionary.

duration: How long, in seconds, the scheduler will keep the lock for. Type: Integer. Default: 360
domain_name: The SDB domain name where locks are stored. Type: String. Default: nymms_locks
lock_name: The name of the lock. Type: String. Default: scheduler_lock

suppress

These are the config settings used by the suppression system. Type: Dictionary.

domain

The SDB domain where suppressions will be stored. Type: String. Default: nymms_suppress

cache_timeout

The amount of time, in seconds, to keep suppressions cached. Type: Integer. Default: 60

resources.yaml¶

The resources.yaml file is where you define your commands, monitors and monitoring groups.

commands: Commands are where you define the commands that will be used for monitoring services. The main config for each command is the command_string, which is a templatized string that defines the command line to a command line executable.
monitors: Monitors are specific instances of commands, allowing you to fill in templated variables in the command used. This allows your commands to be fairly generic and easily re-usable.
monitoring groups: Monitoring groups are used to tie monitors to individual nodes. It also lets you add some monitoring group specific variables that can be used in commands templates and other places.

Example resources.yaml

Config Options¶

commands

A dictionary of commands, the key of each is a unique name for the command, and the value is another dictionary with the commands configuration. Other than the command_string config option, you can specify any others you like - they will be accessible in the template of the command_string itself. Type: Dictionary.

command_string: A command line string using Jinja’s variable syntax. (ie: {{variable}}). Type: String.
other configs: You can specify as many other key/value entries as you like. They will be useable as variables in the command_string itself. Often times the values set here will be used as defaults for the command, provided the variable isn’t set anywhere else (such as on the monitor, or the node).

monitors

A dictionary of monitors, each of which calls a command defined above. The key of each entry is the name of the monitor, the value is another dictionary which contains configuration values for that monitor. Type: Dictionary

command: The name of a command defined in the resources file. This is the command that will be called for this monitor. Type: String.
monitoring_groups: A list of monitoring groups that this monitor is a part of. This is how you tie monitors to nodes - every monitor that is attached to a monitoring_group will be ran against every node that is attached to that monitoring_group.
other configs: You can specify as many other key/value entries as you like for each monitor. They will be useable as variables in the template strings used in the command for this monitor.

monitoring_groups

A dictionary of monitoring groups which tie together monitors and nodes. The keys of the dictionary are the monitoring_groups names, while the values are any extra config you want to put into the command context. Often times the values will be blank (see the example).

private.yaml¶

The private.yaml file is used to give context variables that can be used in various monitors, but which are not included when the tasks and results are sent over the wire. Largely these are used for things like passwords that are needed by monitors.

The variables that are provided by private.yaml need to be prepended by __private. when referring to them in templates. For example, if you have a private variable called db_password you would refer to it as __private.db_password in templates.

The contents of the private.yaml are simple key/value pairs.

Example private.yaml

nodes.yaml¶

The nodes.yaml file is the file used by default by the YamlBackend, which is used by the scheduler to figure out what nodes (instances, hosts, etc) need to be monitored. It’s a dictionary of node entries - each entry’s key is the name of the node. The value of each entry is a dictionary with the following options:

Example nodes.yaml

address: The network address of the node. This can be an ip address, or a hostname. If no address is provided, then it is assumed that the name of the node entry is the address. Type: String. Default: The node entry name.
monitoring_groups: A list of monitoring groups (as defined in resources.yaml) that this node is part of. Every monitor that is attached to a monitoring group will be applied to every node in the monitoring group. Type: List.
realm: The realm this node is a part of. See the realms documentation.