Skip to main content

Chef Push Jobs

[edit on GitHub]

Chef Push Jobs is an extension of the Chef Infra Server that allows jobs to be run against nodes independently of a Chef Infra Client run. A job is an action or a command to be executed against a subset of nodes; the nodes against which a job is run are determined by the results of a search query made to the Chef Infra Server.

Chef Push Jobs uses the Chef Infra Server API and a Ruby client to initiate all connections to the Chef Infra Server. Connections use the same authentication and authorization model as any other request made to the Chef Infra Server. A knife plugin is used to initiate job creation and job tracking.

Install Push Jobs using the push-jobs cookbook and a Chef Infra Client run on each of the target nodes.

Requirements

Chef Push Jobs has the following requirements:

  • An on-premises Chef Infra Server. Hosted Chef does not support Chef Push Jobs.
  • The Chef Push Jobs client can be configured using a push-jobs cookbook, but Chef Infra Client must also be present on the node. Only Chef Infra Client can use a cookbook to configure a node.
  • TCP protocol ports 10000, 10002 and 10003. 10000 is the default heartbeat port, 10002 is the default command port, 10003 is the default API port. These may be configured in the Chef Push Jobs configuration file. The command port allows Chef Push Jobs clients to communicate with the Chef Push Jobs server and also allows chef server components to communicate with the push-jobs server. In a configuration with both front and back ends, this port only needs to be open on the back end servers. The Chef Push Jobs server waits for connections from the Chef Push Jobs client, and never initiates a connection to a Chef Push Jobs client. In situations where the chef server has a non-locally-assigned public address (like a cloud deployment / or behind NAT ) the api port should be added to the network security configuration for the chef server to connect to itself on the public IP, if that is what the chef server hostname points to.

Components

Chef Push Jobs has three main components: jobs (managed by the Chef Push Jobs server), a client that is installed on every node in the organization, and one (or more) workstations from which job messages are initiated.

All communication between these components is done with the following:

  • A heartbeat message between the Chef Push Jobs server and each managed node
  • A knife plugin named knife push jobs with four subcommands: job list, job start, job status, and node status
  • Various job messages sent from a workstation to the Chef Push Jobs server
  • A single job message that is sent (per job) from the Chef Push Jobs server to one (or more) nodes that are being managed by the Chef server

The following diagram shows the various components of Chef Push Jobs:

image

Jobs

The Chef Push Jobs server is used to send job messages to one (or more) managed nodes and also to manage the list of jobs that are available to be run against nodes.

A heartbeat message is used to let all of the nodes in an organization know that the Chef Push Jobs server is available. The Chef Push Jobs server listens for heartbeat messages from each Chef Push Jobs client. If there is no heartbeat from a Chef Push Jobs client, the Chef Push Jobs server will mark that node as unavailable for job messages until the heartbeat resumes.

Nodes

The Chef Push Jobs client is used to receive job messages from the Chef Push Jobs server and to verify the heartbeat status. The Chef Push Jobs client uses the same authorization / authentication model as Chef Infra Client. The Chef Push Jobs client listens for heartbeat messages from the Chef Push Jobs server. If there is no heartbeat from the Chef Push Jobs server, the Chef Push Jobs client will finish its current job, but then stop accepting any new jobs until the heartbeat from the Chef Push Jobs server resumes.

Workstations

A workstation is used to manage Chef Push Jobs jobs, including maintaining the push-jobs cookbook, using knife to start and stop jobs, view job status, and to manage job lists.

push-jobs Cookbook

The push-jobs cookbook contains attributes that are used to configure the Chef Push Jobs client. In addition, Chef Push Jobs relies on the whitelist attribute to manage the list of jobs (and commands) that are available to Chef Push Jobs.

Whitelist

A whitelist is a list of jobs and commands that are used by Chef Push Jobs. A whitelist is saved as an attribute in the push-jobs cookbook. For example:

default['push_jobs']['whitelist'] = {
  'job_name' => 'command',
}

The whitelist is accessed from a recipe using the node['push_jobs']['whitelist'] attribute. For example:

template 'name' do
  source 'name'
  ...
  variables(:whitelist => node['push_jobs']['whitelist'])
end

Use the knife exec subcommand to add a job to the whitelist. For example:

knife exec -E 'nodes.transform("name:A_NODE_NAME") do |n|
    n.set["push_jobs"]["whitelist"]["ntpdate"] = "ntpdate -u time"
  end'

where ["ntpdate"] = "ntpdate -u time" is added to the whitelist:

default['push_jobs']['whitelist'] = {
  'ntpdate' => 'ntpdate -u time',
}

Reference

The following sections describe the knife subcommands, the Push Jobs API, and configuration settings used by Chef Push Jobs.

knife push jobs

The knife push jobs subcommand is used by Chef Push Jobs to start jobs, view job status, view job lists, and view node status.

Note

Review the list of common options available to this (and all) knife subcommands and plugins.

job list

Use the job list argument to view a list of Chef Push Jobs jobs.

Syntax

This argument has the following syntax:

knife job list

Options

This command does not have any specific options.

job start

Use the job start argument to start a Chef Push Jobs job.

Syntax

This argument has the following syntax:

knife job start (options) COMMAND [NODE, NODE, ...]

Options

This argument has the following options:

--timeout TIMEOUT

The maximum amount of time (in seconds) by which a job must complete, before it is stopped.

-q QUORUM, --quorum QUORUM

The minimum number of nodes that match the search criteria, are available, and acknowledge the job request. This can be expressed as a percentage (e.g. 50%) or as an absolute number of nodes (e.g. 145). Default value: 100%.

For example, there are ten total nodes. If --quorum 80% is used and eight of those nodes acknowledge the job request, the command will be run against all of the available nodes. If two of the nodes were unavailable, the command would still be run against the remaining eight available nodes because quorum was met.

Examples

Run a job

To run a job named add-glasses against a node named ricardosalazar, run the following command:

knife job start add-glasses 'ricardosalazar'

Run a job using quorum percentage

To search for nodes assigned the role webapp, and where 90% of those nodes must be available, run the following command:

knife job start --quorum 90% 'chef-client' --search 'role:webapp'

Run a job using node names

To search for a specific set of nodes (named chico, harpo, groucho, gummo, zeppo), and where 90% of those nodes must be available, run the following command:

knife job start --quorum 90% 'chef-client' chico harpo groucho gummo zeppo

to return something similar to:

Started. Job ID: GUID12345abc
  quorum_failed
  Command: chef-client
  Created_at: date
  unavailable: zeppo
  was_ready:
    gummo
    groucho
    chico
    harpo
  On_timeout: 3600
  Status: quorum_failed

Note

If quorum had been set at 80% (--quorum 80%), then quorum would have passed with the previous example.

job status

Use the job status argument to view the status of Chef Push Jobs jobs. Each job is always in one of the following states:

new

New job status.

voting

Waiting for nodes to commit or refuse to run the command.

running

Running the command on the nodes.

complete

Ran the command. Check individual node statuses to see if they completed or had issues.

quorum_failed

Did not run the command on any nodes.

crashed

Crashed while running the job.

timed_out

Timed out while running the job.

aborted

Job aborted by user.

Syntax

This argument has the following syntax:

knife job status <job id>

Options

This command does not have any specific options.

Examples

View job status by job identifier

To view the status of a job that has the identifier of 235, run the following command:

knife job status 235

to return something similar to:

Node name   Status      Last updated
foo         Failed      2012-05-04 00:00
bar         Done        2012-05-04 00:01

node status

Use the node status argument to identify nodes that Chef Push Jobs may interact with. Each node is always in one of the following states:

new

Node has neither committed nor refused to run the command.

ready

Node has committed to run the command but has not yet run it.

running

Node is presently running the command.

succeeded

Node successfully ran the command (an exit code of 0 was returned).

failed

Node failed to run the command (an exit code of non-zero was returned).

aborted

Node ran the command but stopped before completion.

crashed

Node went down after it started running the job.

nacked

Node was busy when asked to be part of the job.

unavailable

Node went down before it started running.

was_ready

Node was ready but quorum failed.

timed_out

Node timed out.

Syntax

This argument has the following syntax:

knife node status [<node> <node> ...]

Options

This command does not have any specific options.

Push Jobs API

The Push Jobs API is used to create jobs and retrieve status using Chef Push Jobs, a tool that pushes jobs against a set of nodes in the organization. All requests are signed using the Chef Infra Server API and the validation key on the workstation from which the requests are made. All commands are sent to the Chef Infra Server using the knife exec subcommand.

Each authentication request must include /organizations/organization_name/pushy/ as part of the name for the endpoint. For example: /organizations/organization_name/pushy/jobs/ID or /organizations/organization_name/pushy/node_states.

connect/NODE_NAME

The /organizations/ORG_NAME/pushy/node_states/NODE_NAME endpoint has the following methods: GET.

GET

The GET method is used to get the status (up or down) for an individual node.

This method has no parameters.

Request

GET /organizations/ORG_NAME/pushy/node_states/NODE_NAME

Response

The response is similar to:

{
  "node_name": "FIONA",
  "status": "down",
  "updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
}

where updated_at shows the date and time at which a node’s status last changed.

Response CodeDescription
200OK. The request was successful.
400Bad request. The contents of the request are not formatted correctly.
401Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request.
403Forbidden. The user who made the request is not authorized to perform the action.
404Not found. The requested object does not exist.

jobs

The /organizations/ORG_NAME/pushy/jobs endpoint has the following methods: GET and POST.

GET

The GET method is used to get a list of jobs.

This method has no parameters.

Request

GET /organizations/ORG_NAME/pushy/jobs

Response

The response is similar to:

{
  "aaaaaaaaaaaa25fd67fa8715fd547d3d",
  "aaaaaaaaaaaa6af7b14dd8a025777cf0"
}
Response CodeDescription
200OK. The request was successful.
400Bad request. The contents of the request are not formatted correctly.
401Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request.
403Forbidden. The user who made the request is not authorized to perform the action.
404Not found. The requested object does not exist.

POST

The POST method is used to start a job.

This method has no parameters.

Request

POST /organizations/ORG_NAME/pushy/jobs

with a request body similar to:

{
  "command": "chef-client",
  "run_timeout": 300,
  "nodes": ["NODE1", "NODE2", "NODE3", "NODE4", "NODE5", "NODE6"]
}

Response

The response is similar to:

{
  "id": "aaaaaaaaaaaa25fd67fa8715fd547d3d"
}
Response CodeDescription
201Created. The object was created.
400Bad request. The contents of the request are not formatted correctly.
401Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request.
403Forbidden. The user who made the request is not authorized to perform the action.
404Not found. The requested object does not exist.

jobs/ID

The /organizations/ORG_NAME/pushy/jobs/ID endpoint has the following methods: GET.

GET

The GET method is used to get the status of an individual job, including node state (running, complete, crashed).

This method has no parameters.

The POST method is used to start a job.

This method has no parameters.

Request

POST /organizations/ORG_NAME/pushy/jobs

with a request body similar to:

{
  "command": "chef-client",
  "run_timeout": 300,
  "nodes": ["NODE1", "NODE2", "NODE3", "NODE4", "NODE5", "NODE6"]
}

Response

The response is similar to:

{
  "id": "aaaaaaaaaaaa25fd67fa8715fd547d3d"
}
Response CodeDescription
201Created. The object was created.
400Bad request. The contents of the request are not formatted correctly.
401Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request.
403Forbidden. The user who made the request is not authorized to perform the action.
404Not found. The requested object does not exist.

Request

GET /organizations/ORG_NAME/pushy/jobs/ID

Response

The response will return something similar to:

{
  "id": "aaaaaaaaaaaa25fd67fa8715fd547d3d",
  "command": "chef-client",
  "run_timeout": 300,
  "status": "running",
  "created_at": "Tue, 04 Sep 2012 23:01:02 GMT",
  "updated_at": "Tue, 04 Sep 2012 23:17:56 GMT",
  "nodes": {
    "running": ["NODE1", "NODE5"],
    "complete": ["NODE2", "NODE3", "NODE4"],
    "crashed": ["NODE6"]
  }
}

where:

  • nodes is one of the following: aborted (node ran command, stopped before completion), complete (node ran command to completion), crashed (node went down after command started running), nacked (node was busy), new (node has not accepted or rejected command), ready (node has accepted command, command has not started running), running (node has accepted command, command is running), and unavailable (node went down before command started).
  • status is one of the following: aborted (the job was aborted), complete (the job completed; see nodes for individual node status), quorum_failed (the command was not run on any nodes), running (the command is running), timed_out (the command timed out), and voting (waiting for nodes; quorum not yet met).
  • updated_at is the date and time at which the job entered its present status
Response CodeDescription
200OK. The request was successful.
400Bad request. The contents of the request are not formatted correctly.
401Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request.
403Forbidden. The user who made the request is not authorized to perform the action.
404Not found. The requested object does not exist.

node_states

The /organizations/ORG_NAME/pushy/node_states endpoint has the following methods: GET.

GET

The GET method is used to get a list of nodes and their status (up or down).

This method has no parameters.

Request

GET /organizations/ORG_NAME/pushy/node_states

Response

The response is similar to:

{
  {
    "node_name": "FARQUAD",
    "status": "up",
    "updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
  }
  {
    "node_name": "DONKEY",
    "status": "up",
    "updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
  }
  {
    "node_name": "FIONA",
    "status": "down",
    "updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
  }
}

The following values are possible: up or down.

Response CodeDescription
200OK. The request was successful.
400Bad request. The contents of the request are not formatted correctly.
401Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request.
403Forbidden. The user who made the request is not authorized to perform the action.
404Not found. The requested object does not exist.

node_states/NODE_NAME

The /organizations/ORG_NAME/pushy/node_states/NODE_NAME endpoint has the following methods: GET.

GET

The GET method is used to get the status (up or down) for an individual node.

This method has no parameters.

Request

GET /organizations/ORG_NAME/pushy/node_states/NODE_NAME

Response

The response is similar to:

{
  "node_name": "FIONA",
  "status": "down",
  "updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
}

where updated_at shows the date and time at which a node’s status last changed.

Response CodeDescription
200OK. The request was successful.
400Bad request. The contents of the request are not formatted correctly.
401Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request.
403Forbidden. The user who made the request is not authorized to perform the action.
404Not found. The requested object does not exist.

push-jobs-client

The Chef Push Jobs executable can be run as a command-line tool.

Options

This command has the following syntax:

push-jobs-client OPTION VALUE OPTION VALUE ...

This command has the following options:

-c CONFIG, --config CONFIG

The configuration file to use. Chef Infra Client and Chef Push Jobs client use the same configuration file: client.rb. Default value: Chef::Config.platform_specific_path("/etc/chef/client.rb").

-h, --help

Show help for the command.

-k KEY_FILE, --client-key KEY_FILE

The location of the file that contains the client key.

-l LEVEL, --log_level LEVEL

The level of logging to be stored in a log file.

-L LOCATION, --logfile LOCATION

The location of the log file. This is recommended when starting any executable as a daemon.

-N NODE_NAME, --node-name NODE_NAME

The unique identifier of the node.

-S URL, --server URL

The URL for the Chef Infra Server.

-v, --version

The version of Chef Push Jobs.

opscode-push-jobs-server.rb

The opscode-push-jobs-server.rb file is used to specify the configuration settings used by the Chef Push Jobs server.

This file is the default configuration file and is located at: /etc/opscode-push-jobs-server.

Settings

This configuration file has the following settings:

api_port

NGINX forwards requests to this port on the push-jobs server as part of the push-jobs communication channel. Default value: 10003.

command_port

The port on which a Chef Push Jobs server listens for requests that are to be executed on managed nodes. Default value: 10002.

heartbeat_interval

The frequency of the Chef Push Jobs server heartbeat message. Default value: 1000 (milliseconds).

server_heartbeat_port

The port on which the Chef Push Jobs server receives heartbeat messages from each Chef Push Jobs client. (This port is the ROUTER half of the ZeroMQ DEALER / ROUTER pattern.) Default value: 10000.

server_name

The name of the Chef Push Jobs server.

zeromq_listen_address

The IP address used by ZeroMQ. Default value: tcp://*.