DagsterDocs

Deploying Dagster to Docker#

If you are running on AWS ECS or another container-based orchestration system, you'll likely want to package Dagit using a Docker image.

A minimal skeleton Dockerfile that will run Dagit is shown below:

FROM python:3.7-slim

RUN mkdir -p /opt/dagster/dagster_home /opt/dagster/app

RUN pip install dagit dagster-postgres

# Copy your pipeline code and workspace to /opt/dagster/app
COPY pipelines.py workspace.yaml /opt/dagster/app/

ENV DAGSTER_HOME=/opt/dagster/dagster_home/

# Copy dagster instance YAML to $DAGSTER_HOME
COPY dagster.yaml /opt/dagster/dagster_home/

WORKDIR /opt/dagster/app

EXPOSE 3000

ENTRYPOINT ["dagit", "-h", "0.0.0.0", "-p", "3000"]

You'll also need to include a dagster.yaml file in the same directory as the Dockerfile to configure the Dagster instance that Dagit will use:

run_storage:
  module: dagster_postgres.run_storage
  class: PostgresRunStorage
  config:
    postgres_db:
      username:
        env: DAGSTER_PG_USERNAME
      password:
        env: DAGSTER_PG_PASSWORD
      hostname:
        env: DAGSTER_PG_HOST
      db_name:
        env: DAGSTER_PG_DB
      port: 5432

event_log_storage:
  module: dagster_postgres.event_log
  class: PostgresEventLogStorage
  config:
    postgres_db:
      username:
        env: DAGSTER_PG_USERNAME
      password:
        env: DAGSTER_PG_PASSWORD
      hostname:
        env: DAGSTER_PG_HOST
      db_name:
        env: DAGSTER_PG_DB
      port: 5432

schedule_storage:
  module: dagster_postgres.schedule_storage
  class: PostgresScheduleStorage
  config:
    postgres_db:
      username:
        env: DAGSTER_PG_USERNAME
      password:
        env: DAGSTER_PG_PASSWORD
      hostname:
        env: DAGSTER_PG_HOST
      db_name:
        env: DAGSTER_PG_DB
      port: 5432

compute_logs:
  module: dagster_aws.s3.compute_log_manager
  class: S3ComputeLogManager
  config:
    bucket: "mycorp-dagster-compute-logs"
    prefix: "dagster-test-"

local_artifact_storage:
  module: dagster.core.storage.root
  class: LocalArtifactStorage
  config:
    base_dir: "/opt/dagster/local/"

In cases where you're using environment variable to configure the instance, you should ensure these environment variables are exposed in the running Dagit container.

In practice, you may want to volume mount your pipeline code into your containers to enable deployment patterns such as git-sync sidecars that avoid the need to rebuild images and redeploy containers when pipeline code changes.

Dagit servers expose a health check endpoint at /dagit_info, which returns a JSON response like:

{
  "dagit_version": "0.6.6",
  "dagster_graphql_version": "0.6.6",
  "dagster_version": "0.6.6"
}

Multi-container Docker deployment#

More advanced dagster deployments will require deploying more than one container. For example, if you are using dagster-daemon to run schedules and sensors or manage a queue of runs, you'll likely want a separate container running the dagster-daemon service. Dagster also supports a deployment setup where pipeline code can be updated and deployed separately in its own container, without needing to redeploy the other dagster services.

A complete working example of this architecture using docker-compose can be found in this example below, which includes a dagit container, a dagster-daemon container, and one or more containers with user pipeline code.

Example#

You can find the code for this example on Github

This example demonstrates a Dagster deployment that includes a Dagit container for loading and launching pipelines, a dagster-daemon container for managing a run queue and submitting runs from schedules and sensors, a postgres container for persistent storage, and a container with user pipeline code. The Dagster instance uses DockerRunLauncher to launch each run in its own container.

To start the deployment, run docker-compose up.