We go over common ways to customize your Dagster Helm deployment. This includes adding Kubernetes and Celery configuration at the pipeline and solid level, configuring Celery queues, and configuring your Helm release to use external resources.
We expect familiarity with the basic guide and advanced guide on deploying Dagster with Helm.
The dagster-k8s/config
allows users to pass custom configuration to the Kubernetes Job, Job metadata, JobSpec, PodSpec, and PodTemplateSpec metadata. We can specify this information in a solid or pipeline's tags.
@solid(
tags = {
'dagster-k8s/config': {
'container_config': {
'resources': {
'requests': { 'cpu': '250m', 'memory': '64Mi' },
'limits': { 'cpu': '500m', 'memory': '2560Mi' },
}
},
'pod_template_spec_metadata': {
'annotations': { "cluster-autoscaler.kubernetes.io/safe-to-evict": "true"}
},
'pod_spec_config': {
'affinity': {
'nodeAffinity': {
'requiredDuringSchedulingIgnoredDuringExecution': {
'nodeSelectorTerms': [{
'matchExpressions': [{
'key': 'beta.kubernetes.io/os', 'operator': 'In', 'values': ['windows', 'linux'],
}]
}]
}
}
}
},
},
},
)
def my_solid(context):
context.log.info('running')
@pipeline(
tags = {
'dagster-k8s/config': {
'container_config': {
'resources': {
'requests': { 'cpu': '200m', 'memory': '32Mi' },
}
},
}
}
)
def my_pipeline():
my_solid()
Users can configure multiple Celery queues (for example, one queue for each resource to be limited) and multiple Celery workers per queue via the runLauncher.config.celeryK8sRunLauncher.workerQueues
section of values.yaml
.
To use the queues, dagster-celery/queue
can be set on solid tags.
By default, all solids will be sent to the default Celery queue named dagster
.
@solid(
tags = {
'dagster-celery/queue': 'snowflake_queue',
}
)
def my_solid(context):
context.log.info('running')
Users can set dagster-celery/run_priority
on the pipeline tags to configure the baseline priority of all solids from that pipeline. To set priority at the solid level, users can set dagster-celery/priority
on the solid tags to configure additional priority. When priorities are set on both the pipeline and solid, the sum of both priorities will be used.
@solid(
tags = {
'dagster-celery/priority': 2,
}
)
def my_solid(context):
context.log.info('running')
@pipeline(
tags = {
'dagster-celery/run_priority': 3,
}
)
def my_pipeline():
my_solid()
In a real deployment, users will likely want to set up an external PostgreSQL database and configure the postgresql
section of values.yaml
.
postgresql:
enabled: false
postgresqlHost: "postgresqlHost"
postgresqlUsername: "postgresqlUsername"
postgresqlPassword: "postgresqlPassword"
postgresqlDatabase: "postgresqlDatabase"
service:
port: 5432
Supplying .Values.postgresql.postgresqlPassword
will create a Kubernetes Secret with key postgresql-password
, containing the encoded password. This secret is used to supply the Dagster infrastructure with an environment variable that's used when creating the storages for the Dagster instance.
If you use a secrets manager like Vault, it may be convenient to manage this Secret outside of the Dagster Helm chart. In this case, the generation of this Secret within the chart should be disabled, and .Values.global.postgresqlSecretName
should be set to the name of the externally managed Secret.
global:
postgresqlSecretName: "dagster-postgresql-secret"
generatePostgresqlPasswordSecret: false
In a real deployment, users will likely want to set up an external message broker like Redis, and configure rabbitmq
and redis
sections of values.yaml
.
rabbitmq:
enabled: false
redis:
enabled: true
internal: false
host: "redisHost"
port: 6379
brokerDbNumber: 0
backendDbNumber: 0
Users will likely want to permission a ServiceAccount bound to a properly scoped Role to launch Jobs and create other Kubernetes resources.
Users will likely want to use Secrets for managing secure information such as database logins.
It may be desirable to manage two Helm releases for your Dagster deployment: one release for the Dagster infrastructure, which consists of Dagit and the Daemon, and another release for your User Code, which contains the definitions of your pipelines written in Dagster. This way, changes to User Code can be decoupled from upgrades to core Dagster infrastructure.
To do this, we offer the dagster
chart and the dagster-user-deployments
chart.
$ helm search repo dagster
NAME CHART VERSION APP VERSION DESCRIPTION
dagster/dagster 0.11.0 0.11.0 Dagster is a system for building modern data ap...
dagster/dagster-user-deployments 0.11.0 0.11.0 A Helm subchart to deploy Dagster User Code dep...
To manage these separate deployments, we first need to isolate Dagster infrastructure to its own deployment. This can be done by disabling the subchart that deploys the User Code in the dagster
chart. This will prevent the dagster
chart from creating the services and deployments related to User Code, as these will be managed in a separate release.
dagster-user-deployments:
enableSubchart: false
Next, the workspace for Dagit must be configured with the future hosts and ports of the services exposing access to the User Code.
dagit:
workspace:
enabled: true
servers:
- host: "k8s-example-user-code-1"
port: 3030
- ...
Finally, the dagster-user-deployments
subchart can now be managed in its own release. The list of possible overrides for the subchart can be found in its values.yaml
.
helm upgrade --install user-code dagster/dagster-user-deployments -f /path/to/values.yaml
You should now be familiar with the common ways to customize your Dagster Helm deployment.