DagsterDocs

Main Concepts#

Dagster is a data orchestrator. It lets you define pipelines (DAGs) in terms of the data flow between logical components called solids. These pipelines can be developed locally and run anywhere.

How to use the Main Concepts section#

Each page in this section contains:

  • Overview: Description of the concept.
  • Relevant APIs: Index of top-level Dagster APIs that are relevant to the concept.
  • Examples: Easy-to-copy code snippets for the concept.
  • Patterns: A list of advanced patterns to use and anti-patterns to avoid with the concept.

Sections#

Solids and Pipelines

Solids and Pipelines are the building blocks of Dagster code. You use these to define orchestration graphs. This section covers how to define and use both solids and pipelines.

Modes and Resources

Modes, alongside resources, enable you to separate the pipeline logic from its heavyweight external dependencies. This makes testing and develop data pipelines possible in various environments.

Testing

Dagster enables you to build testable and maintainable data applications. This section shows that Dagster enables you to unit-test your data pipelines, separate business logic from external dependencies, and run data quality tests.

Configuration System

Dagster provides a configuration system that allows you to document, schematize, and error-check your configuration. This section demonstrates how configurations work with different Dagster entities.

Dagster Types

Dagster includes gradual, opt-in typing for the inputs and outputs of solids. This section explains how to define, use, and test types in Dagster.

IO Management

IO Managers are user-provided objects that store solid outputs and load them as inputs to downstream solids. This section explains how Dagster thinks about IO management and shows how to define and use IO managers and other IO-related features.

Dagit & GraphQL API

Dagit is a web-based interface for viewing and interacting with Dagster objects. This section walks you through Dagit's functionalities and the GraphQL API used to interact with Dagster programatically.

Repositories and Workspaces

A workspace is a collection of user-defined repositories and information about where to find them. Dagster tools, like Dagit and the Dagster CLI, use workspaces to load user code. This section shows how to define and when to use repositories and workspaces.

Schedules, Sensors, and Partitions

Schedulers can launch runs on a fixed interval, while sensors allow you to run based on any external state change. This section demonstrates how to define them and their convenient capabilities like partitioning and backfilling.

Assets

Assets are data objects that you produce during a pipeline run. This section walks you through how to inform Dagster about these assets so that they can be tracked over time.

Logging

Dagster includes a rich and extensible logging system. This section showcases Dagster's built-in logger and shows how you can customize loggers to fit your logging and monitoring infrastructure.