DagsterDocs

Repositories#

A repository is a collection of pipelines and other definitions that various tools, such as the Dagster CLI or Dagit, can target to load the definitions.

Relevant APIs#

NameDescription
@repositoryThe decorator used to define repositories. The decorator returns a RepositoryDefinition
RepositoryDefinitionBase class for repositories. You almost never want to use initialize this class directly. Instead, you should use the @repository which returns a RepositoryDefinition

Overview#

A repository is a convenient way to organize your pipelines and other definitions. Each repository:

  • Includes various definitions: Pipelines, Schedules, Sensors, and Partition Sets.
  • Is loaded in a different process than Dagster system processes like dagit. Any communication between the Dagster system and repository code occurs over an RPC mechanism, ensuring that problems in repository code can't affect Dagster or other repositories.
  • Can be loaded in its own Python environment, so you can manage your dependencies (or even your own Python versions) separately.

You can set up multiple repositories and load them all at once by creating a workspace.yaml file. This can be useful to group pipelines and other artifacts by team for organization purposes. See Workspace to learn more about setting up multiple repositories.


Defining a Repository#

Repositories are declared using the RepositoryDefinition API or repository decorator. For example:

@pipeline
def addition_pipeline():
    return add(return_one(), return_two())


@pipeline
def subtraction_pipeline():
    return subtract(return_one(), return_two())


@daily_schedule(
    pipeline_name="addition_pipeline",
    start_date=datetime.datetime(2020, 1, 1),
)
def daily_addition_schedule(date):
    return {}


@sensor(pipeline_name="addition_pipeline")
def addition_sensor(context):
    should_run = True
    if should_run:
        yield RunRequest(run_key=None, run_config={})


@repository
def my_repository():
    return [
        addition_pipeline,
        subtraction_pipeline,
        daily_addition_schedule,
        addition_sensor,
    ]

The repository specifies a list of items, each of which can be a PipelineDefinition, ScheduleDefinition, SensorDefinition, or PartitionSetDefinition.

Using a Repository#

If you save the code above as repo.py, you can then run the Dagster command line tools on it. Try running:

dagit -f repo.py

Now you can see that all pipelines, schedules and sensors in this repository are on the left:

repo-dagit

You can also use -m to specify a module where the repository lives (See dagit for the full list of ways to locate a repository).

Loading repositories via the -f or -m options is actually just a convenience function. The underlying abstraction is the Workspace, which determines all of the available repositories available to Dagit. See Workspace for more details.