DagsterDocs

Types

Dagster includes facilities for typing the input and output values of solids (“runtime” types).

Built-in primitive types

dagster.Any

Use this type for any input, output, or config field whose type is unconstrained

All values are considered to be instances of Any.

Examples:

@solid
def identity(_, x: Any) -> Any:
    return x

# Untyped inputs and outputs are implicitly typed Any
@solid
def identity_imp(_, x):
    return x

# Explicitly typed
@solid(
    input_defs=[InputDefinition('x', dagster_type=Any)],
    output_defs=[OutputDefinition(dagster_type=Any)]
)
def identity(_, x):
    return x

@solid(config_schema=Field(Any))
def any_config(context):
    return context.solid_config
dagster.Bool

Use this type for any boolean input, output, or config_field. At runtime, this will perform an isinstance(value, bool) check. You may also use the ordinary bool type as an alias.

Examples:

@solid
def boolean(_, x: Bool) -> String:
    return 'true' if x else 'false'

@solid
def empty_string(_, x: String) -> bool:
    return len(x) == 0

# Explicit
@solid(
    input_defs=[InputDefinition('x', dagster_type=Bool)],
    output_defs=[OutputDefinition(dagster_type=String)]
)
def boolean(_, x):
    return 'true' if x else 'false'

@solid(
    input_defs=[InputDefinition('x', dagster_type=String)],
    output_defs=[OutputDefinition(dagster_type=bool)]
)
def empty_string(_, x):
    return len(x) == 0

@solid(config_schema=Field(Bool))
def bool_config(context):
    return 'true' if context.solid_config else 'false'
dagster.Int

Use this type for any integer input or output. At runtime, this will perform an isinstance(value, int) check. You may also use the ordinary int type as an alias.

Examples:

@solid
def add_3(_, x: Int) -> int:
    return x + 3

# Explicit
@solid(
    input_defs=[InputDefinition('x', dagster_type=Int)],
    output_defs=[OutputDefinition(dagster_type=Int)]
)
def add_3(_, x):
    return x + 3
dagster.Float

Use this type for any float input, output, or config value. At runtime, this will perform an isinstance(value, float) check. You may also use the ordinary float type as an alias.

Examples:

@solid
def div_2(_, x: Float) -> float:
    return x / 2

# Explicit
@solid(
    input_defs=[InputDefinition('x', dagster_type=Float)],
    output_defs=[OutputDefinition(dagster_type=float)]
)
def div_2(_, x):
    return x / 2

@solid(config_schema=Field(Float))
def div_y(context, x: Float) -> float:
    return x / context.solid_config
dagster.String

Use this type for any string input, output, or config value. At runtime, this will perform an isinstance(value, str) check. You may also use the ordinary str type as an alias.

Examples:

@solid
def concat(_, x: String, y: str) -> str:
    return x + y

# Explicit
@solid(
    input_defs=[
        InputDefinition('x', dagster_type=String),
        InputDefinition('y', dagster_type=str)
    ],
    output_defs=[OutputDefinition(dagster_type=str)]
)
def concat(_, x, y):
    return x + y

@solid(config_schema=Field(String))
def hello(context) -> str:
    return 'Hello, {friend}!'.format(friend=context.solid_config)
dagster.Nothing

Use this type only for inputs and outputs, in order to establish an execution dependency without communicating a value. Inputs of this type will not be pased to the solid compute function, so it is necessary to use the explicit InputDefinition API to define them rather than the Python 3 type hint syntax.

All values are considered to be instances of Nothing.

Examples:

@solid
def wait(_) -> Nothing:
    time.sleep(1)
    return

@solid(
    InputDefinition('ready', dagster_type=Nothing)
)
def done(_) -> str:
    return 'done'

@pipeline
def nothing_pipeline():
    done(wait())

# Any value will pass the type check for Nothing
@solid
def wait_int(_) -> Int:
    time.sleep(1)
    return 1

@pipeline
def nothing_int_pipeline():
    done(wait_int())
dagster.Optional

Use this type only for inputs and outputs, if the value can also be None.

Examples:

@solid
def nullable_concat(_, x: String, y: Optional[String]) -> String:
    return x + (y or '')

# Explicit
@solid(
    input_defs=[
        InputDefinition('x', dagster_type=String),
        InputDefinition('y', dagster_type=Optional[String])
    ],
    output_defs=[OutputDefinition(dagster_type=String)]
)
def nullable_concat(_, x, y):
    return x + (y or '')
dagster.List

Use this type for inputs, or outputs.

Lists are also the appropriate input types when fanning in multiple outputs using a MultiDependencyDefinition or the equivalent composition function syntax.

Examples:

@solid
def concat_list(_, xs: List[String]) -> String:
    return ''.join(xs)

# Explicit
@solid(
    input_defs=[InputDefinition('xs', dagster_type=List[String])],
    output_defs=[OutputDefinition(dagster_type=String)]
)
def concat_list(_, xs) -> String:
    return ''.join(xs)

# Fanning in multiple outputs
@solid
def emit_1(_) -> int:
    return 1

@solid
def emit_2(_) -> int:
    return 2

@solid
def emit_3(_) -> int:
    return 3

@solid
def sum_solid(_, xs: List[int]) -> int:
    return sum(xs)

@pipeline
def sum_pipeline():
    sum_solid([emit_1(), emit_2(), emit_3()])
dagster.Dict

Use this type for inputs, or outputs that are dicts.

For inputs and outputs, you may optionally specify the key and value types using the square brackets syntax for Python typing.

Examples:

@solid
def repeat(_, spec: Dict) -> str:
    return spec['word'] * spec['times']

# Explicit
@solid(
    input_defs=[InputDefinition('spec', dagster_type=Dict)],
    output_defs=[OutputDefinition(String)]
)
def repeat(_, spec):
    return spec['word'] * spec['times']
dagster.Set

Use this type for inputs, or outputs that are sets. Alias for typing.Set.

You may optionally specify the inner type using the square brackets syntax for Python typing.

Examples:

@solid
def set_solid(_, set_input: Set[String]) -> List[String]:
    return sorted([x for x in set_input])

# Explicit
@solid(
    input_defs=[InputDefinition('set_input', dagster_type=Set[String])],
    output_defs=[OutputDefinition(List[String])],
)
def set_solid(_, set_input):
    return sorted([x for x in set_input])
dagster.Tuple

Use this type for inputs or outputs that are tuples. Alias for typing.Tuple.

You may optionally specify the inner types using the square brackets syntax for Python typing.

Config values should be passed as a list (in YAML or the Python config dict).

Examples:

@solid
def tuple_solid(_, tuple_input: Tuple[String, Int, Float]) -> List:
    return [x for x in tuple_input]

# Explicit
@solid(
    input_defs=[InputDefinition('tuple_input', dagster_type=Tuple[String, Int, Float])],
    output_defs=[OutputDefinition(List)],
)
def tuple_solid(_, tuple_input):
    return [x for x in tuple_input]
class dagster.FileHandle[source]

A reference to a file as manipulated by a FileManager

Subclasses may handle files that are resident on the local file system, in an object store, or in any arbitrary place where a file can be stored.

This exists to handle the very common case where you wish to write a computation that reads, transforms, and writes files, but where you also want the same code to work in local development as well as on a cluster where the files will be stored in a globally available object store such as S3.

abstract property path_desc

A representation of the file path for display purposes only.

class dagster.LocalFileHandle(path)[source]

A reference to a file on a local filesystem.

Making New Types

class dagster.DagsterType(type_check_fn, key=None, name=None, is_builtin=False, description=None, loader=None, materializer=None, serialization_strategy=None, auto_plugins=None, required_resource_keys=None, kind=<DagsterTypeKind.REGULAR: 'REGULAR'>, typing_type=None)[source]

Define a type in dagster. These can be used in the inputs and outputs of solids.

Parameters
  • type_check_fn (Callable[[TypeCheckContext, Any], [Union[bool, TypeCheck]]]) – The function that defines the type check. It takes the value flowing through the input or output of the solid. If it passes, return either True or a TypeCheck with success set to True. If it fails, return either False or a TypeCheck with success set to False. The first argument must be named context (or, if unused, _, _context, or context_). Use required_resource_keys for access to resources.

  • key (Optional[str]) –

    The unique key to identify types programatically. The key property always has a value. If you omit key to the argument to the init function, it instead receives the value of name. If neither key nor name is provided, a CheckError is thrown.

    In the case of a generic type such as List or Optional, this is generated programatically based on the type parameters.

    For most use cases, name should be set and the key argument should not be specified.

  • name (Optional[str]) – A unique name given by a user. If key is None, key becomes this value. Name is not given in a case where the user does not specify a unique name for this type, such as a generic class.

  • description (Optional[str]) – A markdown-formatted string, displayed in tooling.

  • loader (Optional[DagsterTypeLoader]) – An instance of a class that inherits from DagsterTypeLoader and can map config data to a value of this type. Specify this argument if you will need to shim values of this type using the config machinery. As a rule, you should use the @dagster_type_loader decorator to construct these arguments.

  • materializer (Optional[DagsterTypeMaterializer]) – An instance of a class that inherits from DagsterTypeMaterializer and can persist values of this type. As a rule, you should use the @dagster_type_materializer decorator to construct these arguments.

  • serialization_strategy (Optional[SerializationStrategy]) – An instance of a class that inherits from SerializationStrategy. The default strategy for serializing this value when automatically persisting it between execution steps. You should set this value if the ordinary serialization machinery (e.g., pickle) will not be adequate for this type.

  • auto_plugins (Optional[List[Type[TypeStoragePlugin]]]) – If types must be serialized differently depending on the storage being used for intermediates, they should specify this argument. In these cases the serialization_strategy argument is not sufficient because serialization requires specialized API calls, e.g. to call an S3 API directly instead of using a generic file object. See dagster_pyspark.DataFrame for an example.

  • required_resource_keys (Optional[Set[str]]) – Resource keys required by the type_check_fn.

  • is_builtin (bool) – Defaults to False. This is used by tools to display or filter built-in types (such as String, Int) to visually distinguish them from user-defined types. Meant for internal use.

  • kind (DagsterTypeKind) – Defaults to None. This is used to determine the kind of runtime type for InputDefinition and OutputDefinition type checking.

  • typing_type – Defaults to None. A valid python typing type (e.g. Optional[List[int]]) for the value contained within the DagsterType. Meant for internal use.

dagster.PythonObjectDagsterType(python_type, key=None, name=None, **kwargs)[source]

Define a type in dagster whose typecheck is an isinstance check.

Specifically, the type can either be a single python type (e.g. int), or a tuple of types (e.g. (int, float)) which is treated as a union.

Examples

ntype = PythonObjectDagsterType(python_type=int)
assert ntype.name == 'int'
assert_success(ntype, 1)
assert_failure(ntype, 'a')
ntype = PythonObjectDagsterType(python_type=(int, float))
assert ntype.name == 'Union[int, float]'
assert_success(ntype, 1)
assert_success(ntype, 1.5)
assert_failure(ntype, 'a')
Parameters
  • python_type (Union[Type, Tuple[Type, ..]) – The dagster typecheck function calls instanceof on this type.

  • name (Optional[str]) – Name the type. Defaults to the name of python_type.

  • key (Optional[str]) – Key of the type. Defaults to name.

  • description (Optional[str]) – A markdown-formatted string, displayed in tooling.

  • loader (Optional[DagsterTypeLoader]) – An instance of a class that inherits from DagsterTypeLoader and can map config data to a value of this type. Specify this argument if you will need to shim values of this type using the config machinery. As a rule, you should use the @dagster_type_loader decorator to construct these arguments.

  • materializer (Optional[DagsterTypeMaterializer]) – An instance of a class that inherits from DagsterTypeMaterializer and can persist values of this type. As a rule, you should use the @dagster_type_mate decorator to construct these arguments.

  • serialization_strategy (Optional[SerializationStrategy]) – An instance of a class that inherits from SerializationStrategy. The default strategy for serializing this value when automatically persisting it between execution steps. You should set this value if the ordinary serialization machinery (e.g., pickle) will not be adequate for this type.

  • auto_plugins (Optional[List[Type[TypeStoragePlugin]]]) – If types must be serialized differently depending on the storage being used for intermediates, they should specify this argument. In these cases the serialization_strategy argument is not sufficient because serialization requires specialized API calls, e.g. to call an S3 API directly instead of using a generic file object. See dagster_pyspark.DataFrame for an example.

dagster.dagster_type_loader(config_schema, required_resource_keys=None, loader_version=None, external_version_fn=None)[source]

Create an dagster type loader that maps config data to a runtime value.

The decorated function should take the execution context and parsed config value and return the appropriate runtime value.

Parameters
  • config_schema (ConfigSchema) – The schema for the config that’s passed to the decorated function.

  • loader_version (str) – (Experimental) The version of the decorated compute function. Two loading functions should have the same version if and only if they deterministically produce the same outputs when provided the same inputs.

  • external_version_fn (Callable) – (Experimental) A function that takes in the same parameters as the loader function (config_value) and returns a representation of the version of the external asset (str). Two external assets with identical versions are treated as identical to one another.

Examples:

@dagster_type_loader(Permissive())
def load_dict(_context, value):
    return value
class dagster.DagsterTypeLoader[source]

Dagster type loaders are used to load unconnected inputs of the dagster type they are attached to.

The recommended way to define a type loader is with the @dagster_type_loader decorator.

dagster.dagster_type_materializer(config_schema, required_resource_keys=None)[source]

Create an output materialization hydration config that configurably materializes a runtime value.

The decorated function should take the execution context, the parsed config value, and the runtime value and the parsed config data, should materialize the runtime value, and should return an appropriate AssetMaterialization.

Parameters

config_schema (Any) – The type of the config data expected by the decorated function.

Examples:

# Takes a list of dicts such as might be read in using csv.DictReader, as well as a config
value, and writes
@dagster_type_materializer(str)
def materialize_df(_context, path, value):
    with open(path, 'w') as fd:
        writer = csv.DictWriter(fd, fieldnames=value[0].keys())
        writer.writeheader()
        writer.writerows(rowdicts=value)

    return AssetMaterialization.file(path)
class dagster.DagsterTypeMaterializer[source]

Dagster type materializers are used to materialize outputs of the dagster type they are attached to.

The recommended way to define a type loader is with the @dagster_type_materializer decorator.

dagster.usable_as_dagster_type(name=None, description=None, loader=None, materializer=None, serialization_strategy=None, auto_plugins=None)[source]

Decorate a Python class to make it usable as a Dagster Type.

This is intended to make it straightforward to annotate existing business logic classes to make them dagster types whose typecheck is an isinstance check against that python class.

Parameters
  • python_type (cls) – The python type to make usable as python type.

  • name (Optional[str]) – Name of the new Dagster type. If None, the name (__name__) of the python_type will be used.

  • description (Optional[str]) – A user-readable description of the type.

  • loader (Optional[DagsterTypeLoader]) – An instance of a class that inherits from DagsterTypeLoader and can map config data to a value of this type. Specify this argument if you will need to shim values of this type using the config machinery. As a rule, you should use the @dagster_type_loader decorator to construct these arguments.

  • materializer (Optional[DagsterTypeMaterializer]) – An instance of a class that inherits from DagsterTypeMaterializer and can persist values of this type. As a rule, you should use the @dagster_type_materializer decorator to construct these arguments.

  • serialization_strategy (Optional[SerializationStrategy]) – An instance of a class that inherits from SerializationStrategy. The default strategy for serializing this value when automatically persisting it between execution steps. You should set this value if the ordinary serialization machinery (e.g., pickle) will not be adequate for this type.

  • auto_plugins (Optional[List[TypeStoragePlugin]]) – If types must be serialized differently depending on the storage being used for intermediates, they should specify this argument. In these cases the serialization_strategy argument is not sufficient because serialization requires specialized API calls, e.g. to call an S3 API directly instead of using a generic file object. See dagster_pyspark.DataFrame for an example.

Examples:

# dagster_aws.s3.file_manager.S3FileHandle
@usable_as_dagster_type
class S3FileHandle(FileHandle):
    def __init__(self, s3_bucket, s3_key):
        self._s3_bucket = check.str_param(s3_bucket, 's3_bucket')
        self._s3_key = check.str_param(s3_key, 's3_key')

    @property
    def s3_bucket(self):
        return self._s3_bucket

    @property
    def s3_key(self):
        return self._s3_key

    @property
    def path_desc(self):
        return self.s3_path

    @property
    def s3_path(self):
        return 's3://{bucket}/{key}'.format(bucket=self.s3_bucket, key=self.s3_key)
dagster.make_python_type_usable_as_dagster_type(python_type, dagster_type)[source]

Take any existing python type and map it to a dagster type (generally created with DagsterType) This can only be called once on a given python type.

Testing Types

dagster.check_dagster_type(dagster_type, value)[source]

Test a custom Dagster type.

Parameters
  • dagster_type (Any) – The Dagster type to test. Should be one of the built-in types, a dagster type explicitly constructed with as_dagster_type(), @usable_as_dagster_type, or PythonObjectDagsterType(), or a Python type.

  • value (Any) – The runtime value to test.

Returns

The result of the type check.

Return type

TypeCheck

Examples

assert check_dagster_type(Dict[Any, Any], {'foo': 'bar'}).success