Architecture Reference#

Hybrid architecture#

Dagster Cloud uses a hybrid architecture to ensure the security of user code and data.

The Dagster backend services, including the web frontend, GraphQL API, metadata database, and daemons (responsible for executing schedules and sensors), are hosted in Dagster Cloud. You are responsible for running an agent in your own environment.

When users interact with the web frontend, queries are made against the GraphQL API, or schedules and sensors tick, work is enqueued for your agent. Your agent polls the agent API to see if any work needs to be done, and launches user code as appropriate to fulfill requests. User code then streams metadata back to the agent API (GraphQL over HTTPS) so that it will be available in Dagster Cloud.

All user code runs within the customer’s environment, in isolation from Dagster system code.

Dagster Cloud Architecture Diagram

The Agent#

Because the agent communicates with the Dagster Cloud control plane over a well-defined agent API, it’s possible to support agents that operate in arbitrary compute environments. This means that over time, Dagster Cloud’s support for different user deployment environments will expand, and that custom agents can take advantage of bespoke compute environments (such as HPC).

Currently, there are four agents:

  • The local agent, which launches user code in operating system subprocesses.
  • The Kubernetes agent, which launches user code in Kubernetes Jobs and Services.
  • The ECS agent, which launches user code in ECS Tasks.
  • The Docker agent, which launches user code in Docker containers.

The Dagster team is actively developing other supported agents, targeting a range of managed container runtimes/PaaS platforms.

Security#

When Dagster Cloud needs to interact with user code (for instance, to display the structure of a job in the Dagster Cloud user interface, to run the body of a sensor definition, or to launch a run for a job), it enqueues a message for the Dagster Cloud Agent. The Dagster Cloud Agent picks this message up, and then launches or queries user code running on the appropriate compute substrate.

Depending on the Agent implementation, user code may run in isolated OS processes, in Docker containers, in ECS Tasks, in Kubernetes Jobs and Services, or in a custom isolation strategy.

Queries to user code run over a well-defined grpc interface. Dagster Cloud uses this interface:

  • To retrieve the names, config schemas, descriptions, tags, and structures of jobs, ops, repositories, partitions, schedules, and sensors defined in your code.
  • To evaluate schedule and sensor ticks and determine whether a run should be launched.

When the agent queries user code, it writes the response back to Dagster Cloud over a well-defined GraphQL interface.

Runs are launched by calling the dagster api CLI command (in a separate process/container as appropriate to the agent type). Run termination is handled by interrupting the user code process/container as appropriate for the compute substrate.

When runs are launched, the user code process/container streams structured metadata (containing everything that is viewable in the integrated logs viewer in the Dagster Cloud UI) back to Dagster Cloud over a well-defined GraphQL interface. Structured metadata is stored in AWS RDS, encrypted at rest.

At present, the run worker also uploads the compute logs (raw stdout and stderr from the runs) to Dagster Cloud. (Enabling compute log redirection to private customer storage, such as S3, is a roadmap feature.)

There is no ingress required from Dagster Cloud to user environments; all dataflow and network requests are unidirectional from user environments to Dagster Cloud.

Note: To ensure that user code remains completely isolated in the user environment, Dagster Cloud does not currently support previews of Dagstermill notebooks. Supporting these previews securely is a roadmap feature.