Deployment Overview
Helm Chart
Marquez uses Helm to manage deployments onto Kubernetes in a cloud environment. The chart and templates for the HTTP API server and Web UI are maintained in the Marquez repository and can be found in the chart directory. The chart's base values.yaml
file includes an option to easily override deployment settings.
Note: The Marquez HTTP API server and Web UI images are publshed to DockerHub.
Database
The Marquez HTTP API server relies only on PostgreSQL to store dataset, job, and run metadata allowing for minimal operational overhead. We recommend a cloud provided databases, such as AWS RDS, when deploying Marquez onto Kubernetes.
Architecture
DOCKER
Figure 1: Minimal Marquez deployment via Docker.
KUBERNETES
Figure 2: Marquez deployment via Kubernetes.
COMPONENTS
Component | Image | Description |
---|---|---|
Marquez Web UI | marquezproject/marquez-web | The web UI used to view metadata. |
Marquez HTTP API | marquezproject/marquez | The core API used to collect metadata using OpenLineage. |
Database | bitnami/postgresql or cloud provided | A PostgreSQL instance used to store metadata. |
Scheduler | User-provided | A scheduler used to run a workflow on a particular schedule (ex: Airflow) |
Workflow | User-provided | A workflow using an OpenLineage integration to send lineage metadata to Marquez. |
Authentication
Our clients support authentication by automatically sending an API key on each request via Bearer Auth when configured on client instantiation. By default, the Marquez HTTP API does not require any form of authentication or authorization.
Next Steps
The following guides will help you and your team effectively deploy and manage Marquez in a cloud environment: