Technology Market Monitoring Platform

A platform providing a complete overview of the technological ambitions and capabilities of companies and institutional actors worldwide. Integrates patent data, job openings, news articles, commercial registry entries and more into a cohesive graph of the technology economy, built for a public sector client.

Overview

Built for a public sector client, this platform provides a complete, structured overview of the technological ambitions and previously acquired capabilities of companies and institutional actors across the world. The core challenge was pulling together heterogeneous, high-volume sources (patent data, job openings, news articles, commercial registry entries, and more) into a cohesive, queryable graph of the technological economy.

Architecture

Data ingestion is handled by Dagster pipelines, with each source modelled as a tracked asset with lineage, retry logic, and failure alerting. Raw source data lands in MinIO (S3-compatible blob storage), from where it feeds into a normalisation and entity-linking layer. dbt manages the transformation stages within a Postgres data warehouse, producing the clean, integrated records that form the basis of the graph. ArangoDB sits at the end of the pipeline for graph-based analysis, populated from the warehouse once the data is fully blended and resolved. The analyst-facing dashboarding and data export interface is built on ASP.NET Core and Angular. Deployment is automated via Ansible and GitLab CI/CD.

Source adapters (Python)
  → Dagster asset pipelines
  → MinIO (raw blob storage)
  → Postgres + dbt (warehouse transformation)
  → ArangoDB (graph analysis)
  → ASP.NET Core / Angular (dashboard & export)

The platform initially used ArangoDB as the primary backing store, a natural choice given the graph-oriented nature of the end goal. Over time it became clear that the ArangoDB ecosystem had not matured to the level of relational data warehouses: operational tooling, query optimisation, and community support all lagged behind. It also became apparent that the graph analysis itself was a downstream step; the bulk of the work was blending and normalising data, which Postgres handles with greater performance and lower maintenance overhead. Restructuring the pipeline to use Postgres and dbt for the warehouse layer reduced operational burden substantially, and ArangoDB was retained where it genuinely added value: traversal queries over the resolved graph.

What I Learned

  • Even when the end goal is graph-shaped, graph storage is not necessarily the right choice at every layer. Separating the data blending step from the graph analysis step opened up the option to use the best tool for each.
  • Adopting emerging technology carries hidden complexity costs that only surface under production load; the safer default is to reach for established tooling and reserve newer tech for problems it uniquely solves
  • Dagster’s asset-based model made it straightforward to reason about data freshness and re-materialise specific pipeline segments without full reruns.