Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
arrow-schema
Scalable datastore for metrics, events, and real-time analytics
A high-performance observability data pipeline.
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Incremental engine for long horizon agents 🌟 Star if you like it!
Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.
Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real time, at scale.
Simple, Elastic-quality search for Postgres
Apache DataFusion SQL Query Engine
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
The open-source Observability 2.0 database. One engine for metrics, logs, and traces — replacing Prometheus, Loki & ES.
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
DORA (Dataflow-Oriented Robotic Architecture) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed dataflow capabilities. Applications are modeled as directed graphs, also referred to as pipelines.
Official Rust implementation of Apache Arrow
A native Rust library for Delta Lake, with bindings into Python
An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
Parseable is an observability datalake built from first principles.
The Feldera Incremental Computation Engine
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing
A cloud-native open source distributed time series database with high performance, high compression ratio and high availability.
Communication infrastructure for the AI era — one binary, one broker, one storage layer, any protocol
Tonbo is an embedded database for serverless and edge runtimes.
Apache Iceberg
Postgres Foreign Data Wrapper development framework in Rust.
AI-Native & Cloud-Native FS: A high-performance file semantic layer for cloud object storage, integrated with high-speed cache
Intuitive Data Workflows
Rust-based WebAssembly bindings to read and write Apache Parquet data
Scalable graph analytics database powered by a multithreaded, vectorized temporal engine, written in Rust
Local-first ETL/ELT studio: a drag-and-drop visual pipeline designer that compiles to SQL and runs on DuckDB. Tiny desktop app, no servers, git-friendly workspaces.
Analytical database for data-driven Web applications 🪶
A single-node analytical database engine with geospatial as a first-class citizen
GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations
Protocol and libraries for sending and receiving OpenTelemetry data using Apache Arrow
Local-first ETL/ELT studio: a drag-and-drop visual pipeline designer that compiles to SQL and runs on DuckDB. Tiny desktop app, no servers, git-friendly workspaces.
Lakehouse native graph engine with git-style workflows
View parquet files online
Apache Paimon Rust The rust implementation of Apache Paimon.
A timeseries database created for events, logs, traces and metrics. Speaks the postgres dialect, and stores data in s3 via delta lake protocol
A command-line tool for querying databases
Run Graph Queries with Lance
DuckLake took Flight. Welcome to SwanLake.
Postgres protocol frontend for DataFusion
On-device property graph database. Schema-as-code. One CLI → One Folder. No Server. Think: DuckDB for graphs.
The power of Rust for the STAC ecosystem
Apache Arrow and Polars compatible, Rust-first columnar data library for real-time and systems workloads
Databricks's Zerobus Ingest SDKs
Manage Multimodal Agentic Context Lifecycle with Lance
High-performance, DSL-free stream processing