lakeFS (By Treeverse)

lakeFS brings Git-like capabilities to your data lake, enabling software engineering best practices for data.

Contact a Red Hatter
Overview

lakeFS using Git-like operations to help you manage data versioning at scale

lakeFS: Git for Data

Git revolutionized software development by supporting essential engineering best practices like collaboration, testing in isolation, and enabling version control. lakeFS brings these same principles to data, empowering data teams with Git-like capabilities to manage their data lakes effectively.

Through an intuitive versioning engine, lakeFS introduces operations familiar from Git:

  • Branch: Create consistent, isolated copies of repositories without duplicating data.
  • Commit: Snapshot entire repositories immutably for full traceability.
  • Merge: Atomically integrate changes across branches.
  • Revert: Roll back to a prior state of a repository with ease.
  • Tag: Assign meaningful labels to immutable commits for quick reference.

In addition to revolutionizing data management, lakeFS enhances AI workflows by addressing critical challenges:

  • Solving Inefficiencies in AI Model Development: Traditional workflows waste up to 80% of the time on data plumbing. lakeFS synchronizes data, code, and models in a centralized system, reducing time spent on data access and preparation.
  • Lowering Costs and Accelerating Training: lakeFS ensures fast, consistent data loading across environments, from dev to cloud GPUs, cutting costs and speeding up model training.
  • Achieving Full ML Reproducibility: lakeFS provides complete traceability, enabling data scientists to debug and conduct root cause analysis with clarity by linking models to the exact data used during training.

By combining software engineering rigor with AI-first innovations, lakeFS is the ultimate tool for managing data lakes at scale while empowering faster, more efficient AI-driven insights.

Moving to a data branching solution has paid off quickly for us. A few days after completing the migration, we’ve already reduced testing time by 80% on two different projects.

Ryan GreenCTO, Enigma Technologies

Benefits to working with lakeFS (By Treeverse)

Solve AI inefficiencies

Centralize data, code, and models to streamline AI workflows and reduce plumbing efforts.

Reduce model training costs

Load data consistently across environments, cutting costs and time for training models.

Debug faster with reproducibility

Achieve ML reproducibility by linking models to the exact data used during training.

Accelerate data operations

Manage data like code with intuitive Git-like operations like branching and committing.


Interested in working with this partner?

Contact a Red Hatter
OfferingsResources
Red Hat logoLinkedInYouTubeFacebookTwitter

Platforms

Products & services

Try, buy, sell

Help

About Red Hat Ecosystem Catalog

The Red Hat Ecosystem Catalog is the official source for discovering and learning more about the Red Hat Ecosystem of both Red Hat and certified third-party products and services.

We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.