⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content
@OpenDCAI

OpenDCAI

Define the future of Data-centric AI together

OpenDCAI

Website Google Scholar X Bilibili RedNote Stars Followers

👋 Welcome

✨We are dedicated to advancing research and open-source tools in Data-Centric Artificial Intelligence (DCAI).✨

🚀Our goal is to develop effective and efficient DCAI systems and algorithms that support and enhance the performance of AI models and applications.

🤝 Community

QR_en

Pinned Loading

  1. DataFlow DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    Python 2.9k 181

  2. MyScaleDB MyScaleDB Public

    Forked from OriginHubAI/MyScaleDB

    AI Database for unified, scalable SQL + vector data management, search and analytics

    C++ 40 1

  3. DataFlex DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    Python 116 10

  4. Paper2Any Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    Python 1.4k 74

Repositories

Showing 10 of 26 repositories
  • leonai Public
    OpenDCAI/leonai’s past year of commit activity
    Python 16 MIT 1 0 0 Updated Feb 7, 2026
  • DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    OpenDCAI/DataFlow’s past year of commit activity
    Python 2,866 Apache-2.0 181 11 4 Updated Feb 7, 2026
  • OpenDCAI/DataFlow-WebUI’s past year of commit activity
    Python 15 11 0 0 Updated Feb 6, 2026
  • DataFlow-Doc Public

    Documentation for DataFlow, Data-centric AI system for LLM.

    OpenDCAI/DataFlow-Doc’s past year of commit activity
    Python 11 28 4 0 Updated Feb 6, 2026
  • Flash-MinerU Public

    Ray-based accelerator for MinerU VLM inference pipeline. Lightweight, multi-GPU friendly PDF → Markdown processing. 基于 Ray 的 MinerU VLM 推理加速器,轻量、低侵入,面向多 GPU / 国产算力环境的 PDF → Markdown 处理方案。

    OpenDCAI/Flash-MinerU’s past year of commit activity
    Python 17 AGPL-3.0 3 0 1 Updated Feb 5, 2026
  • Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    OpenDCAI/Paper2Any’s past year of commit activity
    Python 1,369 Apache-2.0 74 5 0 Updated Feb 5, 2026
  • OpenDCAI/AgentFlow-Doc’s past year of commit activity
    0 0 0 1 Updated Feb 4, 2026
  • DataFlow-Agent Public

    Agent for DataFlow: Automatic Data Workflow Design

    OpenDCAI/DataFlow-Agent’s past year of commit activity
    Python 49 Apache-2.0 8 1 1 Updated Feb 5, 2026
  • DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    OpenDCAI/DataFlex’s past year of commit activity
    Python 116 10 0 0 Updated Feb 4, 2026
  • DataFlow-MM Public

    Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.

    OpenDCAI/DataFlow-MM’s past year of commit activity
    Python 28 Apache-2.0 16 2 2 Updated Feb 3, 2026

Top languages

Loading…

Most used topics

Loading…