Data Prep Kit

Data Prep Kit

The Data Prep Kit (DPK) is an open-source toolkit developed by IBM Research to streamline the preparation of unstructured data for Large Language Model (LLM) applications. It addresses the challenges of processing diverse data types—such as text and code—for tasks like fine-tuning, instruction-tuning, and retrieval-augmented generation (RAG).

Web site

Github repository

Tech tags:

In productions with: