DocSynth Tools
Last Update: 05/02/2025Overview
DocSynth Tools is a utility suite designed to process, convert, and manage QA pair datasets generated by either the DocQA Generator or SynthQA Generator. These tools streamline dataset preparation for AI model training across various platforms.

Available Tools
Doc Format Converter
Convert QA datasets between supported output formats to ensure compatibility with different training pipelines.
ViewDoc Merge
Combine multiple QA files into a single, consolidated JSONL file for streamlined processing and analysis.
ViewDoc Validation
Split a QA dataset into training and validation sets, a critical step for machine learning model development.
ViewDoc Splitter
Divide a large QA dataset into smaller, manageable files for distributed processing or modular testing.
ViewProcess Flow
1. Generate
Create QA pairs with DocQA or SynthQA
2. Process
Manage and transform with DocSynth Tools
3. Train
Use datasets for AI model training