Doc Format Converter

Last Update: 05/02/2025

Overview

Convert QA datasets between supported output formats to ensure compatibility with different training pipelines.

Supported Formats

DocSynth Single-Turn Format

DocSynth Single-Turn Format

{
  "conversations": [
    {"from": "human", "value": "Q"},
    {"from": "assistant", "value": "A"}
  ]
}

QA Format

{
  "question": "Q",
  "answer": "A"
}

OpenAI Format

{
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "Q"},
    {"role": "assistant", "content": "A"}
  ]
}

Use Case

Adapt datasets from one structure to another to meet the input requirements of various machine learning models. For example, converting from DocSynth Single-Turn Format to OpenAI Format makes your dataset compatible with OpenAI fine-tuning APIs.

Functionality Overview

  1. Upload a QA dataset file in any supported format.
  2. Choose a target output format.
  3. The tool automatically restructures the file to match the target format's schema.
    • Example: Converting from QA Format to OpenAI Format includes adding a system role and wrapping QA pairs in message objects.

Example Conversion

Input (DocSynth Single-Turn Format)

{
  "conversations": [
    {
      "from": "human",
      "value": "What is AI?"
    },
    {
      "from": "assistant",
      "value": "AI is the simulation of human intelligence in machines."
    }
  ]
}

Converted Output (QA Format)

{
  "question": "What is AI?",
  "answer": "AI is the simulation of human intelligence in machines."
}

Output

A downloadable file in the selected target format, preserving the semantic content of the original QA pairs while adapting the structure.