WebQA Generator

Last Update: 05/02/2025

Overview

The WebQA Generator enables users to extract content directly from web pages and generate high-quality question-answer (QA) pairs. This tool is ideal for building datasets from online articles, documentation, blogs, or any publicly accessible web-based resource.

WebQA Generator Overview

Configuration Options

URL Configuration

Input one or more URLs to extract content from. Each page will be processed in the same session.

Example:

https://example.com
https://docs.example.org/guide

Chunk Size

Default: 1900 characters

Splits the web content into smaller segments.

  • Smaller chunks → More specific QA pairs
  • Larger chunks → Broader contextual questions

Questions per Chunk

Default: 3

Specifies how many QA pairs to generate per chunk of text.

Auto-Download Dataset

Automatically downloads the generated QA pairs as a file when enabled.

Auto-Save to Database

Saves the generated QA pairs directly to a connected database when enabled.

Output Formats

Choose from the following supported formats:

DocSynth Single-Turn Format

Simple question-answer pairs in a single-turn structure.

Format: json

{
  "conversations": [
    {"from": "human", "value": "Q"},
    {"from": "assistant", "value": "A"}
  ]
}

QA Format

Minimalistic format with standalone QA pairs.

Format: json

{
  "question": "Q",
  "answer": "A"
}

OpenAI Format

Compatible with OpenAI's fine-tuning format, includes system-level guidance.

Format: json

{
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "Q"},
    {"role": "assistant", "content": "A"}
  ]
}

OpenAI Format – System Message Configuration

Define the system role's content in the OpenAI format to control the tone, behavior, or focus of generated QA pairs. This message provides initial context for the assistant.

Example:

{
  "messages": [
    {
      "role": "system",
      "content": "Generate QA pairs that summarize key technical concepts in a concise and professional manner."
    },
    {
      "role": "user",
      "content": "Q"
    },
    {
      "role": "assistant",
      "content": "A"
    }
  ]
}

This can be uniquely crafted to suit OpenAI-specific training pipelines.

Actions

Generate Q&A Pairs

Starts processing the listed URLs using the configured chunking, output, and generation settings.

Start Over

Resets all fields and settings to their default state, clearing the URL list for a fresh session.