Skip to main content

Guides

Step-by-step tutorials for common TDMP tasks and advanced workflows.

Getting Started

Quick Start: Your First Dataset

Time required: 10 minutes

This guide walks you through creating your first project and generating a dataset.

Steps:

  1. Create a Project

    • Go to Projects → New Project
    • Name it "My First Project"
    • Set visibility to Private
    • Click Create
  2. Upload a Schema

    • Navigate to your project
    • Go to Schemas tab
    • Click "Upload Schema"
    • Select an XSD or JSON Schema file
    • Wait for validation
  3. Create Constraints

    • Open your schema
    • Click "New Constraint Set"
    • Define rules for key fields
    • Save the constraint set
  4. Generate Data

    • Click "Generate Dataset"
    • Select your constraint set
    • Choose record count (start with 100)
    • Click Generate
    • Monitor progress in Datasets tab
  5. Download Your Data

    • Once complete, click the dataset
    • Choose your export format
    • Click Download

Setting Up Team Collaboration

Prerequisites: Admin or Project Owner role

Learn how to share projects and manage team access.

Steps:

  1. Invite Team Members

    • Go to Project Settings
    • Click "Members" tab
    • Enter user email/username
    • Assign permission level
    • Click Add
  2. Configure Permissions

    • Admin: Full project control
    • Edit: Create/modify datasets
    • View: Read-only access
  3. Share Resources

    • Save schemas to Library for team reuse
    • Export constraint sets
    • Share dataset links
  4. Track Activity

    • View audit logs (Admin only)
    • Monitor who generated what
    • Review changes history

Advanced Guides

Building Complex Constraint Sets

Master constraint configuration for realistic data generation.

Field-Level Constraints:

{
"field_name": "email",
"type": "string",
"constraints": {
"format": "email",
"unique": true,
"required": true,
"pattern": "^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$"
}
}

Common Patterns:

  • Email addresses: Use format: email or custom regex
  • Phone numbers: pattern: "^\\+?[1-9]\\d{1,14}$" (E.164 format)
  • Dates: minDate, maxDate, format: "YYYY-MM-DD"
  • Numbers: min, max, step, precision
  • Enums: enum: ["value1", "value2", "value3"]

Cross-Field Relationships:

{
"relationships": [
{
"if": "country = 'USA'",
"then": "state IN ['CA', 'NY', 'TX', ...]"
}
]
}

Integrating with CI/CD

Automate test data generation in your pipelines.

GitHub Actions Example:

name: Generate Test Data
on:
push:
branches: [main, develop]

jobs:
generate-data:
runs-on: ubuntu-latest
steps:
- name: Generate Dataset
run: |
TOKEN=$(curl -X POST $TDMP_URL/users/token \
-d "username=${{ secrets.TDMP_USER }}&password=${{ secrets.TDMP_PASS }}" \
| jq -r '.access_token')

curl -X POST $TDMP_URL/datasets/create \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"project_id": 123, "schema_id": 456, "constraint_set_id": 789, "name": "CI Test Data", "record_count": 1000}'

Jenkins Pipeline:

pipeline {
agent any
stages {
stage('Generate Test Data') {
steps {
sh '''
token=$(curl -s -X POST ${TDMP_URL}/users/token \
-d "username=${TDMP_USER}&password=${TDMP_PASS}" \
| jq -r .access_token)

curl -X POST ${TDMP_URL}/datasets/create \
-H "Authorization: Bearer $token" \
-H "Content-Type: application/json" \
-d @dataset-config.json
'''
}
}
}
}

Optimizing Large Dataset Generation

Best practices for generating 10,000+ records efficiently.

Strategies:

  1. Batch Generation

    • Split into smaller chunks (10k records each)
    • Generate in parallel if infrastructure allows
    • Combine results post-processing
  2. Simplify Constraints

    • Use built-in formats instead of complex regex
    • Reduce cross-field dependencies
    • Cache reference data
  3. Use Appropriate Formats

    • CSV for simple tabular data
    • Parquet for large analytical datasets
    • JSON for nested/complex structures
  4. Schedule Off-Peak

    • Generate during low-usage hours
    • Set up automated nightly jobs
    • Use queue priorities

Schema Best Practices

Design schemas for optimal data generation.

Do's:

  • ✅ Use clear, descriptive field names
  • ✅ Define appropriate data types
  • ✅ Include min/max length constraints
  • ✅ Document field purposes
  • ✅ Keep schemas modular and reusable

Don'ts:

  • ❌ Overly complex nested structures
  • ❌ Circular references
  • ❌ Ambiguous field names
  • ❌ Missing required field markers
  • ❌ Unrealistic constraint combinations

Integration Guides

AWS S3 Export Setup

Configure direct export to Amazon S3.

Steps:

  1. Create IAM User

    aws iam create-user --user-name tdmp-export
  2. Attach S3 Policy

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": ["s3:PutObject", "s3:PutObjectAcl"],
    "Resource": "arn:aws:s3:::your-bucket/*"
    }
    ]
    }
  3. Configure in TDMP

    • Go to Integrations
    • Select AWS S3
    • Enter Access Key ID and Secret
    • Specify bucket name and region
    • Test connection

MinIO Self-Hosted Storage

Set up MinIO for on-premise S3-compatible storage.

Docker Setup:

docker run -d \
-p 9000:9000 \
-p 9001:9001 \
-e MINIO_ROOT_USER=admin \
-e MINIO_ROOT_PASSWORD=password123 \
-v /mnt/data:/data \
minio/minio server /data --console-address ":9001"

TDMP Configuration:

  • Endpoint: http://localhost:9000
  • Access Key: Your MinIO access key
  • Secret Key: Your MinIO secret key
  • Bucket: Create via MinIO console

Troubleshooting Guides

Debugging Failed Generations

When datasets fail to generate:

  1. Check Task Logs

    • Navigate to Dataset details
    • View error messages
    • Identify failing constraints
  2. Validate Schema

    • Ensure schema is well-formed
    • Check for syntax errors
    • Verify data type definitions
  3. Review Constraints

    • Look for conflicting rules
    • Test with minimal constraints first
    • Gradually add complexity
  4. Check System Resources

    • Monitor CPU/memory usage
    • Check Celery worker status
    • Review Redis connection

Common Error Messages

"Schema validation failed"

  • Cause: Invalid XSD or JSON Schema syntax
  • Solution: Validate schema with external tool, fix errors

"Constraint conflict detected"

  • Cause: Impossible constraint combinations (e.g., min > max)
  • Solution: Review constraint logic, adjust ranges

"Generation timeout"

  • Cause: Complex constraints or too many records
  • Solution: Reduce record count, simplify constraints

"Permission denied"

  • Cause: Insufficient access rights
  • Solution: Check project permissions, request access

Need More Help?