Guides

Step-by-step tutorials for common TDMP tasks and advanced workflows.

Getting Started

Quick Start: Your First Dataset

Time required: 10 minutes

This guide walks you through creating your first project and generating a dataset.

Steps:

Create a Project
- Go to Projects → New Project
- Name it "My First Project"
- Set visibility to Private
- Click Create
Upload a Schema
- Navigate to your project
- Go to Schemas tab
- Click "Upload Schema"
- Select an XSD or JSON Schema file
- Wait for validation
Create Constraints
- Open your schema
- Click "New Constraint Set"
- Define rules for key fields
- Save the constraint set
Generate Data
- Click "Generate Dataset"
- Select your constraint set
- Choose record count (start with 100)
- Click Generate
- Monitor progress in Datasets tab
Download Your Data
- Once complete, click the dataset
- Choose your export format
- Click Download

Setting Up Team Collaboration

Prerequisites: Admin or Project Owner role

Learn how to share projects and manage team access.

Steps:

Invite Team Members
- Go to Project Settings
- Click "Members" tab
- Enter user email/username
- Assign permission level
- Click Add
Configure Permissions
- Admin: Full project control
- Edit: Create/modify datasets
- View: Read-only access
Share Resources
- Save schemas to Library for team reuse
- Export constraint sets
- Share dataset links
Track Activity
- View audit logs (Admin only)
- Monitor who generated what
- Review changes history

Advanced Guides

Building Complex Constraint Sets

Master constraint configuration for realistic data generation.

Field-Level Constraints:

{
  "field_name": "email",
  "type": "string",
  "constraints": {
    "format": "email",
    "unique": true,
    "required": true,
    "pattern": "^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$"
  }
}

Common Patterns:

Email addresses: Use format: email or custom regex
Phone numbers: pattern: "^\\+?[1-9]\\d{1,14}$" (E.164 format)
Dates: minDate, maxDate, format: "YYYY-MM-DD"
Numbers: min, max, step, precision
Enums: enum: ["value1", "value2", "value3"]

Cross-Field Relationships:

{
  "relationships": [
    {
      "if": "country = 'USA'",
      "then": "state IN ['CA', 'NY', 'TX', ...]"
    }
  ]
}

Integrating with CI/CD

Automate test data generation in your pipelines.

GitHub Actions Example:

name: Generate Test Data
on:
  push:
    branches: [main, develop]

jobs:
  generate-data:
    runs-on: ubuntu-latest
    steps:
      - name: Generate Dataset
        run: |
          TOKEN=$(curl -X POST $TDMP_URL/users/token \
            -d "username=${{ secrets.TDMP_USER }}&password=${{ secrets.TDMP_PASS }}" \
            | jq -r '.access_token')

          curl -X POST $TDMP_URL/datasets/create \
            -H "Authorization: Bearer $TOKEN" \
            -H "Content-Type: application/json" \
            -d '{"project_id": 123, "schema_id": 456, "constraint_set_id": 789, "name": "CI Test Data", "record_count": 1000}'

Jenkins Pipeline:

pipeline {
  agent any
  stages {
    stage('Generate Test Data') {
      steps {
        sh '''
          token=$(curl -s -X POST ${TDMP_URL}/users/token \
            -d "username=${TDMP_USER}&password=${TDMP_PASS}" \
            | jq -r .access_token)

          curl -X POST ${TDMP_URL}/datasets/create \
            -H "Authorization: Bearer $token" \
            -H "Content-Type: application/json" \
            -d @dataset-config.json
        '''
      }
    }
  }
}

Optimizing Large Dataset Generation

Best practices for generating 10,000+ records efficiently.

Strategies:

Batch Generation
- Split into smaller chunks (10k records each)
- Generate in parallel if infrastructure allows
- Combine results post-processing
Simplify Constraints
- Use built-in formats instead of complex regex
- Reduce cross-field dependencies
- Cache reference data
Use Appropriate Formats
- CSV for simple tabular data
- Parquet for large analytical datasets
- JSON for nested/complex structures
Schedule Off-Peak
- Generate during low-usage hours
- Set up automated nightly jobs
- Use queue priorities

Schema Best Practices

Design schemas for optimal data generation.

Do's:

✅ Use clear, descriptive field names
✅ Define appropriate data types
✅ Include min/max length constraints
✅ Document field purposes
✅ Keep schemas modular and reusable

Don'ts:

❌ Overly complex nested structures
❌ Circular references
❌ Ambiguous field names
❌ Missing required field markers
❌ Unrealistic constraint combinations

Integration Guides

AWS S3 Export Setup

Configure direct export to Amazon S3.

Steps:

Create IAM User

aws iam create-user --user-name tdmp-export

Attach S3 Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:PutObjectAcl"],
      "Resource": "arn:aws:s3:::your-bucket/*"
    }
  ]
}

Configure in TDMP
- Go to Integrations
- Select AWS S3
- Enter Access Key ID and Secret
- Specify bucket name and region
- Test connection

MinIO Self-Hosted Storage

Set up MinIO for on-premise S3-compatible storage.

Docker Setup:

docker run -d \
  -p 9000:9000 \
  -p 9001:9001 \
  -e MINIO_ROOT_USER=admin \
  -e MINIO_ROOT_PASSWORD=password123 \
  -v /mnt/data:/data \
  minio/minio server /data --console-address ":9001"

TDMP Configuration:

Endpoint: http://localhost:9000
Access Key: Your MinIO access key
Secret Key: Your MinIO secret key
Bucket: Create via MinIO console

Troubleshooting Guides

Debugging Failed Generations

When datasets fail to generate:

Check Task Logs
- Navigate to Dataset details
- View error messages
- Identify failing constraints
Validate Schema
- Ensure schema is well-formed
- Check for syntax errors
- Verify data type definitions
Review Constraints
- Look for conflicting rules
- Test with minimal constraints first
- Gradually add complexity
Check System Resources
- Monitor CPU/memory usage
- Check Celery worker status
- Review Redis connection

Common Error Messages

"Schema validation failed"

Cause: Invalid XSD or JSON Schema syntax
Solution: Validate schema with external tool, fix errors

"Constraint conflict detected"

Cause: Impossible constraint combinations (e.g., min > max)
Solution: Review constraint logic, adjust ranges

"Generation timeout"

Cause: Complex constraints or too many records
Solution: Reduce record count, simplify constraints

"Permission denied"

Cause: Insufficient access rights
Solution: Check project permissions, request access

Need More Help?

Quick Reference - Tips and shortcuts
FAQ - Common questions
API Documentation - For automation
Contact your administrator for support

Getting Started​

Quick Start: Your First Dataset​

Setting Up Team Collaboration​

Advanced Guides​

Building Complex Constraint Sets​

Integrating with CI/CD​

Optimizing Large Dataset Generation​

Schema Best Practices​

Integration Guides​

AWS S3 Export Setup​

MinIO Self-Hosted Storage​

Troubleshooting Guides​

Debugging Failed Generations​

Common Error Messages​

Need More Help?​

Getting Started

Quick Start: Your First Dataset

Setting Up Team Collaboration

Advanced Guides

Building Complex Constraint Sets

Integrating with CI/CD

Optimizing Large Dataset Generation

Schema Best Practices

Integration Guides

AWS S3 Export Setup

MinIO Self-Hosted Storage

Troubleshooting Guides

Debugging Failed Generations

Common Error Messages

Need More Help?