Guides
Step-by-step tutorials for common TDMP tasks and advanced workflows.
Getting Started
Quick Start: Your First Dataset
Time required: 10 minutes
This guide walks you through creating your first project and generating a dataset.
Steps:
-
Create a Project
- Go to Projects → New Project
- Name it "My First Project"
- Set visibility to Private
- Click Create
-
Upload a Schema
- Navigate to your project
- Go to Schemas tab
- Click "Upload Schema"
- Select an XSD or JSON Schema file
- Wait for validation
-
Create Constraints
- Open your schema
- Click "New Constraint Set"
- Define rules for key fields
- Save the constraint set
-
Generate Data
- Click "Generate Dataset"
- Select your constraint set
- Choose record count (start with 100)
- Click Generate
- Monitor progress in Datasets tab
-
Download Your Data
- Once complete, click the dataset
- Choose your export format
- Click Download
Setting Up Team Collaboration
Prerequisites: Admin or Project Owner role
Learn how to share projects and manage team access.
Steps:
-
Invite Team Members
- Go to Project Settings
- Click "Members" tab
- Enter user email/username
- Assign permission level
- Click Add
-
Configure Permissions
- Admin: Full project control
- Edit: Create/modify datasets
- View: Read-only access
-
Share Resources
- Save schemas to Library for team reuse
- Export constraint sets
- Share dataset links
-
Track Activity
- View audit logs (Admin only)
- Monitor who generated what
- Review changes history
Advanced Guides
Building Complex Constraint Sets
Master constraint configuration for realistic data generation.
Field-Level Constraints:
{
"field_name": "email",
"type": "string",
"constraints": {
"format": "email",
"unique": true,
"required": true,
"pattern": "^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$"
}
}
Common Patterns:
- Email addresses: Use
format: emailor custom regex - Phone numbers:
pattern: "^\\+?[1-9]\\d{1,14}$"(E.164 format) - Dates:
minDate,maxDate,format: "YYYY-MM-DD" - Numbers:
min,max,step,precision - Enums:
enum: ["value1", "value2", "value3"]
Cross-Field Relationships:
{
"relationships": [
{
"if": "country = 'USA'",
"then": "state IN ['CA', 'NY', 'TX', ...]"
}
]
}
Integrating with CI/CD
Automate test data generation in your pipelines.
GitHub Actions Example:
name: Generate Test Data
on:
push:
branches: [main, develop]
jobs:
generate-data:
runs-on: ubuntu-latest
steps:
- name: Generate Dataset
run: |
TOKEN=$(curl -X POST $TDMP_URL/users/token \
-d "username=${{ secrets.TDMP_USER }}&password=${{ secrets.TDMP_PASS }}" \
| jq -r '.access_token')
curl -X POST $TDMP_URL/datasets/create \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"project_id": 123, "schema_id": 456, "constraint_set_id": 789, "name": "CI Test Data", "record_count": 1000}'
Jenkins Pipeline:
pipeline {
agent any
stages {
stage('Generate Test Data') {
steps {
sh '''
token=$(curl -s -X POST ${TDMP_URL}/users/token \
-d "username=${TDMP_USER}&password=${TDMP_PASS}" \
| jq -r .access_token)
curl -X POST ${TDMP_URL}/datasets/create \
-H "Authorization: Bearer $token" \
-H "Content-Type: application/json" \
-d @dataset-config.json
'''
}
}
}
}
Optimizing Large Dataset Generation
Best practices for generating 10,000+ records efficiently.
Strategies:
-
Batch Generation
- Split into smaller chunks (10k records each)
- Generate in parallel if infrastructure allows
- Combine results post-processing
-
Simplify Constraints
- Use built-in formats instead of complex regex
- Reduce cross-field dependencies
- Cache reference data
-
Use Appropriate Formats
- CSV for simple tabular data
- Parquet for large analytical datasets
- JSON for nested/complex structures
-
Schedule Off-Peak
- Generate during low-usage hours
- Set up automated nightly jobs
- Use queue priorities
Schema Best Practices
Design schemas for optimal data generation.
Do's:
- ✅ Use clear, descriptive field names
- ✅ Define appropriate data types
- ✅ Include min/max length constraints
- ✅ Document field purposes
- ✅ Keep schemas modular and reusable
Don'ts:
- ❌ Overly complex nested structures
- ❌ Circular references
- ❌ Ambiguous field names
- ❌ Missing required field markers
- ❌ Unrealistic constraint combinations
Integration Guides
AWS S3 Export Setup
Configure direct export to Amazon S3.
Steps:
-
Create IAM User
aws iam create-user --user-name tdmp-export -
Attach S3 Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:PutObjectAcl"],
"Resource": "arn:aws:s3:::your-bucket/*"
}
]
} -
Configure in TDMP
- Go to Integrations
- Select AWS S3
- Enter Access Key ID and Secret
- Specify bucket name and region
- Test connection
MinIO Self-Hosted Storage
Set up MinIO for on-premise S3-compatible storage.
Docker Setup:
docker run -d \
-p 9000:9000 \
-p 9001:9001 \
-e MINIO_ROOT_USER=admin \
-e MINIO_ROOT_PASSWORD=password123 \
-v /mnt/data:/data \
minio/minio server /data --console-address ":9001"
TDMP Configuration:
- Endpoint:
http://localhost:9000 - Access Key: Your MinIO access key
- Secret Key: Your MinIO secret key
- Bucket: Create via MinIO console
Troubleshooting Guides
Debugging Failed Generations
When datasets fail to generate:
-
Check Task Logs
- Navigate to Dataset details
- View error messages
- Identify failing constraints
-
Validate Schema
- Ensure schema is well-formed
- Check for syntax errors
- Verify data type definitions
-
Review Constraints
- Look for conflicting rules
- Test with minimal constraints first
- Gradually add complexity
-
Check System Resources
- Monitor CPU/memory usage
- Check Celery worker status
- Review Redis connection
Common Error Messages
"Schema validation failed"
- Cause: Invalid XSD or JSON Schema syntax
- Solution: Validate schema with external tool, fix errors
"Constraint conflict detected"
- Cause: Impossible constraint combinations (e.g., min > max)
- Solution: Review constraint logic, adjust ranges
"Generation timeout"
- Cause: Complex constraints or too many records
- Solution: Reduce record count, simplify constraints
"Permission denied"
- Cause: Insufficient access rights
- Solution: Check project permissions, request access
Need More Help?
- Quick Reference - Tips and shortcuts
- FAQ - Common questions
- API Documentation - For automation
- Contact your administrator for support