AI Media Gen

The AI Media Generator is a comprehensive and extensible feature that enables users to create AI-generated images and videos using various AI models from multiple providers. This feature provides a unified interface for generating media content with text prompts and optional reference images, supporting both system-provided models and custom models from any compatible AI service provider.

Understanding the AI Media Generator Feature

Purpose: The AI Media Generator provides TechMaju applications with a flexible framework for leveraging various AI models to create visual content (images and videos). It supports multiple AI service providers through a unified interface, allowing organizations to choose the best models for their specific needs. The system handles both synchronous (images) and asynchronous (videos) generation processes seamlessly.

Function: This feature allows users to: - Generate images using text-to-image (t2i) AI models from various providers - Generate videos using text-to-video (t2v) or image-to-video (i2v) AI models - Upload reference images for image-to-video generation - Configure and use custom AI models from any compatible provider - Track generation status and history across all model providers - Download generated media files - Monitor and cancel in-progress video generation tasks - Browse generation history with previews and metadata

Extensibility: The system is designed to support: - System Models: Pre-configured models ready to use out of the box - Custom Models: Integration with any AI provider through custom API configuration - Multiple Providers: Simultaneous use of models from different AI services - Flexible Authentication: Support for various API key and authentication methods

Roles Required

The following roles are required to access and use the AI Media Generator:

Role	Access Level	Description
TM AI Developer	Full Access	Primary role for using AI Media Generation features
System Manager	Full Access	Administrative access to all DocType operations

Note: The page "AI Media Generator" is accessible to both System Manager and TM AI Developer roles.

1. TM AI Generated File

This DocType stores records of all generated media files (images and videos) regardless of which AI provider was used.

TM AI Generated File Fields

Field	Type	Description
file_name	Data	Unique identifier for the generated file (auto-generated, read-only)
file_url	Data	URL path to the generated media file (hidden, read-only)
ai_model	Link	Reference to the TM AI Model used for generation (read-only)
type	Select	Media type: "Image" or "Video" (read-only)
status	Select	Generation status: Queued, Running, Cancelled, Succeeded, Failed
prompt	Text	Original text prompt used for generation
parameters	JSON	Generation parameters (resolution, guidance_scale, seed, reference images, provider-specific settings, etc.)
videogenid	Data	Task ID for video generation tracking (provider-specific)
file_preview	HTML	Visual preview of the generated media (displayed in form view)
error_message	Small Text	Error details if generation fails

Status Indicators

Status	Color	Description
Queued	Cyan	Initial state for video generation tasks waiting to start
Running	Blue	Video generation currently in progress
Cancelled	Pink	Generation task cancelled by user
Succeeded	Green	Generation completed successfully
Failed	Red	Generation failed due to an error

2. TM AI Model

This DocType defines available AI models from various providers for different tasks including media generation.

TM AI Model Fields

Field	Type	Description
model_name	Data	Display name of the model (unique, required)
model_id	Data	Provider's model identifier (e.g., "seedream-3-0-t2i-250415" for BytePlus)
model_type	Select	Model category: LLM, Image Gen, Video Gen, VLM
model_input	Select	Input type: Text, Image, or Text + Image
model_output	Select	Output type: Text, Image, or Video
active	Check	Enable/disable the model for use
issystemmodel	Check	Flag for pre-configured system models (read-only)
custom_model	Data	Custom model ID for third-party AI providers
customapikey	Password	API key for custom models (securely stored)
customapibase_url	Data	API base URL for custom model providers
application	Link	Reference to My Application

Model Configuration Options

The system supports two types of model configurations:

1. System Models: - Pre-configured models from default providers (currently BytePlus in MVP) - Use model_id field for model identifier - Authentication handled through site configuration - Marked with is_system_model = 1

2. Custom Models: - Models from any third-party AI provider - Use custom_model for model identifier - Use custom_api_key for authentication - Use custom_api_base_url for API endpoint - Supports OpenAI-compatible APIs or custom integrations

Model Type Support Matrix

Model Type	Naming Convention	Input Requirement	Use Case
Text-to-Image (t2i)	Contains "t2i" in model ID	Text only	Generate images from text prompts
Text-to-Video (t2v)	Contains "t2v" in model ID	Text only	Generate videos from text prompts
Image-to-Video (i2v)	Contains "i2v" in model ID	Text + Images (required)	Generate videos from text and reference images
Multimodal	model_input = "Text + Image"	Text + Images (optional)	Generate with optional reference images

Note: The exact capabilities depend on the specific AI provider and model being used.

Procedure

1. Navigate to AI Media Generator

Path: Home > AI Workspace > AI Media Generator

Or use the search bar: Type "AI Media Generator" and select the page.

2. Understanding the Interface

The AI Media Generator features a split-pane layout:

Left Pane: Current result preview and generation history table
Right Pane: Generation controls and configuration options
Responsive Design: Automatically adapts to mobile devices (single column on screens < 992px)
Resizable: Drag the divider handle to adjust pane sizes

3. Selecting Media Type

At the top of the right pane, choose the media type:
- Image: For generating static images (faster, typically synchronous)
- Video: For generating video clips (slower, typically asynchronous with polling)
The interface will update to show relevant models and options for your selection.

4. Selecting an AI Model

Click the Model dropdown
Choose from available active models:
- For Image mode: Only Image Gen (t2i) models are displayed
- For Video mode: Only Video Gen (t2v/i2v) models are displayed
Models from different providers may be listed together
Model capabilities (e.g., frame rate, duration) are displayed below the dropdown

Note: Available models depend on your system configuration and which providers have been set up by your administrator.

5. Providing Input (Reference Images)

Note: Input file requirements depend on the selected model type and provider capabilities.

When Input Files are REQUIRED (i2v models):

Select File Upload or URL Input tab
For File Upload:
- Click Choose Files or drag and drop
- Supported formats: .jpg, .jpeg, .png
- Maximum: 3 files (may vary by provider)
For URL Input:
- Enter image URL and click Add
- Repeat for multiple images (max 3)
Preview cards show uploaded/added images
Click × on any card to remove an image

When Input Files are OPTIONAL (Text + Image models):

Follow the same steps as above, but images are not required
You may proceed with text prompt only

When Input Files are HIDDEN (t2i/t2v models):

Input file section is not displayed
Only text prompt is needed

6. Writing the Prompt

In the Prompt text area, enter a detailed description of the media you want to generate
Best practices:
- Be specific and descriptive
- Include details about style, mood, colors, composition
- For videos: Describe the action or movement
- Example: "A serene sunset over a calm ocean with vibrant orange and pink clouds, photorealistic style"

Note: Optimal prompt structure may vary by AI provider and model. Consult your model's documentation for best results.

7. Configuring Parameters

Available parameters depend on the selected model and provider. Common parameters include:

For Image Generation:

Resolution: Select from available sizes (varies by provider)
Guidance Scale: Typically 1.0 to 10.0 (higher = more prompt adherence)
Seed: -1 for random, or specific number for reproducibility
Watermark: Enable/disable watermark on generated image
Additional Parameters: Provider-specific options may be available

For Video Generation:

Resolution: Select from available options (e.g., 480p, 720p, 1080p)
Aspect Ratio: Various ratios depending on provider support
Duration: Available durations in seconds
Frame Rate: Supported frame rates (e.g., 24, 30 fps)
Seed: -1 for random, or specific number for reproducibility
Watermark: Enable/disable watermark on generated video
Additional Parameters: Provider-specific options may be available

8. Generating Media

Review all inputs: model, prompt, reference images (if applicable), parameters
Click the Generate ✨ button
For Images (typically synchronous):
- Loading animation appears (border pulse effect)
- Generation completes based on provider speed (typically 30-90 seconds)
- Image appears in the left preview pane
- Status: "Succeeded" on completion
For Videos (typically asynchronous):
- Initial response is immediate with task ID
- Status starts as "Queued"
- Frontend polls status periodically
- Status progresses: Queued → Running → Succeeded
- Video appears in preview pane when completed
- Note: Video generation time varies by provider and parameters

9. Viewing Results

Preview Pane:
- Images display as <img> with full resolution
- Videos display with HTML5 player controls (play, pause, volume, fullscreen)
Generation Info:
- Model name and parameters displayed below preview
- Prompt text shown
- Resolution and generation timestamp

10. Downloading Generated Media

Click the Download button below the preview
File saves with smart filename format:
- Pattern: {model_name}_{YYYYMMDD_HHMMSS}_{hash}.{ext}
- Example: image_model_20260112_143022_abc12345.png
Files are also saved in Frappe File Manager:
- Images: "AI Generated Images/YYYY-MM/"
- Videos: "AI Generated Videos/YYYY-MM/"

11. Browsing Generation History

The History table displays the last 20 generations across all models and providers
Columns shown:
- Preview: Thumbnail of generated media
- Model: AI model used
- Prompt: First 50 characters of the prompt
- Resolution: Size or aspect ratio
- Status: Current generation status
- Created: Timestamp of generation
Click on any row to reload that result into the preview pane
Click column headers to sort the table

12. Cancelling Video Generation

For in-progress video tasks only (provider support required):

While status is "Queued" or "Running":
- A Cancel button appears in the preview pane
- Or open the TM AI Generated File document and click Cancel
Click Cancel to stop the generation
Status changes to "Cancelled"
No further polling occurs

Note: Cancellation support depends on the AI provider's API capabilities.

13. Accessing TM AI Generated File DocType (Advanced)

For administrative purposes or detailed tracking:

Navigate to: Home > TechMaju AI > TM AI Generated File
View all generated files with full metadata
Each document shows:
- File preview (image or video embedded)
- All generation parameters in JSON format
- Status tracking
- Error messages (if failed)
- Provider-specific task IDs (for videos)

Configuring AI Models (Administrator)

Adding System Models

For pre-configured provider models (currently BytePlus):

Navigate to: Home > TechMaju AI > TM AI Model
Click + Add TM AI Model
Fill in the fields:
- Model Name: Display name (e.g., "SeeDream 3.0 Image Generator")
- Model ID: Provider's model identifier
- Model Type: Select "Image Gen" or "Video Gen"
- Model Input: Select input type requirement
- Model Output: Select "Image" or "Video"
- Active: Check to enable
- Is System Model: Check for system models
Save the document

Note: System models use API credentials configured in site configuration.

Adding Custom Models

For third-party AI providers:

Navigate to: Home > TechMaju AI > TM AI Model
Click + Add TM AI Model
Fill in the fields:
- Model Name: Display name (e.g., "OpenAI DALL-E 3")
- Model Type: Select "Image Gen" or "Video Gen"
- Model Input: Select input type requirement
- Model Output: Select "Image" or "Video"
- Active: Check to enable
- Custom Model: Provider's model identifier
- Custom API Key: Your API key for the provider
- Custom API Base URL: Provider's API endpoint
Save the document

Supported Custom Providers: - OpenAI-compatible APIs (DALL-E, Stable Diffusion, etc.) - Any provider with compatible REST API - Custom in-house AI models

Requirements for Custom Models: - API must support text-to-image or text-to-video generation - Response format should be compatible (URL or base64) - Authentication via API key in headers

Architecture & Technical Details

File Storage Structure

Private Files/
├── AI Generated Images/
│   ├── 2026-01/
│   │   ├── model_20260112_143022_abc12345.png
│   │   └── ...
│   └── 2026-02/
└── AI Generated Videos/
    ├── 2026-01/
    │   ├── model_20260112_150030_xyz67890.mp4
    │   └── ...
    └── 2026-02/

API Endpoints

Endpoint	Method	Purpose
`techmaju_ai.api.ai_media_gen.get_media_models`	GET	Fetch available Image Gen and Video Gen models from all providers
`techmaju_ai.api.ai_media_gen.generate_media`	POST	Initiate image or video generation using specified model
`techmaju_ai.api.ai_media_gen.get_generation_history`	GET	Retrieve recent generation records across all models
`techmaju_ai.api.ai_media_gen.check_video_status`	GET	Poll video task status (provider-specific)
`techmaju_ai.api.ai_media_gen.cancel_video_generation`	POST	Cancel in-progress video task (provider-specific)

Provider Integration Architecture

The system uses a flexible integration layer that supports multiple AI providers:

Current Implementation (MVP): - Default Provider: BytePlus Ark API (Asia Pacific) - Base URL: https://ark.ap-southeast.bytepluses.com/api/v3 - Authentication: Bearer token from site configuration

Extensibility: - Custom Provider Support: Via customapibaseurl and customapi_key - Multiple Providers: Can use different providers simultaneously - Provider-Agnostic Interface: Unified user experience regardless of backend provider

Image Generation Flow: 1. User selects model (system or custom) 2. System routes to appropriate provider API 3. Synchronous or asynchronous processing based on provider 4. File downloaded and saved in unified storage 5. Metadata stored in TM AI Generated File

Video Generation Flow: 1. User selects model (system or custom) 2. System routes to appropriate provider API 3. Asynchronous task initiated with provider-specific task ID 4. Status polling via provider's status endpoint 5. Video downloaded when ready 6. Metadata stored with task tracking information

Security Features

Role-Based Access Control: All endpoints require TM AI Developer role
Secure Credential Storage: API keys encrypted in database
Signed URLs: Generated files use time-limited signed URLs for secure access
Parameter Validation: All inputs sanitized and validated before API calls
File Type Restrictions: Only safe file types accepted for uploads
Rate Limiting: Controlled by individual provider quotas

Best Practices

For Optimal Results

Prompt Engineering:
- Be descriptive and specific about desired output
- Include style descriptors appropriate for the model
- Specify composition, lighting, colors, and mood
- For videos: Describe the action or camera movement
- Note: Different AI models may interpret prompts differently - experiment with prompt styles
Model Selection:
- Choose models based on your specific use case and quality requirements
- System models are pre-configured and easier to use
- Custom models offer flexibility and choice of providers
- Consider cost, speed, and quality trade-offs
- Test multiple models to find the best fit for your needs
Reference Images (for i2v models):
- Use high-quality images (minimum 720p recommended)
- Ensure images are clear and well-lit
- Avoid copyrighted or sensitive content
- Maximum number of images depends on provider (typically 3)
Parameter Tuning:
- Start with default or recommended values
- Higher guidance scale typically = more prompt adherence but less creativity
- Use specific seeds to reproduce results
- Choose resolution based on use case (higher = better quality but slower/more expensive)
- Note: Parameter effects vary by model and provider
Video Generation:
- Expect variable wait times depending on provider and parameters
- Monitor status in history table
- Consider leaving tab open during generation for real-time updates
- Cancel tasks that are taking unexpectedly long

For System Administration

Model Management:
- Regularly review and update available models
- Test new models in development environment before production
- Deactivate deprecated, slow, or expensive models
- Monitor which models are most popular with users
- Consider cost-per-generation when adding models
Provider Configuration:
- For system models: Configure provider API keys in site configuration
- For custom models: Securely store API keys in TM AI Model documents
- Regularly rotate API keys for security
- Monitor API quotas and usage across all providers
- Set up alerts for quota limits or API failures
Storage Management:
- Monitor "AI Generated Images" and "AI Generated Videos" folders for disk usage
- Implement cleanup policies for old files (e.g., archive after 90 days)
- Consider external storage or CDN for serving media files
- Backup important generations before cleanup
Usage Monitoring:
- Review TM AI Usage Log regularly for cost tracking across all providers
- Analyze which models and providers are most cost-effective
- Set up alerts for high usage or errors
- Monitor failed generations to identify problematic models or prompts
Performance Optimization:
- Cache frequently generated prompts if applicable
- Implement queue management for high-traffic periods
- Use CDN for serving generated media files
- Consider load balancing across multiple providers
- Monitor response times by provider and model
Security:
- Regularly rotate all provider API keys
- Audit user generations for inappropriate content
- Ensure signed URL expiration is appropriate for your use case
- Restrict TM AI Developer role to trusted users only
- Review custom model configurations for security compliance

For Development

Testing:
- Test with non-production API keys in development
- Validate all model types before deploying to production
- Test error scenarios (API failures, invalid parameters, quota exceeded)
- Verify polling logic for asynchronous providers
- Test custom provider integration thoroughly
Adding New Providers:
- Document provider-specific API requirements
- Implement adapter pattern for provider-specific logic
- Test synchronous and asynchronous generation flows
- Verify error handling and status mapping
- Update documentation with provider-specific capabilities
Custom Models:
- Use custommodel, customapikey, and customapibaseurl fields
- Ensure custom providers follow compatible API contracts
- Document any provider-specific parameter requirements
- Implement proper error handling for provider-specific errors
- Test authentication and authorization flows
Integration:
- Use exposed API endpoints to integrate with other DocTypes
- Consider embedding generation functionality in workflows
- Leverage generation history for analytics and reporting
- Build on top of the extensible model management system

Troubleshooting

Common Issues

Issue	Possible Cause	Solution
"No models available"	No active Image Gen or Video Gen models configured	Activate models in TM AI Model DocType
"Reference images required"	Using i2v model without images	Upload at least one reference image
"Generation failed"	API error, quota exceeded, invalid parameters, provider issue	Check error_message field, verify API credentials and quotas
"Video stuck in Queued"	Provider API delay or failure	Wait, then check provider service status
"Download not working"	File URL expired or generation incomplete	Regenerate the media or check file_url field
"Permission denied"	User lacks TM AI Developer role	Assign TM AI Developer role to user
"Model not responding"	Custom model misconfigured or provider down	Verify customapibaseurl and customapi_key, check provider status
"Authentication failed"	Invalid or expired API key	Update API key in model configuration or site config

Provider-Specific Issues

For System Models: - Verify site configuration has correct provider API key - Check provider service status and quotas - Review provider's API documentation for changes

For Custom Models: - Verify customapibaseurl is correct and accessible - Check customapikey is valid and has sufficient quota - Ensure custom model ID matches provider's model identifier - Review provider-specific error messages in errormessage field

Getting Help

Check generation error messages in TM AI Generated File document
Review server logs: bench --site [site-name] console → Check frappe.log
Verify provider API status and quotas
For custom models: Test API credentials independently
Check provider-specific documentation for parameter requirements
Contact provider support for API-specific issues

Model Type	Naming Convention	Input Requirement	Use Case
Text-to-Image (t2i)	Contains "t2i" in model_id	Text only	Generate images from text prompts
Text-to-Video (t2v)	Contains "t2v" in model_id	Text only	Generate videos from text prompts
Image-to-Video (i2v)	Contains "i2v" in model_id	Text + Images (required)	Generate videos from text and reference images
Multimodal	model_input = "Text + Image"	Text + Images (optional)	Generate with optional reference images

Issue	Possible Cause	Solution
"No models available"	No active Image Gen or Video Gen models	Activate models in TM AI Model DocType
"Reference images required"	Using i2v model without images	Upload at least one reference image
"Generation failed"	API error, quota exceeded, invalid parameters	Check error_message field, verify API key and quota
"Video stuck in Queued"	BytePlus API delay or failure	Wait up to 60 seconds, check BytePlus service status
"Download not working"	File URL expired or not generated	Regenerate the media or check file_url field
"Permission denied"	User lacks TM AI Developer role	Assign TM AI Developer role to user

Parameter	Valid Values	Default
Resolution	512×512, 768×768, 1024×1024, 1024×1792, 1792×1024, 2048×2048	1024×1024
Guidance Scale	1.0 to 10.0	7.0
Seed	-1 (random) or any positive integer	-1
Watermark	true / false	false

Parameter	Valid Values	Default
Resolution	480p, 720p, 1080p	720p
Aspect Ratio	21:9, 16:9, 4:3, 1:1, 3:4, 9:16, 9:21, adaptive	adaptive
Duration	5, 10 (seconds)	10
Frame Rate	24 fps	24
Seed	-1 (random) or any positive integer	-1
Watermark	true / false	false