AI Media Gen
The AI Media Generator is a comprehensive and extensible feature that enables users to create AI-generated images and videos using various AI models from multiple providers. This feature provides a unified interface for generating media content with text prompts and optional reference images, supporting both system-provided models and custom models from any compatible AI service provider.
Understanding the AI Media Generator Feature
Purpose: The AI Media Generator provides TechMaju applications with a flexible framework for leveraging various AI models to create visual content (images and videos). It supports multiple AI service providers through a unified interface, allowing organizations to choose the best models for their specific needs. The system handles both synchronous (images) and asynchronous (videos) generation processes seamlessly.
Function: This feature allows users to: - Generate images using text-to-image (t2i) AI models from various providers - Generate videos using text-to-video (t2v) or image-to-video (i2v) AI models - Upload reference images for image-to-video generation - Configure and use custom AI models from any compatible provider - Track generation status and history across all model providers - Download generated media files - Monitor and cancel in-progress video generation tasks - Browse generation history with previews and metadata
Extensibility: The system is designed to support: - System Models: Pre-configured models ready to use out of the box - Custom Models: Integration with any AI provider through custom API configuration - Multiple Providers: Simultaneous use of models from different AI services - Flexible Authentication: Support for various API key and authentication methods
Roles Required
The following roles are required to access and use the AI Media Generator:
| Role | Access Level | Description |
|---|---|---|
| TM AI Developer | Full Access | Primary role for using AI Media Generation features |
| System Manager | Full Access | Administrative access to all DocType operations |
Note: The page "AI Media Generator" is accessible to both System Manager and TM AI Developer roles.
Related DocTypes
1. TM AI Generated File
This DocType stores records of all generated media files (images and videos) regardless of which AI provider was used.
TM AI Generated File Fields
| Field | Type | Description |
|---|---|---|
| file_name | Data | Unique identifier for the generated file (auto-generated, read-only) |
| file_url | Data | URL path to the generated media file (hidden, read-only) |
| ai_model | Link | Reference to the TM AI Model used for generation (read-only) |
| type | Select | Media type: "Image" or "Video" (read-only) |
| status | Select | Generation status: Queued, Running, Cancelled, Succeeded, Failed |
| prompt | Text | Original text prompt used for generation |
| parameters | JSON | Generation parameters (resolution, guidance_scale, seed, reference images, provider-specific settings, etc.) |
| videogenid | Data | Task ID for video generation tracking (provider-specific) |
| file_preview | HTML | Visual preview of the generated media (displayed in form view) |
| error_message | Small Text | Error details if generation fails |
Status Indicators
| Status | Color | Description |
|---|---|---|
| Queued | Cyan | Initial state for video generation tasks waiting to start |
| Running | Blue | Video generation currently in progress |
| Cancelled | Pink | Generation task cancelled by user |
| Succeeded | Green | Generation completed successfully |
| Failed | Red | Generation failed due to an error |
2. TM AI Model
This DocType defines available AI models from various providers for different tasks including media generation.
TM AI Model Fields
| Field | Type | Description |
|---|---|---|
| model_name | Data | Display name of the model (unique, required) |
| model_id | Data | Provider's model identifier (e.g., "seedream-3-0-t2i-250415" for BytePlus) |
| model_type | Select | Model category: LLM, Image Gen, Video Gen, VLM |
| model_input | Select | Input type: Text, Image, or Text + Image |
| model_output | Select | Output type: Text, Image, or Video |
| active | Check | Enable/disable the model for use |
| issystemmodel | Check | Flag for pre-configured system models (read-only) |
| custom_model | Data | Custom model ID for third-party AI providers |
| customapikey | Password | API key for custom models (securely stored) |
| customapibase_url | Data | API base URL for custom model providers |
| application | Link | Reference to My Application |
Model Configuration Options
The system supports two types of model configurations:
1. System Models:
- Pre-configured models from default providers (currently BytePlus in MVP)
- Use model_id field for model identifier
- Authentication handled through site configuration
- Marked with is_system_model = 1
2. Custom Models:
- Models from any third-party AI provider
- Use custom_model for model identifier
- Use custom_api_key for authentication
- Use custom_api_base_url for API endpoint
- Supports OpenAI-compatible APIs or custom integrations
Model Type Support Matrix
| Model Type | Naming Convention | Input Requirement | Use Case |
|---|---|---|---|
| Text-to-Image (t2i) | Contains "t2i" in model ID | Text only | Generate images from text prompts |
| Text-to-Video (t2v) | Contains "t2v" in model ID | Text only | Generate videos from text prompts |
| Image-to-Video (i2v) | Contains "i2v" in model ID | Text + Images (required) | Generate videos from text and reference images |
| Multimodal | model_input = "Text + Image" | Text + Images (optional) | Generate with optional reference images |
Note: The exact capabilities depend on the specific AI provider and model being used.
Procedure
1. Navigate to AI Media Generator
Path: Home > AI Workspace > AI Media Generator
Or use the search bar: Type "AI Media Generator" and select the page.
2. Understanding the Interface
The AI Media Generator features a split-pane layout:
- Left Pane: Current result preview and generation history table
- Right Pane: Generation controls and configuration options
- Responsive Design: Automatically adapts to mobile devices (single column on screens < 992px)
- Resizable: Drag the divider handle to adjust pane sizes
3. Selecting Media Type
At the top of the right pane, choose the media type:
- Image: For generating static images (faster, typically synchronous)
- Video: For generating video clips (slower, typically asynchronous with polling)
The interface will update to show relevant models and options for your selection.
4. Selecting an AI Model
- Click the Model dropdown
- Choose from available active models:
- For Image mode: Only Image Gen (t2i) models are displayed
- For Video mode: Only Video Gen (t2v/i2v) models are displayed
- Models from different providers may be listed together
- Model capabilities (e.g., frame rate, duration) are displayed below the dropdown
Note: Available models depend on your system configuration and which providers have been set up by your administrator.
5. Providing Input (Reference Images)
Note: Input file requirements depend on the selected model type and provider capabilities.
When Input Files are REQUIRED (i2v models):
- Select File Upload or URL Input tab
- For File Upload:
- Click Choose Files or drag and drop
- Supported formats: .jpg, .jpeg, .png
- Maximum: 3 files (may vary by provider)
- For URL Input:
- Enter image URL and click Add
- Repeat for multiple images (max 3)
- Preview cards show uploaded/added images
- Click × on any card to remove an image
When Input Files are OPTIONAL (Text + Image models):
- Follow the same steps as above, but images are not required
- You may proceed with text prompt only
When Input Files are HIDDEN (t2i/t2v models):
- Input file section is not displayed
- Only text prompt is needed
6. Writing the Prompt
- In the Prompt text area, enter a detailed description of the media you want to generate
- Best practices:
- Be specific and descriptive
- Include details about style, mood, colors, composition
- For videos: Describe the action or movement
- Example: "A serene sunset over a calm ocean with vibrant orange and pink clouds, photorealistic style"
Note: Optimal prompt structure may vary by AI provider and model. Consult your model's documentation for best results.
7. Configuring Parameters
Available parameters depend on the selected model and provider. Common parameters include:
For Image Generation:
- Resolution: Select from available sizes (varies by provider)
- Guidance Scale: Typically 1.0 to 10.0 (higher = more prompt adherence)
- Seed: -1 for random, or specific number for reproducibility
- Watermark: Enable/disable watermark on generated image
- Additional Parameters: Provider-specific options may be available
For Video Generation:
- Resolution: Select from available options (e.g., 480p, 720p, 1080p)
- Aspect Ratio: Various ratios depending on provider support
- Duration: Available durations in seconds
- Frame Rate: Supported frame rates (e.g., 24, 30 fps)
- Seed: -1 for random, or specific number for reproducibility
- Watermark: Enable/disable watermark on generated video
- Additional Parameters: Provider-specific options may be available
8. Generating Media
- Review all inputs: model, prompt, reference images (if applicable), parameters
- Click the Generate ✨ button
- For Images (typically synchronous):
- Loading animation appears (border pulse effect)
- Generation completes based on provider speed (typically 30-90 seconds)
- Image appears in the left preview pane
- Status: "Succeeded" on completion
- For Videos (typically asynchronous):
- Initial response is immediate with task ID
- Status starts as "Queued"
- Frontend polls status periodically
- Status progresses: Queued → Running → Succeeded
- Video appears in preview pane when completed
- Note: Video generation time varies by provider and parameters
9. Viewing Results
Preview Pane:
- Images display as
<img>with full resolution - Videos display with HTML5 player controls (play, pause, volume, fullscreen)
- Images display as
Generation Info:
- Model name and parameters displayed below preview
- Prompt text shown
- Resolution and generation timestamp
10. Downloading Generated Media
- Click the Download button below the preview
- File saves with smart filename format:
- Pattern:
{model_name}_{YYYYMMDD_HHMMSS}_{hash}.{ext} - Example:
image_model_20260112_143022_abc12345.png
- Pattern:
- Files are also saved in Frappe File Manager:
- Images: "AI Generated Images/YYYY-MM/"
- Videos: "AI Generated Videos/YYYY-MM/"
11. Browsing Generation History
- The History table displays the last 20 generations across all models and providers
- Columns shown:
- Preview: Thumbnail of generated media
- Model: AI model used
- Prompt: First 50 characters of the prompt
- Resolution: Size or aspect ratio
- Status: Current generation status
- Created: Timestamp of generation
- Click on any row to reload that result into the preview pane
- Click column headers to sort the table
12. Cancelling Video Generation
For in-progress video tasks only (provider support required):
- While status is "Queued" or "Running":
- A Cancel button appears in the preview pane
- Or open the TM AI Generated File document and click Cancel
- Click Cancel to stop the generation
- Status changes to "Cancelled"
- No further polling occurs
Note: Cancellation support depends on the AI provider's API capabilities.
13. Accessing TM AI Generated File DocType (Advanced)
For administrative purposes or detailed tracking:
- Navigate to: Home > TechMaju AI > TM AI Generated File
- View all generated files with full metadata
- Each document shows:
- File preview (image or video embedded)
- All generation parameters in JSON format
- Status tracking
- Error messages (if failed)
- Provider-specific task IDs (for videos)
Configuring AI Models (Administrator)
Adding System Models
For pre-configured provider models (currently BytePlus):
- Navigate to: Home > TechMaju AI > TM AI Model
- Click + Add TM AI Model
- Fill in the fields:
- Model Name: Display name (e.g., "SeeDream 3.0 Image Generator")
- Model ID: Provider's model identifier
- Model Type: Select "Image Gen" or "Video Gen"
- Model Input: Select input type requirement
- Model Output: Select "Image" or "Video"
- Active: Check to enable
- Is System Model: Check for system models
- Save the document
Note: System models use API credentials configured in site configuration.
Adding Custom Models
For third-party AI providers:
- Navigate to: Home > TechMaju AI > TM AI Model
- Click + Add TM AI Model
- Fill in the fields:
- Model Name: Display name (e.g., "OpenAI DALL-E 3")
- Model Type: Select "Image Gen" or "Video Gen"
- Model Input: Select input type requirement
- Model Output: Select "Image" or "Video"
- Active: Check to enable
- Custom Model: Provider's model identifier
- Custom API Key: Your API key for the provider
- Custom API Base URL: Provider's API endpoint
- Save the document
Supported Custom Providers: - OpenAI-compatible APIs (DALL-E, Stable Diffusion, etc.) - Any provider with compatible REST API - Custom in-house AI models
Requirements for Custom Models: - API must support text-to-image or text-to-video generation - Response format should be compatible (URL or base64) - Authentication via API key in headers
Architecture & Technical Details
File Storage Structure
Private Files/
├── AI Generated Images/
│ ├── 2026-01/
│ │ ├── model_20260112_143022_abc12345.png
│ │ └── ...
│ └── 2026-02/
└── AI Generated Videos/
├── 2026-01/
│ ├── model_20260112_150030_xyz67890.mp4
│ └── ...
└── 2026-02/
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
techmaju_ai.api.ai_media_gen.get_media_models |
GET | Fetch available Image Gen and Video Gen models from all providers |
techmaju_ai.api.ai_media_gen.generate_media |
POST | Initiate image or video generation using specified model |
techmaju_ai.api.ai_media_gen.get_generation_history |
GET | Retrieve recent generation records across all models |
techmaju_ai.api.ai_media_gen.check_video_status |
GET | Poll video task status (provider-specific) |
techmaju_ai.api.ai_media_gen.cancel_video_generation |
POST | Cancel in-progress video task (provider-specific) |
Provider Integration Architecture
The system uses a flexible integration layer that supports multiple AI providers:
Current Implementation (MVP):
- Default Provider: BytePlus Ark API (Asia Pacific)
- Base URL: https://ark.ap-southeast.bytepluses.com/api/v3
- Authentication: Bearer token from site configuration
Extensibility: - Custom Provider Support: Via customapibaseurl and customapi_key - Multiple Providers: Can use different providers simultaneously - Provider-Agnostic Interface: Unified user experience regardless of backend provider
Image Generation Flow: 1. User selects model (system or custom) 2. System routes to appropriate provider API 3. Synchronous or asynchronous processing based on provider 4. File downloaded and saved in unified storage 5. Metadata stored in TM AI Generated File
Video Generation Flow: 1. User selects model (system or custom) 2. System routes to appropriate provider API 3. Asynchronous task initiated with provider-specific task ID 4. Status polling via provider's status endpoint 5. Video downloaded when ready 6. Metadata stored with task tracking information
Security Features
- Role-Based Access Control: All endpoints require TM AI Developer role
- Secure Credential Storage: API keys encrypted in database
- Signed URLs: Generated files use time-limited signed URLs for secure access
- Parameter Validation: All inputs sanitized and validated before API calls
- File Type Restrictions: Only safe file types accepted for uploads
- Rate Limiting: Controlled by individual provider quotas
Best Practices
For Optimal Results
Prompt Engineering:
- Be descriptive and specific about desired output
- Include style descriptors appropriate for the model
- Specify composition, lighting, colors, and mood
- For videos: Describe the action or camera movement
- Note: Different AI models may interpret prompts differently - experiment with prompt styles
Model Selection:
- Choose models based on your specific use case and quality requirements
- System models are pre-configured and easier to use
- Custom models offer flexibility and choice of providers
- Consider cost, speed, and quality trade-offs
- Test multiple models to find the best fit for your needs
Reference Images (for i2v models):
- Use high-quality images (minimum 720p recommended)
- Ensure images are clear and well-lit
- Avoid copyrighted or sensitive content
- Maximum number of images depends on provider (typically 3)
Parameter Tuning:
- Start with default or recommended values
- Higher guidance scale typically = more prompt adherence but less creativity
- Use specific seeds to reproduce results
- Choose resolution based on use case (higher = better quality but slower/more expensive)
- Note: Parameter effects vary by model and provider
Video Generation:
- Expect variable wait times depending on provider and parameters
- Monitor status in history table
- Consider leaving tab open during generation for real-time updates
- Cancel tasks that are taking unexpectedly long
For System Administration
Model Management:
- Regularly review and update available models
- Test new models in development environment before production
- Deactivate deprecated, slow, or expensive models
- Monitor which models are most popular with users
- Consider cost-per-generation when adding models
Provider Configuration:
- For system models: Configure provider API keys in site configuration
- For custom models: Securely store API keys in TM AI Model documents
- Regularly rotate API keys for security
- Monitor API quotas and usage across all providers
- Set up alerts for quota limits or API failures
Storage Management:
- Monitor "AI Generated Images" and "AI Generated Videos" folders for disk usage
- Implement cleanup policies for old files (e.g., archive after 90 days)
- Consider external storage or CDN for serving media files
- Backup important generations before cleanup
Usage Monitoring:
- Review TM AI Usage Log regularly for cost tracking across all providers
- Analyze which models and providers are most cost-effective
- Set up alerts for high usage or errors
- Monitor failed generations to identify problematic models or prompts
Performance Optimization:
- Cache frequently generated prompts if applicable
- Implement queue management for high-traffic periods
- Use CDN for serving generated media files
- Consider load balancing across multiple providers
- Monitor response times by provider and model
Security:
- Regularly rotate all provider API keys
- Audit user generations for inappropriate content
- Ensure signed URL expiration is appropriate for your use case
- Restrict TM AI Developer role to trusted users only
- Review custom model configurations for security compliance
For Development
Testing:
- Test with non-production API keys in development
- Validate all model types before deploying to production
- Test error scenarios (API failures, invalid parameters, quota exceeded)
- Verify polling logic for asynchronous providers
- Test custom provider integration thoroughly
Adding New Providers:
- Document provider-specific API requirements
- Implement adapter pattern for provider-specific logic
- Test synchronous and asynchronous generation flows
- Verify error handling and status mapping
- Update documentation with provider-specific capabilities
Custom Models:
- Use custommodel, customapikey, and customapibaseurl fields
- Ensure custom providers follow compatible API contracts
- Document any provider-specific parameter requirements
- Implement proper error handling for provider-specific errors
- Test authentication and authorization flows
Integration:
- Use exposed API endpoints to integrate with other DocTypes
- Consider embedding generation functionality in workflows
- Leverage generation history for analytics and reporting
- Build on top of the extensible model management system
Troubleshooting
Common Issues
| Issue | Possible Cause | Solution |
|---|---|---|
| "No models available" | No active Image Gen or Video Gen models configured | Activate models in TM AI Model DocType |
| "Reference images required" | Using i2v model without images | Upload at least one reference image |
| "Generation failed" | API error, quota exceeded, invalid parameters, provider issue | Check error_message field, verify API credentials and quotas |
| "Video stuck in Queued" | Provider API delay or failure | Wait, then check provider service status |
| "Download not working" | File URL expired or generation incomplete | Regenerate the media or check file_url field |
| "Permission denied" | User lacks TM AI Developer role | Assign TM AI Developer role to user |
| "Model not responding" | Custom model misconfigured or provider down | Verify customapibaseurl and customapi_key, check provider status |
| "Authentication failed" | Invalid or expired API key | Update API key in model configuration or site config |
Provider-Specific Issues
For System Models: - Verify site configuration has correct provider API key - Check provider service status and quotas - Review provider's API documentation for changes
For Custom Models: - Verify customapibaseurl is correct and accessible - Check customapikey is valid and has sufficient quota - Ensure custom model ID matches provider's model identifier - Review provider-specific error messages in errormessage field
Getting Help
- Check generation error messages in TM AI Generated File document
- Review server logs:
bench --site [site-name] console→ Check frappe.log - Verify provider API status and quotas
- For custom models: Test API credentials independently
- Check provider-specific documentation for parameter requirements
- Contact provider support for API-specific issues