AI Media Gen

The AI Media Generator is a comprehensive and extensible feature that enables users to create AI-generated images and videos using various AI models from multiple providers. This feature provides a unified interface for generating media content with text prompts and optional reference images, supporting both system-provided models and custom models from any compatible AI service provider.

Understanding the AI Media Generator Feature

Purpose: The AI Media Generator provides TechMaju applications with a flexible framework for leveraging various AI models to create visual content (images and videos). It supports multiple AI service providers through a unified interface, allowing organizations to choose the best models for their specific needs. The system handles both synchronous (images) and asynchronous (videos) generation processes seamlessly.

Function: This feature allows users to: - Generate images using text-to-image (t2i) AI models from various providers - Generate videos using text-to-video (t2v) or image-to-video (i2v) AI models - Upload reference images for image-to-video generation - Configure and use custom AI models from any compatible provider - Track generation status and history across all model providers - Download generated media files - Monitor and cancel in-progress video generation tasks - Browse generation history with previews and metadata

Extensibility: The system is designed to support: - System Models: Pre-configured models ready to use out of the box - Custom Models: Integration with any AI provider through custom API configuration - Multiple Providers: Simultaneous use of models from different AI services - Flexible Authentication: Support for various API key and authentication methods

Roles Required

The following roles are required to access and use the AI Media Generator:

Role Access Level Description
TM AI Developer Full Access Primary role for using AI Media Generation features
System Manager Full Access Administrative access to all DocType operations

Note: The page "AI Media Generator" is accessible to both System Manager and TM AI Developer roles.

1. TM AI Generated File

This DocType stores records of all generated media files (images and videos) regardless of which AI provider was used.

TM AI Generated File Fields

Field Type Description
file_name Data Unique identifier for the generated file (auto-generated, read-only)
file_url Data URL path to the generated media file (hidden, read-only)
ai_model Link Reference to the TM AI Model used for generation (read-only)
type Select Media type: "Image" or "Video" (read-only)
status Select Generation status: Queued, Running, Cancelled, Succeeded, Failed
prompt Text Original text prompt used for generation
parameters JSON Generation parameters (resolution, guidance_scale, seed, reference images, provider-specific settings, etc.)
videogenid Data Task ID for video generation tracking (provider-specific)
file_preview HTML Visual preview of the generated media (displayed in form view)
error_message Small Text Error details if generation fails

Status Indicators

Status Color Description
Queued Cyan Initial state for video generation tasks waiting to start
Running Blue Video generation currently in progress
Cancelled Pink Generation task cancelled by user
Succeeded Green Generation completed successfully
Failed Red Generation failed due to an error

2. TM AI Model

This DocType defines available AI models from various providers for different tasks including media generation.

TM AI Model Fields

Field Type Description
model_name Data Display name of the model (unique, required)
model_id Data Provider's model identifier (e.g., "seedream-3-0-t2i-250415" for BytePlus)
model_type Select Model category: LLM, Image Gen, Video Gen, VLM
model_input Select Input type: Text, Image, or Text + Image
model_output Select Output type: Text, Image, or Video
active Check Enable/disable the model for use
issystemmodel Check Flag for pre-configured system models (read-only)
custom_model Data Custom model ID for third-party AI providers
customapikey Password API key for custom models (securely stored)
customapibase_url Data API base URL for custom model providers
application Link Reference to My Application

Model Configuration Options

The system supports two types of model configurations:

1. System Models: - Pre-configured models from default providers (currently BytePlus in MVP) - Use model_id field for model identifier - Authentication handled through site configuration - Marked with is_system_model = 1

2. Custom Models: - Models from any third-party AI provider - Use custom_model for model identifier - Use custom_api_key for authentication - Use custom_api_base_url for API endpoint - Supports OpenAI-compatible APIs or custom integrations

Model Type Support Matrix

Model Type Naming Convention Input Requirement Use Case
Text-to-Image (t2i) Contains "t2i" in model ID Text only Generate images from text prompts
Text-to-Video (t2v) Contains "t2v" in model ID Text only Generate videos from text prompts
Image-to-Video (i2v) Contains "i2v" in model ID Text + Images (required) Generate videos from text and reference images
Multimodal model_input = "Text + Image" Text + Images (optional) Generate with optional reference images

Note: The exact capabilities depend on the specific AI provider and model being used.

Procedure

1. Navigate to AI Media Generator

Path: Home > AI Workspace > AI Media Generator

Or use the search bar: Type "AI Media Generator" and select the page.

2. Understanding the Interface

The AI Media Generator features a split-pane layout:

  • Left Pane: Current result preview and generation history table
  • Right Pane: Generation controls and configuration options
  • Responsive Design: Automatically adapts to mobile devices (single column on screens < 992px)
  • Resizable: Drag the divider handle to adjust pane sizes

3. Selecting Media Type

  1. At the top of the right pane, choose the media type:

    • Image: For generating static images (faster, typically synchronous)
    • Video: For generating video clips (slower, typically asynchronous with polling)
  2. The interface will update to show relevant models and options for your selection.

4. Selecting an AI Model

  1. Click the Model dropdown
  2. Choose from available active models:
    • For Image mode: Only Image Gen (t2i) models are displayed
    • For Video mode: Only Video Gen (t2v/i2v) models are displayed
  3. Models from different providers may be listed together
  4. Model capabilities (e.g., frame rate, duration) are displayed below the dropdown

Note: Available models depend on your system configuration and which providers have been set up by your administrator.

5. Providing Input (Reference Images)

Note: Input file requirements depend on the selected model type and provider capabilities.

When Input Files are REQUIRED (i2v models):

  1. Select File Upload or URL Input tab
  2. For File Upload:
    • Click Choose Files or drag and drop
    • Supported formats: .jpg, .jpeg, .png
    • Maximum: 3 files (may vary by provider)
  3. For URL Input:
    • Enter image URL and click Add
    • Repeat for multiple images (max 3)
  4. Preview cards show uploaded/added images
  5. Click × on any card to remove an image

When Input Files are OPTIONAL (Text + Image models):

  • Follow the same steps as above, but images are not required
  • You may proceed with text prompt only

When Input Files are HIDDEN (t2i/t2v models):

  • Input file section is not displayed
  • Only text prompt is needed

6. Writing the Prompt

  1. In the Prompt text area, enter a detailed description of the media you want to generate
  2. Best practices:
    • Be specific and descriptive
    • Include details about style, mood, colors, composition
    • For videos: Describe the action or movement
    • Example: "A serene sunset over a calm ocean with vibrant orange and pink clouds, photorealistic style"

Note: Optimal prompt structure may vary by AI provider and model. Consult your model's documentation for best results.

7. Configuring Parameters

Available parameters depend on the selected model and provider. Common parameters include:

For Image Generation:

  • Resolution: Select from available sizes (varies by provider)
  • Guidance Scale: Typically 1.0 to 10.0 (higher = more prompt adherence)
  • Seed: -1 for random, or specific number for reproducibility
  • Watermark: Enable/disable watermark on generated image
  • Additional Parameters: Provider-specific options may be available

For Video Generation:

  • Resolution: Select from available options (e.g., 480p, 720p, 1080p)
  • Aspect Ratio: Various ratios depending on provider support
  • Duration: Available durations in seconds
  • Frame Rate: Supported frame rates (e.g., 24, 30 fps)
  • Seed: -1 for random, or specific number for reproducibility
  • Watermark: Enable/disable watermark on generated video
  • Additional Parameters: Provider-specific options may be available

8. Generating Media

  1. Review all inputs: model, prompt, reference images (if applicable), parameters
  2. Click the Generate ✨ button
  3. For Images (typically synchronous):
    • Loading animation appears (border pulse effect)
    • Generation completes based on provider speed (typically 30-90 seconds)
    • Image appears in the left preview pane
    • Status: "Succeeded" on completion
  4. For Videos (typically asynchronous):
    • Initial response is immediate with task ID
    • Status starts as "Queued"
    • Frontend polls status periodically
    • Status progresses: Queued → Running → Succeeded
    • Video appears in preview pane when completed
    • Note: Video generation time varies by provider and parameters

9. Viewing Results

  1. Preview Pane:

    • Images display as <img> with full resolution
    • Videos display with HTML5 player controls (play, pause, volume, fullscreen)
  2. Generation Info:

    • Model name and parameters displayed below preview
    • Prompt text shown
    • Resolution and generation timestamp

10. Downloading Generated Media

  1. Click the Download button below the preview
  2. File saves with smart filename format:
    • Pattern: {model_name}_{YYYYMMDD_HHMMSS}_{hash}.{ext}
    • Example: image_model_20260112_143022_abc12345.png
  3. Files are also saved in Frappe File Manager:
    • Images: "AI Generated Images/YYYY-MM/"
    • Videos: "AI Generated Videos/YYYY-MM/"

11. Browsing Generation History

  1. The History table displays the last 20 generations across all models and providers
  2. Columns shown:
    • Preview: Thumbnail of generated media
    • Model: AI model used
    • Prompt: First 50 characters of the prompt
    • Resolution: Size or aspect ratio
    • Status: Current generation status
    • Created: Timestamp of generation
  3. Click on any row to reload that result into the preview pane
  4. Click column headers to sort the table

12. Cancelling Video Generation

For in-progress video tasks only (provider support required):

  1. While status is "Queued" or "Running":
    • A Cancel button appears in the preview pane
    • Or open the TM AI Generated File document and click Cancel
  2. Click Cancel to stop the generation
  3. Status changes to "Cancelled"
  4. No further polling occurs

Note: Cancellation support depends on the AI provider's API capabilities.

13. Accessing TM AI Generated File DocType (Advanced)

For administrative purposes or detailed tracking:

  1. Navigate to: Home > TechMaju AI > TM AI Generated File
  2. View all generated files with full metadata
  3. Each document shows:
    • File preview (image or video embedded)
    • All generation parameters in JSON format
    • Status tracking
    • Error messages (if failed)
    • Provider-specific task IDs (for videos)

Configuring AI Models (Administrator)

Adding System Models

For pre-configured provider models (currently BytePlus):

  1. Navigate to: Home > TechMaju AI > TM AI Model
  2. Click + Add TM AI Model
  3. Fill in the fields:
    • Model Name: Display name (e.g., "SeeDream 3.0 Image Generator")
    • Model ID: Provider's model identifier
    • Model Type: Select "Image Gen" or "Video Gen"
    • Model Input: Select input type requirement
    • Model Output: Select "Image" or "Video"
    • Active: Check to enable
    • Is System Model: Check for system models
  4. Save the document

Note: System models use API credentials configured in site configuration.

Adding Custom Models

For third-party AI providers:

  1. Navigate to: Home > TechMaju AI > TM AI Model
  2. Click + Add TM AI Model
  3. Fill in the fields:
    • Model Name: Display name (e.g., "OpenAI DALL-E 3")
    • Model Type: Select "Image Gen" or "Video Gen"
    • Model Input: Select input type requirement
    • Model Output: Select "Image" or "Video"
    • Active: Check to enable
    • Custom Model: Provider's model identifier
    • Custom API Key: Your API key for the provider
    • Custom API Base URL: Provider's API endpoint
  4. Save the document

Supported Custom Providers: - OpenAI-compatible APIs (DALL-E, Stable Diffusion, etc.) - Any provider with compatible REST API - Custom in-house AI models

Requirements for Custom Models: - API must support text-to-image or text-to-video generation - Response format should be compatible (URL or base64) - Authentication via API key in headers

Architecture & Technical Details

File Storage Structure

Private Files/
├── AI Generated Images/
│   ├── 2026-01/
│   │   ├── model_20260112_143022_abc12345.png
│   │   └── ...
│   └── 2026-02/
└── AI Generated Videos/
    ├── 2026-01/
    │   ├── model_20260112_150030_xyz67890.mp4
    │   └── ...
    └── 2026-02/

API Endpoints

Endpoint Method Purpose
techmaju_ai.api.ai_media_gen.get_media_models GET Fetch available Image Gen and Video Gen models from all providers
techmaju_ai.api.ai_media_gen.generate_media POST Initiate image or video generation using specified model
techmaju_ai.api.ai_media_gen.get_generation_history GET Retrieve recent generation records across all models
techmaju_ai.api.ai_media_gen.check_video_status GET Poll video task status (provider-specific)
techmaju_ai.api.ai_media_gen.cancel_video_generation POST Cancel in-progress video task (provider-specific)

Provider Integration Architecture

The system uses a flexible integration layer that supports multiple AI providers:

Current Implementation (MVP): - Default Provider: BytePlus Ark API (Asia Pacific) - Base URL: https://ark.ap-southeast.bytepluses.com/api/v3 - Authentication: Bearer token from site configuration

Extensibility: - Custom Provider Support: Via customapibaseurl and customapi_key - Multiple Providers: Can use different providers simultaneously - Provider-Agnostic Interface: Unified user experience regardless of backend provider

Image Generation Flow: 1. User selects model (system or custom) 2. System routes to appropriate provider API 3. Synchronous or asynchronous processing based on provider 4. File downloaded and saved in unified storage 5. Metadata stored in TM AI Generated File

Video Generation Flow: 1. User selects model (system or custom) 2. System routes to appropriate provider API 3. Asynchronous task initiated with provider-specific task ID 4. Status polling via provider's status endpoint 5. Video downloaded when ready 6. Metadata stored with task tracking information

Security Features

  • Role-Based Access Control: All endpoints require TM AI Developer role
  • Secure Credential Storage: API keys encrypted in database
  • Signed URLs: Generated files use time-limited signed URLs for secure access
  • Parameter Validation: All inputs sanitized and validated before API calls
  • File Type Restrictions: Only safe file types accepted for uploads
  • Rate Limiting: Controlled by individual provider quotas

Best Practices

For Optimal Results

  1. Prompt Engineering:

    • Be descriptive and specific about desired output
    • Include style descriptors appropriate for the model
    • Specify composition, lighting, colors, and mood
    • For videos: Describe the action or camera movement
    • Note: Different AI models may interpret prompts differently - experiment with prompt styles
  2. Model Selection:

    • Choose models based on your specific use case and quality requirements
    • System models are pre-configured and easier to use
    • Custom models offer flexibility and choice of providers
    • Consider cost, speed, and quality trade-offs
    • Test multiple models to find the best fit for your needs
  3. Reference Images (for i2v models):

    • Use high-quality images (minimum 720p recommended)
    • Ensure images are clear and well-lit
    • Avoid copyrighted or sensitive content
    • Maximum number of images depends on provider (typically 3)
  4. Parameter Tuning:

    • Start with default or recommended values
    • Higher guidance scale typically = more prompt adherence but less creativity
    • Use specific seeds to reproduce results
    • Choose resolution based on use case (higher = better quality but slower/more expensive)
    • Note: Parameter effects vary by model and provider
  5. Video Generation:

    • Expect variable wait times depending on provider and parameters
    • Monitor status in history table
    • Consider leaving tab open during generation for real-time updates
    • Cancel tasks that are taking unexpectedly long

For System Administration

  1. Model Management:

    • Regularly review and update available models
    • Test new models in development environment before production
    • Deactivate deprecated, slow, or expensive models
    • Monitor which models are most popular with users
    • Consider cost-per-generation when adding models
  2. Provider Configuration:

    • For system models: Configure provider API keys in site configuration
    • For custom models: Securely store API keys in TM AI Model documents
    • Regularly rotate API keys for security
    • Monitor API quotas and usage across all providers
    • Set up alerts for quota limits or API failures
  3. Storage Management:

    • Monitor "AI Generated Images" and "AI Generated Videos" folders for disk usage
    • Implement cleanup policies for old files (e.g., archive after 90 days)
    • Consider external storage or CDN for serving media files
    • Backup important generations before cleanup
  4. Usage Monitoring:

    • Review TM AI Usage Log regularly for cost tracking across all providers
    • Analyze which models and providers are most cost-effective
    • Set up alerts for high usage or errors
    • Monitor failed generations to identify problematic models or prompts
  5. Performance Optimization:

    • Cache frequently generated prompts if applicable
    • Implement queue management for high-traffic periods
    • Use CDN for serving generated media files
    • Consider load balancing across multiple providers
    • Monitor response times by provider and model
  6. Security:

    • Regularly rotate all provider API keys
    • Audit user generations for inappropriate content
    • Ensure signed URL expiration is appropriate for your use case
    • Restrict TM AI Developer role to trusted users only
    • Review custom model configurations for security compliance

For Development

  1. Testing:

    • Test with non-production API keys in development
    • Validate all model types before deploying to production
    • Test error scenarios (API failures, invalid parameters, quota exceeded)
    • Verify polling logic for asynchronous providers
    • Test custom provider integration thoroughly
  2. Adding New Providers:

    • Document provider-specific API requirements
    • Implement adapter pattern for provider-specific logic
    • Test synchronous and asynchronous generation flows
    • Verify error handling and status mapping
    • Update documentation with provider-specific capabilities
  3. Custom Models:

    • Use custommodel, customapikey, and customapibaseurl fields
    • Ensure custom providers follow compatible API contracts
    • Document any provider-specific parameter requirements
    • Implement proper error handling for provider-specific errors
    • Test authentication and authorization flows
  4. Integration:

    • Use exposed API endpoints to integrate with other DocTypes
    • Consider embedding generation functionality in workflows
    • Leverage generation history for analytics and reporting
    • Build on top of the extensible model management system

Troubleshooting

Common Issues

Issue Possible Cause Solution
"No models available" No active Image Gen or Video Gen models configured Activate models in TM AI Model DocType
"Reference images required" Using i2v model without images Upload at least one reference image
"Generation failed" API error, quota exceeded, invalid parameters, provider issue Check error_message field, verify API credentials and quotas
"Video stuck in Queued" Provider API delay or failure Wait, then check provider service status
"Download not working" File URL expired or generation incomplete Regenerate the media or check file_url field
"Permission denied" User lacks TM AI Developer role Assign TM AI Developer role to user
"Model not responding" Custom model misconfigured or provider down Verify customapibaseurl and customapi_key, check provider status
"Authentication failed" Invalid or expired API key Update API key in model configuration or site config

Provider-Specific Issues

For System Models: - Verify site configuration has correct provider API key - Check provider service status and quotas - Review provider's API documentation for changes

For Custom Models: - Verify customapibaseurl is correct and accessible - Check customapikey is valid and has sufficient quota - Ensure custom model ID matches provider's model identifier - Review provider-specific error messages in errormessage field

Getting Help

  • Check generation error messages in TM AI Generated File document
  • Review server logs: bench --site [site-name] console → Check frappe.log
  • Verify provider API status and quotas
  • For custom models: Test API credentials independently
  • Check provider-specific documentation for parameter requirements
  • Contact provider support for API-specific issues
Discard
Save

On this page

Review Changes ← Back to Content
Message Status Space Raised By Last update on