A powerful Model Context Protocol (MCP) server for YouTube video transcription and metadata extraction. This server provides advanced tools for AI agents to retrieve video metadata and generate high-quality transcriptions with native language support.
🌟 Features
Metadata Extraction: Retrieve comprehensive video details (title, description, views, duration, etc.) without downloading the video.
Smart Transcription:
In-Memory Processing: fast, efficient, and disk-I/O free pipeline.
VAD (Voice Activity Detection): uses Silero VAD for precise segmentation.
Multilingual Support: supports 99 languages.
Translation: Transcribe to any supported language.
Caching: Intelligent file-based caching to avoid redundant processing.
Optimized Performance:
Uses yt-dlp for robust extraction.
Hardware acceleration (MPS/CUDA) for Whisper inference.
Parallel processing for transcription segments.