주제 한 줄 입력으로 유튜브 영상 완성 — AI 콘텐츠 자동화 파이프라인

[미검증]

📌 0. 시리즈

응용편	제목	난이도	핵심 기술
응용 1	사진 10장으로 AI 캐릭터·프로필 이미지 만들기	⭐⭐⭐	FLUX.1 dev · LoRA · ComfyUI
응용 2	내 목소리 AI 클론 — 유튜브 내레이터 자동화	⭐⭐⭐	F5-TTS · Kokoro · Sesame CSM-1B
응용 3	상품 이미지 1장 → 15초 광고 영상 자동 생성	⭐⭐⭐⭐	Wan2.2 · HunyuanVideo 1.5 · LTX-2
응용 4	공장 불량 자동 검사 — NG/OK 탐지 + 로봇 좌표 추출	⭐⭐~⭐⭐⭐⭐⭐	YOLOv12 · OpenCV · RealSense
응용 5	영상 분위기 분석 → BGM 자동 생성 & 싱크	⭐⭐⭐	MusicGen · AudioCraft · Stable Audio Open
응용 6	주제 한 줄 입력으로 유튜브 영상 완성 — AI 콘텐츠 자동화 파이프라인	⭐⭐⭐⭐	LangGraph · CrewAI · AutoGen 0.4
응용 7	사진 보고 글 쓰는 AI — Vision LLM 상세페이지 자동 작성	⭐⭐⭐⭐	Qwen2.5-VL · InternVL3 · LLaVA-Next
응용 8	내 PDF 문서를 AI가 읽는다 — 사내 지식 RAG 챗봇 구축	⭐⭐⭐	LlamaIndex · ChromaDB · Qdrant

📌 1. 들어가며

이 포스트에서 만들 것

이 포스트를 끝까지 따라하면 주제 한 줄만 입력하면 아래 결과물이 자동으로 완성됩니다.

입력: "2026년 AI 트렌드 TOP 5"
   ↓
LangGraph + CrewAI 에이전트 오케스트레이션
   │
   ├─ [에이전트 1] 대본 작성    → script.txt
   ├─ [에이전트 2] 썸네일 생성  → thumbnail.jpg  (응용 1 연동)
   ├─ [에이전트 3] 더빙         → narration.wav  (응용 2 연동)
   ├─ [에이전트 4] 인트로 영상  → intro.mp4      (응용 3 연동)
   └─ [에이전트 5] BGM 생성     → bgm.wav        (응용 5 연동)
   ↓
FFmpeg 자동 합치기
   ↓
출력: final_youtube.mp4 (업로드 준비 완료)

에이전트 오케스트레이션이란 — 여러 AI를 연결하는 방식

단순 AI 호출:
  사람 → AI 호출 → 결과
  (매번 사람이 직접 입력)

에이전트 오케스트레이션:
  사람 → 오케스트레이터(LangGraph/CrewAI)
              ↓
      [에이전트 A] 결과 → [에이전트 B] 결과 → [에이전트 C]
              ↓
         최종 결과물

핵심 개념:
  Agent:  특정 역할을 수행하는 AI (대본 작가, 영상 편집자 등)
  Task:   에이전트에게 주어진 구체적인 작업
  State:  에이전트 간에 공유되는 현재 작업 상태
  Edge:   에이전트 간 실행 순서 / 조건 분기

💡 LangGraph vs CrewAI 선택 기준:

LangGraph → 실행 흐름을 직접 제어하고 싶을 때, 복잡한 조건 분기가 필요할 때

CrewAI → 빠르게 멀티에이전트를 구성할 때, 20~30줄 코드로 프로토타이핑할 때

이 포스트에서는 두 방식 모두 구현합니다.

📌 2. 환경 준비

2-1. 응용 1~5 환경 전제 조건

이 포스트는 아래 응용편을 이미 완료했다고 가정합니다:

✅ 응용 1: FLUX.1 dev + LoRA 학습 완료
   └─ ./output/lora/my_character_v1.safetensors 존재

✅ 응용 2: F5-TTS 설치 완료
   └─ ./my_voice_sample.wav 존재
   └─ ./my_voice_ref_text.txt 존재

✅ 응용 3: ComfyUI + Wan2.2 설치 완료
   └─ ComfyUI가 http://127.0.0.1:8188 에서 실행 중

✅ 응용 5: MusicGen(AudioCraft) 설치 완료
   └─ audiocraft 패키지 설치됨

미완료 시:
  → 각 응용편을 먼저 진행 후 돌아오세요.
  → 또는 각 에이전트를 Mock(가짜 출력)으로 대체해 파이프라인 구조만 먼저 학습 가능

2-2. LangGraph, CrewAI, Ollama 설치

bash# LangGraph + LangChain
pip install langgraph langchain langchain-community
pip install langchain-ollama    # Ollama LLM 연동

# CrewAI
pip install crewai crewai-tools

# Ollama (로컬 LLM 실행기)
# Linux / Mac:
curl -fsSL https://ollama.com/install.sh | sh

# Windows:
# https://ollama.com/download 에서 설치파일 다운로드

# Ollama 모델 다운로드
ollama pull llama3.2            # 3B 경량 (RAM 4GB)
ollama pull qwen2.5:14b         # 14B 고품질 (RAM 16GB)
ollama pull qwen2.5:7b          # 7B 균형 (RAM 8GB)

# Ollama 서버 실행 확인
ollama serve                    # http://localhost:11434

# 추가 패키지
pip install diffusers pillow    # 썸네일 생성
pip install soundfile numpy     # 오디오 처리

Ollama 모델 선택 가이드:

RAM 4~8GB:   llama3.2:3b    → 대본 품질 보통, 빠름
RAM 8~16GB:  qwen2.5:7b     → 대본 품질 좋음, 균형
RAM 16GB+:   qwen2.5:14b    → 대본 품질 최고, 느림
RAM 32GB+:   qwen2.5:32b    → 전문가 수준 대본

📌 3. 전체 파이프라인 아키텍처

주제 입력: "2026년 AI 트렌드 TOP 5"
      │
      ▼
┌─────────────────────────────────────────┐
│         LangGraph State Manager          │
│  (전체 상태 관리 & 에이전트 오케스트레이션)  │
└─────────────────────────────────────────┘
      │
      ▼
[Node 1] 대본 작성 에이전트 (Ollama + Qwen2.5)
  출력: script.json (씬별 대본 + 타임라인)
      │
      ▼ (병렬 실행)
      ├──────────────────────────────────┐
      │                                  │
[Node 2] 썸네일 생성              [Node 3] 더빙
  (FLUX.1 dev)                    (F5-TTS)
  출력: thumbnail.jpg             출력: narration.wav
      │                                  │
      ▼                                  │
[Node 4] 인트로 영상 생성                 │
  (Wan2.2 / ComfyUI)                     │
  출력: intro.mp4                        │
      │                                  │
      ▼                                  │
[Node 5] BGM 생성                        │
  (MusicGen)                             │
  출력: bgm.wav                          │
      │                                  │
      └────────────────┬─────────────────┘
                       │
                       ▼
               [Node 6] FFmpeg 합치기
                 출력: final_youtube.mp4

📌 4. 각 에이전트 구성

4-1. 대본 작성 에이전트

python# agents/script_agent.py
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage
import json

def create_script_agent(model_name="qwen2.5:7b"):
    """Ollama 기반 대본 작성 에이전트"""
    llm = ChatOllama(
        model       = model_name,
        temperature = 0.7,
        base_url    = "http://localhost:11434",
    )
    return llm

def generate_script(llm, topic, duration_sec=120, style="유튜브 교육"):
    """
    주제 → 씬별 대본 JSON 생성
    duration_sec: 목표 영상 길이 (초)
    """
    system_prompt = """당신은 유튜브 영상 대본 전문 작가입니다.
주어진 주제로 흥미롭고 교육적인 영상 대본을 작성합니다.

반드시 아래 JSON 형식으로만 응답하세요:
{
  "title": "영상 제목",
  "description": "영상 설명 (유튜브 설명란용, 3줄)",
  "tags": ["태그1", "태그2", "태그3"],
  "thumbnail_prompt": "썸네일 이미지 생성용 영어 프롬프트",
  "bgm_mood": "BGM 분위기 설명 (영어)",
  "scenes": [
    {
      "id": "scene_01",
      "type": "intro",
      "duration": 10,
      "narration": "나레이션 텍스트",
      "visual": "화면 묘사",
      "b_roll": "배경 영상 설명"
    }
  ]
}"""

    user_prompt = f"""주제: {topic}
목표 길이: {duration_sec}초
스타일: {style}

씬 구성 가이드:
- scene_01 (intro):    10초  — 후킹, 궁금증 유발
- scene_02~N (main):   각 20~30초 — 핵심 내용
- scene_last (outro):  10초  — 구독/좋아요 유도

위 형식의 완전한 JSON을 작성해주세요."""

    messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=user_prompt),
    ]

    print(f"🖊️  대본 생성 중: '{topic}'")
    response = llm.invoke(messages)

    # JSON 파싱
    content = response.content
    # JSON 블록 추출
    if "```json" in content:
        content = content.split("```json").split("```")[1]
    elif "```" in content:
        content = content.split("```").split("```")[0]

    script = json.loads(content.strip())
    print(f"✅ 대본 완성: {len(script['scenes'])}개 씬")
    return script

4-2. 썸네일 생성 에이전트

python# agents/thumbnail_agent.py
from diffusers import FluxPipeline
import torch
from PIL import Image, ImageDraw, ImageFont
import os

def create_thumbnail_agent():
    """FLUX.1 dev 썸네일 생성 에이전트"""
    pipe = FluxPipeline.from_pretrained(
        "black-forest-labs/FLUX.1-dev",
        torch_dtype=torch.bfloat16,
    ).to("cuda")

    # 응용 1에서 학습한 LoRA 로드 (선택)
    lora_path = "./output/lora/my_character_v1.safetensors"
    if os.path.exists(lora_path):
        pipe.load_lora_weights(lora_path)
        print("LoRA 로드 완료")

    return pipe

def generate_thumbnail(pipe, thumbnail_prompt, title,
                        output_path="./output/thumbnail.jpg"):
    """
    썸네일 생성 + 제목 텍스트 오버레이
    """
    # 유튜브 썸네일 비율 (16:9)
    full_prompt = (
        f"{thumbnail_prompt}, "
        "YouTube thumbnail style, high contrast, bold colors, "
        "eye-catching, 16:9 aspect ratio, professional, "
        "no text, clean composition"
    )

    print(f"🖼️  썸네일 생성 중...")
    image = pipe(
        prompt              = full_prompt,
        width               = 1280,
        height              = 720,
        num_inference_steps = 28,
        guidance_scale      = 3.5,
    ).images[0]

    # 제목 텍스트 오버레이
    image = add_title_overlay(image, title)
    image.save(output_path, "JPEG", quality=95)
    print(f"✅ 썸네일 저장: {output_path}")
    return output_path

def add_title_overlay(image, title, font_size=60):
    """이미지 하단에 제목 텍스트 추가"""
    draw = ImageDraw.Draw(image)
    w, h = image.size

    # 텍스트 영역 배경 (반투명 검정)
    overlay = Image.new("RGBA", image.size, (0, 0, 0, 0))
    overlay_draw = ImageDraw.Draw(overlay)
    overlay_draw.rectangle([(0, h-120), (w, h)], fill=(0, 0, 0, 160))

    image = Image.alpha_composite(image.convert("RGBA"), overlay).convert("RGB")
    draw  = ImageDraw.Draw(image)

    # 텍스트 출력 (폰트 없으면 기본 폰트 사용)
    try:
        font = ImageFont.truetype("./fonts/NotoSansKR-Bold.ttf", font_size)
    except:
        font = ImageFont.load_default()

    draw.text((w//2, h-60), title, fill="white",
              font=font, anchor="mm")
    return image

4-3. 더빙 에이전트

python# agents/dubbing_agent.py
from f5_tts.api import F5TTS
import soundfile as sf
import numpy as np
import json, os

def create_dubbing_agent():
    """F5-TTS 더빙 에이전트"""
    tts = F5TTS()
    return tts

def generate_narration(tts, script, ref_audio, ref_text,
                        output_dir="./output/audio"):
    """
    씬별 대본 → 씬별 음성 파일 생성
    반환: {"scene_01": "path/to/scene_01.wav", ...}
    """
    os.makedirs(output_dir, exist_ok=True)
    audio_map    = {}
    total_scenes = len(script["scenes"])

    for i, scene in enumerate(script["scenes"]):
        scene_id  = scene["id"]
        narration = scene["narration"]

        if not narration.strip():
            continue

        print(f"🎙️  더빙 [{i+1}/{total_scenes}]: {scene_id}")

        wav, sr, _ = tts.infer(
            ref_file = ref_audio,
            ref_text = ref_text,
            gen_text = narration,
            speed    = 1.05,   # 유튜브 권장 속도
        )

        save_path = os.path.join(output_dir, f"{scene_id}.wav")
        sf.write(save_path, wav, sr)
        audio_map[scene_id] = {
            "path":     save_path,
            "duration": len(wav) / sr,
        }

    # 전체 나레이션 합본 생성
    all_audio   = []
    silence_sec = 0.3
    for scene in script["scenes"]:
        sid = scene["id"]
        if sid in audio_map:
            audio, sr = sf.read(audio_map[sid]["path"])
            all_audio.append(audio)
            all_audio.append(np.zeros(int(sr * silence_sec)))

    if all_audio:
        full_audio = np.concatenate(all_audio)
        full_path  = os.path.join(output_dir, "full_narration.wav")
        sf.write(full_path, full_audio, sr)
        audio_map["__full__"] = {"path": full_path,
                                  "duration": len(full_audio) / sr}

    print(f"✅ 더빙 완료: {len(audio_map)-1}개 씬")
    return audio_map

4-4. 인트로 영상 에이전트

python# agents/video_agent.py
import requests, time, json, os

COMFYUI_URL = "http://127.0.0.1:8188"

def generate_intro_video(thumbnail_path, visual_description,
                          output_name="intro",
                          num_frames=81, fps=24):
    """
    썸네일 이미지 → 인트로 영상 생성 (Wan2.2 / ComfyUI API)
    """
    # 응용 3의 ComfyUI API 방식 그대로 사용
    prompt = (
        f"{visual_description}, "
        "slow cinematic zoom in, dramatic lighting, "
        "YouTube intro style, high quality"
    )

    workflow = {
        "1": {"class_type": "LoadImage",
              "inputs": {"image": os.path.abspath(thumbnail_path)}},
        "2": {"class_type": "WanVideoModelLoader",
              "inputs": {"model": "wan2.2-i2v-14b-720p.safetensors"}},
        "3": {"class_type": "CLIPTextEncode",
              "inputs": {"text": prompt, "clip": ["2", 1]}},
        "4": {"class_type": "CLIPTextEncode",
              "inputs": {"text": "blurry, shaky, low quality",
                         "clip": ["2", 1]}},
        "5": {"class_type": "WanVideoSampler",
              "inputs": {"model": ["2", 0], "image": ["1", 0],
                         "positive": ["3", 0], "negative": ["4", 0],
                         "steps": 25, "cfg": 6.0,
                         "num_frames": num_frames, "fps": fps}},
        "6": {"class_type": "VHS_VideoCombine",
              "inputs": {"images":          ["5", 0],
                         "frame_rate":       fps,
                         "filename_prefix":  output_name,
                         "format":           "video/mp4"}},
    }

    print(f"🎬 인트로 영상 생성 중...")
    resp      = requests.post(f"{COMFYUI_URL}/prompt",
                               json={"prompt": workflow})
    prompt_id = resp.json()["prompt_id"]

    while True:
        status = requests.get(
            f"{COMFYUI_URL}/history/{prompt_id}"
        ).json()
        if prompt_id in status:
            outputs = status[prompt_id].get("outputs", {})
            for node_id, output in outputs.items():
                if "gifs" in output:
                    filename = output["gifs"][0]["filename"]
                    video_path = f"./ComfyUI/output/{filename}"
                    print(f"✅ 인트로 영상 생성 완료: {video_path}")
                    return video_path
            break
        print("⏳ 인트로 영상 생성 중...")
        time.sleep(10)

    return None

4-5. BGM 에이전트

python# agents/bgm_agent.py
import torch
import soundfile as sf
import numpy as np
from audiocraft.models import MusicGen

def create_bgm_agent(size="medium"):
    model = MusicGen.get_pretrained(f"facebook/musicgen-{size}")
    return model

def generate_bgm(model, bgm_mood, duration_sec,
                 output_path="./output/bgm.wav"):
    """
    BGM 분위기 설명 → BGM 생성 (응용 5 로직 재사용)
    """
    # 영상 길이에 맞는 생성 길이 계산
    gen_duration = min(duration_sec, 30.0)  # MusicGen 최대 30초

    model.set_generation_params(
        duration    = gen_duration,
        top_k       = 250,
        temperature = 1.0,
        cfg_coef    = 3.0,
    )

    print(f"🎵 BGM 생성 중... ({gen_duration}초)")
    with torch.no_grad():
        wav = model.generate([bgm_mood])

    audio = wav[0].cpu().numpy()

    # 영상보다 짧으면 루프 처리
    if duration_sec > gen_duration:
        repeat_count = int(np.ceil(duration_sec / gen_duration)) + 1
        audio        = np.tile(audio, repeat_count)

    # 목표 길이로 자르기
    target_samples = int(duration_sec * model.sample_rate)
    audio          = audio[:target_samples]

    # Fade-out
    fade_samples = int(2.0 * model.sample_rate)
    audio[-fade_samples:] *= np.linspace(1, 0, fade_samples)

    sf.write(output_path, audio.T, model.sample_rate)
    print(f"✅ BGM 저장: {output_path}")
    return output_path

📌 5. LangGraph 워크플로우 구현

5-1. 상태(State) 정의

python# pipeline/state.py
from typing import TypedDict, Optional, Dict, Any

class PipelineState(TypedDict):
    """
    에이전트 간 공유되는 전체 파이프라인 상태
    각 노드는 이 상태를 읽고 업데이트함
    """
    # 입력
    topic:           str             # 주제 입력
    duration_sec:    int             # 목표 영상 길이

    # 에이전트 결과물
    script:          Optional[Dict]  # 대본 JSON
    thumbnail_path:  Optional[str]   # 썸네일 파일 경로
    audio_map:       Optional[Dict]  # 씬별 음성 파일 맵
    intro_video:     Optional[str]   # 인트로 영상 경로
    bgm_path:        Optional[str]   # BGM 파일 경로
    final_video:     Optional[str]   # 최종 영상 경로

    # 실행 제어
    current_step:    str             # 현재 실행 중인 단계
    error:           Optional[str]   # 오류 메시지
    retry_count:     int             # 재시도 횟수
    completed:       bool            # 완료 여부

5-2. 노드(Node) 연결 & 엣지 설정

python# pipeline/langgraph_pipeline.py
from langgraph.graph import StateGraph, END
from agents.script_agent    import create_script_agent, generate_script
from agents.thumbnail_agent import create_thumbnail_agent, generate_thumbnail
from agents.dubbing_agent   import create_dubbing_agent, generate_narration
from agents.video_agent     import generate_intro_video
from agents.bgm_agent       import create_bgm_agent, generate_bgm
from pipeline.state         import PipelineState
import os, json

# ─── 에이전트 초기화 ───────────────────────────────────────
llm            = create_script_agent("qwen2.5:7b")
flux_pipe      = create_thumbnail_agent()
tts            = create_dubbing_agent()
musicgen       = create_bgm_agent("medium")

REF_AUDIO = "./my_voice_sample.wav"
REF_TEXT  = open("./my_voice_ref_text.txt").read().strip()

os.makedirs("./output", exist_ok=True)

# ─── 노드 함수 정의 ───────────────────────────────────────

def node_generate_script(state: PipelineState) -> PipelineState:
    """노드 1: 대본 생성"""
    state["current_step"] = "script"
    try:
        script = generate_script(
            llm, state["topic"], state["duration_sec"]
        )
        # 대본 저장
        with open("./output/script.json", "w", encoding="utf-8") as f:
            json.dump(script, f, ensure_ascii=False, indent=2)

        state["script"] = script
        print(f"✅ 대본 완성: {script['title']}")
    except Exception as e:
        state["error"] = f"대본 생성 실패: {e}"
    return state

def node_generate_thumbnail(state: PipelineState) -> PipelineState:
    """노드 2: 썸네일 생성"""
    state["current_step"] = "thumbnail"
    try:
        path = generate_thumbnail(
            flux_pipe,
            state["script"]["thumbnail_prompt"],
            state["script"]["title"],
            output_path="./output/thumbnail.jpg",
        )
        state["thumbnail_path"] = path
    except Exception as e:
        state["error"] = f"썸네일 생성 실패: {e}"
    return state

def node_generate_dubbing(state: PipelineState) -> PipelineState:
    """노드 3: 더빙 생성"""
    state["current_step"] = "dubbing"
    try:
        audio_map = generate_narration(
            tts, state["script"],
            REF_AUDIO, REF_TEXT,
            output_dir="./output/audio",
        )
        state["audio_map"] = audio_map
    except Exception as e:
        state["error"] = f"더빙 생성 실패: {e}"
    return state

def node_generate_intro(state: PipelineState) -> PipelineState:
    """노드 4: 인트로 영상 생성"""
    state["current_step"] = "intro_video"
    try:
        intro_desc = state["script"]["scenes"][0]["b_roll"]
        path = generate_intro_video(
            state["thumbnail_path"], intro_desc,
            output_name="intro",
        )
        state["intro_video"] = path
    except Exception as e:
        state["error"] = f"인트로 영상 실패: {e}"
    return state

def node_generate_bgm(state: PipelineState) -> PipelineState:
    """노드 5: BGM 생성"""
    state["current_step"] = "bgm"
    try:
        total_dur = state["audio_map"]["__full__"]["duration"]
        path = generate_bgm(
            musicgen,
            state["script"]["bgm_mood"],
            total_dur,
            output_path="./output/bgm.wav",
        )
        state["bgm_path"] = path
    except Exception as e:
        state["error"] = f"BGM 생성 실패: {e}"
    return state

def node_final_merge(state: PipelineState) -> PipelineState:
    """노드 6: 최종 합치기"""
    state["current_step"] = "merge"
    try:
        output = ffmpeg_merge(
            intro_video = state["intro_video"],
            narration   = state["audio_map"]["__full__"]["path"],
            bgm         = state["bgm_path"],
            output_path = "./output/final_youtube.mp4",
        )
        state["final_video"] = output
        state["completed"]   = True
    except Exception as e:
        state["error"] = f"최종 합치기 실패: {e}"
    return state

5-3. 조건부 분기 처리

python# 조건부 엣지: 오류 발생 시 재시도 or 종료
def should_retry(state: PipelineState) -> str:
    """
    오류 발생 시 재시도 여부 결정
    반환: "retry" / "end" / "continue"
    """
    if state.get("error"):
        if state.get("retry_count", 0) < 2:
            state["retry_count"] = state.get("retry_count", 0) + 1
            state["error"]       = None
            print(f"⚠️  오류 발생. 재시도 {state['retry_count']}/2...")
            return "retry"
        else:
            print(f"❌ 최대 재시도 초과. 종료.")
            return "end"

    if state.get("completed"):
        return "end"

    return "continue"

# ─── 그래프 빌드 ──────────────────────────────────────────
def build_pipeline():
    graph = StateGraph(PipelineState)

    # 노드 등록
    graph.add_node("script",    node_generate_script)
    graph.add_node("thumbnail", node_generate_thumbnail)
    graph.add_node("dubbing",   node_generate_dubbing)
    graph.add_node("intro",     node_generate_intro)
    graph.add_node("bgm",       node_generate_bgm)
    graph.add_node("merge",     node_final_merge)

    # 시작 노드
    graph.set_entry_point("script")

    # 순차 엣지
    graph.add_edge("script",    "thumbnail")
    graph.add_edge("thumbnail", "dubbing")
    graph.add_edge("dubbing",   "intro")
    graph.add_edge("intro",     "bgm")
    graph.add_edge("bgm",       "merge")

    # 조건부 엣지 (오류 처리)
    graph.add_conditional_edges(
        "merge",
        should_retry,
        {
            "retry": "script",    # 처음부터 재시도
            "end":   END,
            "continue": END,
        }
    )

    return graph.compile()

# ─── 실행 ─────────────────────────────────────────────────
pipeline = build_pipeline()

initial_state = PipelineState(
    topic        = "2026년 AI 트렌드 TOP 5",
    duration_sec = 120,
    script=None, thumbnail_path=None, audio_map=None,
    intro_video=None, bgm_path=None, final_video=None,
    current_step="init", error=None,
    retry_count=0, completed=False,
)

final_state = pipeline.invoke(initial_state)
print(f"\n🎉 최종 영상: {final_state['final_video']}")

📌 6. CrewAI 멀티에이전트 구성

6-1. Agent / Task / Crew 구조 설명

CrewAI 구조:

Agent  = 역할 + 목표 + 배경 (누구인가)
  예: "당신은 유튜브 대본 전문 작가입니다."

Task   = 구체적인 작업 지시 (무엇을 하는가)
  예: "주제 'AI 트렌드'로 3분짜리 대본을 JSON으로 작성하세요."

Crew   = Agent + Task를 조합한 실행 단위
  예: [대본에이전트, 썸네일에이전트, 더빙에이전트] 순서로 실행

6-2. 에이전트 간 결과물 전달 방법

python# pipeline/crewai_pipeline.py
from crewai import Agent, Task, Crew, Process
from langchain_ollama import ChatOllama
from crewai.tools import tool
import json, os

# LLM 설정
ollama_llm = ChatOllama(
    model    = "qwen2.5:7b",
    base_url = "http://localhost:11434",
)

# ─── 커스텀 도구 정의 ─────────────────────────────────────

@tool("generate_thumbnail_tool")
def generate_thumbnail_tool(thumbnail_prompt: str) -> str:
    """FLUX.1 dev로 썸네일을 생성합니다. 입력: 영어 프롬프트"""
    from agents.thumbnail_agent import create_thumbnail_agent, generate_thumbnail
    pipe = create_thumbnail_agent()
    path = generate_thumbnail(pipe, thumbnail_prompt,
                               title="",
                               output_path="./output/thumbnail.jpg")
    return f"썸네일 생성 완료: {path}"

@tool("generate_dubbing_tool")
def generate_dubbing_tool(script_json_path: str) -> str:
    """F5-TTS로 대본 전체 더빙을 생성합니다. 입력: script.json 경로"""
    from agents.dubbing_agent import create_dubbing_agent, generate_narration
    with open(script_json_path, encoding="utf-8") as f:
        script = json.load(f)
    tts       = create_dubbing_agent()
    audio_map = generate_narration(tts, script,
                                    "./my_voice_sample.wav",
                                    open("./my_voice_ref_text.txt").read().strip())
    return f"더빙 완료: {len(audio_map)-1}개 씬 / 합본: {audio_map['__full__']['path']}"

@tool("generate_bgm_tool")
def generate_bgm_tool(mood_and_duration: str) -> str:
    """MusicGen으로 BGM을 생성합니다. 입력: '분위기설명|길이초' 형식"""
    from agents.bgm_agent import create_bgm_agent, generate_bgm
    mood, duration = mood_and_duration.split("|")
    model = create_bgm_agent("medium")
    path  = generate_bgm(model, mood.strip(), float(duration.strip()))
    return f"BGM 생성 완료: {path}"

# ─── 에이전트 정의 ────────────────────────────────────────

script_writer = Agent(
    role  = "유튜브 대본 전문 작가",
    goal  = "주어진 주제로 시청자가 끝까지 보는 흥미로운 대본을 작성한다",
    backstory = (
        "10년 경력의 유튜브 콘텐츠 전략가. "
        "100만 구독자 채널의 대본을 다수 집필. "
        "SEO 최적화와 시청자 유지율을 동시에 고려한 대본 설계 전문가."
    ),
    llm     = ollama_llm,
    verbose = True,
)

thumbnail_designer = Agent(
    role  = "유튜브 썸네일 디자이너",
    goal  = "클릭률을 극대화하는 썸네일을 AI로 생성한다",
    backstory = (
        "CTR 최적화 전문 디자이너. "
        "FLUX.1 dev를 활용한 AI 썸네일 생성 전문가."
    ),
    llm   = ollama_llm,
    tools = [generate_thumbnail_tool],
)

dubbing_director = Agent(
    role  = "AI 더빙 디렉터",
    goal  = "대본을 자연스러운 AI 목소리로 더빙한다",
    backstory = (
        "10년 경력 성우 출신 AI 더빙 전문가. "
        "F5-TTS를 활용한 고품질 목소리 클론 전문."
    ),
    llm   = ollama_llm,
    tools = [generate_dubbing_tool],
)

music_producer = Agent(
    role  = "AI 음악 프로듀서",
    goal  = "영상 분위기에 맞는 저작권 없는 BGM을 생성한다",
    backstory = (
        "유튜브 BGM 전문 음악 프로듀서. "
        "MusicGen을 활용한 맞춤형 BGM 제작 전문."
    ),
    llm   = ollama_llm,
    tools = [generate_bgm_tool],
)

# ─── 태스크 정의 ──────────────────────────────────────────

task_script = Task(
    description = (
        "주제: {topic}\n"
        "목표 길이: {duration}초\n"
        "위 주제로 유튜브 영상 대본을 JSON 형식으로 작성하세요.\n"
        "작성 후 ./output/script.json 으로 저장하세요."
    ),
    expected_output = "완성된 대본 JSON 파일 경로: ./output/script.json",
    agent           = script_writer,
)

task_thumbnail = Task(
    description = (
        "이전 에이전트가 작성한 대본(script.json)을 참고하여 "
        "썸네일 프롬프트로 썸네일 이미지를 생성하세요."
    ),
    expected_output = "썸네일 파일 경로: ./output/thumbnail.jpg",
    agent           = thumbnail_designer,
    context         = [task_script],   # 대본 태스크 결과를 컨텍스트로 받음
)

task_dubbing = Task(
    description = (
        "./output/script.json 파일을 입력으로 "
        "모든 씬의 나레이션을 더빙하세요."
    ),
    expected_output = "더빙 합본 파일: ./output/audio/full_narration.wav",
    agent           = dubbing_director,
    context         = [task_script],
)

task_bgm = Task(
    description = (
        "대본의 bgm_mood 값과 나레이션 길이를 참고하여 "
        "BGM을 생성하세요. 입력 형식: '분위기설명|길이초'"
    ),
    expected_output = "BGM 파일: ./output/bgm.wav",
    agent           = music_producer,
    context         = [task_script, task_dubbing],
)

# ─── 크루 조합 & 실행 ─────────────────────────────────────

crew = Crew(
    agents  = [script_writer, thumbnail_designer,
               dubbing_director, music_producer],
    tasks   = [task_script, task_thumbnail,
               task_dubbing, task_bgm],
    process = Process.sequential,   # 순차 실행
    verbose = True,
)

result = crew.kickoff(inputs={
    "topic":    "2026년 AI 트렌드 TOP 5",
    "duration": 120,
})

print("\n🎉 CrewAI 작업 완료!")
print(result)

📌 7. FFmpeg 최종 합치기 자동화

대본 타임라인 기반 자동 컷 편집

python# pipeline/ffmpeg_merge.py
import subprocess
import json
import os
import soundfile as sf
import numpy as np

def build_timeline(script, audio_map):
    """
    대본 + 음성 파일 → 타임라인 자동 생성
    반환: [{"scene_id", "start", "end", "audio_path"}, ...]
    """
    timeline = []
    cursor   = 0.0

    for scene in script["scenes"]:
        sid = scene["id"]
        if sid not in audio_map:
            continue

        duration = audio_map[sid]["duration"]
        timeline.append({
            "scene_id":   sid,
            "start":      cursor,
            "end":        cursor + duration,
            "duration":   duration,
            "audio_path": audio_map[sid]["path"],
            "visual":     scene.get("visual", ""),
        })
        cursor += duration + 0.3   # 씬 간 0.3초 간격

    return timeline, cursor

def ffmpeg_merge(intro_video, narration, bgm, output_path):
    """
    인트로 영상 + 나레이션 + BGM → 최종 영상 합치기
    """
    # 나레이션 길이 확인
    audio, sr = sf.read(narration)
    total_duration = len(audio) / sr

    print(f"\n🎬 최종 합치기 시작 (총 {total_duration:.1f}초)")

    # Step 1: 인트로 영상 길이 확인
    intro_dur_cmd = [
        "ffprobe", "-v", "quiet",
        "-show_entries", "format=duration",
        "-of", "default=noprint_wrappers=1:nokey=1",
        intro_video
    ]
    intro_dur = float(
        subprocess.run(intro_dur_cmd,
                       capture_output=True, text=True).stdout.strip()
    )
    print(f"인트로 영상 길이: {intro_dur:.1f}초")

    # Step 2: 인트로에 나레이션 없는 구간 처리 (인트로 3초는 영상만)
    # 나레이션 앞에 인트로 길이만큼 묵음 추가
    silence_path = "./output/silence_intro.wav"
    silence = np.zeros(int(intro_dur * sr))
    sf.write(silence_path, silence, sr)

    # 묵음 + 나레이션 합치기
    narration_with_intro = "./output/narration_with_intro.wav"
    concat_audio_cmd = [
        "ffmpeg", "-y",
        "-i", silence_path,
        "-i", narration,
        "-filter_complex",
        "[0:a][1:a]concat=n=2:v=0:a=1[aout]",
        "-map", "[aout]",
        narration_with_intro
    ]
    subprocess.run(concat_audio_cmd, check=True)

    # Step 3: 인트로 영상 루프 (영상 전체를 채울 만큼)
    loop_video = "./output/intro_looped.mp4"
    loop_cmd = [
        "ffmpeg", "-y",
        "-stream_loop", "-1",
        "-i", intro_video,
        "-t", str(total_duration + intro_dur),
        "-c", "copy",
        loop_video
    ]
    subprocess.run(loop_cmd, check=True)

    # Step 4: 영상 + 나레이션 + BGM 최종 합치기
    final_cmd = [
        "ffmpeg", "-y",
        "-i", loop_video,
        "-i", narration_with_intro,
        "-i", bgm,
        "-filter_complex",
        # 나레이션 100% + BGM 25% 믹싱
        "[1:a]volume=1.0[voice];"
        "[2:a]volume=0.25[bgm];"
        "[voice][bgm]amix=inputs=2:duration=first[aout]",
        "-map",  "0:v",
        "-map",  "[aout]",
        "-c:v",  "libx264",
        "-crf",  "18",
        "-preset", "fast",
        "-c:a",  "aac",
        "-b:a",  "192k",
        "-shortest",
        output_path
    ]
    subprocess.run(final_cmd, check=True)

    # 임시 파일 정리
    for tmp in [silence_path, narration_with_intro, loop_video]:
        if os.path.exists(tmp):
            os.remove(tmp)

    print(f"✅ 최종 영상 저장: {output_path}")
    return output_path

📌 8. 결과 확인 & 트러블슈팅

에이전트가 루프에 빠질 때

python# 최대 반복 횟수 제한 설정

# LangGraph: recursion_limit 설정
from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver

pipeline = build_pipeline()

config = {
    "recursion_limit": 10,        # 최대 10번 반복 후 강제 종료
    "configurable": {"thread_id": "pipeline_01"}
}

final_state = pipeline.invoke(initial_state, config=config)

# CrewAI: max_iter 설정
script_writer = Agent(
    role     = "유튜브 대본 전문 작가",
    goal     = "...",
    backstory= "...",
    llm      = ollama_llm,
    max_iter = 3,           # 태스크 당 최대 3번 시도
    max_rpm  = 10,          # 분당 최대 10번 LLM 호출
)

# 타임아웃 설정 (에이전트가 멈췄을 때)
import signal

def timeout_handler(signum, frame):
    raise TimeoutError("에이전트 실행 시간 초과 (5분)")

signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(300)    # 5분 타임아웃
try:
    result = pipeline.invoke(initial_state)
finally:
    signal.alarm(0)  # 타임아웃 해제

중간 단계 실패 시 재시도 로직

python# 체크포인트 기반 재시도 — 실패한 노드부터 재실행

import pickle, os

CHECKPOINT_DIR = "./output/checkpoints"
os.makedirs(CHECKPOINT_DIR, exist_ok=True)

def save_checkpoint(state: PipelineState, step: str):
    """현재 상태를 체크포인트로 저장"""
    path = os.path.join(CHECKPOINT_DIR, f"{step}.pkl")
    with open(path, "wb") as f:
        pickle.dump(state, f)
    print(f"💾 체크포인트 저장: {step}")

def load_checkpoint(step: str) -> PipelineState:
    """체크포인트에서 상태 복원"""
    path = os.path.join(CHECKPOINT_DIR, f"{step}.pkl")
    if os.path.exists(path):
        with open(path, "rb") as f:
            state = pickle.load(f)
        print(f"♻️  체크포인트 복원: {step}")
        return state
    return None

def node_generate_script_with_checkpoint(state):
    # 이미 완료된 단계면 체크포인트에서 복원
    cached = load_checkpoint("script")
    if cached and cached.get("script"):
        state["script"] = cached["script"]
        print("✅ 대본: 캐시에서 복원")
        return state

    # 재시도 로직
    for attempt in range(3):
        try:
            script = generate_script(llm, state["topic"], state["duration_sec"])
            state["script"] = script
            save_checkpoint(state, "script")
            return state
        except Exception as e:
            print(f"⚠️  대본 생성 실패 (시도 {attempt+1}/3): {e}")
            if attempt == 2:
                state["error"] = str(e)

    return state

전체 실행 시간 최적화

python# 병렬 실행 가능한 단계는 동시에 처리
import asyncio
import concurrent.futures

async def run_parallel_agents(state: PipelineState):
    """
    썸네일 생성 + 더빙 + BGM 생성은 독립적 → 병렬 실행
    (대본 생성 완료 후 병렬 진행)
    """
    loop = asyncio.get_event_loop()

    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
        future_thumbnail = loop.run_in_executor(
            executor, node_generate_thumbnail, state
        )
        future_dubbing = loop.run_in_executor(
            executor, node_generate_dubbing, state
        )
        # BGM은 더빙 완료 후 길이 알아야 하므로 더빙 완료 후 실행

        # 썸네일 + 더빙 병렬 완료 대기
        thumbnail_state, dubbing_state = await asyncio.gather(
            future_thumbnail, future_dubbing
        )

    # 썸네일 결과 병합
    state["thumbnail_path"] = thumbnail_state["thumbnail_path"]
    state["audio_map"]      = dubbing_state["audio_map"]

    # BGM은 더빙 길이 확인 후 생성
    state = node_generate_bgm(state)
    return state

# 전체 시간 측정
import time

start = time.time()
final_state = pipeline.invoke(initial_state)
elapsed = time.time() - start

print(f"\n⏱️  총 실행 시간: {elapsed/60:.1f}분")
print(f"""
단계별 소요 시간 예상:
  대본 생성:     1~3분   (Ollama 모델 크기에 따라)
  썸네일 생성:   2~5분   (FLUX.1 dev, 28 steps)
  더빙 생성:     2~5분   (씬 수에 따라)
  인트로 영상:   10~15분 (Wan2.2 720p)
  BGM 생성:      1~2분   (MusicGen medium)
  최종 합치기:   1분     (FFmpeg)
  ─────────────────────
  총합 (순차):   17~31분
  총합 (병렬):   12~22분 (썸네일+더빙 병렬 시)
""")

✅ 완성 체크리스트

Ollama 실행 확인 (ollama serve)

응용 1~5 결과물 경로 확인

LangGraph 파이프라인 단계별 테스트 완료

CrewAI 멀티에이전트 실행 확인

체크포인트 저장/복원 동작 확인

FFmpeg 최종 합치기 완료

final_youtube.mp4 재생 및 품질 확인