Trung BìnhAI Engineering iconAI Engineering

Implement retry với exponential backoff cho LLM API call (handle rate limit, timeout).

LLM API fail vì nhiều lý do: rate limit (429), provider overload (503), timeout, network jitter.

Retry logic tốt là yêu cầu cơ bản trong production.

python
import time, random, logging
from typing import Callable, TypeVar
from openai import OpenAI, RateLimitError, APIError, APITimeoutError

T = TypeVar("T")
log = logging.getLogger(__name__)

def retry_with_backoff(
    fn: Callable[[], T],
    *,
    max_attempts: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    jitter: bool = True,
    retryable=(RateLimitError, APITimeoutError, APIError),
) -> T:
    """Exponential backoff with jitter + respect Retry-After header."""
    for attempt in range(max_attempts):
        try:
            return fn()
        except retryable as e:
            if attempt == max_attempts - 1:
                raise  # last attempt, bubble up
            
            # Respect Retry-After header (OpenAI trả về khi 429)
            retry_after = getattr(e, "retry_after", None)
            if retry_after:
                delay = float(retry_after)
            else:
                # Exponential: 1s, 2s, 4s, 8s, 16s, capped
                delay = min(base_delay * 2**attempt, max_delay)
                if jitter:
                    # "Full jitter" — chống thundering herd
                    delay = random.uniform(0, delay)
            
            log.warning(
                f"Attempt {attempt+1}/{max_attempts} failed: {e}. "
                f"Retrying in {delay:.1f}s"
            )
            time.sleep(delay)

# --- USAGE ---
client = OpenAI()

def call_llm(prompt: str) -> str:
    return retry_with_backoff(
        lambda: client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            timeout=30,
        ).choices[0].message.content,
        max_attempts=5,
    )

Version async (production thực tế):

python
import asyncio
from openai import AsyncOpenAI

async def retry_async(fn, max_attempts=5, base=1.0, cap=60.0):
    for i in range(max_attempts):
        try:
            return await fn()
        except (RateLimitError, APITimeoutError) as e:
            if i == max_attempts - 1: raise
            delay = min(base * 2**i, cap)
            delay = random.uniform(0, delay)
            await asyncio.sleep(delay)

Thư viện production ready (khuyến nghị thay vì tự viết):

  • tenacity — Python decorator mạnh:
python
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type

@retry(
    wait=wait_exponential(multiplier=1, min=1, max=60),
    stop=stop_after_attempt(5),
    retry=retry_if_exception_type((RateLimitError, APITimeoutError)),
)
def call_llm(prompt): ...
  • backoff — Python library, decorator đơn giản.
  • OpenAI SDK built-in — SDK mới có max_retries param sẵn: OpenAI(max_retries=5).

Best practices production:

1. Idempotency key (OpenAI support idempotency_key) — tránh duplicate billing khi retry.
2. Respect Retry-After header — provider nói 30s thì không spam retry sớm hơn.
3. Jitter — full jitter (random 0-delay) chống thundering herd khi nhiều client cùng retry.
4. Different strategies per error:
- 429 rate limit → wait theo Retry-After.
- 500/503 server error → exponential backoff.
- 400/401/403 → KHÔNG retry (lỗi request).
- Timeout → retry nhưng giới hạn.
5. Circuit breaker — nếu error rate > threshold → trip, fallback sang provider khác hoặc reject sớm. Library: pybreaker.
6. Fallback model — primary fail → downgrade sang model khác (GPT-4o → Claude 3.5 Sonnet → Haiku).
7. Budget retry — giới hạn tổng retry per user/feature để tránh runaway cost.
8. Log với trace ID — mỗi attempt log với request_id để debug.
9. Metrics — track retry rate, success-after-retry rate; spike → investigate.
10. Deadline budget — với user-facing request, tổng latency có ceiling (VD 10s). Dynamic reduce retry attempts khi gần deadline.

Gateway giải pháp: LiteLLM, Portkey handle retry/fallback/circuit breaker transparently → không cần code riêng.

Xem toàn bộ AI Engineering cùng filter theo level & chủ đề con.

Mở danh sách AI Engineering