Skip to main content

Soundtrack with an LLM

Use the Epidemic Sound API + an LLM to automatically find and adapt music for your videos.

Workflow Overview:

Authenticate → Analyze Video (LLM) → Search Music → Preview → Adapt Length

1. Authentication

The Epidemic Sound API uses a two-token flow. Your backend obtains a Partner Token (never exposed to clients), then requests a User Token for each end user. The User Token is safe to use in frontend code.

Endpoints
  • POST /v0/partner-token
  • POST /v0/token

See the full authentication docs for details.

Token Lifecycle

TokenWhere to storeTTL
Partner TokenBackend only1 day
User TokenClient / frontend7 days

Example: Get a Partner Token

curl -X POST https://partner-content-api.epidemicsound.com/v0/partner-token \
-H "Content-Type: application/json" \
-d '{
"accessKeyId": "YOUR_KEY_ID",
"accessKeySecret": "YOUR_KEY_SECRET"
}'

Example: Get a User Token

curl -X POST https://partner-content-api.epidemicsound.com/v0/token \
-H "Authorization: Bearer PARTNER_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "userId": "unique-user-id" }'

Use the User Token in subsequent API calls: Authorization: Bearer USER_TOKEN.

2. Video Analysis with an LLM

Before searching for music, analyze your video to extract mood, tempo, and other attributes. You can use any LLM (Gemini, GPT-4, Claude, etc.) that supports video or image input.

What you build

Upload a video (or YouTube URL) to your LLM and instruct it to return a JSON object with music search parameters.

Suggested LLM Response Schema

Instruct your LLM to return JSON with a term field that maps directly to the search API. The search endpoint supports semantic search, so the LLM should describe the desired music in natural language — including mood, energy, tempo, genre, instruments, and vocal style — rather than using structured filter parameters.

{
"type": "object",
"properties": {
"term": {
"type": "string",
"description": "Natural-language search query for finding music (max 500 characters). Describe the desired mood, energy, tempo, genre, instruments, and vocal style as a flowing description. The search engine understands semantic queries like 'warm uplifting indie folk with fingerpicked acoustic guitar, around 110 bpm, happy and hopeful energy' better than structured filters."
}
}
}

Example output:

{
"term": "uplifting indie folk with warm fingerpicked acoustic guitar and soft strings, happy and hopeful energy, around 100-120 bpm, instrumental"
}
Guiding the model

Some LLMs support the JSON Schema examples keyword in the schema, which is useful for steering the output. If the LLM you use does not support schema examples, you can put example values in your prompt text instead—e.g. an "Example output:" block like the one above, or example phrases in the field descriptions.

3. Search Music

Call the search endpoint with the query generated by your LLM.

Endpoint

GET /v0/tracks/search

See the search endpoint docs for details.

Example Request

curl "https://partner-content-api.epidemicsound.com/v0/tracks/search?term=uplifting%20indie%20folk&limit=10" \
-H "Authorization: Bearer USER_TOKEN"

Key Query Parameters

ParameterDescription
termSearch query — supports keywords (genres, moods, instruments) and semantic search with natural language
limitResults per page (default 50, max 60)
offsetPagination offset
sortRelevance, Date, Title, Popularity, BPM, Duration

Response Fields to Display

Each track in the response includes:

  • id – unique track ID (use with /v0/tracks/{id}/stream for playback)
  • title – track title
  • mainArtist – artist name
  • bpm – beats per minute
  • length – duration in seconds
  • imageUrl – cover art URL

Instead of having the LLM return a search query that your code executes, you can give the LLM a search_music tool and let it search autonomously. This approach lets the LLM:

  • Run multiple searches from different angles (e.g., varying the genre, energy, or instrumentation across queries)
  • Review results and refine its query if the first search returned poor matches
  • Filter out obvious misfits (e.g., vocal tracks when instrumental is needed)

Define a search tool

Declare a function the LLM can call. Keep the interface simple — just a natural-language query:

{
"name": "search_music",
"description": "Search Epidemic Sound's catalog using semantic search. Returns up to 30 tracks with metadata.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural-language search query (max 500 characters). Describe the feel, mood, genre, instruments, and energy as a flowing description."
}
},
"required": ["query"]
}
}

Implement the tool handler

When the LLM calls search_music, your backend executes the search against the Epidemic Sound API and returns the results:

async function handleSearchMusic({ query }) {
const response = await fetch(
`https://partner-content-api.epidemicsound.com/v0/tracks/search?term=${encodeURIComponent(
query
)}&limit=30`,
{ headers: { Authorization: `Bearer ${userToken}` } }
)
return await response.json()
}

Run a multi-turn loop

Send the context (video, text brief, or user prompt) to the LLM with a system prompt that instructs it to call search_music, review the results, and return a final selection:

  1. Send context + prompt to the LLM with the tool declaration
  2. LLM calls search_music (one or more times) — your backend executes each call and returns the results
  3. LLM reviews the returned tracks, optionally refines with another search
  4. LLM returns a final JSON response with selected_track_ids from the search results

The LLM can call the tool multiple times in one session, exploring different angles before settling on its recommendations.

4. Preview Playback

Display search results in a list with cover art, title, artist, BPM, and duration. Add a play/pause button to preview each track.

Streaming Endpoint
  • GET /v0/tracks/{trackId}/stream

See the stream endpoint docs for details.

How Streaming Works

  • The stream endpoint returns a manifest URL (HLS format) for adaptive streaming.
  • Audio is encoded in AAC format (smaller footprint than MP3 for similar quality).
  • To play the manifest URL: use hls.js in web browsers; Safari/iOS have native support; ExoPlayer on Android.

Preview Player Example

<audio id="preview"></audio>
<script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script>

<script>
const audio = document.getElementById('preview')

async function playTrack(trackId, userToken) {
const res = await fetch(
`https://partner-content-api.epidemicsound.com/v0/tracks/${trackId}/stream`,
{ headers: { Authorization: `Bearer ${userToken}` } }
)
const { url: streamUrl } = await res.json()

if (Hls.isSupported()) {
const hls = new Hls()
hls.loadSource(streamUrl)
hls.attachMedia(audio)
hls.on(Hls.Events.MANIFEST_PARSED, () => audio.play())
} else if (audio.canPlayType('application/vnd.apple.mpegurl')) {
audio.src = streamUrl
audio.play()
}
}
</script>

5. Select & Adapt Length

Once the user picks a track, you can generate an edited version that matches their video duration. The Epidemic Sound API provides Edit Versions endpoints for this (a.k.a. "adapt length").

Endpoints (Beta)
  • POST /v0/tracks/{trackId}/versions
  • GET /v0/tracks/{trackId}/versions/{jobId}

See the edit versions endpoint docs for details.

Flow

  1. Start a job: POST to /v0/tracks/{trackId}/versions with your target duration in milliseconds (1 second to 5 minutes max).
curl -X POST "https://partner-content-api.epidemicsound.com/v0/tracks/123456/versions" \
-H "Authorization: Bearer USER_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "targetDurationMs": 30000 }'

The response includes a jobId.

  1. Poll for status: GET /v0/tracks/{trackId}/versions/{jobId} until status is COMPLETED.
curl "https://partner-content-api.epidemicsound.com/v0/tracks/123456/versions/JOB_ID" \
-H "Authorization: Bearer USER_TOKEN"
  1. Use the result: When complete, the response contains preview and download URLs for the adapted track.

Job Status Values

StatusMeaning
PENDINGJob queued
IN_PROGRESSProcessing
COMPLETEDReady – URLs available
FAILEDError occurred

Longer durations increase processing time. Poll every 2–5 seconds until complete.