This topic describes how to stream real-time responses from the Cortex Code Agent SDK.
By default, the SDK yields complete AssistantMessage objects after the model finishes generating each response. To
receive incremental updates as text and thinking blocks are generated, enable partial message streaming by setting
includePartialMessages (TypeScript) or include_partial_messages (Python) to true.
When partial messages are enabled, Cortex Code emits StreamEvent objects for partial text and thinking content.
Complete tool calls still arrive as AssistantMessage objects, and tool results still arrive as UserMessage
objects.
When enabled, the SDK yields StreamEvent messages containing partial streaming events, in addition to the usual
AssistantMessage, UserMessage, and ResultMessage objects. Your code needs to:
Check each message’s type to distinguish StreamEvent from other types.
For StreamEvent, extract the event field and check its type.
Look for content_block_delta events where delta.type is text_delta.
import { query } from"cortex-code-agent-sdk";
forawait (const message ofquery({
prompt: "List the files in my project",
options: {
cwd: process.cwd(),
includePartialMessages: true,
allowedTools: ["Bash", "Read"],
},
})) {
if (message.type === "stream_event") {
const event = message.event;
if (event.type === "content_block_delta") {
if (event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
}
}
}
}
import asyncio
from cortex_code_agent_sdk import query, CortexCodeAgentOptions
from cortex_code_agent_sdk.types import StreamEvent
asyncdefstream_response():
asyncfor message in query(
prompt="List the files in my project",
options=CortexCodeAgentOptions(
cwd=".",
include_partial_messages=True,
allowed_tools=["Bash", "Read"],
),
):
ifisinstance(message, StreamEvent):
event = message.event
if event.get("type") == "content_block_delta":
delta = event.get("delta", {})
if delta.get("type") == "text_delta":
print(delta.get("text", ""), end="", flush=True)
asyncio.run(stream_response())
With partial messages enabled, you commonly receive messages in the following order:
SystemMessage -- session initialization
StreamEvent (content_block_start)-- text or thinking block
StreamEvent (content_block_delta)-- text_delta or thinking_delta chunks...
StreamEvent (content_block_stop)
AssistantMessage -- complete text/thinking block, or complete tool_use block
UserMessage -- complete tool_result block... more assistant/user turns ...
ResultMessage -- final result
Without partial messages enabled, you still receive the same complete assistant, user, and result messages, but not
StreamEvent. Depending on the session, the SDK can also emit system events such as initialization, status, and
background-task notifications.
The following example accumulates streamed text in a local buffer and re-renders the current response each time a new
text_delta arrives. In a real application, replace the render function with your framework’s state update
logic:
Where your configuration of Cortex Code uses a model provided on the
Model and Service Pass-Through Terms,
your use of that model is further subject to the terms for that model on that page.
The data classification of inputs and outputs are as set forth in the following table.