docs: add how-to on multi-modal tool calling (#21667)

Can move this to a dedicated multi-modal section if desired.
pull/18991/merge
ccurme 3 weeks ago committed by GitHub
parent 5c64c004cc
commit 12b599c47f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -172,6 +172,7 @@ LangChain Tools contain a description of the tool (to pass to the language model
- [How to: add a human in the loop to tool usage](/docs/how_to/tools_human)
- [How to: do parallel tool use](/docs/how_to/tools_parallel)
- [How to: handle errors when calling tools](/docs/how_to/tools_error)
- [How to: call tools using multi-modal data](/docs/how_to/tool_calls_multi_modal)
### Agents

@ -0,0 +1,160 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4facdf7f-680e-4d28-908b-2b8408e2a741",
"metadata": {},
"source": [
"# How to call tools with multi-modal data\n",
"\n",
"Here we demonstrate how to call tools with multi-modal data, such as images.\n",
"\n",
"Some multi-modal models, such as those that can reason over images or audio, support [tool calling](/docs/concepts/#functiontool-calling) features as well.\n",
"\n",
"To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).\n",
"\n",
"Below, we demonstrate examples using [OpenAI](/docs/integrations/platforms/openai) and [Anthropic](/docs/integrations/platforms/anthropic). We will use the same image and tool in all cases. Let's first select an image, and build a placeholder tool that expects as input the string \"sunny\", \"cloudy\", or \"rainy\". We will ask the models to describe the weather in the image."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0d9fd81a-b7f0-445a-8e3d-cfc2d31fdd59",
"metadata": {},
"outputs": [],
"source": [
"from typing import Literal\n",
"\n",
"from langchain_core.tools import tool\n",
"\n",
"image_url = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\"\n",
"\n",
"\n",
"@tool\n",
"def weather_tool(weather: Literal[\"sunny\", \"cloudy\", \"rainy\"]) -> None:\n",
" \"\"\"Describe the weather\"\"\"\n",
" pass"
]
},
{
"cell_type": "markdown",
"id": "8656018e-c56d-47d2-b2be-71e87827f90a",
"metadata": {},
"source": [
"## OpenAI\n",
"\n",
"For OpenAI, we can feed the image URL directly in a content block of type \"image_url\":"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a8819cf3-5ddc-44f0-889a-19ca7b7fe77e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'call_mRYL50MtHdeNuNIjSCm5UPmB'}]\n"
]
}
],
"source": [
"from langchain_core.messages import HumanMessage\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"model = ChatOpenAI(model=\"gpt-4o\").bind_tools([weather_tool])\n",
"\n",
"message = HumanMessage(\n",
" content=[\n",
" {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n",
" {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n",
" ],\n",
")\n",
"response = model.invoke([message])\n",
"print(response.tool_calls)"
]
},
{
"cell_type": "markdown",
"id": "e5738224-1109-4bf8-8976-ff1570dd1d46",
"metadata": {},
"source": [
"Note that we recover tool calls with parsed arguments in LangChain's [standard format](/docs/how_to/tool_calling) in the model response."
]
},
{
"cell_type": "markdown",
"id": "0cee63ff-e09f-4dd8-8323-912edbde94f6",
"metadata": {},
"source": [
"## Anthropic\n",
"\n",
"For Anthropic, we can format a base64-encoded image into a content block of type \"image\", as below:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d90c4590-71c8-42b1-99ff-03a9eca8082e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'toolu_016m9KfknJqx5fVRYk4tkF6s'}]\n"
]
}
],
"source": [
"import base64\n",
"\n",
"import httpx\n",
"from langchain_anthropic import ChatAnthropic\n",
"\n",
"image_data = base64.b64encode(httpx.get(image_url).content).decode(\"utf-8\")\n",
"\n",
"model = ChatAnthropic(model=\"claude-3-sonnet-20240229\").bind_tools([weather_tool])\n",
"\n",
"message = HumanMessage(\n",
" content=[\n",
" {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n",
" {\n",
" \"type\": \"image\",\n",
" \"source\": {\n",
" \"type\": \"base64\",\n",
" \"media_type\": \"image/jpeg\",\n",
" \"data\": image_data,\n",
" },\n",
" },\n",
" ],\n",
")\n",
"response = model.invoke([message])\n",
"print(response.tool_calls)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading…
Cancel
Save