mirror of https://github.com/hwchase17/langchain
docs: add how-to on multi-modal tool calling (#21667)
Can move this to a dedicated multi-modal section if desired.pull/18991/merge
parent
5c64c004cc
commit
12b599c47f
@ -0,0 +1,160 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4facdf7f-680e-4d28-908b-2b8408e2a741",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to call tools with multi-modal data\n",
|
||||
"\n",
|
||||
"Here we demonstrate how to call tools with multi-modal data, such as images.\n",
|
||||
"\n",
|
||||
"Some multi-modal models, such as those that can reason over images or audio, support [tool calling](/docs/concepts/#functiontool-calling) features as well.\n",
|
||||
"\n",
|
||||
"To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).\n",
|
||||
"\n",
|
||||
"Below, we demonstrate examples using [OpenAI](/docs/integrations/platforms/openai) and [Anthropic](/docs/integrations/platforms/anthropic). We will use the same image and tool in all cases. Let's first select an image, and build a placeholder tool that expects as input the string \"sunny\", \"cloudy\", or \"rainy\". We will ask the models to describe the weather in the image."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "0d9fd81a-b7f0-445a-8e3d-cfc2d31fdd59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Literal\n",
|
||||
"\n",
|
||||
"from langchain_core.tools import tool\n",
|
||||
"\n",
|
||||
"image_url = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"@tool\n",
|
||||
"def weather_tool(weather: Literal[\"sunny\", \"cloudy\", \"rainy\"]) -> None:\n",
|
||||
" \"\"\"Describe the weather\"\"\"\n",
|
||||
" pass"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8656018e-c56d-47d2-b2be-71e87827f90a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## OpenAI\n",
|
||||
"\n",
|
||||
"For OpenAI, we can feed the image URL directly in a content block of type \"image_url\":"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a8819cf3-5ddc-44f0-889a-19ca7b7fe77e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'call_mRYL50MtHdeNuNIjSCm5UPmB'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_core.messages import HumanMessage\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(model=\"gpt-4o\").bind_tools([weather_tool])\n",
|
||||
"\n",
|
||||
"message = HumanMessage(\n",
|
||||
" content=[\n",
|
||||
" {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n",
|
||||
" {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n",
|
||||
" ],\n",
|
||||
")\n",
|
||||
"response = model.invoke([message])\n",
|
||||
"print(response.tool_calls)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e5738224-1109-4bf8-8976-ff1570dd1d46",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that we recover tool calls with parsed arguments in LangChain's [standard format](/docs/how_to/tool_calling) in the model response."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0cee63ff-e09f-4dd8-8323-912edbde94f6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Anthropic\n",
|
||||
"\n",
|
||||
"For Anthropic, we can format a base64-encoded image into a content block of type \"image\", as below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "d90c4590-71c8-42b1-99ff-03a9eca8082e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'toolu_016m9KfknJqx5fVRYk4tkF6s'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import base64\n",
|
||||
"\n",
|
||||
"import httpx\n",
|
||||
"from langchain_anthropic import ChatAnthropic\n",
|
||||
"\n",
|
||||
"image_data = base64.b64encode(httpx.get(image_url).content).decode(\"utf-8\")\n",
|
||||
"\n",
|
||||
"model = ChatAnthropic(model=\"claude-3-sonnet-20240229\").bind_tools([weather_tool])\n",
|
||||
"\n",
|
||||
"message = HumanMessage(\n",
|
||||
" content=[\n",
|
||||
" {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n",
|
||||
" {\n",
|
||||
" \"type\": \"image\",\n",
|
||||
" \"source\": {\n",
|
||||
" \"type\": \"base64\",\n",
|
||||
" \"media_type\": \"image/jpeg\",\n",
|
||||
" \"data\": image_data,\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" ],\n",
|
||||
")\n",
|
||||
"response = model.invoke([message])\n",
|
||||
"print(response.tool_calls)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
Loading…
Reference in New Issue