{width=75%}
In this post we take a look at the function calling capabilities of the open source model
NousResearch/Hermes-2-Pro-Mistral-7B
(@Hermes-2-Pro-Mistral-7B)
Start by creating a virtual environment:
python3 -m venv env
source env/bin/activate
Then install:
pip install openai
pip install python-dotenv # or define your environment variables differently
pip install langchain # utilities for converting functions to OpenAI tools format.
I also have:
NousResearch/Hermes-2-Pro-Mistral-7B
.In my .env
file I have the following:
OPENAI_API_KEY=your_key
HUGGING_FACE_ACCESS_TOKEN=your_key
HUGGING_FACE_ENDPOINT_URL=url_for_endpoint
TOGETHER_AI_BASE_URL=https://api.together.xyz/v1
TOGETHER_API_KEY=your_key
# ruff: noqa: F403, F405, F811
import os
from dotenv import load_dotenv
load_dotenv()
HUGGING_FACE_ACCESS_TOKEN = os.environ["HUGGING_FACE_ACCESS_TOKEN"]
HUGGING_FACE_ENDPOINT_URL = os.environ["HUGGING_FACE_ENDPOINT_URL"]
TOGETHER_API_KEY = os.environ["TOGETHER_API_KEY"]
TOGETHER_AI_BASE_URL = os.environ["TOGETHER_AI_BASE_URL"]
In a previous blog post I discussed how we can use the OpenAI python client to run inference with open source models through services that are OpenAI compatible. I'm going to copy part of the code here.
import ast
import json
import random
from datetime import datetime, timedelta
from typing import Any, Dict, Optional, Union
from langchain.tools import tool
from langchain_core.utils.function_calling import convert_to_openai_tool
from openai import OpenAI
from openai._streaming import Stream
from openai.types.chat.chat_completion import ChatCompletion
from openai.types.chat.chat_completion_chunk import ChatCompletionChunk
today = datetime.now().strftime("%A %Y-%m-%d")
class OpenAIChatCompletion:
clients: Dict = dict()
@classmethod
def _load_client(cls, base_url: Optional[str] = None, api_key: Optional[str] = None) -> OpenAI:
client_key = (base_url, api_key)
if OpenAIChatCompletion.clients.get(client_key) is None:
OpenAIChatCompletion.clients[client_key] = OpenAI(base_url=base_url, api_key=api_key)
return OpenAIChatCompletion.clients[client_key]
def __call__(
self,
model: str,
messages: list,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
**kwargs: Any,
) -> Union[ChatCompletion, Stream[ChatCompletionChunk]]:
# https://platform.openai.com/docs/api-reference/chat/create
# https://github.com/openai/openai-python
client = self._load_client(base_url, api_key)
return client.chat.completions.create(model=model, messages=messages, **kwargs)
Simply use it like this.
llm = OpenAIChatCompletion()
print(llm(model="gpt-3.5-turbo-0125", messages=[dict(role="user", content="Hello!")]))
We can also use the same class to run inference with Hermes-2-Pro-Mistral-7B
through a Hugging Face Inference endpoint.
You don't need to use an inference endpoint to run this model. You could use the transformers library directly
and run it locally. Remember to use the proper prompt format. I'm using the messages format.
print(
llm(
model="tgi",
api_key=HUGGING_FACE_ACCESS_TOKEN,
base_url=HUGGING_FACE_ENDPOINT_URL,
messages=[
dict(
role="system",
content="You are an OpenSource LLM that rivals OpenAI GPT. Your goal is to bring open source AI to everyone!",
),
dict(role="user", content="Explain why open source AI is important."),
],
max_tokens=2000,
temperature=1,
)
.choices[0]
.message.content
)
First we will define some functions/tools which the LLM will have access to.
Here I use langchain
to convert the Python functions into the tools
format used
by OpenAI. It's much faster than writing those JSON objects by hand.
Note that Hermes-2-Pro-Mistral-7B
also uses this same format!
I am leaving out the actual logic for each function. I mainly want to test the models ability to pick out the correct function and arguments. The important step here is to document each function and argument.
@tool
def get_weather_forecast(location: str, date: str) -> str:
"""
Provides a weather forecast for a given location and date.
Args:
location (str): The name of the city and state, e.g. 'San Francisco, CA'.
date (str): The date of the forecast in YYYY-MM-DD format, e.g. '2023-07-01'.
Returns:
str: A string containing the weather forecast, e.g. 'Partly cloudy with a high of 72F (22C).'
"""
pass
@tool
def book_flight(
departure_city: str,
arrival_city: str,
departure_date: str,
return_date: str,
num_passengers: int,
cabin_class: str,
) -> dict:
"""
Book a round-trip flight for the given parameters.
Args:
departure_city (str): The full city name with the departure airport, e.g. "Toronto".
arrival_city (str): The full city name with the arrival airport, e.g. "Austin".
departure_date (str): The departure date in YYYY-MM-DD format.
return_date (str): The return date in YYYY-MM-DD format.
num_passengers (int): The number of passengers.
cabin_class (str): The cabin class, e.g. "economy", "business", "first".
Returns:
dict: A dict with the booking details including airline, flight numbers, price and booking confirmation code.
"""
pass
@tool
def book_movie_tickets(movie_name: str, theater_name: str, date: str, time: str, num_tickets: int) -> dict:
"""
Book movie tickets for the given movie, theater, date, time, and number of tickets.
Args:
movie_name (str): The name of the movie.
theater_name (str): The name of the theater.
date (str): The date of the movie showing (YYYY-MM-DD).
time (str): The time of the movie showing (HH:MM).
num_tickets (int): The number of tickets to book for the movie.
Returns:
dict: Returns a dictionary with booking details if successful, otherwise returns a dictionary with an error message.
"""
pass
@tool
def translate_text(text: str, target_language: str) -> str:
"""
Translate the given text into the specified target language.
Args:
text (str): The text to be translated.
target_language (str): The target language code (e.g., 'es' for Spanish, 'fr' for French).
Returns:
str: The translated text in the target language.
"""
pass
@tool
def get_recipe(dish_name: str) -> str:
"""
Returns a recipe for the given dish name.
Args:
dish_name (str): The name of the dish to get the recipe for.
Returns:
str: A string containing the recipe instructions.
"""
pass
@tool
def solve_math_problem(problem: str) -> str:
"""
Solves a given math equation using a symbolic math library.
Simply pass in the equation.
Args:
problem (str): The equation to be solved.
Returns:
str: The solution to the equation.
"""
pass
@tool
def send_slack_message(channel_name: str, message: str) -> bool:
"""
Send a message to a Slack channel.
Args:
channel_name (str): The name of the channel.
message (str): The message to be sent.
Returns:
bool: True if the message was sent successfully, False otherwise.
"""
pass
functions = [
get_weather_forecast,
book_flight,
book_movie_tickets,
translate_text,
get_recipe,
solve_math_problem,
send_slack_message,
]
tools = [convert_to_openai_tool(f) for f in functions]
Here is an example of two of the tool definitions. Note that this is the same tools
format used by OpenAI.
tools[0]
tools[-1]
Here is a list of questions to test out the function calling capabilities. For each question we have the text and the ground truth expected function name and arguments. This way we can have a mini evaluation for how well the function calling works.
questions = [
{
"question": "What will the weather be like in Seattle, WA tomorrow?",
"tool_calls": [
{
"name": "get_weather_forecast",
"arguments": {
"location": "Seattle, WA",
"date": (datetime.now() + timedelta(days=1)).strftime("%Y-%m-%d"),
},
}
],
},
{
"question": "What's the forecast for Miami for today?",
"tool_calls": [
{
"name": "get_weather_forecast",
"arguments": {"location": "Miami, FL", "date": datetime.now().strftime("%Y-%m-%d")},
}
],
},
{
"question": "Will I need an umbrella in New York City two days from now?",
"tool_calls": [
{
"name": "get_weather_forecast",
"arguments": {
"location": "New York City, NY",
"date": (datetime.now() + timedelta(days=2)).strftime("%Y-%m-%d"),
},
}
],
},
{
"question": "Book me a round-trip flight from New York City to Los Angeles departing on June 15th and returning June 22nd for 2 passengers in economy class.",
"tool_calls": [
{
"name": "book_flight",
"arguments": {
"departure_city": "NYC",
"arrival_city": "LAX",
"departure_date": datetime(datetime.now().year, 6, 15).strftime("%Y-%m-%d"),
"return_date": datetime(datetime.now().year, 6, 22).strftime("%Y-%m-%d"),
"num_passengers": 2,
"cabin_class": "economy",
},
}
],
},
{
"question": "I need to book a first class round-trip flight for 4 people from Chicago to Miami. We want to leave on December 1 and return on December 12.",
"tool_calls": [
{
"name": "book_flight",
"arguments": {
"departure_city": "Chicago",
"arrival_city": "Miami",
"departure_date": datetime(datetime.now().year, 12, 1).strftime("%Y-%m-%d"),
"return_date": datetime(datetime.now().year, 12, 12).strftime("%Y-%m-%d"),
"num_passengers": 4,
"cabin_class": "first",
},
}
],
},
{
"question": "I want to book 3 tickets for The Super Mario Bros. Movie at AMC Empire 25 on April 7th at 7:30 PM.",
"tool_calls": [
{
"name": "book_movie_tickets",
"arguments": {
"movie_name": "The Super Mario Bros. Movie",
"theater_name": "AMC Empire 25",
"date": datetime(datetime.now().year, 4, 7).strftime("%Y-%m-%d"),
"time": "19:30",
"num_tickets": 3,
},
}
],
},
{
"question": "Book 2 tickets for Guardians of the Galaxy Vol. 3 at Regal Union Square on May 5th for the 9:45 PM show.",
"tool_calls": [
{
"name": "book_movie_tickets",
"arguments": {
"movie_name": "Guardians of the Galaxy Vol. 3",
"theater_name": "Regal Union Square",
"date": datetime(datetime.now().year, 5, 5).strftime("%Y-%m-%d"),
"time": "21:45",
"num_tickets": 2,
},
}
],
},
{
"question": "How do you say 'Hello, how are you?' in Spanish?",
"tool_calls": [
{
"name": "translate_text",
"arguments": {"text": "Hello, how are you?", "target_language": "es"},
}
],
},
{
"question": "Translate 'I love programming' to French.",
"tool_calls": [
{
"name": "translate_text",
"arguments": {"text": "I love programming", "target_language": "fr"},
}
],
},
{
"question": "How do I make pesto?",
"tool_calls": [{"name": "get_recipe", "arguments": {"dish_name": "pesto"}}],
},
{
"question": "What's a good vegan chili recipe?",
"tool_calls": [{"name": "get_recipe", "arguments": {"dish_name": "vegan chili"}}],
},
{
"question": "Can you give me a recipe for chocolate chip cookies?",
"tool_calls": [{"name": "get_recipe", "arguments": {"dish_name": "chocolate chip cookies"}}],
},
{
"question": "Solve the equation: x^2 + 2x + 1=0.",
"tool_calls": [{"name": "solve_math_problem", "arguments": {"problem": "x^2 + 2x + 1=0"}}],
},
{
"question": "Solve the equation: 3x - 7 = 5x + 9",
"tool_calls": [{"name": "solve_math_problem", "arguments": {"problem": "3x - 7 = 5x + 9"}}],
},
{
"question": "Solve the equation: sin(x) = 0",
"tool_calls": [{"name": "solve_math_problem", "arguments": {"problem": "sin(x) = 0"}}],
},
{
"question": "Send a message to the general channel on Slack saying 'Hello, world!'",
"tool_calls": [
{
"name": "send_slack_message",
"arguments": {"channel_name": "general", "message": "Hello, world!"},
}
],
},
{
"question": "Send a message to the sales-team channel on Slack with the message: 'Please register for the conference.'",
"tool_calls": [
{
"name": "send_slack_message",
"arguments": {
"channel_name": "sales-team",
"message": "Please register for the conference.",
},
}
],
},
{
"question": "Send a message to the office-updates channel with the message 'FOOD IS HERE!'",
"tool_calls": [
{
"name": "send_slack_message",
"arguments": {"channel_name": "office-updates", "message": "FOOD IS HERE!"},
}
],
},
]
random.shuffle(tools)
random.shuffle(questions)
First we will use gpt-3.5-turbo-0125
to extract the function name and arguments for each question.
def extract_tool_calls(resp):
resp = resp.choices[0].message
if resp.tool_calls:
final_tools = []
for tool_call in resp.tool_calls:
final_tools.append(
{
"name": tool_call.function.name,
"arguments": json.loads(tool_call.function.arguments),
}
)
return final_tools
else:
return None
I'm going to use GPT4 to check the "correctness" of the predicted/generated function arguments by comparing them with the expected arguments. This step is completely optional. Instead, you could use exact string matching or something else. I was curious to see how this would work though.
def check_tool_call_arguments(expected, predicted):
# Ask GPT4 if the expected function name and arguments are the same as the predicted function name and arguments.
if expected["name"] != predicted["name"]:
return False, f'Function Names Do not Match. Expected {expected["name"]}. Predicted: {predicted["name"]}'
prompt = f"""
Check if the following queries are approx equal. Use fuzzy logic matching for strings.
Check to see if the arguments are semantically similar, especially for free form text.
If you decide they are equivalent then return TRUE and only TRUE with no other explanation.
Otherwise return FALSE and give an explanation why they don't match.
Expected Arguments: {expected['arguments']}
Predicted Arguments: {predicted['arguments']}
"""
resp = llm(model="gpt-4-0125-preview", messages=[dict(role="user", content=prompt)])
if resp.choices[0].message.content.lower().strip() == "true":
return True, None
explanation = resp.choices[0].message.content.lower().strip()
return False, explanation
Okay, let's loop over the questions and use gpt-3.5-turbo-0125
to extract the function name and arguments.
def eval_openai_inference_models(model="gpt-3.5-turbo-0125", base_url=None, api_key=None):
total = 0
total_correct = 0
for question in questions:
resp = llm(
api_key=api_key,
base_url=base_url,
model=model,
tools=tools,
messages=[
dict(role="system", content=f"The date today is {today}"),
dict(role="user", content=question["question"]),
],
)
tool_calls = extract_tool_calls(resp)
if tool_calls is None:
print(f'Model {model} failed to return any tool calls for question {question["question"]}')
total += 1
continue
assert len(tool_calls) == len(question["tool_calls"])
for tool_call, expected_call in zip(tool_calls, question["tool_calls"]):
correct_call, explanation = check_tool_call_arguments(expected_call, tool_call)
if not correct_call:
print(f'QUESTION: {question["question"]}')
print(f'EXPECTED Tool Call: {question["tool_calls"][0]}')
print(f"GENERATED Tool Call: {tool_call}")
print(f"EXPLANATION: {explanation}\n\n")
else:
total_correct += 1
total += 1
return total_correct, total
model = "gpt-3.5-turbo-0125"
total_correct, total = eval_openai_inference_models(model=model, base_url=None, api_key=None)
print(
f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.'
)
model = "gpt-4-0125-preview"
total_correct, total = eval_openai_inference_models(model=model, base_url=None, api_key=None)
print(
f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.'
)
model = "mistralai/Mistral-7B-Instruct-v0.1"
total_correct, total = eval_openai_inference_models(model=model, base_url=TOGETHER_AI_BASE_URL, api_key=TOGETHER_API_KEY)
print(
f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.'
)
model = "mistralai/Mixtral-8x7B-Instruct-v0.1"
total_correct, total = eval_openai_inference_models(model=model, base_url=TOGETHER_AI_BASE_URL, api_key=TOGETHER_API_KEY)
print(
f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.'
)
::: {.callout-warning}
Both models had issues with the pesto question. I wonder if this is something on together.ai's end of things and how they implemented this function calling feature. IDK! :::
Now we will repeat with NousResearch/Hermes-2-Pro-Mistral-7B
.
The format for the function calling is documented
on the model card as well as in this repo. The way we define the tools is the same format as with OpenAI.
However, we don't pass in a tools
argument. Rather, we use a special system
prompt which defines the tools.
def extract_tool_calls(tool_calls_str):
tool_calls = tool_calls_str.split("</tool_call>\n")
parsed_results = []
for tool_call in tool_calls:
if tool_call:
dict_str = tool_call.split("\n")[1]
tool_call_dict = ast.literal_eval(dict_str)
parsed_results.append({"arguments": tool_call_dict["arguments"], "name": tool_call_dict["name"]})
return parsed_results
system_prompt = (
f"The date today is {today}\n"
+ """
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
<tools>
"""
+ str(tools)
+ """
</tools> Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']} For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{'arguments': <args-dict>, 'name': <function-name>}
</tool_call>
"""
)
total = 0
total_correct = 0
for question in questions:
resp = llm(
model="tgi",
base_url=HUGGING_FACE_ENDPOINT_URL,
api_key=HUGGING_FACE_ACCESS_TOKEN,
messages=[
dict(role="system", content=system_prompt),
dict(role="user", content=question["question"]),
],
max_tokens=500,
)
tool_calls = extract_tool_calls(resp.choices[0].message.content)
assert len(tool_calls) == len(question["tool_calls"])
for tool_call, expected_call in zip(tool_calls, question["tool_calls"]):
correct_call, explanation = check_tool_call_arguments(expected_call, tool_call)
if not correct_call:
print(f'QUESTION: {question["question"]}')
print(f'EXPECTED Tool Call: {question["tool_calls"][0]}')
print(f"GENERATED Tool Call: {tool_call}")
print(f"EXPLANATION: {explanation}\n\n")
else:
total_correct += 1
total += 1
print(
f'Correctly called the proper functions {total_correct} times out of {total}. But check the "failure" cases above since they may be correct anyway.'
)
Wow, it got all of them correct! It may not get them all correct every time. Run it over again to see if any mistakes are made.
Sometimes I saw it forgetting to fill in num_tickets
for example.
Let's look at a single question to see the output from the model.
today
question = "I want to go see Dune 2 on Wednesday night with 5 of my friends. We will be going to the Halifax Bayers Lake Ciniplex Theatre. Get tickets for the 7pm show. Thanks!"
resp = llm(
model="tgi",
base_url=HUGGING_FACE_ENDPOINT_URL,
api_key=HUGGING_FACE_ACCESS_TOKEN,
messages=[
dict(role="system", content=system_prompt),
dict(role="user", content=question),
],
)
resp
print(resp.choices[0].message.content)
tool_calls = extract_tool_calls(resp.choices[0].message.content)
tool_calls
The model also supports multiple function calls!
tasks = f"""
Today's date is {today}.
Please complete the following tasks for me:
1. I want to go see Dune 2 on Monday night with 5 of my friends. We will be going to the Halifax Bayers Lake Ciniplex Theatre. Get tickets for the 7pm show.
2. Please check the weather for Monday night so I know how to dress.
3. Also please book my plane ticket to Toronto. I will be leaving Tuesday and coming back 2 days later on Thursday. First class please.
4. Send a slack message to the research channel to let them know I will not be there this week in the office.
"""
resp = llm(
model="tgi",
base_url=HUGGING_FACE_ENDPOINT_URL,
api_key=HUGGING_FACE_ACCESS_TOKEN,
messages=[
dict(role="system", content=system_prompt),
dict(role="user", content=tasks),
],
max_tokens=1000,
)
tool_calls = extract_tool_calls(resp.choices[0].message.content)
tool_calls
Impressive!
You can take the arguments, and pass them into the actual function, and give back the results to the model. See the model card or repo on how to do that.
There is JSON Mode support too!
I'm just getting started with playing around with this powerful open source model. I can't wait to explore it more!