Intro
DSPy
kept popping up on my X timeline and I thought it looked pretty interesting, so I decided to take a few days to look into it. I didn't get super deep into it yet, but I think I have a high level understanding. The library is fairly new IMO (as of writing this). There is excitement around it though and a growing community. I am hopeful that the documentation and library will continue to improve throughout the year. If you are completely new to DSPy
I would suggest the following resources below.
Read through the newer documentation here .
Checkout the README from DSPY
GitHub repo and the examples there.
Try and code up some simple examples on your own data.
Checkout the Discord server .
Skim through or read some of the associated papers (see the paper links on the DSPy
repo README ). For example:
There are also some decent videos on YouTube. Simply Search for DSPy
LLM etc.
Follow Omar Khattab
ENV Setup
python3 -m venv env
source env/bin/activate
pip install dspy-ai
pip install openai --upgrade
pip install --upgrade notebook ipywidgets
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
BIG-Bench Hard Dataset - Penguins In a Table - Example
Within the BIG-Bench Hard dataset [@suzgun2022challenging] there are various tasks. You can use one of these strings when using load_dataset
to load in the corresponding records for that task.
['tracking_shuffled_objects_seven_objects', 'salient_translation_error_detection', 'tracking_shuffled_objects_three_objects', 'geometric_shapes', 'object_counting', 'word_sorting', 'logical_deduction_five_objects', 'hyperbaton', 'sports_understanding', 'logical_deduction_seven_objects', 'multistep_arithmetic_two', 'ruin_names', 'causal_judgement', 'logical_deduction_three_objects', 'formal_fallacies', 'snarks', 'boolean_expressions', 'reasoning_about_colored_objects', 'dyck_languages', 'navigate', 'disambiguation_qa', 'temporal_sequences', 'web_of_lies', 'tracking_shuffled_objects_five_objects', 'penguins_in_a_table', 'movie_recommendation', 'date_understanding']
We will use the penguins_in_a_table
task.
import dspy
from datasets import load_dataset
ds = load_dataset("maveriq/bigbenchhard", "penguins_in_a_table")["train"]
examples = [dspy.Example({"question": r["input"], "answer": r["target"]}).with_inputs("question") for r in ds]
print(f"There are {len(examples)} examples.")
trainset = examples[0:20]
valset = examples[20:]
example = trainset[10]
for k, v in example.items():
print(f"\n{k.upper()}:\n")
print(v)
QUESTION:
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We then delete the penguin named Bernard from the table.
How many penguins are more than 8 years old?
Options:
(A) 1
(B) 2
(C) 3
(D) 4
(E) 5
ANSWER:
(A)
We will use the DSPy
OpenAI connector to make calls to gpt-3.5. Note that DSPy
caches
API calls so that subsequent calls with the same input will read from the cache instead of calling the OpenAI API a second time.
llm = dspy.OpenAI(model="gpt-3.5-turbo-0125", max_tokens=250)
dspy.settings.configure(lm=llm)
We can test that the calls to OpenAI are working:
llm("Testing testing, is anyone out there?")
llm(example.question)
At any point we can look at the last n
calls to the llm:
llm.inspect_history(n=2)
Testing testing, is anyone out there? Hello! I'm here to help. What can I assist you with today?
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We then delete the penguin named Bernard from the table.
How many penguins are more than 8 years old?
Options:
(A) 1
(B) 2
(C) 3
(D) 4
(E) 5 There are 2 penguins who are more than 8 years old: Vincent (9 years old) and Gwen (8 years old).
Therefore, the answer is (B) 2.
Our evaluation metric will check if the llm output contains the correct multiple choice
answer. To define an evaluation metric in DSPy
we create a function like the example below. The first two inputs
should be instances of dspy.Example
. The metric function can contain any logic you need to evaluate your task. You can read more about the trace
argument in the documentation . It needs to be there, even if not explicitly using it.
import re
def eval_metric(true, prediction, trace=None):
pred = prediction.answer
matches = re.findall(r"\([A-Z]\)", pred)
parsed_answer = matches[-1] if matches else ""
return parsed_answer == true.answer
We set up an evaluation pipeline:
from dspy.evaluate import Evaluate
evaluate = Evaluate(devset=valset, metric=eval_metric, num_threads=6, display_progress=True, display_table=10)
Here is a simple module in DSPy
for basic question and answer.
class BasicQA(dspy.Module):
def __init__(self):
super().__init__()
self.prog = dspy.Predict("question -> answer")
def forward(self, question):
return self.prog(question=question)
basic_qa = BasicQA()
The forward
method calls __call__
similar to how things work in pytorch.
pred = basic_qa(question=example.question)
print("\nQUESTION:\n")
print(example.question)
print("\nANSWER:\n")
print(example.answer)
print("\nPREDICTION:\n")
print(pred.answer)
QUESTION:
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We then delete the penguin named Bernard from the table.
How many penguins are more than 8 years old?
Options:
(A) 1
(B) 2
(C) 3
(D) 4
(E) 5
ANSWER:
(A)
PREDICTION:
(B) 2
eval_metric(example, pred)
llm.inspect_history(n=1)
Given the fields `question`, produce the fields `answer`.
---
Follow the following format.
Question: ${question}
Answer: ${answer}
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We then delete the penguin named Bernard from the table. How many penguins are more than 8 years old? Options: (A) 1 (B) 2 (C) 3 (D) 4 (E) 5
Answer: (B) 2
Now we can pass each example question through the LLM in the validation set and check if we get the correct answer:
evaluate(basic_qa)
Average Metric: 44 / 126 (34.9): 100%|██████████| 126/126 [00:00<00:00, 1308.82it/s]
Average Metric: 44 / 126 (34.9%)
/Users/christopher/personal_projects/mysite/posts/dspy/env/lib/python3.11/site-packages/dspy/evaluate/evaluate.py:137: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
df = df.applymap(truncate_cell)
question
example_answer
pred_answer
eval_metric
0
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(A)
3
False
1
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(D)
(C) 50
False
2
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(A)
Answer: (C) 3
False
3
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(A)
Answer: (B) 2
False
4
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(B)
(B) 5
✔️ [True]
5
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(C)
(B) 2
False
6
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(E)
James
False
7
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(A)
(B) 2
False
8
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(C)
Answer: Vincent
False
9
Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis,...
(D)
Answer: Donna
False
... 116 more rows not displayed ...
34.92
DSPy
uses optimizers to optimize the modules. In this example, optimization is a process that will choose which demos/examples
are best to put into the prompt in order to increase the evaluation metric. At the time of writing the optimizers are called
teleprompters (prompting from a distance). I think they will change the name though to optimizers in future refactoring. The DSPy documentation states that the optimizer can adjust/edit:
Demo examples in the prompt.
Instructions of the prompt.
Weights of the actual LLM (for example fine tuning an open source model).
I have only played around with optimizers that optimize which demos/examples are put into the prompt.
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
config = dict(max_bootstrapped_demos=2, max_labeled_demos=4, num_candidate_programs=2, num_threads=6)
teleprompter = BootstrapFewShotWithRandomSearch(metric=eval_metric, **config)
optimized_qa = teleprompter.compile(basic_qa, trainset=trainset, valset=valset)
There is a lot of output from the above code block which I am hiding to keep things cleaner.
You can now evaluate the optimized model to see if the accuracy has improved.
evaluate(optimized_qa)
llm.inspect_history()
Given the fields `question`, produce the fields `answer`.
---
Follow the following format.
Question: ${question}
Answer: ${answer}
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. And here is a similar table, but listing giraffes: name, age, height (cm), weight (kg) Jody, 5, 430, 620 Gladys, 10, 420, 590 Marian, 2, 310, 410 Donna, 9, 440, 650 How many giraffes are more than 5 years old? Options: (A) 1 (B) 2 (C) 3 (D) 4 (E) 5
Answer: (B)
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. What is the name of the last penguin sorted by alphabetic order? Options: (A) Louis (B) Bernard (C) Vincent (D) Gwen (E) James
Answer: (C)
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We now add a penguin to the table: James, 12, 90, 12 We then delete the penguin named Bernard from the table. How many penguins are more than 5 years old and weight more than 12 kg? Options: (A) 1 (B) 2 (C) 3 (D) 4 (E) 5
Answer: (A)
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. How many animals are listed in the table? Options: (A) 1 (B) 2 (C) 3 (D) 4 (E) 5
Answer: (D)
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. Which is the second heaviest penguin? Options: (A) Louis (B) Bernard (C) Vincent (D) Gwen (E) James
Answer: (B) Bernard
Now we can try a Chain of Thought [@wei2023chainofthought] prompt.
class CoT(dspy.Module):
def __init__(self):
super().__init__()
self.prog = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.prog(question=question)
cot_qa = CoT()
evaluate(cot_qa)
llm.inspect_history()
Given the fields `question`, produce the fields `answer`.
---
Follow the following format.
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: ${answer}
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We now add a penguin to the table: James, 12, 90, 12 Which penguin is taller than the other ones? Options: (A) Louis (B) Bernard (C) Vincent (D) Gwen (E) James
Reasoning: Let's think step by step in order to produce the answer. We need to compare the height of each penguin in the table and determine which one is the tallest. Louis is 50 cm tall, Bernard is 80 cm tall, Vincent is 60 cm tall, Gwen is 70 cm tall, and James is 90 cm tall. Therefore, James is taller than all the other penguins.
Answer: (E) James
Now we will try and optimize our chain of thought program. I am also hiding the output from this cell to keep things cleaner.
tqdm._instances.clear()
config = dict(max_bootstrapped_demos=1, max_labeled_demos=4, num_candidate_programs=4, num_threads=6)
teleprompter = BootstrapFewShotWithRandomSearch(metric=eval_metric, **config)
optimized_cot_qa = teleprompter.compile(cot_qa, trainset=trainset, valset=valset)
evaluate(optimized_cot_qa)
llm.inspect_history(n=1)
Given the fields `question`, produce the fields `answer`.
---
Follow the following format.
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: ${answer}
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We then delete the penguin named Bernard from the table. How many penguins are more than 8 years old? Options: (A) 1 (B) 2 (C) 3 (D) 4 (E) 5
Reasoning: Let's think step by step in order to produce the answer. We know that after deleting Bernard, the penguins left are Louis, Vincent, and Gwen. Among them, only Vincent is more than 8 years old.
Answer: (A) 1
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. How many penguins are more than 5 years old? Options: (A) 1 (B) 2 (C) 3 (D) 4 (E) 5
Answer: (C)
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. And here is a similar table, but listing giraffes: name, age, height (cm), weight (kg) Jody, 5, 430, 620 Gladys, 10, 420, 590 Marian, 2, 310, 410 Donna, 9, 440, 650 How many animals are more than 5 years old? Options: (A) 5 (B) 6 (C) 7 (D) 8 (E) 9
Answer: (A)
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. Which penguin is older than Gwen? Options: (A) Louis (B) Bernard (C) Vincent (D) Gwen (E) James
Answer: (C)
---
Question: Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We then delete the penguin named Bernard from the table. What is the name of the last penguin sorted by alphabetic order? Options: (A) Louis (B) Bernard (C) Vincent (D) Gwen (E) James
Reasoning: Let's think step by step in order to produce the answer. After deleting Bernard, the remaining penguins are Louis, Vincent, and Gwen. Sorting them alphabetically, the last penguin is Vincent.
Answer: (C) Vincent
It's really nice that the above focused on:
Writing small modules/programs.
Choosing an optimizer.
Running the compile/optimization step.
Running an evaluation.
I really like this idea instead of manually writing prompts and hoping for the best.