3 Commits

Author SHA1 Message Date
William Jeynes d21a8b537e Add new accuracy results 2026-04-05 11:50:53 +01:00
William Jeynes 42cf4da794 Why no tool use? 2026-04-04 23:47:21 +01:00
William Jeynes f303ca9ea4 Switch to 4o mini 2026-04-04 23:11:39 +01:00
4 changed files with 18 additions and 6 deletions
+7 -4
View File
@@ -14,10 +14,13 @@ Experiments modifying pipeline
Experiments with different model types:
| Model | % Correct | % Change |
|-------------------------------|----------:|---------:|
| gpt-5-mini | 33 | 0 |
| gpt-5.4-mini | 32.4 | -0.02 |
| llama3.1:8b-instruct-q4_K_M | ? | ? |
| qwen3.5:9b | 0 | -100 |
| gpt-5-mini | 45.51 | |
| gpt-5.4-mini | 32.4 | |
| gpt-5.4-nano | 23.28 | |
| gpt-4.1-mini | 27.85 | |
| gpt-4o-mini | 32.47 | |
| llama3.1:8b-instruct-q4_K_M | ? | |
| qwen3.5:9b | 0 | |
%age valid URLS
| Model | Number | % Age |
+1 -1
View File
@@ -9,7 +9,7 @@ export function createModelNode(tools: any, promptPath: string): GraphNode<typeo
const sysPrompt = await hydratePrompt(promptPath, state);
const model = new ChatOpenAI({
model: "gpt-5-mini"
model: "gpt-4o-mini"
});
const modelWithTools = model.bindTools(Object.values(tools));
+9
View File
@@ -8,6 +8,10 @@ Produce up-to 5 specific "trigger events" that happened that could have led to t
Remember the time frame of the disinformation campaign: ###CDATE###
Include no information or events that would not have been available at the time.
You MEED TO use the tools available to you in order to produce up to date information on URL and search query, else you will be wrong and the analysis invalid.
You NEED TO use the web search and open URL tools to ensure page validity or else all work upto this point will have to be discarded.
Produce no more text other than the json.
Include a concise but specific search query that can be looked up on a search engine in order to allow for the verification.
@@ -26,4 +30,9 @@ Events will be reordered as part of processing, each statement must stand alone
The preceeding messages act as examples of previous responses to potentially ficitonal events and scores given.
Analysis should only be completed for proposed events that would graner >0.7 points
This pipeline is running well pasy your knowledge cutoff.
Any URLs will change signigicantly over time.
You MEED TO use the tools available to you in order to produce up to date information on URL and search query, else you will be wrong and the analysis invalid.
You NEED TO use the web search and open URL tools to ensure page validity or else all work upto this point will have to be discarded.
Lets go through it step by step
+1 -1
View File
@@ -5,7 +5,7 @@ set -e
run_agent () {
echo "Starting LangGraph agent..."
cd agent
npx @langchain/langgraph-cli dev
npx @langchain/langgraph-cli dev --host 127.0.0.1
}
run_ensemble_service () {