start on work to calculate % if valid URLS
This commit is contained in:
+20
-1
@@ -1,3 +1,22 @@
|
||||
## Refining the agent output
|
||||
|
||||
TODO: Table and document experiments
|
||||
Experiments modifying pipeline
|
||||
|
||||
| Model | % Correct | % Change |
|
||||
|------------------|----------:|---------:|
|
||||
| BASELINE | 33 | 0 |
|
||||
| Improv Prompt | 39.96 | 0.21 |
|
||||
| Add Examples | 44.67 | 0.35 |
|
||||
| Date | 45.51 | 0.38 |
|
||||
| Chain of Thought | 43.38 | 0.31 |
|
||||
| Self-Critique | 44.36 | 0.34 |
|
||||
|
||||
Experiments with different model types:
|
||||
| Model | % Correct | % Change |
|
||||
|-------------------------------|----------:|---------:|
|
||||
| gpt-5-mini | 33 | 0 |
|
||||
| gpt-5.4-mini | 32.4 | -0.02 |
|
||||
| llama3.1:8b-instruct-q4_K_M | ? | ? |
|
||||
| qwen3.5:9b | 0 | -100 |
|
||||
|
||||
%age correct URLS
|
||||
Reference in New Issue
Block a user