Update README.md
Browse files
README.md
CHANGED
|
@@ -116,6 +116,7 @@ The `AutonomousWebAgent` is a sophisticated, multi-component search and retrieva
|
|
| 116 |
- `ToTNode` and `ToTSearch` classes enable the agent to generate thoughts, evaluate them, and navigate through them as a tree, considering various potential paths to best answer the query.
|
| 117 |
- It combines MCTS and RAG to synthesize responses based on the generated thought paths.
|
| 118 |
|
|
|
|
| 119 |
### Training Process
|
| 120 |
|
| 121 |
The training process for the agent involves episodic learning, where it interacts with various queries from a predefined list. Each query initiates an episode, and the agent performs actions based on its learned policy:
|
|
@@ -292,6 +293,183 @@ After each epoch, the model is evaluated on the validation set, computing the av
|
|
| 292 |
### Checkpoints
|
| 293 |
At the end of each epoch, the model saves checkpoints of all components, enabling easy resumption or further fine-tuning as needed.
|
| 294 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 295 |
|
| 296 |
## Requirements
|
| 297 |
|
|
|
|
| 116 |
- `ToTNode` and `ToTSearch` classes enable the agent to generate thoughts, evaluate them, and navigate through them as a tree, considering various potential paths to best answer the query.
|
| 117 |
- It combines MCTS and RAG to synthesize responses based on the generated thought paths.
|
| 118 |
|
| 119 |
+
|
| 120 |
### Training Process
|
| 121 |
|
| 122 |
The training process for the agent involves episodic learning, where it interacts with various queries from a predefined list. Each query initiates an episode, and the agent performs actions based on its learned policy:
|
|
|
|
| 293 |
### Checkpoints
|
| 294 |
At the end of each epoch, the model saves checkpoints of all components, enabling easy resumption or further fine-tuning as needed.
|
| 295 |
|
| 296 |
+
## Inference Details
|
| 297 |
+
|
| 298 |
+
1. Input Processing:
|
| 299 |
+
- The function takes a query (text input), world model components, a root thought node, and a tokenizer.
|
| 300 |
+
- The query is tokenized and encoded using the provided tokenizer.
|
| 301 |
+
|
| 302 |
+
2. Inference Modes:
|
| 303 |
+
The function supports three inference modes:
|
| 304 |
+
|
| 305 |
+
a. 'without_world_model':
|
| 306 |
+
- This mode directly uses the transformer model to generate text.
|
| 307 |
+
- It doesn't utilize the world model components or the Tree of Thought.
|
| 308 |
+
- The transformer generates text autoregressively up to the specified max length.
|
| 309 |
+
|
| 310 |
+
b. 'world_model':
|
| 311 |
+
- This mode uses the world model components but doesn't use the Tree of Thought.
|
| 312 |
+
- It generates actions based on the prediction network's output.
|
| 313 |
+
|
| 314 |
+
c. 'world_model_tree_of_thought':
|
| 315 |
+
- This is the most comprehensive mode, using both the world model and the Tree of Thought.
|
| 316 |
+
|
| 317 |
+
3. World Model Inference Process:
|
| 318 |
+
For the 'world_model' and 'world_model_tree_of_thought' modes:
|
| 319 |
+
|
| 320 |
+
a. Initial State:
|
| 321 |
+
- The query is passed through the transformer model.
|
| 322 |
+
- The representation network creates an initial state representation from the transformer output.
|
| 323 |
+
|
| 324 |
+
b. Action Selection:
|
| 325 |
+
- For 'world_model':
|
| 326 |
+
- The prediction network generates policy logits from the state representation.
|
| 327 |
+
- Actions are selected based on the highest probabilities in the policy.
|
| 328 |
+
|
| 329 |
+
- For 'world_model_tree_of_thought':
|
| 330 |
+
- It uses Monte Carlo Tree Search (MCTS) to explore the Tree of Thought.
|
| 331 |
+
- For each MCTS iteration:
|
| 332 |
+
* Selection: Traverse the tree to find a leaf node.
|
| 333 |
+
* Expansion: Add child nodes to the leaf.
|
| 334 |
+
* Evaluation: Use the prediction network to estimate the value of the node.
|
| 335 |
+
* Backpropagation: Update the values and visit counts of nodes.
|
| 336 |
+
- The best action is chosen based on visit counts after MCTS.
|
| 337 |
+
|
| 338 |
+
c. State Transition:
|
| 339 |
+
- The selected action is applied to the current state using the dynamics network.
|
| 340 |
+
- This creates a new state representation for the next step.
|
| 341 |
+
|
| 342 |
+
d. Sequence Generation:
|
| 343 |
+
- The process repeats for the specified number of steps or until a termination condition is met.
|
| 344 |
+
- For the Tree of Thought approach, it continues until reaching a leaf node in the thought tree.
|
| 345 |
+
|
| 346 |
+
4. Output:
|
| 347 |
+
- For 'without_world_model', it returns the generated text.
|
| 348 |
+
- For 'world_model' and 'world_model_tree_of_thought', it returns a sequence of selected actions (thoughts).
|
| 349 |
+
|
| 350 |
+
The world model inference leverages the learned representations and dynamics to navigate the problem-solving process. The Tree of Thought approach adds structure to this process, guiding the model through a predefined hierarchy of problem-solving steps. This allows for a more structured and potentially more effective approach to complex problem-solving tasks.
|
| 351 |
+
|
| 352 |
+
Here I am utilising Trees of Thought as a structure of how to structure sets of policies, and sequences of actions. These Tree structures provide the World Model a general thought structure and pattern, similarly to how humans create thought patterns for solving certain problems (e.g. understand, describe, analyse, etc).
|
| 353 |
+
|
| 354 |
+
Here are some example Trees of Thought:
|
| 355 |
+
graph TD
|
| 356 |
+
A[Problem-Solving Process] --> B[Problem Identification]
|
| 357 |
+
A --> C[Problem Analysis]
|
| 358 |
+
A --> D[Solution Generation]
|
| 359 |
+
A --> E[Implementation]
|
| 360 |
+
A --> F[Evaluation and Adjustment]
|
| 361 |
+
B --> B1[Define the Problem]
|
| 362 |
+
B --> B2[Identify Stakeholders]
|
| 363 |
+
B --> B3[Determine Constraints]
|
| 364 |
+
B --> B4[Recognize Problem Type]
|
| 365 |
+
B --> B5[Historical Context]
|
| 366 |
+
C --> C1[Root Cause Analysis]
|
| 367 |
+
C --> C2[System Mapping]
|
| 368 |
+
C --> C3[Data Collection]
|
| 369 |
+
C --> C4[Impact Assessment]
|
| 370 |
+
C --> C5[Theoretical Framework]
|
| 371 |
+
D --> D1[Creative Problem Solving]
|
| 372 |
+
D --> D2[Analytical Approach]
|
| 373 |
+
D --> D3[Mathematical Computation]
|
| 374 |
+
D --> D4[Decision Making]
|
| 375 |
+
E --> E1[Action Planning]
|
| 376 |
+
E --> E2[Resource Allocation]
|
| 377 |
+
E --> E3[Change Management]
|
| 378 |
+
F --> F1[Verification]
|
| 379 |
+
F --> F2[Performance Metrics]
|
| 380 |
+
F --> F3[Feedback Loops]
|
| 381 |
+
F --> F4[Continuous Improvement]
|
| 382 |
+
C3 --> C3a[Quantitative Data]
|
| 383 |
+
C3 --> C3b[Qualitative Data]
|
| 384 |
+
C3 --> C3c[Data Validation]
|
| 385 |
+
D1 --> D1a[Divergent Thinking]
|
| 386 |
+
D1 --> D1b[Convergent Thinking]
|
| 387 |
+
D1 --> D1c[Lateral Thinking]
|
| 388 |
+
D2 --> D2a[Logical Reasoning]
|
| 389 |
+
D2 --> D2b[Critical Analysis]
|
| 390 |
+
D2 --> D2c[Systems Thinking]
|
| 391 |
+
D3 --> D3a[Basic Operations]
|
| 392 |
+
D3 --> D3b[Advanced Operations]
|
| 393 |
+
D3 --> D3c[Computational Methods]
|
| 394 |
+
D4 --> D4a[Decision Trees]
|
| 395 |
+
D4 --> D4b[Multi-Criteria Analysis]
|
| 396 |
+
D4 --> D4c[Probabilistic Reasoning]
|
| 397 |
+
G[Cross-Cutting Considerations] --> G1[Ethical Framework]
|
| 398 |
+
G --> G2[Stakeholder Management]
|
| 399 |
+
G --> G3[Interdisciplinary Connections]
|
| 400 |
+
G --> G4[Technological Integration]
|
| 401 |
+
G --> G5[Emotional Intelligence]
|
| 402 |
+
G --> G6[Collaborative Problem Solving]
|
| 403 |
+
G1 --> G1a[Value-based Decision Making]
|
| 404 |
+
G1 --> G1b[Long-term Consequences]
|
| 405 |
+
G2 --> G2a[Direct Stakeholders]
|
| 406 |
+
G2 --> G2b[Indirect Stakeholders]
|
| 407 |
+
G2 --> G2c[Conflicting Interests]
|
| 408 |
+
G3 --> G3a[Related Fields]
|
| 409 |
+
G3 --> G3b[Cross-disciplinary Impact]
|
| 410 |
+
G4 --> G4a[AI-assisted Problem Solving]
|
| 411 |
+
G4 --> G4b[Data-driven Insights]
|
| 412 |
+
G4 --> G4c[Digital Collaboration Tools]
|
| 413 |
+
G5 --> G5a[Self-Awareness]
|
| 414 |
+
G5 --> G5b[Empathy]
|
| 415 |
+
G5 --> G5c[Stress Management]
|
| 416 |
+
G6 --> G6a[Team Dynamics]
|
| 417 |
+
G6 --> G6b[Communication Strategies]
|
| 418 |
+
G6 --> G6c[Conflict Resolution]
|
| 419 |
+
H[Computational Considerations] --> H1[CPU Operations]
|
| 420 |
+
H --> H2[GPU Parallelization]
|
| 421 |
+
H --> H3[Floating-Point Precision]
|
| 422 |
+
I[Order of Operations] --> I1[Parentheses]
|
| 423 |
+
I --> I2[Exponents]
|
| 424 |
+
I --> I3[Multiplication and Division]
|
| 425 |
+
I --> I4[Addition and Subtraction]
|
| 426 |
+
J[Critical Thinking] --> J1[Assumptions Questioning]
|
| 427 |
+
J --> J2[Bias Recognition]
|
| 428 |
+
K[Future Perspective] --> K1[Short-term Projections]
|
| 429 |
+
K --> K2[Long-term Scenarios]
|
| 430 |
+
K --> K3[Potential Impacts]
|
| 431 |
+
L[Learning and Adaptation] --> L1[Reflective Practice]
|
| 432 |
+
L --> L2[Knowledge Transfer]
|
| 433 |
+
L --> L3[Adaptive Problem Solving]
|
| 434 |
+
|
| 435 |
+
|
| 436 |
+
graph TD
|
| 437 |
+
A[Meta-Cognitive Strategies] --> B[Creative Problem Solving]
|
| 438 |
+
A --> C[Systems Thinking]
|
| 439 |
+
A --> D[Decision Making]
|
| 440 |
+
A --> E[Emotional Intelligence]
|
| 441 |
+
A --> F[Collaborative Problem Solving]
|
| 442 |
+
B --> B1[Divergent Thinking]
|
| 443 |
+
B --> B2[Convergent Thinking]
|
| 444 |
+
B --> B3[Lateral Thinking]
|
| 445 |
+
C --> C1[Holistic Perspective]
|
| 446 |
+
C --> C2[Feedback Loops]
|
| 447 |
+
C --> C3[Emergent Properties]
|
| 448 |
+
D --> D1[Decision Trees]
|
| 449 |
+
D --> D2[Multi-Criteria Decision Analysis]
|
| 450 |
+
D --> D3[Probabilistic Reasoning]
|
| 451 |
+
E --> E1[Self-Awareness]
|
| 452 |
+
E --> E2[Empathy]
|
| 453 |
+
E --> E3[Stress Management]
|
| 454 |
+
F --> F1[Team Dynamics]
|
| 455 |
+
F --> F2[Communication Strategies]
|
| 456 |
+
F --> F3[Conflict Resolution]
|
| 457 |
+
G[Learning and Adaptation]
|
| 458 |
+
A --> G
|
| 459 |
+
G --> G1[Reflective Practice]
|
| 460 |
+
G --> G2[Knowledge Transfer]
|
| 461 |
+
G --> G3[Adaptive Problem Solving]
|
| 462 |
+
H[Ethical Framework]
|
| 463 |
+
A --> H
|
| 464 |
+
H --> H1[Value-based Decision Making]
|
| 465 |
+
H --> H2[Stakeholder Analysis]
|
| 466 |
+
H --> H3[Long-term Consequences]
|
| 467 |
+
I[Technological Integration]
|
| 468 |
+
A --> I
|
| 469 |
+
I --> I1[AI-assisted Problem Solving]
|
| 470 |
+
I --> I2[Data-driven Insights]
|
| 471 |
+
I --> I3[Digital Collaboration Tools]
|
| 472 |
+
|
| 473 |
|
| 474 |
## Requirements
|
| 475 |
|