AgentRobot is a multi-agent human-computer interaction system designed to enhance large language models by integrating visual recognition, speech recognition, and speech synthesis. The system's architecture comprises specialized agents, including a Brain Agent for decision-making, an Eye Agent for visual input processing, an Ear Agent for audio input processing, and a Mouth Agent for audio output. These agents collaborate to achieve a natural and intuitive human-computer interaction experience, giving large language models the ability to 'see,' 'hear,' and 'speak'.
Key Features
01Chinese speech recognition and synthesis
02Configurable agent parameters and model settings
03Web interface for visualization and control
042 GitHub stars
05Real-time face detection and recognition
06Multi-agent architecture for modularity