BabyAI routes user prompts through leading large language models

The V2 expansion of BabyAI will aggregate generative AI responses from top large language models (LLMs): GPT-4, Claude 3 and Llama 2.

  1. User inputs prompt with BabyAI

  2. BabyAI routes prompt through GPT-4, Claude 3 and Llama 2 APIs

  3. User receives generative AI responses from GPT-4, Claude 3 and Llama 2

  4. User can rate the best response and receive a crypto reward

  5. User can elect to combine all three LLM outputs into a single response

The data generated by users rating LLM responses and selecting which LLM performs best is highly valuable. Companies like OpenAI spend unfathomable amounts of money on Reinforcement Learning from Human Feedback (RLHF), which uses people to rate the responses from a large language model like ChatGPT. BabyAI expands RLHF across multiple large language models, to obtain live benchmarking data from real-life human interactions.

