Episodi

  • Agent Bench: Evaluating LLMs as Agents
    Nov 27 2024

    Large Language Models (LLMs) are rapidly evolving, but how do we assess their ability to act as agents in complex, real-world scenarios? Join Jenny as we explore Agent Bench, a new benchmark designed to evaluate LLMs in diverse environments, from operating systems to digital card games.

    We'll delve into the key findings, including the strengths and weaknesses of different LLMs and the challenges of developing truly intelligent agents.

    Mostra di più Mostra meno
    13 min
  • Ivy-VL: A Lightweight Multimodal Model for Everyday Devices
    Dec 9 2024

    In this episode, we dive into Ivy-VL, a groundbreaking lightweight multimodal AI model released by AI Safeguard in collaboration with Carnegie Mellon University (CMU) and Stanford University. With only 3 billion parameters, Ivy-VL processes both image and text inputs to generate text outputs, offering an optimal balance of performance, speed, and efficiency. Its compact design supports deployment on edge devices like AI glasses and smartphones, making advanced AI accessible on everyday hardware.

    Join us as we explore Ivy-VL's development, real-world applications, and how this collaborative effort is redefining the future of multimodal AI for smart devices. Whether you're an AI enthusiast, developer, or tech-savvy professional, tune in to learn how Ivy-VL is setting new standards for accessible AI technology.

    Mostra di più Mostra meno
    19 min