top of page

AI Research Highlights | Week 38, 2023

Updated: Feb 20

1. Large Language Models as Optimizers

LLMs can serve as optimizers in the OPRO (Optimization by PROmpting) framework, which DeepMind Scientists just unveiled. Under OPRO, prompts can be best optimized by LLMs, exceeding human-designed prompts by up to 8% on GSM8K tests and by up to 50% on Big-Bench Hard tasks through continual iteration of meta-prompts and produced answers.

An overview of the OPRO framework

2. When Do Program-of-Thoughts Work for Reasoning?

Researchers suggest the complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to quantify the link between code and reasoning abilities in order to answer the question What kind of data format is crucial for LLM’s reasoning abilities? They investigate the reasoning abilities for program-of-thought prompting, and the results show that code data with an appropriate amount of code, defined by certain logical and structural qualities, is the key aspect.

utilizing complexity-impacted reasoning score (CIRS) to measure the complexity of code reasoning steps

3. NExT-GPT: Any-to-Any Multimodal LLM

Researchers from the National University of Singapore proposed an end-to-end general-purpose any-to-any MM-LLM system (NExT-GPT). The following graphic demonstrates how NExT-GPT provides global multimodal understanding and any-to-any modality input and output by integrating LLM with multimodal adaptors and diffusion decoders. On this website:, you may discover complete details about this project (including a demo, code, dataset, and more).

4. AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models

AGIBench, a multi-dimensional benchmark for LLMs, was suggested by researchers at ICT, Chinese Academy of Sciences. It uses a four-tuple structure <ability branch, knowledge, difficulty, modal> to automatically identify the properties of 927 questions that span 20 core knowledge areas and 68 subdomains, as illustrated below. They used AGIBench to evaluate 12 state-of-the-art LLMs. Results were shown in the paper. You can download AGIBench from

AGIBench comprises 927 questions spanning across 20 primary knowledge domains and 68 subdomains.

5. Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Tencent AI Lab collaborated with several academic institutions to publish a review of work on hallucination in Large Language Models (LLMs), outlining the term's definition, the main distinctions between it and traditional hallucinations, how to evaluate it, where it came from, how to mitigate it, and other pertinent information.

The structure of A Survey on Hallucination in Large Language Models

6. Life-inspired Interoceptive Artificial Intelligence for Autonomous and Adaptive Agents

This paper established an interoceptive AI framework, drawing inspiration from cybernetics, RL, artificial life, and active inference. The authors emphasized distinguishing internal states from external states and provided three core ideas for interoceptive AI (see figure below). This paper is a good attempt to draw inspiration from biology to build more autonomous, adaptive AI agents. It provides new ideas for the integration of cognitive neuroscience, biology, and computer science. You can find a thread provided by one of the authors here to seek further understanding.

7. The Rise and Potential of Large Language Model Based Agents: A Survey

Fudan NLP Group and miHoYo published a comprehensive survey on LLM-based AI agents, including the construction of agents, agents in practice, and society made up of agents. The authors also listed some must-read papers at

an envisioned agent society

*The researchers behind the publications deserve full credit for their work.


bottom of page