SSRN Top 10,000 Papers

You are viewing only the top 10 for this ranking. To view all SSRN Top 10,000 Papers and access full functionality, sign in or register.

Last 12 Months All Time Authors
Rank Paper
1 and
Overseas Development Institute (ODI)Maastricht University
94,187 225,514 2 1
2 , , , , , , , and
Harvard University - Business School (HBS), Harvard University - Business School (HBS), University of Pennsylvania - Wharton School, Harvard University Lab for Innovation Sciences, Massachusetts Institute of Technology (MIT) - Sloan School of Management, Boston Consulting Group, Henderson Institute, Boston Consulting Group, Henderson Institute, and Harvard Business School - Technology and Operations Management Group
Last Revised: 27 Sep 2023
63,594 63,596 17 9
3 and
University of Chicago - Law School and University of St. Thomas School of Law
Last Revised: 26 Feb 2024
58,369 107,493 2 2
4 , and
University of Chicago Booth School of Business, University of Chicago - Booth School of Business and University of Chicago Booth School of Business
57,836 57,862 2 3
5 and
Yale SOM and University of Chicago - Booth School of Business
50,974 69,492 11 2
6 and
Quantigic Solutions LLC and NYU - Courant Institute of Mathematical Sciences
Last Revised: 13 Sep 2019
29,736 112,096 1 2
7 and
DAV College for Girls and ISGEC Heavy Engineering Limited
26,781 92,129 7 2
8 and
Concretum Research and Peak Capital Trading
Last Revised: 21 Feb 2024
26,700 44,713 0 2
9 , and
City University of Hong Kong, North Carolina State University and City University of Hong Kong
Last Revised: 10 Apr 2024
25,202 25,202 0 3
10 , and
Emory University - Department of Finance, University of Arizona - Department of Finance and University of Missouri at Columbia - Department of Finance
24,105 24,106 0 3

American Psychological Association Logo

The top 10 journal articles of 2020

In 2020, APA’s 89 journals published more than 5,000 articles—the most ever and 25% more than in 2019. Here’s a quick look at the 10 most downloaded to date.

Vol. 52 No. 1 Print version: page 24

man watching television

1. Me, My Selfie, and I: The Relations Between Selfie Behaviors, Body Image, Self-Objectification, and Self-Esteem in Young Women

Veldhuis, j., et al..

Young women who appreciate their bodies and consider them physical objects are more likely to select, edit, and post selfies to social media, suggests this study in Psychology of Popular Media (Vol. 9, No. 1). Researchers surveyed 179 women, ages 18 to 25, on how often they took selfies, how they selected selfies to post, how often they used filters and editing techniques, and how carefully they planned their selfie postings. They also assessed participants’ levels of body appreciation and dissatisfaction, self-objectification, and self-esteem. Higher levels of self-objectification were linked to more time spent on all selfie behaviors, while body appreciation was related to more time spent selecting selfies to post, but not frequency of taking or editing selfies. Body dissatisfaction and self-esteem were not associated with selfie behaviors. DOI: 10.1037/ppm0000206

2. A Closer Look at Appearance and Social Media: Measuring Activity, Self-Presentation, and Social Comparison and Their Associations With Emotional Adjustment

Zimmer-gembeck, m. j., et al..

This Psychology of Popular Media (online first publication) article presents a tool to assess young people’s preoccupation with their physical appearance on social media. Researchers administered a 21-item survey about social media to 281 Australian high school students. They identified 18 items with strong inter-item correlation centered on three categories of social media behavior: online self-presentation, appearance-related online activity, and appearance comparison. In a second study with 327 Australian university students, scores on the 18-item survey were found to be associated with measures of social anxiety and depressive symptoms, appearance-related support from others, general interpersonal stress, coping flexibility, sexual harassment, disordered eating, and other factors. The researchers also found that young women engaged in more appearance-related social media activity and appearance comparison than did young men. DOI: 10.1037/ppm0000277

3. The Novel Coronavirus (COVID-2019) Outbreak: Amplification of Public Health Consequences by Media Exposure

Garfin, d. r., et al..

Repeated media exposure to the COVID-19 pandemic may be associated with psychological distress and other public health consequences, according to this commentary in Health Psychology (Vol. 39, No. 5). The authors reviewed research about trends in health behavior and psychological distress as a response to media coverage of crises, including terrorist attacks, school shootings, and disease outbreaks. They found that repeated media exposure to collective crises was associated with increased anxiety and heightened acute and post-traumatic stress, with downstream effects on health outcomes such as new incidence of cardiovascular disease. Moreover, misinformation can further amplify stress responses and lead to misplaced or misguided health-protective and help-seeking behaviors. The authors recommended public health agencies use social media strategically, such as with hashtags, to keep residents updated during the pandemic. They also urged the public to avoid sensationalism and repeated coverage of the same information. DOI: 10.1037/hea0000875

4. Barriers to Mental Health Treatment Among Individuals With Social Anxiety Disorder and Generalized Anxiety Disorder

Goetter, e. m., et al..

This study in Psychological Services (Vol. 17, No. 1) indicates that 3 in 4 people who suffer from anxiety do not receive proper care. Researchers recruited 226 participants in the United States who were previously diagnosed with social anxiety disorder or generalized anxiety disorder and assessed their symptom severity and asked them to self-report any barriers to treatment. Shame and stigma were the highest cited barriers, followed by logistical and financial barriers and not knowing where to seek treatment. Participants with more severe symptoms reported more barriers to treatment than those with milder symptoms. Racial and ethnic minorities reported more barriers than racial and ethnic majorities even after controlling for symptom severity. The researchers called for increased patient education and more culturally sensitive outreach to reduce treatment barriers. DOI: 10.1037/ser0000254

5. The Construction of “Critical Thinking”: Between How We Think and What We Believe

This History of Psychology (Vol. 23, No. 3) article examines the emergence of “critical thinking” as a psychological concept. The author describes how, between World War I and World War II in the United States, the concept emerged out of growing concerns about how easily people’s beliefs could be changed and was constructed in a way that was independent of what people believed. The author delves into how original measurements of critical thinking avoided assumptions about the accuracy of specific real-world beliefs and details how subsequent critical thinking tests increasingly focused on logical abilities, often favoring outcome (what we believe) over process (how we think). DOI: 10.1037/hop0000145

6. Treatment of Alcohol Use Disorder: Integration of Alcoholics Anonymous and Cognitive Behavioral Therapy

Breuninger, m. m., et al..

This article in Training and Education in Professional Psychology (Vol. 14, No. 1) details how to work with alcohol use disorder patients who are participating in both cognitive behavioral therapy (CBT) and Alcoholics Anonymous (AA). The authors point to distinctions between AA and CBT: The goal of AA is total abstinence and the primary therapeutic relationship is with a peer in recovery, while CBT takes a less absolute approach and the primary relationship is with a psychotherapist. The authors also point to commonalities: both approaches emphasize identifying and replacing dysfunctional beliefs and place value in social support. The authors recommend clinicians and trainees become more educated about AA and recommend a translation of the 12-step language into CBT terminology to bridge the gap. DOI: 10.1037/tep0000265

7. Positivity Pays Off: Clients’ Perspectives on Positive Compared With Traditional Cognitive Behavioral Therapy for Depression

Geschwind, n., et al..

Positive cognitive behavioral therapy, a version of CBT focused on exploring exceptions to the problem rather than the problem itself, personal strengths, and embracing positivity, works well to counter depressive symptoms and build well-being, according to this study in Psychotherapy (Vol. 57, No. 3). Participants received a block of eight sessions of traditional CBT and a block of eight sessions of positive CBT. Researchers held in-depth interviews with 12 of these participants. Despite initial skepticism, most participants reported preferring positive CBT but indicated experiencing a steeper learning curve than with traditional CBT. Researchers attributed positive CBT’s favorability to four factors: feeling empowered, benefiting from effects of positive emotions, learning to appreciate baby steps, and rediscovering optimism as a personal strength. DOI: 10.1037/pst0000288

8. Targeted Prescription of Cognitive-Behavioral Therapy Versus Person-Centered Counseling for Depression Using a Machine Learning Approach

Delgadillo, j., & gonzalez salas duhne, p..

Amachine learning algorithm can identify which patients would derive more benefit from cognitive behavioral therapy (CBT) versus counseling for depression, suggests research in this Journal of Consulting and Clinical Psychology (Vol. 88, No. 1) article. Researchers retrospectively explored data from 1,085 patients in the United Kingdom treated with either CBT or counseling for depression and discovered six patient characteristics—age, employment status, disability, and three diagnostic measures of major depression and social adjustment—relevant to developing an algorithm for prescribing the best approach. The researchers then used the algorithm to determine which therapy would work best for an additional 350 patients with depression. They found that patients receiving their optimal treatment type were twice as likely to improve significantly. DOI: 10.1037/ccp0000476

9. Traumatic Stress in the Age of COVID-19: A Call to Close Critical Gaps and Adapt to New Realities

Horesh, d., & brown, a. d..

This article in Psychological Trauma: Theory, Research, Practice, and Policy (Vol. 12, No. 4) argues that COVID-19 should be examined from a post-traumatic stress perspective. The authors call for mental health researchers and clinicians to develop better diagnoses and prevention strategies for COVID-related traumatic stress; create guidelines and talking points for the media and government officials to use when speaking to an anxious, and potentially traumatized, public; and provide mental health training to professionals in health care, education, childcare, and occupational support in order to reach more people. DOI: 10.1037/tra0000592

10. Emotional Intelligence Predicts Academic Performance: A Meta-Analysis

Maccann, c., et al..

Students with high emotional intelligence get better grades and score higher on standardized tests, according to the research presented in this article in Psychological Bulletin (Vol. 146, No. 2). Researchers analyzed data from 158 studies representing more than 42,529 students—ranging in age from elementary school to college—from 27 countries. The researchers found that students with higher emotional intelligence earned better grades and scored higher on achievement tests than those with lower emotional intelligence. This finding was true even when controlling for intelligence and personality factors, and the association held regardless of age. The researchers suggest that students with higher emotional intelligence succeed because they cope well with negative emotions that can harm academic performance; they form stronger relationships with teachers, peers, and family; and their knowledge of human motivations and socialinteractions helps them understand humanities subject matter. DOI: 10.1037/bul0000219

5 interviews to listen to now

Psychology’s most innovative thinkers are featured on APA’s Speaking of Psychology podcast , which highlights important research and helps listeners apply psychology to their lives. The most popular episodes of 2020, as measured by the number of downloads in the first 30 days, were: 

  • How to have meaningful dialogues despite political differences , with  Tania Israel, PhD
  • Canine cognition and the survival of the friendliest , with  Brian Hare, PhD  
  • The challenges faced by women in leadership , with  Alice Eagly, PhD
  • How to choose effective, science-based mental health apps , with  Stephen Schueller, PhD  
  • Psychedelic therapy , with Roland Griffiths, PhD  

Listen to all of the Speaking of Psychology episodes .

Contact APA

You may also like.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

🔥Highlighting the top ML papers every week.

dair-ai/ML-Papers-of-the-Week

Folders and files.

NameName
284 Commits

Repository files navigation

Ml papers of the week.

Subscribe to our newsletter to get a weekly list of top ML papers in your inbox.

At DAIR.AI we ❤️ reading ML papers so we've created this repo to highlight the top ML papers of every week.

Here is the weekly series:

  • Top ML Papers of the Week (August 12 - August 18)
  • Top ML Papers of the Week (August 5 - August 11)
  • Top ML Papers of the Week (July 29 - August 4)
  • Top ML Papers of the Week (July 22 - July 28)
  • Top ML Papers of the Week (July 15 - July 21)
  • Top ML Papers of the Week (July 8 - July 14)
  • Top ML Papers of the Week (July 1 - July 7)
  • Top ML Papers of the Week (June 24 - June 30)
  • Top ML Papers of the Week (June 17 - June 23)
  • Top ML Papers of the Week (June 10 - June 16)
  • Top ML Papers of the Week (June 3 - June 9)
  • Top ML Papers of the Week (May 27 - June 2)
  • Top ML Papers of the Week (May 20 - May 26)
  • Top ML Papers of the Week (May 13 - May 19)
  • Top ML Papers of the Week (May 6 - May 12)
  • Top ML Papers of the Week (April 29 - May 5)
  • Top ML Papers of the Week (April 22 - April 28)
  • Top ML Papers of the Week (April 15 - April 21)
  • Top ML Papers of the Week (April 8 - April 14)
  • Top ML Papers of the Week (April 1 - April 7)
  • Top ML Papers of the Week (March 26 - March 31)
  • Top ML Papers of the Week (March 18 - March 25)
  • Top ML Papers of the Week (March 11 - March 17)
  • Top ML Papers of the Week (March 4 - March 10)
  • Top ML Papers of the Week (February 26 - March 3)
  • Top ML Papers of the Week (February 19 - February 25)
  • Top ML Papers of the Week (February 12 - February 18)
  • Top ML Papers of the Week (February 5 - February 11)
  • Top ML Papers of the Week (January 29 - February 4)
  • Top ML Papers of the Week (January 22 - January 28)
  • Top ML Papers of the Week (January 15 - January 21)
  • Top ML Papers of the Week (January 8 - January 14)
  • Top ML Papers of the Week (January 1 - January 7)
  • Top ML Papers of the Week (December 24 - December 31)

Top ML Papers of the Week (December 18 - December 24)

Top ml papers of the week (december 11 - december 17), top ml papers of the week (december 4 - december 10), top ml papers of the week (november 27 - december 3), top ml papers of the week (november 20 - november 26), top ml papers of the week (november 13 - november 19), top ml papers of the week (november 6 - november 12), top ml papers of the week (october 30 - november 5), top ml papers of the week (october 23 - october 29), top ml papers of the week (october 16 - october 22), top ml papers of the week (october 9 - october 15), top ml papers of the week (october 2 - october 8), top ml papers of the week (september 25 - october 1), top ml papers of the week (september 18 - september 24), top ml papers of the week (september 11 - september 17), top ml papers of the week (september 4 - september 10), top ml papers of the week (august 28 - september 3), top ml papers of the week (august 21 - august 27), top ml papers of the week (august 14 - august 20), top ml papers of the week (august 7 - august 13), top ml papers of the week (july 31 - august 6), top ml papers of the week (july 24 - july 30), top ml papers of the week (july 17 - july 23), top ml papers of the week (july 10 - july 16), top ml papers of the week (july 3 - july 9), top ml papers of the week (june 26 - july 2), top ml papers of the week (june 19 - june 25), top ml papers of the week (june 12 - june 18), top ml papers of the week (june 5 - june 11).

  • Top ML Papers of the Week (May 29 - June 4)
  • Top ML Papers of the Week (May 22 - 28)
  • Top ML Papers of the Week (May 15 - 21)
  • Top ML Papers of the Week (May 8 - 14)

Top ML Papers of the Week (May 1-7)

Top ml papers of the week (april 24 - april 30), top ml papers of the week (april 17 - april 23), top ml papers of the week (april 10 - april 16), top ml papers of the week (april 3 - april 9), top ml papers of the week (mar 27 - april 2), top ml papers of the week (mar 20-mar 26), top ml papers of the week (mar 13-mar 19), top ml papers of the week (mar 6-mar 12), top ml papers of the week (feb 27-mar 5), top ml papers of the week (feb 20-26), top ml papers of the week (feb 13 - 19), top ml papers of the week (feb 6 - 12), top ml papers of the week (jan 30-feb 5), top ml papers of the week (jan 23-29), top ml papers of the week (jan 16-22), top ml papers of the week (jan 9-15), top ml papers of the week (jan 1-8).

Follow us on Twitter

Join our Discord

Top ML Papers of the Week (August 12 - August 18) - 2024

1) - a novel AI agent that can develop and write a full conference-level scientific paper costing less than $15; it automates scientific discovery by enabling frontier LLMs to perform independent research and summarize findings; it also uses an automated reviewer to evaluate the generated papers; claims to achieve near-human performance in evaluating paper scores; claims to produce papers that exceed the acceptance threshold at a top machine learning conference as judged by their automated reviewer. ,
2) - a new frontier model with strong code, math, and reasoning capabilities which includes a large and small model; outperforms both Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS Chatbot Arena; claims to improve capabilities including instruction following, retrieval, tool use, and enhancing factuality; competes with Claude 3.5 Sonnet (June release) and GPT-4o (May release) on MMLU and HumanEval. ,
3) - proposes AgentWrite to enable off-the-shelf LLMs to generate coherent outputs beyond 20K words; AgentWrite breaks down the long generation task into subtasks and in a divide-and-conquer approach generates; the agent breaks the task into multiple writing subtasks and concatenates the outputs to get a final output (i.e., plan + write); the approach is then used to build SFT datasets that are used to tune LLMs to generate coherent longer outputs automatically; a 9B parameter model, further improved through DPO, achieves state-of-the-art performance on their benchmark, and surpasses proprietary models. ,
4) - trains an auto-encoder LM to label and tag chunks; it retrieves relevant chunks, tags them as either or , and annotates chunks for continuous processing; then a filter model is trained to formulate the next-hop query based on the original question and previous annotations; this is done iteratively until all chunks are tagged as or the maximum # of iterations is reached; after the process above has gathered enough information to answer the initial question, the final generator (an LLM) generates the final answer. ,
5) - a fine-grained evaluation framework for diagnosing retrieval and generation modules in RAG; shows that RAGChecker has better correlations with human judgment; reports several revealing insightful patterns and trade-offs in design choices of RAG architectures. ,
6) - combines GraphRAG and VectorRAG leading to a HybridRAG system that outperforms both individually; it was tested on a set of financial earning call transcripts. Combining the advantages of both approaches provides more accurate answers to queries. ,
7) - introduces self-play mutual reasoning to improve the reasoning capabilities of small language models without fine-tuning or superior models; MCTS is augmented with human-like reasoning actions, obtained from SLMs, to build richer reasoning trajectories; a separate SLM provides unsupervised feedback on the trajectories and the target SLM selects the final reasoning trajectory as the answer; rStar boosts GSM8K accuracy from 12.51% to 63.91% for LLaMA2-7B and consistently improves the accuracy of other SLMs. ,
8) - investigates the scaling behaviors of inference-time computation in LLMs; in particular, it analyses how much an LLM can be improved provided a fixed amount of inference-time compute; finds that the effectiveness of different scaling approaches varies by difficulty of prompt; it then proposes an adaptive compute-optimal strategy that can improve efficiency by more than 4x compared to a best-of-N baseline; reports that in a FLOPs-matched evaluation, optimally scaling test-time compute can outperform a 14x larger model. ,
9) - a graph-based framework for the medical domain with a focus on enhancing LLMs and generating evidence-based results; leverages a hybrid static-semantic approach to chunk documents to improve context capture; entities and medical knowledge are represented through graphs which leads to an interconnected global graph; this approach improves precision and outperforms state-of-the-art models on multiple medical Q&A benchmarks. ,
10) - a comprehensive overview of NL2SQL techniques powered by LLMs; covers models, data collection, evaluation methods, and error analysis. ,

Top ML Papers of the Week (August 5 - August 11) - 2024

1) - an open unified model for real-time, promptable object segmentation in images and videos; can be applied to unseen visual content without the need for custom adaptation; to enable accurate mask prediction in videos, a memory mechanism is introduced to store information on the object and previous interactions; the memory module also allows real-time processing of arbitrarily long videos; SAM2 significantly outperforms previous approaches on interactive video segmentation across 17 zero-shot video datasets while requiring three times fewer human-in-the-loop interactions. ,
2) - investigates if structured generation can impact an LLM’s reasoning and domain knowledge comprehensive capabilities; observes that there is a significant decline in LLM’s reasoning abilities when applying format restrictions compared to free-form responses; this degradation effect is further amplified when applying stricter format constraints to reasoning tasks. ,
3) - a survey paper on current practices and solutions for LLM-based agents for software engineering; covers important topics such as requirement engineering, code generation, test generation, and autonomous decision making; it also includes benchmarks, metrics, and models used in different software engineering applications. ,
4) - presents an open-source interactive tool to learn about the inner workings of a Transformer model; it runs a GPT-2 instance locally in the user's browser and allows experimenting with your own inputs. ,
5) - introduces RAGFoundry, an open-source framework for augmented LLMs for RAG use cases; it supports data creation, training, inference, and evaluation; one useful application is the creation of data-augmented datasets for tuning and evaluating LLMs in RAG settings. ,
6) - proposes integrated synthetic data to build a highly specialized SoTA text-to-SQL model called SENSE; the synthetic data from strong models enhances data diversity while valuable erroneous data from weaker models combined with an executor to learn from execution feedback; preference learning is used to instruction-tune LLMs to learn from both correct and incorrect samples; SENSE achieves state-of-the-art results on the SPIDER and BIRD benchmarks, which bridges the performance gap between open-source models and methods that use closed-source models. ,
7) - proposes an approach to help users create personalized prompts by articulating the preferred outputs via interactions; it involves two stages: 1) an initial instruction shaped by the model based on user-provided unlabeled data, and 2) the model shares the output and the user provides feedback with refinements on outputs and instruction; this iterative process results in a personalized few-shot prompt that performs better and more optimally on the desired task. ,
8) - an approach to improve model-based evaluators using synthetic training data only; it first generates contrasting outputs (good and bad model responses) and trains an LLM-as-a-Judge to produce reasoning traces and final judgments; the self-improvement scheme repeats the training process in an iterative way using its improved predictions; claims to outperform LLM-judges such as GPT-4 and match top-performing reward models trained on labeled examples; improves a strong LLM (Llama3-70BInstruct) from 75.4 to 88.3 (88.7 with majority vote) on RewardBench. ,
9) - proposes a simple framework to automatically generate evaluation datasets to assess knowledge usage of different LLM under different scenarios; it defines a schema from seed documents and then generates diverse documents which leads to question-answering pairs; the QA pairs are based on both the articles and configurations. ,
10) - provides a systematic review of existing Mamba-based models across domains and tasks; specifically, focuses on advancements of Mamba-based models, techniques for adapting Mamba to diverse data, applications where Mamba excels, and promising research directions ,

Top ML Papers of the Week (July 29 - August 4) - 2024

1) - proposes a self-improving alignment technique (no human supervision) where the LLM judges its own judgements and uses the feedback to improve its judgment skills; shows that leveraging this LLM-as-a-Meta-Judge approach improves the LLM's ability to judge and follow instructions; just doing self-improvement to generate better responses (act) saturates quickly; this work improves the LLM's ability to judge itself (judge) to avoid issues like reward hacking; in addition to the act and judge roles, a third role called meta-judge is used to evaluate the model's own judgements. ,
2) - presents an LLM-based multi-agent framework to perform complex web-information seeking and integration tasks; a web planner effectively decomposes complex queries followed by a web searcher that performs hierarchical information retrieval on the Internet to improve the relevancy of the retrieved information; the planning component is powered by an iterative graph construction which is used to better model complex problem-solving processes; the multi-agent framework handles long context problems better by distributing reasoning and retrieval tasks to specialized agents. ,
3) - presents an end-to-end self-reasoning framework to improve the reliability and traceability of RAG systems; leverages the reasoning trajectories generated by the LLM itself; the LLM is used to carry out the following 3 processes: 1) relevance-aware: judges the relevance between the retrieved documents and the question, 2) evidence-aware selective: chooses and cites relevant documents, and then automatically selects snippets of key sentences as evidence from the cited documents, and 3) trajectory analysis: generates a concise analysis based on all gathered self-reasoning trajectories generated by the previous 2 processes and then provides the final inferred answer; this method helps the model to be more selective, reason and distinguish relevant and irrelevant documents, therefore improving the accuracy of the overall RAG system; the framework achieves comparable performance to GPT-4 with only 2K training samples (generated by GPT-4). ,
4) - limits the model reasoning output length without sacrificing performance; shows that constraining the reasoning of LLaMA2-70b to 100 words improves the accuracy from 36.01% (CoT) to 41.07% (CCoT) on GSM8K, while reducing the average output length by 28 words. ,
5) - develops a gating model that predicts if a conversational system requires RAG to improve its responses; shows that RAG-based conversational systems have the potential to generate high-quality responses and high generation confidence; it also claims to identify a correlation between the generation's confidence level and the relevance of the augmented knowledge. ,
6) - offers a comprehensive suite of LLM-based safety content moderation models built on Gemma 2; includes classifiers for key harm types such as dangerous content, toxicity, hate speech, and more. ,
7) - proposes a benchmark to evaluate persona agent capabilities in LLMs; finds that Claude 3.5 Sonnet only has a 2.97% relative improvement in PersonaScore compared to GPT 3.5 despite being a much more advanced model. ,
8) - provides a comprehensive survey on machine unlearning in generative AI. ,
9) - proposes an approach to address inefficiencies in KV cache memory consumption; it focuses on the long-context scenarios and the inference side of things; it presents a query-dependent KV cache pruning method to minimize attention weight loss while selectively pruning the least significant channels ,
10) - a survey of the current methods used to achieve refusal in LLMs; provides evaluation benchmarks and metrics used to measure abstention in LLMs. ,

Top ML Papers of the Week (July 22 - July 28) - 2024

1) - a collection of LLMs that include 8B, 70B, and 405B parameters models; supports eight languages and extends the context window to 128K tokens; performs competitively and in some cases outperforms state-of-the-art models across capabilities like general knowledge, math reasoning, and tool use. ,
2) - solved 4 out of 6 problems in this year’s IMO which is the equivalent of a silver-medal score; AlphaProof consists of a Gemini model that automatically translates natural language problem statements into formal statements (i.e., formalizer network); then a solver network searches for proofs/disproofs and progressively trains itself using AlphaZero to learn to solve even more complex problems; AlphaGeometry 2, a neuro symbolic hybrid system, proved the geometry problem; based on the Gemini model and trained from scratch on large amounts of synthetic data. ,
3) - compares RAG and long-context LLMs and finds that long-context LLMs outperform RAG on average performance while RAG is significantly less expensive; proposes Self-Route, leveraging self-reflection to route queries to RAG or LC; reports that Self-Route significantly reduces computational cost while maintaining comparable performance to LC. ,
4) - presents a platform to develop generalist agents that interact with the world through software; features include 1) an interaction mechanism for interaction between agents, interfaces, and environments, 2) an environment including a sandboxed operating system and web browser available to the agents, 3) interface to create and execute code, 4) multi-agent support, and 5) an evaluation framework. ,
5) - introduces a novel dynamic token pruning method for efficient long-context LLM inference; it can accelerate the prefilling stage of a Llama 2 7B model by 2.34x and maintain high accuracy; it selectively computes the KV for tokens that are important for the next token prediction in both the prefilling and decoding stages; it allows language models to dynamically select different subsets of tokens from the context in different generation steps, even though they might be pruned in previous steps. ,
6) - claims it is possible to iteratively fine-tune LLMs with the ability to improve their own response over multiple turns with additional environment feedback; the LLM learns to recursively detect and correct its previous mistakes in subsequent iterations; improves the self-improvement abilities of 7B models on reasoning tasks (GSM8K and MATH), attaining an improvement over turns that’s unseen in strong proprietary models. ,
7) - provides a survey on employing LLMs for Text-to-SQL tasks, including prompt engineering techniques, fine-tuning methods, benchmarks, and more. ,
8) - open-sources a large-scale multimodal interleaved dataset consisting of 1 trillion tokens which has 3.4 billion images; it also includes new sources such as PDFs and ArXiv papers. ,
9) - investigates the effects of training models on recursively generated data; finds that training on model-generated content can cause irreversible defects where the original content distribution disappears; shows that the effect, referred to as model collapse, occurs in LLMs, VAEs, and GMMs; while tested on smaller scale models (~100M params), the authors suggest this effect is highly likely to transfer to larger models over time. ,
10) - proposes a new training-free approach to mitigate hallucination in LLMs; they scaled the readout vector that constrains generation in a memory-augmented LLM decoder; recent works claim that LLMs with explicit memory mechanisms can help lower hallucination; this work uses a memory-augmented LLM and constrains generation in the decoder by applying lightweight memory primitives to reduce hallucination. ,

Top ML Papers of the Week (July 15 - July 21) - 2024

1) - iteratively trains small verifiers to predict solution correctness, helpful provers to produce correct solutions accepted by the verifier, and sneaky provers that produce incorrect solutions that fool the verifier; this process helps train models that can produce text that is correct and easy to understand by both humans and AI systems which leads to more trustworthy systems. ,
2) - presents an efficient encoding method to optimize an LLM’s understanding and reasoning capability on spreadsheets; develops a sheet compressor consisting of structural-anchor-based compression, inverse index translation, and data-format-aware aggregation modules to efficiently compress and encode spreadsheets; in GPT-4’s in-context learning, it improves performance in spreadsheet table detection by 25.6%. ,
3) - proposes an effective context compression method to reduce long context and speed up generation time in RAG systems; the long contexts are compressed into a small number of context embeddings which allow different compression rates that trade-off decoding time for generation quality; reduces inference time by up to 5.69 × and GFLOPs by up to 22 × while maintaining high performance. ,
4) - demonstrates the use of weak supervision to elicit strong reasoning capabilities in LLMs without relying on human annotations or advanced models; reports that strong models can automatically refine their training data without explicitly being trained to do so; enables expanding a model's learning scope and scaling performance on reasoning. ,
5) - a collection of prompt engineering methods for a variety of NLP tasks. ,
6) - finds that simply reformulating an LLM request into past tense can jailbreak many state-of-the-art LLMs; for example "How to make a Molotov cocktail?" can be rephrased as "How did people make a Molotov cocktail?"; finds that the success rate of such requests can increase from 1% to 88% using direct requests on GPT-4o; concludes that current alignment techniques may not always generalize as intended. ,
7) - proposes a framework (NeedleBench) of progressively challenging tasks to assess the long-context retrieval and reasoning capabilities of LLMs; they also present the Ancestral Trace Challenge that increases the need for complex logical reasoning which is common in real-world long-context tasks; their findings suggest that current LLMs struggle to handle reasoning tasks with complex logical relationships, even with texts shorter than 2K tokens. ,
8) - investigates self-supervised methods to distill high-quality outputs from System 2 techniques and then fine-tune System 1 to match the predictions of the System 2 technique but without generating intermediate steps; the process of distilling reasoning into System 1 results in less inference cost. ,
9) - shares practical tips for developing with and evaluating LLMs; solutions covered range from ReAct to RAG to parameter-efficient methods. ,
10) - provides an illustrated guide and graphical taxonomy of recent advances in non-Euclidean machine learning. ,

Top ML Papers of the Week (July 8 - July 14) - 2024

1) - proposes to adapt FlashAttention to take advantage of modern hardware; the techniques used to speed up attention on modern GPUs include producer-consumer asynchrony, interleaving block-wise matmul and softmax operations, and block quantization and incoherent processing; achieves speedup on H100 GPUs by 1.5-2.0x with FP16 reaching up to 740 TFLOPs/s (75% utilization), and with FP8 reaching close to 1.2 PFLOPs/s. ,
2) - introduces a new instruction fine-tuning framework to perform effective context ranking and answering generation to enhance an LLM’s RAG capabilities; it leverages a small ranking dataset to outperform existing expert ranking models; shows that a Llama3-RankRAG significantly outperforms Llama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks. ,
3) - introduces a parameter-efficient expert retrieval mechanism that leverages the product key technique for sparse retrieval from a million tiny experts; it attempts to decouple computational cost from parameter count by efficiently routing to a very large number of tiny experts through a learned index structure used for routing; demonstrates superior efficiency compared to dense FFW, coarse-grained MoEs, and Product Key Memory (PKM) layers. ,
4) - explores the reasoning of LLMs from a geometrical perspective; reports that a higher intrinsic dimension implies greater expressive capacity of the LLM; reports that they establish a connection between the expressive power of LLMs and the density of their self-attention graphs; their analysis demonstrates that the density of these graphs defines the intrinsic dimension of the inputs to the MLP blocks. ,
5) - proposes a new method that detects and significantly reduces contextual hallucinations in LLMs (e.g., reduces by 10% in the XSum summarization task); builds a hallucination detection model based on input features given by the ratio of attention weights on the context vs. newly generated tokens (for each attention head); the hypothesis is that contextual hallucinations are related to the extent to which an LLM attends to the provided contextual information; they also propose a decoding strategy based on their detection method which mitigates the contextual hallucination; the detector can also be transferred across models without the need for retraining. ,
6) - proposes efficient router models to dynamically select between stronger and weak LLMs during inference to achieve a balance between cost and performance; the training framework leverages human preference data and data augmentation techniques to boost performance; shows to significantly reduce costs by over 2x in certain cases while maintaining the quality of responses. ,
7) - a survey paper on Mixture of Experts (MoE), including the technical details of MoE, open-source implementations, evaluation techniques, and applications of MoE in practice. ,
8) - a new framework to address several limitations in multi-agent frameworks such as integrating diverse third-party agents and adaptability to dynamic task requirements; introduces an agent integration protocol, instant messaging architecture design, and dynamic mechanisms for effective collaboration among heterogeneous agents. ,
9) - a new pipeline for end-to-end text-to-3D asset generation in under a minute; integrates state-of-the-art components like AssetGen and TextureGen to represent 3D objects in three ways, namely view space, in volumetric space, and in UV space; achieves a win rate of 68% with respect to the single-stage model. ,
10) - proposes new sequence modeling layers with linear complexity and an expressive hidden state; defines a hidden state as an ML model itself capable of updating even on test sequence; by a linear model and a two-layer MLP based hidden state is found to match or exceed baseline models like Transformers, Mamba, and modern RNNs; the linear model is faster than Transformer at 8k context and matches Mamba in wall-clock time. ,

Top ML Papers of the Week (July 1 - July 7) - 2024

1) - presents an automated data generation pipeline to synthesize high-quality datasets for function-calling applications; shows that 7B models trained on curated datasets outperform GPT-4 models and other state-of-the-art models on the Berkeley Function-Calling Benchmark; a dataset consisting of 60K entries is also released to help with research in function-calling enabled agents. ,
2) - a new model based on GPT-4 to help write critiques for responses generated by ChatGPT; trained using RLHF using a large number of inputs that contained mistakes for which it had to critique; built to help human trainers spot mistakes during RLHF and claims that CriticGPT critiques are preferred by trainers over ChatGPT critiques in 63% of cases on naturally occurring bugs. ,
3) - shows the best practices for building effective RAG workflows; proposes strategies that focus on performance and efficiency, including emerging multimodal retrieval techniques. ,
4) - proposes 1 billion diverse personas to facilitate the creation of diverse synthetic data for different scenarios; uses a novel persona-driven data synthesis methodology to generate diverse and distinct data covering a wide range of perspectives; to measure the quality of the synthetic datasets, they performed an out-of-distribution evaluation on MATH. A fine-tuned model on their synthesized 1.07M math problems achieves 64.9% on MATH, matching the performance of gpt-4-turbo-preview at only a 7B scale. ,
5) - proposes the use of self-evaluation to defend against adversarial attacks; uses a pre-trained LLM to build defense which is more effective than fine-tuned models, dedicated safety LLMs, and enterprise moderation APIs; they evaluate different settings like attacks on the generator only and generator + evaluator combined; it shows that building a dedicated evaluator can significantly reduce the success rate of attacks. ,
6) - introduces OpenAutoEncoder-Agentless which offers an agentless system that solves 27.3% GitHub issues on SWE-bench Lite; claims to outperform all other open-source AI-powered software engineering agents. ,
7) - presents the Ctrl-G framework to facilitate control of LLM generations that reliably follow logical constraints; it combines LLMs and Hidden Markow Models to enable following logical constraints (represented as deterministic finite automata); Ctrl-G achieves over 30% higher satisfaction rate in human evaluation compared to GPT4. ,
8) - closely investigates the effects and effectiveness of synthetic data and how it shapes a model’s internal biases, calibration, attributes, and preferences; finds that LLMs are sensitive towards certain attributes even when the synthetic data prompts appear neutral; demonstrates that it’s possible to steer the generation profiles of models towards desirable attributes. ,
9) - proposes a new task, SummHay, to test a model’s ability to process a Haystack and generate a summary that identifies the relevant insights and cites the source documents; reports that long-context LLMs score 20% on the benchmark which lags the human performance estimate (56%); RAG components is found to boost performance on the benchmark, which makes it a viable option for holistic RAG evaluation. ,
10) - analyzes current agent evaluation practices and reveals shortcomings that potentially hinder real-world application; proposes an implementation that jointly optimizes cost and accuracy and a framework to avoid overfitting agents. ,

Top ML Papers of the Week (June 24 - June 30) - 2024

1) - a new LLM-based biological model that generates a new green fluorescent protein called esmGFP; builds on a bidirectional transformer, uses masked language models for the objective function, leverages geometric attention to represent atomic coordinates, and applies chain-of-thought prompting to generate fluorescent proteins; estimates that esmGFP represents an equivalent of over 500 million years of natural evolution performed by an evolutionary simulator. ,
2) - presents a family of open models ranging between 2B to 27B parameters; demonstrates strong capabilities in reasoning, math, and code generation, outperforming models twice its size. ,
3) - a suite of open pre-trained models (7B and 13B parameters) designed for code optimization tasks; it’s built on top of Code Llama and trained on a corpus of 546 billion tokens of LLVM-IR and assembly code; it’s also instruction fine-tuned to interpreter compiler behavior; achieves 77% of the optimizing potential of autotuning search and performs accurate disassembling 14% of the time compared to the autotuning technique on which it was trained. ,
4) - proposes LongRAG, which combines RAG with long-context LLMs to enhance performance; uses a long retriever to significantly reduce the number of extracted units by operating on longer retrieval units; the long reader takes in the long retrieval units and leverages the zero-shot answer extraction capability of long-context LLMs to improve performance of the overall system; claims to achieve 64.3% on HotpotQA (full-wiki), which is on par with the state-of-the-art model. ,
5) - proposes a fine-tuning approach to improve the accuracy of retrieving information in LLMs while maintaining reasoning capabilities over long-context inputs; the fine-tuning dataset comprises numerical dictionary key-value retrieval tasks (350 samples); finds that this approach mitigates the "lost-in-the-middle" phenomenon and improves performance on both information retrieval and long-context reasoning. ,
6) - proposes a graph-based agent system to enhance the long-context abilities of LLMs; it structures long text into a graph and employs an agent to explore the graph (using predefined functions guided by a step-by-step rational plan) to effectively generate answers for questions; consistently outperforms GPT-4-128k across context lengths from 16k to 256k. ,
7) - presents a context-aware dynamic draft tree to increase the speed of inference; the previous speculative sampling method used a static draft tree for sampling which only depended on position but lacked context awareness; achieves speedup ratios ranging from 3.05x-4.26x, which is 20%-40% faster than previous work; these speedup ratios occur because the new method significantly increases the number of accepted draft tokens. ,
8) - presents an approach for how to deal with length bias and train instruction following language models that better follow length constraint instructions; fine-tunes a model using DPO with a length instruction augmented dataset and shows less length constraint violations and while keeping a high response quality. ,
9) - survey on LLM-based synthetic data generation, curation, and evaluation. ,
10) - a new optimizer that reduces memory footprint (45%-50% less memory footprint) by using fewer learning rates and achieves on-par or even outperforms AdamW; it carefully partitions parameters into blocks and assigns a single high-quality learning that outperforms Adam; achieves consistent results on language models sized from 125M -7B for pre-training, SFT, and RLHF. ,

Top ML Papers of the Week (June 17 - June 23) - 2024

1) - a new model that achieves state-of-the-art performance on several common benchmarks such as MMLU and HumanEval; it outperforms Claude 3 Opus and GPT-4o on several benchmarks with the exception of math word problem-solving tasks; achieves strong performance on vision tasks which also helps power several new features like image-text transcription and generation of artifacts. ,
2) - competes with closed-sourced models on code and math generation tasks; achieves 90.2% on HumanEval and 75.7% on MATH; these results are higher than GPT-4-Turbo-0409 performance according to their report; includes a 16B and 236B parameter model with 128K context length. ,
3) - a new framework for automatic differentiation through backpropagation on textual feedback provided by an LLM; this improves individual components and the natural language helps to optimize the computation graph; it works by providing an objective function without tuning prompts or components; claims to achieve LeetCodeHard best scores and SoTA performance on GPQA when combined with GPT4o. ,
4) - conducts a deep performance analysis of long-context LLMs on in-context retrieval and reasoning; they first present a benchmark with real-world tasks requiring 1M token context; reports that long-context LLMs can rival state-of-the-art retrieval and RAG systems, without any explicit training on the tasks; suggests that compositional reasoning (required in SQL-like tasks) is still challenging for these LLMs; they also encourage the need for continued research on advanced prompting strategies as they noted significant boosts in performance when applying them for long context problems. ,
5) - enhances decision making with a new RAG technique called iterative plan-then-RAG (PlanRAG); involves two steps: 1) an LM generates the plan for decision making by examining data schema and questions and 2) the retriever generates the queries for data analysis; the final step checks if a new plan for further analysis is needed and iterates on previous steps or makes a decision on the data; PlanRAG is found to be more effective than iterative RAG on the proposed Decision QA tasks. ,
6) - presents a modification of the next-token prediction objective called goldfish loss to help mitigate the verbatim generation of memorized training data; it uses a simple technique that excludes a pseudorandom subset of training tokens at training time; they show that the goldfish loss resists memorization and keeps the model useful; however, it may need to train for longer to more effectively learn from the training data. ,
7) - report to have achieved GPT-4 level mathematical olympiad solution using an approach that integrates LLMs with Monte Carlo Tree Search; this approach focuses on enhancing the mathematical reasoning performance of the system through capabilities such as systematic exploration, self-refinement, and self-evaluation. ,
8) - investigates more closely how LLMs utilize external knowledge over parametric information for factual queries; finds that in a RAG pipeline, LLMs take a “shortcut” and display a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory. ,
9) - an open-source video generation model that can generate 16-second 720p videos; it’s a 1.1B parameter model trained on more than 30m data and now supports image-to-video; presents an enhanced diffusion model and video compression network for spatial and temporal compression; increases controllability of generations and reduces training costs. ,
10) - proposes an inference-time tree search algorithm for LM agents to perform exploration and enable multi-step reasoning; it’s tested on interactive web environments and applied to GPT-4o to significantly improve performance; demonstrates that performance scales when increasing test-time compute. ,

Top ML Papers of the Week (June 10 - June 16) - 2024

1) - provides an instruct model to generate high-quality data and a reward model to filter out data on several attributes; demonstrates strong performance on common benchmarks like MMLU and GSM8K; it’s competitive with GPT-4 on several tasks, including high scores in multi-turn chat; a preference data is also released along with the base model. ,
2) - proposes LLM-driven objective discovery of state-of-the-art preference optimization; no human intervention is used and an LLM is prompted to propose and implement the preference optimization loss functions based on previously evaluated performance metrics; discovers an algorithm that adaptively combined logistic and exponential losses. ,
3) - a framework to enhance an LLM-based agent's capabilities to achieve high-level goals; adaptively breaks down a high-level goal into a tree structure of practical subgoals during interaction with the environment; improves performance on various tasks, including competitive, cooperative, and deferred feedback environments ,
4) - an approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents methodology; layers are designed with multiple LLM agents and each agent builds on the outputs of other agents in the previous layers; surpasses GPT-4o on AlpacaEval 2.0, MT-Bench and FLASK. ,
5) - a new hybrid architecture that enables tokens in the LLM to cross-attend to node embeddings from a GNN-based neural algorithmic reasoner (NAR); the resulting model, called TransNAR, demonstrates improvements in OOD reasoning across algorithmic tasks ,
6) - improves an LLM’s ability to effectively acquire new knowledge from raw documents through self-teaching; the three steps involved are 1) a self-teaching component that augments documents with a set of knowledge-intensive tasks focusing on memorization, comprehension, and self-reflection, 2) uses the deployed model to acquire knowledge from new documents while reviewing its QA skills, and 3) the model is configured to continually learn using only the new documents which helps with thorough acquisition of new knowledge. ,
7) - a framework that enables a multimodal LLM to access a visual sketchpad and tools to draw on the sketchpad; it can equip a model like GPT-4 with the capability to generate intermediate sketches to reason over complex tasks; improves performance on many tasks over strong base models with no sketching; GPT-4o equipped with SketchPad sets a new state of the art on all the tasks tested. ,
8) - proposes an approach to significantly reduce hallucination (10x) by tuning millions of expert adapters (e.g., LoRAs) to learn exact facts and retrieve them from an index at inference time; the memory experts are specialized to ensure faithful and factual accuracy on the data it was tuned on; claims to enable scaling to a high number of parameters while keeping the inference cost fixed. ,
9) - introduces Table-LLaVa 7B, a multimodal LLM for multimodal table understanding; it’s competitive with GPT-4V and significantly outperforms existing MLLMs on multiple benchmarks; also develops a large-scale dataset MMTab, covering table images, instructions, and tasks. ,
10) - proposes an approach to tune an LLM to effectively utilize information from the middle part of the context; it first proposes a training-efficient method to extend LLMs to longer context lengths (e.g., 4K -> 256K); it uses a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning; the approach helps to alleviate the so-called "Lost-in-the-Middle" problem in long-context LLMs. ,

Top ML Papers of the Week (June 3 - June 9) - 2024

1) - proposes a massive multilingual model that leverages transfer learning across 200 languages; it’s based on a sparsely Gated Mixture of Experts architecture and trained on data via an approach tailored for low-resource languages; evaluates on 40K translations and achieves an average of 44% improvement in translation quality. ,
2) - proposes a new scalable method based on sparse autoencoders to extract around 16 million interpretable patterns from GPT-4; the method demonstrates predictable scaling and is more efficient than previous techniques. ,
3) - a new architecture that combines state space models (SSMs) and structured attention; it uses 8x larger states and trains 50% faster; the new state space duality layer is more efficient and scalable compared to the approach used in Mamba; it also improves results on tasks that require large state capacity. ,
4) - proposes an implementation that eliminates matrix multiplication operations from LLMs while maintaining performance at billion-parameter scales; the performance between full precision Transformers and the MatMul-free models narrows as the model size increases; claims that by using an optimized kernel during inference, memory consumption is reduced by more than 10x. ,
5) - presents a thought-augmented reasoning approach to enhance the accuracy, efficiency, and robustness of LLM-based reasoning; it leverages a meta-buffer containing high-level thoughts (thought templates) distilled from problem-solving processes; the relevant thought template is then retrieved and instantiated with task-specific reasoning structures for the thought-augmented reasoning process; it demonstrates SOTA performance on 10 challenging tasks while requiring 12% of the cost of multi-query prompting methods like Tree-of-Thoughts. ,
6) - a training framework to teach LLMs to express more accurate fine-grained confidence estimates and self-reflective rationales; it performs supervised finetuning on a dataset that contains summaries of the difference between multiple reasoning chains; reinforcement learning is then applied to calibrate confidence estimates, encouraging the LLM to produce accurate, high-confidence predictions and penalize overconfidence in erroneous outputs. ,
7) - studies the geometry of categorical concepts and how the hierarchical relations between them are encoded in LLMs; finds that simple categorical concepts are represented as simplices by the LLMs and complex concepts are represented as polytopes constructed from direct sums of simplices, which reflect the hierarchical structure. ,
8) - proposes a method to align LLMs to a specific setting via a very small number of demonstrations as feedback; it aligns LLM outputs to a user’s demonstrated behaviors and can learn fine-grained style and task alignment across domains; outperforms few-shot prompting, SFT, and self-play methods on the tested benchmarks. ,
9) - provides an overview of methods used for alignment of LLMs; explores the 4 following directions: 1) aligning through inductive bias, 2) aligning through behavior imitation, 3) aligning through model feedback, and 4) aligning through environment feedback. ,
10) - a new framework featuring various environments and tasks for broad, real-time, and concurrent agent exploration; builds a generally capable LLM-based agent with self-evolution abilities and explores its potential beyond previously seen data across tasks and environments. ,

Top ML Papers of the Week (May 27 - June 2) - 2024

1) - proposes a new position encoding method, CoPE, to enable the position to be conditioned on context by incrementing position only on certain tokens; the position encoding is context-dependent and can represent different levels of position abstraction; the general position encoding method can attend to the i-th particular word, noun, or sentence; improves perplexity on language modeling and coding tasks. ,
2) - proposes a method that improves the logical reasoning capabilities of LLMs by integrating symbolic expressions and logical rules with chain-of-thought (CoT) prompting; the prompting technique is called Symbolic Chain-of-Thought and it’s a fully LLM-based framework with the following key steps: 1) translates natural language context to symbolic format, 2) derives step-by-step plan to solve problems following symbolic logical rules, and 3) uses a verifier to check the translation and reasoning chain. ,
3) - achieves 99% accuracy on 100-digit addition problems by training on only 20-digit numbers with a single GPU; the main challenge this work addresses is the inability of transformers to track the exact position of digits; they do this by adding an embedding to each digit that encodes its position relative to the start of the number; these gains also transfer to multi-step reasoning tasks that include sorting and multiplication. ,
4) - presents an introduction to vision-language models along with key details of how they work and how to effectively train these models. ,
5) - combines the language understanding abilities of LLMs with the reasoning abilities of GNNs in a RAG style; the GNN extracts useful and relevant graph information while the LLM takes the information and leverages its capabilities to perform question answering over knowledge graphs (KGQA); GNN-RAG improves vanilla LLMs on KGQA and outperforms or matches GPT-4 performance with a 7B tuned LLM. ,
6) - presents a new attention mechanism that can be trained in parallel (like Transformers) and be updated efficiently with new tokens requiring constant memory usage for inferences (like RNNs); the attention formulation is based on the parallel prefix scan algorithm which enables efficient computation of attention’s many-to-many RNN output; achieves comparable performance to Transformers on 38 datasets while being more time and memory-efficient. ,
7) - a family of multilingual language models that can serve up to 23 languages; it intentionally focuses on fewer languages and allocates more capacity to these languages; shows that it can outperform other massive multimodal models on those specific languages. ,
8) - claims that long-LLMs are not a necessity to solve long-context tasks; proposes a reasoning framework to enable short-LLMs to address long-context tasks by adaptively accessing and utilizing the context based on the presented tasks; it decomposes the long context into short contexts and processes them using a decision-making process. ,
9) - claims that LLMs can generate useful insights from its analysis of trends and financial ratios; shows that GPT-4 performs on par with narrowly specialized models; and achieves a profitable trading strategy based on GPT’s predictions. ,
10) - a simpler and more effective approach for preference optimization with a reference-free reward; uses the average log probability of a sequence as an implicit reward (i.e., no reference model required) which makes it more compute and memory efficient; demonstrates that it outperforms existing approaches like DPO and claims to produce the strongest 8B open-source model. ,

Top ML Papers of the Week (May 20 - May 26) - 2024

1) - presents an effective method to extract millions of abstract features from an LLM that represent specific concepts; these concepts could represent people, places, programming abstractions, emotion, and more; reports that some of the discovered features are directly related to the safety aspects of the model; finds features directly related to security vulnerabilities and backdoors in code, bias, deception, sycophancy; and dangerous/criminal content, and more; these features are also used to intuititively steer the model’s output. ,
2) - introduces a parametric world knowledge model to facilitate agent planning; the agent model can self-synthesize knowledge from expert and sampled trajectories; this is used to train the world knowledge model; prior task knowledge is used to guide global planning and dynamic state knowledge is used to guide the local planning; demonstrates superior performance compared to various strong baselines when adopting open-source LLMs like Mistral-7B and Gemma-7B. ,
3) - analyzes the risks and opportunities of open-source generative AI models; argues that the overall benefits of open-source generative AI outweigh its risks. ,
4) - proposes a hierarchical reasoning aggregation framework for improving the reasoning capabilities of LLMs; the approach, called Aggregation of Reasoning (AoR), selects answers based on the evaluation of reasoning chains; AoR uses dynamic sampling to adjust the number of reasoning chains with respect to the task complexity; it uses results from the evaluation phase to determine whether to sample additional reasoning chains; a known flaw of majority voting is that it fails in scenarios where the correct answer is in the minority; AoR focuses on evaluating the reasoning chains to improve the selection of the final answer; AoR outperforms various prominent ensemble methods and can be used with various LLMs to improve performance on complex reasoning tasks. ,
5) - presents an opinion paper addressing important questions to understand the proximity to artificial general intelligence (AGI); it provides a summary of strategies necessary to achieve AGI which includes a detailed survey, discussion, and original perspectives.
6) - proposes a layer-condensed KV cache to achieve efficient inference in LLMs; only computes and caches the key-values (KVs) of a small number of layers which leads to saving memory consumption and improved inference throughput; can achieve up to 26x higher throughput than baseline transformers while maintaining satisfactory performance. ,
7) - provides guidance and lessons for evaluating large language models; discusses challenges and best practices, along with the introduction of an open-source library for evaluating LLMs. ,
8) - presents INDUS, a comprehensive suite of LLMs for Earth science, biology, physics, planetary sciences, and more; includes an encoder model, embedding model, and small distilled models. ,
9) - introduces an approach to generate Lean 4 proof data from high-school and undergraduate-level mathematical competition problems; it uses the synthetic data, comprising of 8 million formal statements and proofs, to fine-tune a DeepSeekMath 7B model; achieves whole-proof generation accuracies of 46.3% with 64 samples and 52% cumulatively on the Lean 4 miniF2F test; this surpasses the baseline GPT-4 (23.0%) with 64 samples and a tree search RL method (41.0%). ,
10) - provides a comprehensive and systematic survey of the current state of efficient multimodal large language models; discusses efficient structures and strategies, applications, limitations, and promising future directions. ,

Top ML Papers of the Week (May 13 - May 19) - 2024

1) - a new model with multimodal reasoning capabilities with real-time support across audio, vision, and text; it can accept as input any combination of text, audio, image, and video to generate combinations of text, audio, and image outputs; it’s reported to match GPT-4 Turbo performance while being 50% much faster and cheaper via APIs. ,
2) - a lightweight transformer decoder model with a 2M context window with multimodal capabilities; it is designed for efficiency and yields the fastest output generation of all models on several evaluated languages; overall, Gemini 1.5 Flash performs uniformly better compared to Gemini 1.0 Pro and even performs at a similar level to 1.0 Ultra on several benchmarks. ,
3) - Google Deepmind’s most capable video generation model generates high-quality, 1080p resolution videos beyond 1 minute; it supports masked editing on videos and can also generate videos with an input image along with text; the model can extend video clips to 60 seconds and more while keeping consistency with its latent diffusion transformer. ,
4) - a family of token-based mixed-modal models for generating images and text in any arbitrary sequence; reports state-of-the-art performance in image captioning and outperforms Llama 2 in text-only tasks and is also competitive with Mixtral 8x7B and Gemini-Pro; exceeds the performance of Gemini Pro and GPT-4V on a new long-form mixed-modal generation evaluation. ,
5) - studies the impact of fine-tuning on new knowledge on the hallucination tendencies of LLMs; the setup includes fine-tuning examples that include new knowledge; shows that LLMs struggle to acquire new factual knowledge via fine-tuning; also finds that as new knowledge is learned it increases the model’s tendency to hallucinate. ,
6) - trains a hypernetwork taking a tokenizer as input and predicting the corresponding embeddings; it demonstrates generalization to new tokenizers both with encoder and decoder LLMs; reports that the method achieves performance close to the original models' performance in cross-lingual and coding tasks while reducing the length of the tokenized sequence. ,
7) - leverages LLMs to connect task-specific models for audio content creation and editing; decomposes users' instructions into several tasks and tackles each task collaboratively with the particular module; it can enable users to interact and produce audio content without explicit commands
8) - provides an easily reproducible recipe for online iterative RLHF; discusses theoretical insights and algorithmic principles of online iterative RLHF and practical implementation. ,
9) - a decoder-decoder LLM architecture that only caches key-value pairs once; it involves a cross-decoder stacked upon a self-decoder which efficiently encodes global key-value caches and the cross-encoder reuses the cache via cross-attention; this leads to a significant reduction in GPU memory use without sacrificing capabilities; achieves comparable performance to Transformer in various settings of scaling up model size and number of training token. ,
10) - presents a method for creating anything in 3D by simulating the real-world capture process using a multi-view diffusion model; it can generate consistent novel views of a scene which can be used as input to 3D reconstruction techniques to produce 3D representation rendered in real-time; the scene from CAT3D can be generated in less than one minute and is reported to outperform existing methods on single image and few-view 3D scene creation tasks. ,

Top ML Papers of the Week (May 6 - May 12) - 2024

1) -releases a new state-of-the-art model for accurately predicting the structure and interactions of molecules; it can generate the 3D structures of proteins, DNA, RNA, and smaller molecules; the model is an improved version of the Evoformer module and then assembling its predictions using a diffusion network; the diffusion process starts with a cloud of atoms which converges to its final molecular structure. ,
2) - attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs; to enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new memory mixing mechanism (termed sLSTM); to enhance the storage capacities of LSTMs, they add a matrix memory and a covariance update rule (termed mLSTM); Both the sLSTM and xLSTM cells stabilize their exponential gates using the same technique; these extensions lead to xLSTM blocks that are residually stacked into the final xLSTM architecture; compared to Transformers, xLSTMs have a linear computation and constant memory complexity concerning the sequence length; the xLSTM architecture is shown to be efficient at handling different aspects of long context problems; achieves better validation perplexities when compared to different model classes like Transformers, SSMs, and RNNs. ,
3) -a strong MoE model comprising 236B parameters, of which 21B are activated for each token; supports a context length of 128K tokens and uses Multi-head Latent Attention (MLA) for efficient inference by compressing the Key-Value (KV) cache into a latent vector; DeepSeek-V2 and its chat versions achieve top-tier performance among open-source models. ,
4) - enhances LLMs with Monte Carlo Tree Search (MCTS) to improve mathematical reasoning capabilities; the MCTS framework extends the LLM to achieve a more effective balance between exploration and exploitation; for this work, the idea is to generate high-quality math reasoning data without professional human annotations; the assumption is that a well pre-trained LLM already possesses mathematical knowledge to generate reasoning steps but needs better stimulation such as an advanced prompting or search strategy; unlike other methods such as Program-of-thought and Chain-of-thought, no solutions are required for the training data, just the math questions and the answers; the integration of LLMs, a value model, and the MCTS framework enables an effective and autonomous process of generating high-quality math reasoning data; the value model also aids the policy model in searching for effective solution paths. ,
5) - investigates using LLMs to automate and accelerate sim-to-real design; it requires the physics simulation for the target task and automatically constructs reward functions and domain randomization distributions to support real-world transfer; discovers sim-to-real configurations competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. ,
6) - proposes efficient parallel decoders that reduce inference latency by decoding n-token sequence per inference step; the inspiration for this work comes from the human's ability to form complete sentences before articulating word by word; this process can be mimicked and learned through fine-tuning pre-trained LLMs to perform parallel decoding; it is trained to perform parallel decoding by mapping randomly initialized n-token sequences to the same result yielded by autoregressive (AR) decoding in as few steps as possible; a consistency loss helps with multiple-token prediction and a standard AR loss prevents deviation from the target LLM and ensures generation quality. Shows 2.4x to 3.4x improvements in generation speed while preserving the generation quality. ,
7) - develops an approach to understanding the effects of numeric deviation and applies it to the widely-adopted Flash Attention optimization; finds that Flash Attention sees roughly an order of magnitude more numeric deviation as compared to Baseline Attention at BF16. ,
8) - presents an overview of generative methodologies in video generation, where world models facilitate the synthesis of highly realistic visual content; examines challenges and limitations of world models, and discusses their potential future directions. ,
9) - harvest 10 million naturally existing instruction data from the pre-training web corpus to enhance LLM reasoning; the approach first recalls relevant documents, extracts instruction-response pairs, and then refines the extracted pairs using open-source LLMs; MAmmoTH2-7B's (Mistral) performance increases from 11% to 34% on MATH and from 36% to 67% on GSM8K. ,
10) -introduce Granite, a series of code models trained with code written in 116 programming languages; it consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from application modernization tasks to on-device memory-constrained use cases; demonstrates that the models reach state-of-the-art performance among available open-source code LLMs. , ,

Top ML Papers of the Week (April 29 - May 5) - 2024

1) - proposes Kolmogorov-Arnold Networks (KANs) as alternatives to Multi-Layer Perceptrons (MLPs); KANs apply learnable activation functions on edges that represent the weights; with no linear weights used, KANs can outperform MLPs and possess faster neural scaling laws; the authors show that KANs can be used as collaborators to help scientists discover mathematics and physical laws. ,
2) - proposes a multi-token prediction approach that performs language modeling by training the predict the following n tokens using n independent output heads; the output heads operate on top of a shared transformer trunk; multi-token prediction is shown to be useful when using larger model sizes and can speed up inference up to 3x; the proposed 13B parameter models solves 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models. ,
3) - presents a family of multimodal models specialized in medicines and based on the strong multimodal and long-context reasoning capabilities of Gemini; achieves state-of-the-art performance on 10/14 benchmarks surpassing GPT-4 models; it achieves 91% accuracy on MedQA (USMLE) benchmark using an uncertainty-guided search strategy. ,
4) - presents an approach to train LLMs to effectively utilize information retrieval; it first proposes a training approach to teach an LLM to generate a special token, , when it's not confident or doesn't know the answer to a question; the fine-tuned model outperforms a base LLM in two fixed alternate settings that include never retrieving and always retrieving context ,
5) - covers the most important recent developments in RAG and RAU systems; it includes evolution, taxonomy, and an analysis of applications; there is also a section on how to enhance different components of these systems and how to properly evaluate them; it concludes with a section on limitations and future directions. ,
6) - open-source Prometheus 2 (7B & 8x7B), state-of-the-art open evaluator LLMs that closely mirror human and GPT-4 judgments; they support both direct assessments and pair-wise ranking formats grouped with user-defined evaluation criteria; according to the experimental results, this open-source model seems to be the strongest among all open-evaluator LLMs; the key seems to be in merging evaluator LMs trained on either direct assessment or pairwise ranking formats. ,
7) - proposes a self-play-based method for aligning language models; this optimation procedure treats the problem as a constant-sum two-player game to identify the Nash equilibrium policy; it addresses the shortcomings of DPO and IPO and effectively increases the log-likelihood of chose responses and decreases the rejected ones; SPPO outperforms DPO and IPO on MT-Bench and the Open LLM Leaderboard. ,
8) - presents a technical introduction to current techniques used to interpret the inner workings of Transformer-based language models; it provides a detailed overview of the internal mechanisms implemented in these models. ,
9) - provides an overview of the recent advances in identifying, evaluating, and mitigating hallucination in multimodal LLMs; it also provides an overview of causes, evaluation benchmarks, metrics, and other strategies to deal with challenges related to detecting hallucinations. ,
10) - studies the behavior in-context learning of LLMs at extreme context lengths with long-context models; shows that performance increases as hundreds or thousands of demonstrations are used; demonstrates that long-context ICL is less sensitive to random input shuffling than short-context ICL; concludes that the effectiveness of long-context LLMs is not due to task learning but from attending to similar examples. ,

Top ML Papers of the Week (April 22 - April 28) - 2024

1) - a new 3.8B parameter language model called phi-3-mini trained on 3.3 trillion tokens and is reported to rival Mixtral 8x7B and GPT-3.5; has a default context length of 4K but also includes a version that is extended to 128K (phi-mini-128K); combines heavily filtered web data and synthetic data to train the 3.8B models; it also reports results on 7B and 14B models trained on 4.8T tokens (phi-3-small and phi-3-medium) ,
2) - a new open language model that employs a layer-wise scaling strategy to efficiently allocate parameters and leading to better efficiency and accuracy; comes with different sizes such as 270M, 450M, 1.1B, and 3B; achieves a 2.36% improvement in accuracy compared to OLMo while requiring 2× fewer pre-training tokens. ,
3) - an open-source LLM (Apache 2.0 license.) that uses a unique Dense-MoE Hybrid transformer architecture; performs on par with Llama3 70B in enterprise metrics like coding (HumanEval+ & MBPP+), SQL (Spider) and instruction following (IFEval); claims to use 17x less compute budget than Llama 3 70B; the training compute is roughly under $2 million (less than 3K GPU weeks). ,
4) - presents an approach to overcome the lost-in-the-middle challenge common in LLMs. It applies an explicit "information-intensive" training procedure on Mistral-7B to enable the LLM to fully utilize the context. It leverages a synthetic dataset where the answer requires fine-grained information awareness on a short segment (∼128 tokens) within a synthesized long context (4K−32K tokens), and 2) the integration and reasoning of information from two or more short segments. The resulting model, FILM-7B (Fill-in-the-Middle), shows that it can robustly retrieve information from different positions in its 32K context window. ,
5) - a large-scale web dataset containing 15 trillion tokens for training language models; filters and deduplicates CommonCrawl between 2013 and 2024 and the goal is to improve the quality of the data. ,
6) - achieves precision editing of the human genome with a programmable gene editor design with an AI system powered by an LLM trained on biological diversity at scale. ,
7) - Combines LLMs with crawlers with the goal of helping crawlers handle diverse and changing web environments more efficiently; the web crawler agent leverages the hierarchical structure of HTML for progressive understanding; employs top-down and step-back operations, and leverages the DOM tree structure, to generate a complete and executable crawler. ,
8) - provides a comprehensive overview of the latest advancements for Graph ML in the era of LLMs; covers the recent developments in Graph ML, how LLM can enhance graph features, and how it can address issues such as OOD and graph heterogeneity. ,
9) - provides a comprehensive survey on self-evolution approaches in LLMs. ,
10) - trains an LLM to have the ability to inspect the execution traced of programs and reason about run-time behavior via synthetic chain-of-thought rationales; improves the fix rate of a PaLM 2 model on MBPP and Human by 26.1% and 14.3%; the model also shows that it can generalize to unknown scenarios. ,

Top ML Papers of the Week (April 15 - April 21) - 2024

1) - a family of LLMs that include 8B and 70B pretrained and instruction-tuned models; Llama 3 8B outperforms Gemma 7B and Mistral 7B Instruct; Llama 3 70 broadly outperforms Gemini Pro 1.5 and Claude 3 Sonnet. ,
2) - a new open-source sparse mixture-of-experts model that reports that compared to the other community models, it delivers the best performance/cost ratio on MMLU; shows strong performance on reasoning, knowledge retrieval, maths, and coding. ,
3) - attempts to replicate the third estimation procedure of the compute-optimal scaling law proposed in Hoffmann et al. (2022) (i.e., Chinchilla scaling); finds that “the reported estimates are inconsistent with their first two estimation methods, fail at fitting the extracted data, and report implausibly narrow confidence intervals.” ,
4) - aims to quantify the tug-of-war between RAG and LLMs' internal prior; it focuses on GPT-4 and other LLMs on question answering for the analysis; finds that providing correct retrieved information fixes most of the model mistakes (94% accuracy); when the documents contain more incorrect values and the LLM's internal prior is weak, the LLM is more likely to recite incorrect information; the LLMs are found to be more resistant when they have a stronger prior. ,
5) - presents a comprehensive overview of the RAG domain, its evolution, and challenges; it includes a detailed discussion of four important aspects of RAG systems: pre-retrieval, retrieval, post-retrieval, and generation. ,
6) - investigates the expressive power of state space models (SSMs) and reveals that it is limited similar to transformers in that SSMs cannot express computation outside the complexity class 𝖳𝖢^0; finds that SSMs cannot solve state-tracking problems like permutation composition and other tasks such as evaluating code or tracking entities in a long narrative. ,
7) - discusses how to deploy an efficient RAG system for structured output tasks; the RAG system combines a small language model with a very small retriever; it shows that RAG can enable deploying powerful LLM-powered systems in limited-resource settings while mitigating issues like hallucination and increasing the reliability of outputs. ,
8) - presents a concise summary of emerging AI agent architectures; it focuses the discussion on capabilities like reasoning, planning, and tool calling which are all needed to build complex AI-powered agentic workflows and systems; the report includes current capabilities, limitations, insights, and ideas for future development of AI agent design. ,
9) - analyzes the in-context recall performance of different LLMs using several needle-in-a-haystack tests; shows various LLMs recall facts at different lengths and depths; finds that a model's recall performance is significantly affected by small changes in the prompt; the interplay between prompt content and training data can degrade the response quality; the recall ability of a model can be improved with increasing size, enhancing the attention mechanism, trying different training strategies, and applying fine-tuning. ,
10) - a survey paper on state space models (SSMs) with experimental comparison and analysis; it reviews current SSMs, improvements compared to alternatives, challenges, and their applications. ,

Top ML Papers of the Week (April 8 - April 14) - 2024

1) - integrates compressive memory into a vanilla dot-product attention layer; the goal is to enable Transformer LLMs to effectively process infinitely long inputs with bounded memory footprint and computation; proposes a new attention technique called Infini-attention which incorporates a compressive memory module into a vanilla attention mechanism; it builds in both masked local attention and long-term linear attention into a single Transformer block; this allows the Infini-Transformer model to efficiently handle both long and short-range contextual dependencies; outperforms baseline models on long-context language modeling with a 114x compression ratio of memory. ,
2) - proposes an open-vocabulary benchmark dataset to measure the capabilities of AI models to perform embodied question answering (EQA); it contains 1600 human-generated questions composed from 180 real-world environments; also provides an LLM-powered evaluation protocol for the task and shows that models like GPT-4V are significantly behind human-level performance. ,
3) - a family of open code LLMs based on Gemma; CodeGemma 7B models excel in mathematical reasoning and match the code capabilities of other open models; the instruction-tuned CodeGemma 7B model is the more powerful model for Python coding as assessed via the HumanEval benchmark; results also suggest that the model performs best on GSM8K among 7B models; the CodeGemma 2B model achieves SoTA code completion and is designed for fast code infilling and deployment in latency-sensitive settings. ,
4) - applies knowledge distillation to a small LM with rationales generated by the large LM with the hope of narrowing the gap in reasoning capabilities; the rationale is generated by the lightweight LM and the answer prediction is then left for the frozen large LM; this resource-efficient approach avoids the need to fine-tune the large model and instead offloads the rationale generation to the small language model; the knowledge-distilled LM is further optimized with reinforcement learning using several rational-oriented and task-oriented reward signals; the LM-guided CoT prompting approach proposed in this paper outperforms both standard prompting and CoT prompting. Self-consistency decoding also enhances performance. ,
5) - an overview by Google DeepMind on synthetic data research, covering applications, challenges, and future directions; discusses important topics when working with synthetic data such as ensuring quality, factuality, fidelity, unbiasedness, trustworthiness, privacy, and more. ,
6) - presents an approach for general reasoning and search on tasks that can be decomposed into components; the proposed graph-based framework, THOUGHTSCULPT, incorporates iterative self-revision capabilities and allows an LLM to build an interwoven network of thoughts; unlike other approaches such as Tree-of-thoughts that shape the reasoning process using a tree, this new approach incorporates Monte Carlo Tree Search (MCTS) to efficiently navigate the search space; due to its ability for continuous thought iteration, THOUGHTSCULPT is particularly suitable for tasks such as open-ended generation, multip-step reasoning, and creative ideation. ,
7) - a survey on multilingual LLMs including a thorough review of methods, a taxonomy, emerging frontiers, challenges, and resources to advance research ,
8) - investigates knowledge capacity scaling laws where it evaluates a model’s capability via loss or benchmarks, to estimate the number of knowledge bits a model stores; reports that "Language models can and only can store 2 bits of knowledge per parameter, even when quantized to int8, and such knowledge can be flexibly extracted for downstream applications. Consequently, a 7B model can store 14B bits of knowledge, surpassing the English Wikipedia and textbooks combined based on our estimation." ,
9) - proposes techniques to align LLMs to leverage memorized information quotes directly from pre-training data; the alignment approach is not only able to generate high-quality quoted verbatim statements but overall preserve response quality; it leverages a synthetic preference dataset for quoting without any human annotation and aligns the target model to quote using preference optimization. ,
10) - aims to quantify the degree of influence between 23 fields of study and NLP; the cross-field engagement of NLP has declined from 0.58 in 1980 to 0.31 in 2022; the study also finds that NLP citations are dominated by CS which accounts for over 80% of citations with emphasis on AI, ML, and information retrieval; overall, NLP is growing more insular -- higher growth of intra-field citation and a decline in multidisciplinary works. ,

Top ML Papers of the Week (April 1 - April 7) - 2024

1) - proposes a jailbreaking technique called many-shot jailbreaking to evade the safety guardrails of LLMs; this jailbreaking technique exploits the longer context window supported by many modern LLMs; it includes a very large number of faux dialogues (~256) preceding the final question which effectively steers the model to produce harmful responses. ,
2) - a new open-source agentic system that can automatically solve GitHub issues with similar accuracy as Devin on the SWE-bench; the agent interacts with a specialized terminal and enables important processing of files and executable tests to achieve good performance; on SWE-bench, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set. ,
3) - demonstrates that transformer models can learn to efficiently and dynamically allocate FLOPs to specific positions in a sequence; this helps to optimize the allocation along the sequence for different layers across model depth; findings suggest that for a given FLOP budget models can be trained to perform faster and better than their baseline counterparts. ,
4) - finds that after evaluating 13 long-context LLMs on long in-context learning the LLMs perform relatively well under the token length of 20K. However, after the context window exceeds 20K, most LLMs except GPT-4 will dip dramatically. ,
5) - inspired by a human cognitive capacity to imagine unseen worlds, this new work proposes Visualization-of-Thought (VoT) prompting to elicit spatial reasoning in LLMs; VoT enables LLMs to "visualize" their reasoning traces, creating internal mental images, that help to guide subsequent reasoning steps; when tested on multi-hop spatial reasoning tasks like visual tiling and visual navigation, VoT outperforms existing multimodal LLMs. ,
6) - finds that a simple layer-pruning strategy of popular open-weight pretraining LLMs shows minimal performance degradation until after a large fraction (up to half) of the layers are removed; using a layer similarity mechanism optimal blocks are identified and pruned followed by a small amount of fine-tuning to heal damage ,
7) - an 8B model trained with less than $ 0.1 million cost but outperforms LLaMA2-7B; shows that LLM training can be much cheaper than generally thought; JetMoE-8B has 24 blocks where each block has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP Experts (MoE); each MoA and MoE layer has 8 experts, and 2 experts are activated for each input token with 2.2B active parameters. ,
8) - proposes a method for representation fine-tuning (ReFT) that operates on a frozen base model and learns task-specific interventions on hidden representations; in other words, by manipulating a small fraction of model representations it is possible to effectively steer model behavior to achieve better downstream performance at inference time; also proposes LoReFT as a drop-in replacement for PEFTs that is 10-50x more parameter efficient. ,
9) - proposes a suite of LLMs (Eurus) optimized for reasoning and achieving SoTA among open-source models on tasks such as mathematics and code generation; Eurus-70B outperforms GPT-3.5 Turbo in reasoning largely due to a newly curated, high-quality alignment dataset designed for complex reasoning tasks; the data includes instructions with preference tree consisting of reasoning chains, multi-turn interactions and pairwise data for preference learning. ,
10) - explores training LLMs with neural text compressors; the proposed compression technique segments text into blocks that each compress to the same bit length; the approach improves at scale and outperforms byte-level baselines on both perplexity and inference speed benchmarks; latency is reduced to the shorter sequence length ,

Top ML Papers of the Week (March 26 - March 31) - 2024

1) - a new 132B parameter open LLM that outperforms all the established open-source models on common benchmarks like MMLU and GSM8K; DBRX was pretrained on 12T tokens (text and code) and uses a mixture-of-experts (MoE) architecture; its inference is up to 2x faster than LLaMA2-70B and is about 40% of the size of Grok-1 in terms of both total and active parameter counts; there is also DBRX Instruct which demonstrates good performance in programming and mathematics; while DBRX is trained as a general-purpose LLM, it still surpasses CodeLLaMa-70 Instruct, a model built explicitly for code generation. ,
2) - xAI’s latest long-context LLM for advanced understanding and reasoning and problem-solving capabilities; Grok-1.5 achieved a 50.6% score on the MATH benchmark and a 90% score on the GSM8K benchmark; this model can process long contexts of up to 128K tokens and demonstrates powerful retrieval capabilities. ,
3) - a generative AI model based on diffusion models that shows powerful capabilities to quantify uncertainty in weather forecasting; it can generate a large ensemble conditioned on as few as one or two forecasts from an operational numerical weather prediction system. ,
4) - finds that the latest LLMs have not surpassed human proficiency in physics coding assignments; also finds that GPT-4 significantly outperforms GPT-3.5 and prompt engineering can further enhance performance. ,
5) - a simple framework to enhance multi-modality vision models; specifically, visual tokens are enhanced through an additional visual encoder for high-resolution refinement without token increase; achieves top performance in several zero-shot benchmarks and even surpasses the developed private models. ,
6) - investigates long-form factuality in open-domain by generating a prompt set of questions including 38 topics; also proposes an LLM-based agent to perform evaluation for the task; finds that LLM agents can achieve superhuman rating performance and is reported to be 20 times cheaper than human annotations. ,
7) - a unified framework for training open-source LLM-based agents; it consists of a modular architecture with a planning module that can learn subgoal generation and a module trained to translate them to action with tool usage. ,
8) - an LLM agent operation system that integrates LLMs into operation systems as a brain; the agent can optimize resource allocation, context switching, enable concurrent execution of agents, tool service, and even maintain access control for agents. ,
9) - a dataset with instruction evaluation benchmark and a separate set for teaching information retrieval model to follow real-world instructions; a FollowIR-7B model has significant improvements (over 13%) after fine-tuning on a training set. ,
10) - an iterative data augmentation strategy that leverages a teacher LLM to enhance a small seed dataset by augmenting additional data that can be used to effectively fine-tune models; it significantly enhances the performance of LLMs in the low-data regime, outperforming both traditional fine-tuning and other data augmentation baselines. ,

Top ML Papers of the Week (March 18 - March 25) - 2024

1) - a mixture-of-experts model with 314B parameters which includes the open release of the base model weights and network architecture; the MoE model activates 25% of the weights for a given token and its pretraining cutoff date is October 2023. ,
2) - an approach for automating foundation model development using evolution to combine open-source models; facilitates cross-domain merging where a Japanese Math LLM achieved state-of-the-art performance on Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not explicitly trained for these tasks. ,
3) - an AI-powered assistant for football tactics developed and evaluated in collaboration with domain experts from Liverpool FC; the systems offer coaches a way to sample and explore alternative player setups for a corner kick routine and select the tactic with the highest predicted likelihood of success; TacticAI’s model suggestions are favored over existing tactics 90% of the time and it offers an effective corner kick retrieval system. ,
4) - provides an overview of tool use in LLMs, including a formal definition of the tool-use paradigm, scenarios where LLMs leverage tool usage, and for which tasks this approach works well; it also provides an analysis of complex tool usage and summarize testbeds and evaluation metrics across LM tooling works. ,
5) - proposes RankPrompt, a prompting method to enable LLMs to self-rank their responses without additional resources; this self-ranking approach ranks candidates through a systematic, step-by-step comparative evaluation; it seems to work well as it leverages the capabilities of LLMs to generate chains of comparisons as demonstrations; RankPrompt significantly enhances the reasoning performance of ChatGPT and GPT-4 on many arithmetic and commonsense reasoning tasks. ,
6) - a family of open-access decompilation LLMs ranging from 1B to 33B parameters; these models are trained on 4 billion tokens of C source code and corresponding assembly code; the authors also introduce Decompile-Eval, a dataset for assessing re-compatibility and re-executability for decompilation and evaluating with a perspective of program semantics; LLM4Decompile demonstrates the capability to decompile 21% of the assembly code, achieving a 50% improvement over GPT-4. ,
7) - designs data and methods to effectively fine-tune language models for agents, referred to as Agent-FLAN; this enables Llama2-7B to outperform prior best works by 3.5% across various agent evaluation datasets; Agent-FLAN greatly alleviates the hallucination issues and consistently improves the agent capability of LLMs when scaling model sizes while generally improving the LLM. ,
8) - shows that it’s possible to learn a large amount of non-public information about an API-protected LLM using the logits; with a relatively small number of API queries, the approach estimates that the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096; the paper also proposes guardrails against the attacks used ,
9) - an open-source, large-scale robot manipulation dataset to train and build more capable and robust robotic manipulation policies; it contains 76K demonstration trajectories, collected across 564 scenes and 86 tasks; training with DROID leads to higher performing policies and generalization. ,
10) - combines the benefits of RAG and fine-tuning to improve a model's ability to answer questions in "open-book" in-domain settings; combining it with RAFT's CoT-style response helps to improve reasoning. ,

Top ML Papers of the Week (March 11 - March 17) - 2024

1) - a generalist AI agent for 3D virtual environments that follows natural-language instructions in a broad range of 3D virtual environments and video games; SIMA is evaluated across 600 basic skills, spanning navigation, object interaction, and menu use. Language seems to be a huge factor in performance. ,
2) - shows that iteratively revising a chain of thoughts with information retrieval can significantly improve LLM reasoning and generation in long-horizon generation tasks; the key idea is that each thought step is revised with relevant retrieved information to the task query, the current and past thought steps; Retrieval Augmented Thoughts (RAT) can be applied to different models like GPT-4 and CodeLlama-7B to improve long-horizon generation tasks (e.g., creative writing and embodied task planning); RAT is a zero-shot prompting approach and provides significant improvements to baselines that include zero-shot CoT prompting, vanilla RAG, and other baselines. ,
3) - presents a generalization of STaR, called Quiet-STaR, to enable language models (LMs) to learn to reason in more general and scalable ways; Quiet-STaR enables LMs to generate rationales at each token to explain future text; it proposes a token-wise parallel sampling algorithm that helps improve LM predictions by efficiently generating internal thoughts; the rationale generation is improved using REINFORCE. ,
4) - an overview of the common issue of knowledge conflict when working with LLMs; the survey paper categorizes these conflicts into context-memory, inter-context, and intra-memory conflict; it also provides insights into causes and potential ways to mitigate these knowledge conflict issues. ,
5) - presents the first model-stealing attack that extracts information from production language models like ChatGPT or PaLM-2; shows that it's possible to recover the embedding projection layer of a transformer-based model through typical API access; as an example, the entire projection matrix was extracted from the OpenAI ada and babbage models for under $20. ,
6) - proposes mixing expert LLMs into a Mixture-of-Experts LLM as a more compute-efficient approach for training LLMs; it's shown to be more efficient than training a larger generalist LLM or several separate specialized LLMs; the approach, BTX, first trains (in parallel) multiple copies of a seed LLM specialized in different domains (i.e., expert LLMs) and merges them into a single LLM using MoE feed-forward layers, followed by fine-tuning of the overall unified model. ,
7) - proposes a benchmark, BrainBench, for evaluating the ability of LLMs to predict neuroscience results; finds that LLMs surpass experts in predicting experimental outcomes; an LLM tuned on neuroscience literature was shown to perform even better. ,
8) - a 35B parameter model, with a context length of 128K, optimized for use cases that include reasoning, summarization, and question answering; Command-R has the capability for multilingual generation evaluated in 10 languages and performant tool use and RAG capabilities; it has been released for research purposes. ,
9) - studies embeddings derived from regularized linear models and derive analytically how cosine-similarity can yield arbitrary and meaningless similarities; also finds that for some linear models, the similarities are not even unique and others are controlled by regularization; the authors caution against blindly using cosine similarity and presents considerations and alternatives. ,
10) - provides a comprehensive overview of methods, analysis, and insights into multimodal LLM pre-training; studies different architecture components and finds that carefully mixing image-caption, interleaved image-text, and text-only data is key for state-of-the-art performance; it also proposes a family of multimodal models up to 30B parameters that achieve SOTA in pre-training metrics and include properties such as enhanced in-context learning, multi-image reasoning, enabling few-shot chain-of-thought prompting. ,

Top ML Papers of the Week (March 4 - March 10) - 2024

1) - consists of a family of three models (Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus); Claude 3 Opus (the strongest model) seems to outperform GPT-4 on common benchmarks like MMLU and HumanEval; Claude 3 capabilities include analysis, forecasting, content creation, code generation, and converting in non-English languages like Spanish, Japanese, and French; 200K context windows supported but can be extended to 1M token to select customers; the models also have strong vision capabilities for processing formats like photos, charts, and graphs; Anthropic claims these models have a more nuanced understanding of requests and make fewer refusals. ,
2) - proposes functional benchmarks for the evaluation of the reasoning capabilities of LLMs; finds that there is a reasoning gap with current models from 58.35% to 80.31%; however, the authors also report that those gaps can be reduced with more sophisticated prompting strategies. ,
3) - proposes a memory-efficient approach for training LLM through low-rank projection; the training strategy allows full-parameter learning and is more memory-efficient than common low-rank adaptation methods such as LoRA; reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures. ,
4) - a new position paper discusses the topic of reasoning and planning for LLMs; here is a summary of the author's conclusion: "To summarize, nothing that I have read, verified, or done gives me any compelling reason to believe that LLMs do reasoning/planning, as normally understood. What they do instead, armed with web-scale training, is a form of universal approximate retrieval, which, as I have argued, can sometimes be mistaken for reasoning capabilities". ,
5) - provides an overview of RAG used in different generation scenarios like code, image, and audio, including a taxonomy of RAG enhancements with reference to key papers. ,
6) - proposes an approach to enhance the planning capabilities of LLMs through explicit action knowledge; uses an action knowledge base and a knowledgeable self-learning phase to guide the model's action generation, mitigate planning hallucination, and enable continuous improvement; outperforms existing baselines and shows the potential of integrating external action knowledge to streamline planning with LLMs and solve complex planning challenges. ,
7) - a comprehensive review of Sora and some of the key developments powering this model, including limitations and opportunities of large vision models. ,
8) - introduces SaulLM-7B, a large language model for the legal domain explicitly designed for legal text comprehension and generation; presents an instructional fine-tuning method that leverages legal datasets to further enhance performance in legal tasks. ,
9) - investigates the use of multimodal LLMs for converting a visual design into code implementation which is key for automating front-end engineering; introduces a benchmark of 484 diverse real-world webpages and a set of evaluation metrics to measure the design-to-code capability; further develops a suite of multimodal prompting methods and show their effectiveness on GPT-4V and Gemini Pro Vision; an open-source fine-tuned Design2Code matches the performance of Gemini Pro Vision, however, GPT-4V performs the best on the task. ,
10) - a transformer-based 3D reconstruction model for fast feed-forward 3D generation; it can produce 3D mesh from a single image in under 0.5 seconds; improvement includes better data processing, model design, and training. ,

Top ML Papers of the Week (February 26 - March 3) - 2024

1) - a foundation model trained from internet videos and with the ability to generate a variety of action-controllable 2D worlds given an image prompt; Genie has 11B parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamic model, and a scalable latent action model; the latent action space enables training agents to imitate behaviors from unseen video which is promising for building more generalist agents. ,
2) - a new LLM with strong multilingual, reasoning, maths, and code generation capabilities; features include: 1) 32K tokens context window, 2) native multilingual capacities, 3) strong abilities in reasoning, knowledge, maths, and coding benchmarks, and 4) function calling and JSON format natively supported. ,
3) - introduces a high-performing and cost-effective 1-bit LLM variant called BitNet b1.58 where every parameter is a ternary {-1, 0, 1}; given the same model size and training tokens, BitNet b1.58 can match the perplexity and task performance of a full precision Transformer LLM (i.e., FP16); the benefits of this 1-bit LLM are significantly better latency, memory, throughout, and energy consumption. ,
4) - a comprehensive overview (180+ pages) and analysis of LLM datasets. ,
5) - explores open-action learning for language agents through an iterative learning strategy that creates and improves actions using Python functions; on each iteration, the proposed framework (LearnAct) expands the action space and enhances action effectiveness by revising and updating available actions based on execution feedback; the LearnAct framework was tested on Robotic planning and AlfWorld environments; it improves agent performance by 32% in AlfWorld compared to ReAct+Reflexion. ,
6) - a new framework for generating expressive video by utilizing a direct audio-to-video synthesis approach; by leveraging an Audio2Video diffusion model it bypasses the need for intermediate 3D models or facial landmarks; EMO can produce convincing speaking videos and singing videos in various styles while outperforming existing methods in terms of expressiveness and realism. ,
7) - a position paper with a focus on open foundation models and their impact, benefits, and risks; proposes a risk assessment framework for analyzing risk and explains why the marginal risk of open foundation models is low in some cases; it also offers a more grounded assessment of the societal impact of open foundation models. ,
8) - a family of open LLMs for code with three different sizes (3B, 7B, and 15B); the 15B model was trained on 14 trillion tokens and 600+ programming languages with a context window of 16K token and employing a fill-in-the-middle objective; it matches 33B+ models on many evaluation like code completion, code reasoning, and math reasoning aided through PAL. ,
9) - an overview of LLMs for tabular data tasks including key techniques, metrics, datasets, models, and optimization approaches; it covers limitations and unexplored ideas with insights for future research directions. ,
10) - shows how to leverage LLMs and combine multiple approaches like retrieval augmentation, fine-tuning, tool usage, and more; the proposed framework is applied to urban and spatial planning but there are a lot of insights and practical tips that apply to other domains. ,

Top ML Papers of the Week (February 19 - February 25) - 2024

1) - a suite of image generation models ranging from 800M to 8B parameters; combines diffusion transformer architecture and flow matching for improved performance in multi-subject prompts, image quality, and spelling abilities; technical report to be published soon and linked here. ,
2) - a series of open models inspired by the same research and tech used for Gemini; includes 2B (trained on 2T tokens) and 7B (trained on 6T tokens) models including base and instruction-tuned versions; trained on a context length of 8192 tokens; generally outperforms Llama 2 7B and Mistral 7B. ,
3) - an overview and a good list of references that apply LLMs for data annotation; includes a taxonomy of methods that employ LLMs for data annotation; covers three aspects: LLM-based data annotation, assessing LLM-generated annotations, and learning with LLM-generated annotations. ,
4) - presents generative representational instruction tuning where an LLM is trained to perform both generative and embedding tasks and designed to distinguish between them via the instructions; produces new state-of-the-art on MTEB and the unification is reported to speed up RAG by 60% for long documents. ,
5) - proposes LoRA+ which improves performance and finetuning speed (up to ∼ 2X speed up), at the same computational cost as LoRA; the key difference between LoRA and LoRA+ is how the learning rate is set; LoRA+ sets different learning rates for LoRA adapter matrices while in LoRA the learning rate is the same. ,
6) - shows that many components of PPO are unnecessary in an RLHF context; it also shows that a simpler REINFORCE variant outperforms both PPO and newly proposed alternatives such as DPO and RAFT; overall, it shows that online RL optimization can be beneficial and low cost. ,
7) - explores the capability of transformer-based models in extremely long context processing; finds that both GPT-4 and RAG performance heavily rely on the first 25% of the input, which means there is room for improved context processing mechanisms; reports that recurrent memory augmentation of transformer models achieves superior performance on documents of up to 10 million tokens. ,
8) - investigates how LLM solves multi-step problems through a framework consisting of a generator, discriminator, and planning method (e.g., iterative correction and tree search); reports that planning methods demand discriminators with at least 90% accuracy but current LLMs don’t demonstrate these discrimination capabilities; finds that tree search is at least 10 to 20 times slower but regardless of it good performance it’s impractical for real-world applications. ,
9) - proposes a chain-of-thought (CoT) decoding method to elicit the reasoning capabilities from pre-trained LLMs without explicit prompting; claims to significantly enhance a model’s reasoning capabilities over greedy decoding across reasoning benchmarks; finds that the model's confidence in its final answer increases when CoT is present in its decoding path. ,
10) - a family of open-source systems for generating, executing, and iteratively refining code; proposes a dataset of 68K multi-turn interactions; integrates execution and human feedback for dynamic code refinement and produces high performance on benchmarks like HumalEval and EvalPlus. ,

Top ML Papers of the Week (February 12 - February 18) - 2024

1) - a text-to-video AI model that can create videos of up to a minute of realistic and imaginative scenes given text instructions; it can generate complex scenes with multiple characters, different motion types, and backgrounds, and understand how they relate to each other; other capabilities include creating multiple shots within a single video with persistence across characters and visual style. ,
2) - a compute-efficient multimodal mixture-of-experts model that focuses on capabilities such as recalling and reasoning over long-form content; it can reason over long documents potentially containing millions of tokens, including hours of video and audio; improves the state-of-the-art performance in long-document QA, long-video QA, and long-context ASR. Gemini 1.5 Pro matches or outperforms Gemini 1.0 Ultra across standard benchmarks and achieves near-perfect retrieval (>99%) up to at least 10 million tokens, a significant advancement compared to other long-context LLMs. ,
3) - a collection of vision models trained on a feature prediction objective using 2 million videos; relies on self-supervised learning and doesn’t use pretrained image encoders, text, negative examples, reconstruction, or other supervision sources; claims to achieve versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model’s parameters. ,
4) - a general-purpose 1M context multimodal model trained on long videos and books using RingAttention; sets new benchmarks in difficult retrieval tasks and long video understanding; uses masked sequence packing for mixing different sequence lengths, loss weighting, and model-generated QA dataset for long sequence chat; open-sources a family of 7B parameter models that can process long text and videos of over 1M tokens. ,
5) - finds that the boundary between trainable and untrainable neural network hyperparameter configurations is fractal; observes fractal hyperparameter landscapes for every neural network configuration and deep linear networks; also observes that the best-performing hyperparameters are at the end of stability. ,
6) - a framework to build generalist computer agents that interface with key elements of an operating system like Linux or MacOS; it also proposes a self-improving embodied agent for automating general computer tasks; this agent outperforms the previous methods by 35% on the general AI assistants (GAIA) benchmark. ,
7) - uses LLMs to automatically improve existing human-written tests; reports that after an evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases were built correctly, 57% passed reliably, and 25% increased coverage. ,
8) - a dedicated LLM trained for chemistry-related tasks; claims to outperform GPT-3.5 on principal tasks such as name conversion, molecular caption, and reaction prediction; it also surpasses GPT-4 on two of these tasks. ,
9) - reviews three popular families of LLMs (GPT, Llama, PaLM), their characteristics, contributions, and limitations; includes a summary of capabilities and techniques developed to build and augment LLM; it also discusses popular datasets for LLM training, fine-tuning, and evaluation, and LLM evaluation metrics; concludes with open challenges and future research directions. ,
10) - shows that LLM agents can automatically hack websites and perform tasks like SQL injections without human feedback or explicit knowledge about the vulnerability beforehand; this is enabled by an LLM’s tool usage and long context capabilities; shows that GPT-4 is capable of such hacks, including finding vulnerabilities in websites in the wild; open-source models did not show the same capabilities. ,

Top ML Papers of the Week (February 5 - February 11) - 2024

1) - trains a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games with up to 15 billion data points; reaches a Lichess blitz Elo of 2895 against humans, and solves a series of challenging chess puzzles; it shows the potential of training at scale for chess and without the need for any domain-specific tweaks or explicit search algorithms. ,
2) - an LLM-based agent that can utilize 16K APIs from Rapid API; proposes a simple framework consisting of 1) a hierarchical API-retriever to identify relevant API candidates to a query, 2) a solver to resolve user queries, and 3) a self-reflection mechanism to reactivate AnyTool if the initial solution is impracticable; this tool leverages the function calling capability of GPT-4 so no further training is needed; the hierarchical API-retriever is inspired by a divide-and-conquer approach to help reduce the search scope of the agents which leads to overcoming limitations around context length in LLMs; the self-reflection component helps with resolving easy and complex queries efficiently. ,
3) - investigates and expands the theoretical understanding of learning with attention layers by exploring the interplay between positional and semantic attention; it employs a toy model of dot-product attention and identifies an emergent phase transition between semantic and positional learning; shows that if provided with sufficient data, dot-product attention layer outperforms a linear positional baseline when using the semantic mechanism. ,
4) - proposes an indirect reasoning method to strengthen the reasoning power of LLMs; it employs the logic of contrapositives and contradictions to tackle IR tasks such as factual reasoning and mathematic proof; it consists of two key steps: 1) enhance the comprehensibility of LLMs by augmenting data and rules (i.e., the logical equivalence of contrapositive), and 2) design prompt templates to stimulate LLMs to implement indirect reasoning based on proof by contradiction; experiments on LLMs like GPT-3.5-turbo and Gemini Pro show that the proposed method enhances the overall accuracy of factual reasoning by 27.33% and mathematic proof by 31.43% compared to traditional direct reasoning methods. ,
5) - a low-cost system for bimanual teleoperation that improves the performance, user-friendliness, and durability of ALOHA; efforts include hardware improvements such as grippers and gravity compensation with a higher quality simulation model; this potentially enables large-scale data collection on more complex tasks to help advanced research in robot learning. ,
6) - presents a study on the scaling property of raw agents instantiated by LLMs; finds that performance scales when increasing agents by simply using a sampling-and-voting method. ,
7) - proposes a new framework, Self-Discover, that enables LLMs to select from multiple reasoning techniques (e.g., critical thinking and thinking step-by-step) to compose task-specific reasoning strategies; outperforms CoT (applied to GPT-4 and PaLM 2) on BigBench-Hard experiments and requires 10-40x fewer inference compute than other inference-intensive methods such as CoT-Self-Consistency; the self-discovered reasoning structures are also reported to transfer well between LLMs and small language models (SLMs). ,
8) - continues pretraining a code base model with 120B math-related tokens; introduces GRPO (a variant to PPO) to enhance mathematical reasoning and reduce training resources via a memory usage optimization scheme; DeepSeekMath 7B achieves 51.7% on MATH which approaches the performance level of Gemini-Ultra (53.2%) and GPT-4 (52.9%); when self-consistency is used the performance improves to 60.9%. ,
9) - provides an overview of LLMs for table processing, including methods, benchmarks, prompting techniques, and much more. ,
10) - discusses the essential aspects of LLM-based multi-agent systems; it includes a summary of recent applications for problem-solving and word simulation; it also discusses datasets, benchmarks, challenges, and future opportunities to encourage further research and development from researchers and practitioners. ,

Top ML Papers of the Week (January 29 - February 4) - 2024

1) - introduces Open Language Model (OLMo), a 7B parameter model; it includes open training code, open data, full model weights, evaluation code, and fine-tuning code; it shows strong performance on many generative tasks; there is also a smaller version of it, OLMo 1B. ,
2) - a comprehensive survey outlining design formulations for model architecture and training pipeline around multimodal large language models. ,
3) - proposes Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation in a RAG system; the core idea is to implement a self-correct component for the retriever and improve the utilization of retrieved documents for augmenting generation; the retrieval evaluator helps to assess the overall quality of retrieved documents given a query; using web search and optimized knowledge utilization operations can improve automatic self-correction and efficient utilization of retrieved documents. ,
4) - introduces an overview of research developments in LLMs for mathematical reasoning; discusses advancements, capabilities, limitations, and applications to inspire ongoing research on LLMs for Mathematics. ,
5) - covers compression algorithms like pruning, quantization, knowledge distillation, low-rank approximation, parameter sharing, and efficient architecture design. ,
6) - employs Mixture of Experts tuning for Large Vision-Language Models which constructs a sparse model with a substantial reduction in parameters with a constant computational cost; this approach also helps to address performance degradation associated with multi-modal learning and model sparsity. ,
7) - uses an off-the-shelf instruction-tuned model prompted to paraphrase web documents in specific styles and formats such as “like Wikipedia” or “question-answer format” to jointly pre-train LLMs on real and synthetic rephrases; it speeds up pre-training by ~3x, improves perplexity, and improves zero-shot question answering accuracy on many tasks. ,
8) - a study that focuses on the components needed to improve the retrieval component of a RAG system; confirms that the position of relevant information should be placed near the query, the model will struggle to attend to the information if this is not the case; surprisingly, it finds that related documents don't necessarily lead to improved performance for the RAG system; even more unexpectedly, irrelevant and noisy documents can help drive up accuracy if placed correctly. ,
9) - discusses hallucination issues and techniques to mitigate hallucination in Large Vision-Language Models (LVLM); it introduces LVLM hallucination evaluation methods and benchmarks; provides tips and a good analysis of the causes of LVLM hallucinations and potential ways to mitigate them. ,
10) - a new LLM compression technique that proposes a post-training sparsification scheme that replaces each weight matrix with a smaller dense matrix; helps reduce the embedding dimension of the network and can remove up to 20% of model parameters for Llama2-70B and Phi-2 models while retaining most of the zero-shot performance of the dense models. ,

Top ML Papers of the Week (January 22 - January 28) - 2024

1) - a robust monocular depth estimation solution that can deal with any images under any circumstance; automatically annotates large-scale unlabeled data (~62M) which helps to reduce generalization error; proposes effective strategies to leverage the power of the large-scale unlabeled data; besides generalization ability, it established new state-of-the-art through fine-tuning and even results in an enhanced depth-conditioned ControlNet. ,
2) - proposes FuseLLM with the core idea of externalizing knowledge from multiple LLMs and transferring their capabilities to a target LLM; leverages the generative distributions of source LLMs to externalize both their collective knowledge and individual strengths and transfer them to the target LLM through continual training; finds that the FuseLLM can improve the performance of the target model across a range of capabilities such as reasoning, common sense, and code generation. ,
3) - adapts Mamba SSM to learn directly from raw bytes; bytes lead to longer sequences which autoregressive Transformers will scale poorly on; this work reports huge benefits related to faster inference and even outperforms subword Transformers. ,
4) - a diffusion-based image-conditioned inpainting model to balance fast inference with high-fidelity while enabling accurate semantic manipulations in a given scene content; outperforms existing zero-shot diffusion inpainting methods and even few-shot diffusion personalization algorithms such as DreamPaint. ,
5) - introduces weighted averaged rewards models (WARM) that involve fine-tuning multiple rewards models and then averaging them in the weight space; average weighting improves efficiency compared to traditional prediction ensembling; it improves the quality and alignment of LLM predictions. ,
6) - a survey of resource-efficient LLMs and multimodal foundations models; provides a comprehensive analysis and insights into ML efficiency research, including architectures, algorithms, and practical system designs and implementations. ,
7) - first presents a red teaming dataset of 10 subtasks (e.g., image misleading, multi-modal jailbreaking, face fairness, etc); finds that 10 prominent open-sourced VLMs struggle with the red teaming in different degrees and have up to 31% performance gap with GPT-4V; also applies red teaming alignment to LLaVA-v1.5 with SFT using the proposed red teaming dataset, which improves model performance by 10% in the test set. ,
8) - a text-to-video space-time diffusion model for synthesizing videos with realistic and coherent motion; introduces a Space-Time U-Net architecture to generate the entire temporal duration of a video at once via a single pass; achieves state-of-the-art text-to-video generation results and supports a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation. ,
9) - a simple framework for LLM inference acceleration using multiple decoding heads that predict multiple subsequent tokens in parallel; parallelization substantially reduces the number of decoding steps; it can achieve over 2.2x speedup without compromising generation quality, while Medusa-2 further improves the speedup to 2.3-3.6x. ,
10) - a comprehensive benchmark with an open-source evaluation framework to perform analytical evaluation of LLM agents; helps to assess the capabilities and limitations of LLM agents and demystifies agent behaviors which leads to building stronger and robust LLM agents. ,

Top ML Papers of the Week (January 15 - January 21) - 2024

1) - an AI system that acts as a theorem prover that can solve Olympiad geometry problems without human demonstrations; this system is trained on synthetic data involving millions of theorems and proofs across different levels of complexity; the data is used to train a neural language model that can solve olympiad-level problems and approaches the performance of an average International Mathematical Olympiad (IMO) gold medallist. ,
2) - a code-oriented iterative flow that improves LLMs on code generation; it involves two key steps to improve code generation capabilities in LLMs: i) additional generated data (problem self-reflection and test reasoning) to aid the iterative process, and ii) enriching public tests using additional AI-generated tests; using the CodeContests validation dataset, GPT-4 pass@5 accuracy increased from 19% using a single well-crafted prompt to 44% using the AlphaCodium flow; it even outperforms AlphaCode using a significantly smaller computation budget and 4 orders of magnitude fewer LLM calls. ,
3) - report discussing the tradeoff between RAG and fine-tuning when using LLMs like Llama 2 and GPT-4; performs a detailed analysis and highlights insights when applying the pipelines on an agricultural dataset; observes that there is an accuracy increase of over 6 p.p. when fine-tuning the model and this is cumulative with RAG, which increases accuracy by 5 p.p. further. ,
4) - proposes a self-alignment method that uses the model itself for LLM-as-a-Judge prompting to provide its rewards during training; Iterative DPO is used for instruction following training using the preference pairs built from the generated data which comes from a self-instruction creation phase; using this approach, fine-tuning a Llama 2 70B model on three iterations can lead to a model that outperforms LLMs like Claude 2 and Gemini Pro on the AlpacaEval 2.0 leaderboard. ,
5) - introduces proxy-tuning, a decoding-time algorithm that modifies logits of a target LLM with the logits’ difference between a small base model and a fine-tuned base model; this can enable a larger target base model to perform as well as would a fine-tuned version of it; proxy-tuning is applied to Llama2-70B using proxies of only 7B size to close 88% of the gap between Llama2-70B and its tuned chat version. ,
6) - proposes an approach, ReFT, to enhance the generalizability of LLMs for reasoning; it starts with applying SFT and then applies online RL for further refinement while automatically sampling reasoning paths to learn from; this differs from RLHF in that it doesn’t utilize a reward model learned from human-labeled data; ReFT demonstrates improved performance and generalization abilities on math problem-solving. ,
7) - thoroughly surveys the methodologies and explores their strengths and limitations; provides a taxonomy of different approaches involving prompt engineering or calibrating open-source LLMs for evaluation ,
8) - proposes a framework that leverages a model itself to explain its internal representations; it decodes information from LLM hidden representations which is possible by “patching” representations into a separate inference pass that encourages the extraction of that information; it can be used to answer questions about an LLM’s computation and can even be used to fix latent multi-hop reasoning errors. ,
9) - suggests that language models often generalize well from easy to hard data, i.e., easy-to-hard generalization; it argues that it can be better to train on easy data as opposed to hard data, even when the emphasis is on improving performance on hard data, and suggests that the scalable oversight problem may be easier than previously thought. ,
10) - an approach to efficiently scale LLMs by combining state space models (SSMs) with Mixture of Experts (MoE); MoE-Mamba, outperforms both Mamba and Transformer-MoE; it reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer. ,

Top ML Papers of the Week (January 8 - January 14) - 2024

1) - a method for text-driven generative object insertion in the Neural 3D scenes; it enables users to provide textual descriptions and a 2D bounding box in a reference viewpoint to generate new objects in 3D scenes; InseRF is also capable of controllable and 3D-consistent object insertion without requiring explicit 3D information as input. ,
2) - shows that LLMs can learn deceptive behavior that persists through safety training; for instance, an LLM was trained to write secure code for a specified year but given another year can enable exploitable code; this backdoor behavior can persist even when training LLMs with techniques like reinforcement learning and adversarial training. ,
3) - shows that effectively combining existing small models of different sizes (6B/13B parameters) can result in systems that can compete with ChatGPT level performance; the goal is to build a collaborative conversational system that can effectively leverage these models to improve engagement and quality of chat AIs and generate more diverse responses. ,
4) - proposes an end-to-end video generation pipeline that integrates the text-to-image model, video motion generator, reference image embedding module, and frame interpolation module; it can generate high-resolution video with advanced fidelity and smoothness compared to other leading and popular text-to-video systems. ,
5) - a comprehensive study (100+ pages) of trustworthiness in LLMs, discussing challenges, benchmarks, evaluation, analysis of approaches, and future directions; proposes a set of principles for trustworthy LLMs that span 8 dimensions, including a benchmark across 6 dimensions (truthfulness, safety, fairness, robustness, privacy, and machine ethics); it also presents a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets; while proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, there are a few open-source models that are closing the gap. ,
6) - a new framework, inspired by Chain-of-Thought prompting, to instruct LLMs to dynamically plan a chain of operations that transforms a complex table to reliably answer the input question; an LLM is used to iteratively generate operations, step-by-step, that will perform necessary transformations to the table (e.g., adding columns or deleting info). ,
7) - proposes 40 persuasion techniques to systematically jailbreak LLMs; their adversarial prompts (also referred to as persuasive adversarial prompts) achieve a 92% attack success rate on aligned LLMs, like Llama 2-7B and GPT-4, without specialized optimization. ,
8) - proposes RAISE, an advanced architecture to enhance LLMs for conversational agents; it's inspired by the ReAct framework and integrates a dual-component memory system; it utilizes a scratchpad and retrieved examples to augment the agent's capabilities; the scratchpad serves as transient storage (akin to short-term memory) and the retrieval module operates as the agent's long-term memory; this system mirrors human short-term and long-term memory and helps to maintain context and continuity which are key in conversational systems. ,
9) - finds that widely used open-source LLMs are extremely sensitive to prompt formatting in few-shot settings; subtle changes in prompt formatting using a Llama 2 13B model can result in a performance difference of up to 76 accuracy points. ,
10) - a comprehensive survey that covers the current state of adversarial ML with a proper taxonomy of concepts, discussions, adversarial methods, mitigation tactics, and remaining challenges. ,

Top ML Papers of the Week (January 1 - January 7) - 2024

1) - proposes a system that learns bimanual mobile manipulation with low-cost whole-body teleoperation; it first collects high-quality demonstrations and then performs supervised behavior cloning; finds that co-training with existing ALOHA datasets increases performance on complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots while keeping the budget under $32K ,
2) - summarizes 32 techniques to mitigate hallucination in LLMs; introduces a taxonomy categorizing methods like RAG, Knowledge Retrieval, CoVe, and more; provides tips on how to apply these methods and highlights the challenges and limitations inherent in them. ,
3) - shows that without acquiring additional human-annotated data, a supervised fine-tuned LLM can be improved; inspired by self-play, it first uses the LLM to generate its training data from its previous iterations; it then refines its policy by distinguishing the self-generated responses from those obtained from human-annotated data; shows that the method can improve LLM’s performance and outperform models trained via DPO with GPT-4 preference data. ,
4) - proposes a post-pretraining method to improve an LLM’s knowledge without catastrophic forgetting; it achieves this by tuning expanded identity blocks using only new corpus while freezing the inherited blocks; uses math and code data to train a LLaMA Pro-8.3B initialized from Llama2-7B; these models achieve advanced performance on various benchmarks compared to base models while preserving the original general capabilities. ,
5) - explore composing existing foundation models with specific models to expand capabilities; introduce cross-attention between models to compose representations that enable new capabilities; as an example, a PaLM2-S model was augmented with a smaller model trained on low-resource languages to improve English translation and arithmetic reasoning for low-resource languages; this was also done with a code-specific model which led to a 40% improvement over the base code model on code generation and explanation tasks. ,
6) - achieves efficient inference of Mixtral-8x7B models through offloading; it applies separate quantization for attention layers and experts to fit the model in combined GPU and CPU memory; designs a MoE-specific offloading strategy that enables running Mixtral-8x7B on desktop hardware and free-tier Google Colab instances ,
7) - explores the potential of GPT-4V as a generalist web agent; in particular, can such a model follow natural language instructions to complete tasks on a website? the authors first developed a tool to enable web agents to run on live websites; findings suggest that GPT-4V can complete 50% of tasks on live websites, possible through manual grounding of its textual plans into actions on the websites. ,
8) - a lightweight extension to traditional LLMs for reasoning over visual documents; focuses on using bounding box information to incorporate spatial layout structure; proposes a pre-training objective that addresses irregular layout and heterogeneous content present in visual documents; it’s then fine-tuned on an instruction-dataset and demonstrate SoTA performance on 14 out of 16 datasets across several document intelligence tasks. ,
9) - a comprehensive overview of the benefits of training LLMs with code-specific data. Some capabilities include enhanced code generation, enabling reasoning, function calling, automated self-improvements, and serving intelligent agents. ,
10) - proposes an image generation model that tackles heterogeneous image generation tasks and generalizes across unseen tasks; it first enhances the model’s ability to ground its generation on external multimodal context and then fine-tunes on image generation tasks with multimodal instructions ,

Top ML Papers of the Week (December 25 - December 31)

1) - presents an 18 billion parameter visual language model specializing in GUI understanding and navigation; supports high-resolution inputs (1120x1120) and shows abilities in tasks such as visual Q&A, visual grounding, and GUI Agent; achieves state of the art on 5 text-rich and 4 general VQA benchmarks. ,
2) - surveys 300+ papers and summarizes research developments to look at in the space of Generative AI; it covers computational challenges, scalability, real-world implications, and the potential for Gen AI to drive progress in fields like healthcare, finance, and education. ,
3) - a unified library that supports comprehensive evaluation and analysis of LLMs; it consists of functionalities for prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. ,
4) - performs red-teaming on three functionalities exposed in the GPT-4 APIs: fine-tuning, function calling, and knowledge retrieval; Main findings: 1) fine-tuning on as few as 15 harmful examples or 100 benign examples can remove core safeguards from GPT-4, 2) GPT-4 Assistants divulge the function call schema and can be made to execute arbitrary function calls, and 3) knowledge retrieval can be hijacked by injecting instructions into retrieval documents. ,
5) - investigates how MLP layers implement a lookup table for factual recall; scopes the study on how early MLPs in Pythia 2.8B look up which of 3 different sports various athletes play; suggests that early MLP layers act as a lookup table and recommends thinking about the recall of factual knowledge in the model as multi-token embeddings. ,
6) - presents a diverse and high-quality math-centric corpus comprising of ~9.5 billion tokens to train foundation models. ,
7) - introduces 26 guiding principles designed to streamline the process of querying and prompting large language models; applies these principles to conduct extensive experiments on LLaMA-1/2 (7B, 13B and 70B), GPT-3.5/4 to verify their effectiveness on instructions and prompts design. ,
8) - provides a comprehensive survey of seminal foundational models for reasoning, highlighting the latest advancements in various reasoning tasks, methods, benchmarks, and potential future directions; also discusses how other developments like multimodal learning, autonomous agents, and super alignment accelerate and extend reasoning research. ,
9) - proposes LLaRA which adapts an LLM for dense retrieval; it consists of two pretext tasks: EBAE (Embedding-Based Auto-Encoding) and EBAR (Embedding-Based Auto-Regression), where the text embeddings from LLM are used to reconstruct the tokens for the input sentence and predict the tokens for the next sentence, respectively; a LLaMa-2-7B was improved on benchmarks like MSMARCO and BEIR.
10) - provides a comprehensive preliminary comparison and combination of vision-language models like Gemini and GPT-4V through several qualitative cases; finds that GPT-4V is precise and succinct in responses, while Gemini excels in providing detailed, expansive answers accompanied by relevant imagery and links. ,
1) - provides an impartial and reproducible study comparing several popular models like Gemini, GPT, and Mixtral; Gemini Pro achieves comparable but slightly lower accuracy than the current version of GPT 3.5 Turbo; Gemini and GPT were better than Mixtral. ,
2) - a high-speed inference engine for deploying LLMs locally; exploits the high locality in LLM inference to design a GPU-CPU hybrid inference engine; hot-activated neurons are preloaded onto the GPU for fast access, while cold-activated neurons (the majority) are computed on the CPU; this approach significantly reduces GPU memory demands and CPU-GPU data transfer. ,
3) - discovered a new structural class of antibiotics with explainable graph algorithms; the approach enables explainable deep learning guided discovery of structural classes of antibiotics which helps to provide chemical substructures that underlie antibiotic activity. ,
4) - introduces a large language model for zero-shot video generation; it’s capable of a variety of video generation tasks such as image-to-video and video stylization; trains an autoregressive model to learn across video, image, audio, and text modalities by using multiple tokenizers; shows that language models can synthesize and edit video with some degree of temporal consistency. , _
5) - introduces an LLM-based multimodal agent framework to operate smartphone applications; learns to navigate new apps through autonomous exploration or observing human demonstrations; shows proficiency in handling diverse tasks across different applications like email, social media, shopping, editing tools, and more. , _
6) - proposes an approach that efficiently runs LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM; enables running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. , _
7) - proposes a ReAct-style agent with self-critique for improving on the task of long-form question answering; it shows that the agent can be improved through ReST-style (reinforced self-training) iterative fine-tuning on its reasoning traces; specifically, it uses growing-batch RL with AI feedback for continuous self-improvement and self-distillation; like a few other recent papers, it focuses on minimizing human involvement (i.e., doesn't rely on human-labeled training data); it generates synthetic data with self-improvement from AI feedback which can then be used to distill the agent into smaller models (1/2 orders magnitude) with comparable performance as the pre-trained agent. , _
8) - uses a simple random search algorithm to implement adversarial attacks on GPT-4; it achieves jailbreaking by appending an adversarial suffix to an original request, then iteratively making slight random changes to the suffix, and keeping changes if it increases the log probability of the token “Sure” at the first position of the response. , _
9) - an overview of all the retrieval augmented generation (RAG) research that has been happening. , _
10) - presents results for a new challenge that involves sample-efficient pretraining on a developmentally plausible corpus; the winning submission, which uses flashy LTG BERT, beat Llama 2 70B on 3/4 evals; other approaches that saw good results included data preprocessing or training on shorter context. , _
1) - uses LLMs to search for new solutions in mathematics & computer science; proposes FunSearch which combines a pre-trained LLM with a systematic evaluator and iterates over them to evolve low-scoring programs into high-scoring ones discovering new knowledge; one of the key findings in this work is that safeguarding against LLM hallucinations is important to produce mathematical discoveries and other real-world problems. ,
2) - studies whether weak model supervision can elicit the full capabilities of stronger models; finds that when naively fine-tuning strong pretrained models on weak model generated labels they can perform better than their weak supervisors; reports that finetuning GPT-4 with a GPT-2-level supervisor it’s possible to recover close to GPT-3.5-level performance on NLP tasks. ,
3) - a unified model based on flow-matching capable of generating various audio modalities; designs description-based and example-based prompting to enhance controllability and unify speech and sound generation paradigms; adapts a self-supervised infilling objective to pre-train on large quantities of unlabeled audio; performs well on speech and sound generation and unlocks new methods for generating audio with novel vocal and acoustic styles. ,
4) - a survey on the progress of LLMs on mathematical tasks; covers papers and resources on LLM research around prompting techniques and tasks such as math word problem-solving and theorem proving. ,
5) - proposes LLM360 to support open and collaborative AI research by making the end-to-end LLM training process transparent and reproducible; releases 7B parameter LLMs pre-trained from scratch, AMBER and CRYSTALCODER, including their training code, data, intermediate checkpoints, and analyses. ,
6) - a comprehensive survey (analyzing 300+ papers) on LLMs in medicine; includes an overview of the principles, applications, and challenges faced by LLMs in medicine. ,
7) - proposes an approach for self-training with feedback that can substantially reduce dependence on human-generated data; the model-generated data combined with a reward function improves the performance of LLMs on problem-solving tasks. ,
8) - a neural RGBD SLAM method capable of photorealistically reconstructing real-world scenes without compromising speed and efficiency; extends classical 3D Gaussians for scene representation to overcome the limitations of the previous methods. ,
9) - introduces a new production-ready RL agent software package that enables researchers and practitioners to develop RL AI agents that adapt to environments with limited observability, sparse feedback, and high stochasticity. ,
10) - compresses trained model weights into a lower precision format to reduce memory requirements; the approach combines lattice codebooks with incoherence processing to create 2 bit quantized models; significantly closes the gap between 2 bit quantized LLMs and unquantized 16 bit models. ,
1) - a series of multimodal models with multimodal reasoning capabilities across text, images, video, audio, and code; claims to outperform human experts on MMLU, a popular benchmark to test the knowledge and problem-solving abilities of AI models; capabilities reported include multimodality, multilinguality, factuality, summarization, math/science, long-context, reasoning, and more. ,
2) - a lightweight Segment Anything Model (SAM) that exhibits decent performance with largely reduced complexity; leverages masked autoencoders with 20x fewer parameters and 20x faster runtime; EfficientSAM performs within 2 points (44.4 AP vs 46.5 AP) of the original SAM model. ,
3) - a series of fully open-source LLMs for code that close the gap with top code models while having no more than 7B parameters; trained on 75K synthetic instruction data; uses open-source references for the production of more diverse, realistic, high-quality, and controllable data; outperforms state-of-the-art code models with similar or even larger sizes on several coding benchmarks, including Python text-to-code generation, multilingual coding, and data-science program completion; MagicoderS-CL-7B based on CodeLlama surpasses ChatGPT on HumanEval+ (66.5 vs. 65.9 in pass@1). ,
4) - a comprehensive overview that summarizes different scenarios where LLMs are used on graphs such as pure graphs, text-rich graphs, and text-paired graphs ,
5) - an LLM-based safeguard model that involves a small (Llama2-7B) customizable instruction-tuned model that can classify safety risks in prompts and responses for conversational AI agent use cases; the model can be leveraged in a zero-shot or few-shot way if you need to adapt it to a different safety risk taxonomy that meets the requirements for a target use case; it can also be fine-tune on a specific dataset to adapt to a new taxonomy. ,
6) - proposes an approach called Kahneman-Tversky Optimization (KTO) that matches or exceeds DPO performance methods at scales from 1B to 30B; KTO maximizes the utility of LLM generations instead of maximizing the log-likelihood of preferences as most current methods do. ,
7) - a simple extension of the chain-of-thought approach that improves LM code-driven reasoning; it encourages LMs to format semantic sub-tasks in a program as pseudocode that the interpreter can explicitly catch undefined behavior and hand off to simulate with an LLM; on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought. ,
8) - an overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs; it covers different aspects of data management strategy design: data quantity, data quality, domain/task composition, and more. ,
9) * - an open-source LLM for listwise zero-shot reranking that bridges the effectiveness gap with GPT-4 and in some cases surpasses the proprietary model; it outperforms GPT-4 on the NovelEval test set, comprising queries and passages past its training period, which addresses concerns about data contamination. ,
10) - a comprehensive review of algorithmic advancements aimed at improving LLM efficiency; covers various topics related to efficiency, including scaling laws, data utilization, architectural innovations, training and tuning strategies, and inference techniques. ,
1) - a new AI system for material design that finds 2.2 million new crystals, including 380,000 stable materials; presents a new deep learning tool that increases the speed and efficiency of discovery by predicting the stability of new materials. ,
2) - provides an exhaustive overview of tasks where open-source LLMs claim to be on par or better than ChatGPT. ,
3) - a novel training approach that efficiently samples large-scale foundation image diffusion models in just 1-4 steps while maintaining high image quality; combines score distillation and an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps; reaches performance of state-of-the-art diffusion models in only four steps. ,
4) - a family of research models that enable end-to-end expressive cross-lingual communication in a streaming fashion; introduces an improved SeamlssM4T model trained on more low-resource language data; also applies red-teaming effort for safer multimodal machine translation. ,
5) - a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain; builds on Llama-2 and extends pretraining on a curated medical corpus; MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. ,
6) - performs a systematic exploration of prompt engineering to boost the performance of LLMs on medical question answering; uses prompt engineering methods that are general purpose and make no use of domain expertise; prompt engineering led to enhancing GPT-4’s performance and achieves state-of-the-art results on nine benchmark datasets in the MultiMedQA suite. ,
7) - a unified instruction-guided multimodal retriever that handles eight retrieval tasks across modalities; can generalize to unseen retrieval tasks and achieves robust performance across existing datasets and zero-shot generalization to new tasks; presents a multimodal retrieval benchmark to help standardize the evaluation of multimodal information retrieval. ,
8) - argues that to protect people’s privacy, medical professionals, not commercial interests, must drive the development and deployment of such models. ,
9) - introduces Dobb-E, an affordable and versatile general-purpose system for learning robotic manipulation within household settings; Dobbe-E can learn new tasks with only 5 minutes of user demonstrations; experiments reveal unique challenges absent or ignored in lab robotics, including effects of strong shadows, variable demonstration quality by non-expert users, among others. ,
10) - proposes an unsupervised approach to speech-to-speech translation that can learn from monolingual data alone; combines masked autoencoder, unsupervised embedding mapping, and back-translation; results show that the model outperforms a baseline cascade system and showcases its capability to retain para-/non-linguistic such as pauses, speaking rates, and speaker identity. ,
1) - leverages the reasoning and instruction following capabilities of LLMs to decide what to attend to; it regenerates input context to only include relevant portions before attending to the regenerated context to elicit the final response from the model; increases factuality and outperforms standard attention-based LLMs on tasks such as QA and math world problems. ,
2) - an overview of the methodologies for enhancing Transformer architecture modules that optimize long-context capabilities across all stages from pre-training to inference. ,
3) - approach to reduce inference time of LLMs based on a variant of speculative sampling and parallel decoding; achieves significant speed-ups (up to 30%) by only learning as little as O(d_emb) additional parameters. ,
4) - a multimodal model for learning across audio, video, and text which decouples the multimodal modeling into separate, focused autoregressive models; the inputs are processed according to the modalities; this approach can handle longer videos compared to other models and it outperforms state-of-the-art approach on video QA, long video QA, and audio-video-text benchmark. ,
5) - proposes an approach to teach smaller language models to reason; specifically, the LM is thought to use reasoning techniques, such as step-by-step processing, recall-then-generate, recall-reason-generate, extract-generate, and direct-answer methods; outperforms models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. ,
6) - proposes a graduate-level Google-proof QA benchmark consisting of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry; the strongest GPT-4 based baseline achieves 39% accuracy; this benchmark offers scalable oversight experiments that can help obtain reliable and truthful information from modern AI systems that surpass human capabilities. ,
7) - summary of CoT reasoning, foundational mechanics underpinning CoT techniques, and their application to language agent frameworks. ,
8) - a benchmark for general AI assistants consisting of real-world questions that require a set of fundamental abilities such as reasoning, multimodal handling, web browsing, and generally tool-use proficiency; shows that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. ,
9) - proposes a collaborative multi-round framework for the medical domain that leverages role-playing LLM-based agents to enhance LLM proficiency and reasoning capabilities. ,
10) - presents a suite of improved TÜLU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user preferences; TÜLU 2 suite achieves state-of-the-art performance among open models and matches or exceeds the performance of GPT-3.5-turbo-0301 on several benchmarks. ,
1) - present new models for controlled image editing and text-to-video generation based on diffusion models; Emu Video can generate high-quality video by using text-only, image-only, or combined text and image inputs; Emu Edit enables free-form editing through text instructions. ,
2) - an approach to improve the robustness and reliability of retrieval-augmented language models in facing noisy, irrelevant documents and in handling unknown scenarios; CoN generates sequential reading notes for the retrieved documents, enabling an evaluation of their relevance to the given question and integrating this information to formulate the final answer; CoN significantly outperforms standard retrieval-augmented language models and achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope. ,
3) - explores the impact of large language models, particularly GPT-4, across various scientific fields including drug discovery, biology, and computational chemistry; assesses GPT-4's understanding of complex scientific concepts, its problem-solving capabilities, and its potential to advance scientific research through expert-driven case assessments and benchmark testing. ,
4) - fine-tunes language model for factuality without requiring human labeling; it learns from automatically generated factuality preference rankings and targets open-ended generation settings; it significantly improves the factuality of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. ,
5) - proposes a contrastive chain of thought method to enhance language model reasoning; the approach provides both valid and invalid reasoning demonstrations, to guide the model to reason step-by-step while reducing reasoning mistakes; also proposes an automatic method to construct contrastive demonstrations and demonstrates improvements over CoT prompting. ,
6) - provides an overview of LLMs for code, including a review of 50+ models, 30+ evaluation tasks, and 500 related works. ,
7) - an open-world agent that can perceive multimodal input ,
8) - proposes a method that improves the quality of the context provided to the generator via two steps: 1) identifying useful context based on lexical and information-theoretic approaches, and 2) training context filtering models that can filter retrieved contexts at inference; outperforms existing approaches on extractive question answering ,
9) - proposes an approach for improving LLM safety with multi-round automatic red-teaming; incorporates automatic adversarial prompt writing and safe response generation, which increases red-teaming scalability and the safety of LLMs; violation rate of an LLM with limited safety alignment reduces up to 84.7% after 4 rounds of MART, achieving comparable performance to LLMs with extensive adversarial prompt writing. ,
10) - explores the use of an autonomous stock trading agent powered by LLMs; finds that the agent acts upon insider tips and hides the reason behind the trading decision; shows that helpful and safe LLMs can strategically deceive users in a realistic situation without direction instructions or training for deception. ,
1) - a comprehensive survey ,
2) - explores simplifying the transformer block and finds that many block components can be removed with no loss of training speed; using different architectures like autoregressive decoder-only and BERT encoder-only models, the simplified blocks emulate per-update training speed and performance of standard transformers, and even achieve 15% faster training throughput with fewer parameters ,
3) - investigates how effectively transformers can bridge between pretraining data mixture to identify and learn new tasks in-context which are both inside and outside the pretraining distribution; in the regimes studied, there is limited evidence that the models’ in-context learning behavior is capable of generalizing beyond their pretraining data. ,
4) - a single-stage transformer-based LLM that operates over several streams of compressed discrete music representation; it can generate high-quality samples ,
5) - a method that makes it possible to take advantage of increasing scale and capacity in Transformer models without increasing the computational cost; achieved by working on a subblock of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks; it widens the learn representation while only incurring a negligible increase in latency. ,
6) - an effective prompting method that uses LLMs to rephrase and expand questions posed by humans to improve overall performance; it can improve the performance of different models across a wide range of tasks; the approach can be combined with chain-of-thought to improve performance further. ,
7) - provides an exhaustive evaluation of the latest state-of-the-art visual language model, GPT-4V(vision), and its application in autonomous driving; the model demonstrates superior performance in scene understanding and causal reasoning compared to existing autonomous systems. ,
8) - outlines technical details of the GPT4All model family along with the open-source repository that aims to democratize access to LLMs. ,
9) - an approach that enables the scalable serving of many LoRA adapters; it stores all adapters in main memory and fetches adapters of currently running queries to the GPU memory; employs novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogenous batching of LoRA computation; improves throughput by 4x, when compared to other solutions, and increases the number of served adapters by several orders of magnitude. ,
10) - proposes a dynamic QA benchmark ,
1) - a state-of-the-art neural weather model that extends both the lead time range and the variables that an observation-based model can predict well; learns from both dense and sparse data sensors and makes predictions up to 24 hours ahead for precipitation, wind, temperature, and dew point. ,
2) - a comprehensive survey ,
3) - a large benchmarking framework for a diverse suite of computer vision tasks; find that while vision transformers ,
4) - proposes using LLMs for industrial chip design by leveraging domain adaptation techniques; evaluates different applications for chip design such as assistant chatbot, electronic design automation, and bug summarization; domain adaptation significantly improves performance over general-purpose models on a variety of design tasks; using a domain-adapted LLM for RAG further improves answer quality. ,
5) - proposes a compute-efficient method for efficiently extending the context window of LLMs beyond what it was pretrained on; extrapolates beyond the limited context of a fine-tuning dataset and models have been reproduced up to 128K context length. ,
6) - introduces a dataset consisting of more than 38M density functional theory ,
7) - presents a unified and methodological framework to enforce, discover, and promote symmetry in machine learning; also discusses how these ideas can be applied to ML models such as multilayer perceptions and basis function regression. ,
8) - reports progress on a new iteration of AlphaFold that greatly expands its range of applicability; shows capabilities of joint structure prediction of complexes including proteins, nucleic acids, small molecules, ions, and modified residue; demonstrates greater accuracy on protein-nucleic acid interactions than specialists predictors. ,
9) - explores the ability of LLMs to understand emotional stimuli; conducts automatic experiments on 45 tasks using various LLMs, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4; the tasks span deterministic and generative applications that represent comprehensive evaluation scenarios; experimental results show that LLMs have a grasp of emotional intelligence. ,
10) - finds that when training FP8 LLMs most variables, such as gradients and optimizer states, in LLM training, can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameter. ,
1) - a 7B parameter model with competitive performance to ChatGPT on AlpacaEval; applies distilled supervised fine-tuning to improve task accuracy and distilled direct performance optimization on AI feedback data to better align the model; shows performance comparable to 70B-parameter chat models aligned with human feedback. ,
2) - investigates the fact-checking capabilities of LLMs like GPT-4; results show the enhanced prowess of LLMs when equipped with contextual information; GPT4 outperforms GPT-3, but accuracy varies based on query language and claim veracity; while LLMs show promise in fact-checking, they demonstrate inconsistent accuracy. ,
3) - introduces an end-to-end framework for high-resolution image and video synthesis; involves a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture; enables a progressive training schedule from lower to higher resolutions leading to improvements in optimization for high-resolution generation. ,
4) - a new approach for spoken language modeling trained end-to-end to directly process spectrograms; it can be fine-tuned to generate high-quality accurate spoken language; the method surpasses existing spoken language models in speaker preservation and semantic coherence. ,
5) - presents a benchmark to assess LLMs' abilities in knowledge understanding, differentiation, and association; benchmark results show ,
6) - explores the problem of pretraining data detection which aims to determine if a black box model was trained on a given text; proposes a detection method named Min-K% Prob as an effective tool for benchmark example contamination detection, privacy auditing of machine unlearning, and copyrighted text detection in LM’s pertaining data. ,
7) - evaluates a performant ConvNet architecture pretrained on JFT-4B at scale; observes a log-log scaling law between the held out loss and compute budget; after fine-tuning on ImageNet, NFNets match the reported performance of Vision Transformers with comparable compute budgets. ,
8) - a dataset of Creative-Commons-licensed ,
9) - a short paper outlining risks from upcoming and advanced AI systems, including an examination of social harms, malicious uses, and other potential societal issues emerging from the rapid adoption of autonomous AI systems. ,
10) - an LLM program that consists of branch, solve, and merge modules parameterized with specific prompts to the base LLM; this enables an LLM to plan a decomposition of task into multiple parallel sub-tasks, independently solve them, and fuse solutions to the sub-tasks; improves evaluation correctness and consistency for multiple LLMs. ,
1) - an LLM for mathematics which is based on continued pretraining from Code Llama on the Proof-Pile-2 dataset; the dataset involves scientific paper, web data containing mathematics, and mathematical code; Llemma outperforms open base models and the unreleased Minerva on the MATH benchmark; the model is released, including dataset and code to replicate experiments. ,
2) - a comprehensive survey of LLMs for software engineering, including open research and technical challenges. ,
3) - presents a new retrieval-augmented framework that enhances an LM’s quality and factuality through retrieval and self-reflection; trains an LM that adaptively retrieves passages on demand, and generates and reflects on the passages and its own generations using special reflection tokens; it significantly outperforms SoTA LLMs ,
4) - explores retrieval-augmented language models on long-form question answering; finds that retrieval is an important component but evidence documents should be carefully added to the LLM; finds that attribution error happens more frequently when retrieved documents lack sufficient information/evidence for answering the question. ,
5) - presents a framework for characterizing and understanding generalization research in NLP; involves a meta-analysis of 543 papers and a set of tools to explore and better understand generalization studies. ,
6) - assesses an LLM's capability to self-generate feature attribution explanations; self-explanation is useful to improve performance and truthfulness in LLMs; this capability can be used together with chain-of-thought prompting. ,
7) - an open platform for using and hosting language agents in the wild; includes three agents, including a Data Agent for data analysis, a Plugins Agent with 200+ daily API tools, and a Web Agent for autonomous web browsing. ,
8) - uses language models to guide the task specification process and a learning framework to help models elicit and infer intended behavior through free-form, language-based interaction with users; shows that by generating open-ended questions, the system generates responses that are more informative than user-written prompts. ,
9) - an approach to route queries to LLMs based on the correctness of smaller language models ,
10) - enables synthesizing complex long-horizon video plans across robotics domains; the proposed algorithm involves a tree search procedure that trains vision-language models to serve as policies and value functions, and text-to-video models as dynamic models. ,
1) - a memory-efficient approach that leverages blockwise computation of self-attention to distribute long sequences across multiple devices to overcome the memory limitations inherent in Transformer architectures, enabling handling of longer sequences during training and inference; enables scaling the context length with the number of devices while maintaining performance, exceeding context length of 100 million without attention approximations. ,
2) - applies generative modeling to learn a universal simulator of real-world interactions; can emulate how humans and agents interact with the world by simulating the visual outcome of high instruction and low-level controls; the system can be used to train vision-language planners, low-level reinforcement learning policies, and even for systems that perform video captioning. ,
3) - a survey of factuality in LLMs providing insights into how to evaluate factuality in LLMs and how to enhance it. ,
4) - presents a two-stage framework that learns a rule library for reasoning with LLMs; in the first stage ,
5) - a generalizable chain-of-thought ,
6) - a comprehensive overview of LLMs applied to the healthcare domain. ,
7) - presents two approaches to compress retrieved documents into text summaries before pre-pending them in-context: 1) extractive compressor - selects useful sentences from retrieved documents 2) abstractive compressor - generates summaries by synthesizing information from multiple documents; achieves a compression rate of as low as 6% with minimal loss in performance on language modeling tasks and open domain question answering tasks; the proposed training scheme performs selective augmentation which helps to generate empty summaries when retrieved docs are irrelevant or unhelpful for a task. ,
8) - introduces Retro 48B, the largest LLM pretrained with retrieval; continues pretraining a 43B parameter GPT model on an additional 100B tokens by retrieving from 1.2T tokens ,
9) - a method to enhance long-text understanding by treating the LLM as an interactive agent that can decide how to read the text via iterative prompting; it first processes long context into a tree of summer nodes and reads in a query to traverse the tree, seeking relevant information and crafting a suitable response; this process is achieved through reasoning and enables effective reading and enhances explainability through reasoning steps. ,
10) - explores the direction of fine-tuning LLMs to obtain language agents; finds that language agents consistently improved after fine-tuning their backbone language model; claims that fine-tuning a Llama2-7B with 500 agent trajectories ,
1) - discovers that LLMs learn linear representations of space and time across multiple scales; the representations are robust to prompt variations and unified across different entity types; demonstrate that LLMs acquire fundamental structured knowledge such as space and time, claiming that language models learn beyond superficial statistics, but literal world models. ,
2) - compares retrieval augmentation and long-context windows for downstream tasks to investigate if the methods can be combined to get the best of both worlds; an LLM with a 4K context window using simple RAG can achieve comparable performance to a fine-tuned LLM with 16K context; retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes; a retrieval-augmented LLaMA2-70B with a 32K context window outperforms GPT-3.5-turbo-16k on seven long context tasks including question answering and query-based summarization. ,
3) - a framework that enables efficient streaming LLMs with attention sinks, a phenomenon where the KV states of initial tokens will largely recover the performance of window attention; the emergence of the attention sink is due to strong attention scores towards the initial tokens; this approach enables LLMs trained with finite length attention windows to generalize to infinite sequence length without any additional fine-tuning. ,
4) - proposes to use neural networks that self-assemble through a developmental process that mirrors properties of embryonic development in biological organisms ,
5) - a comprehensive analysis of GPT-4V to deepen the understanding of large multimodal models ,
6) - performs training and inference on LLMs with a learnable token which helps to delay the model's answer generation and attain performance gains on general understanding tasks of Commonsense QA and math word problem-solving; experiments show that this is only beneficial provided that the delay is introduced in both pertaining and downstream fine-tuning. ,
7) - proposes the use of a language model-infused scaffolding program to recursively improve itself; a seed improver first improves an input program that returns the best solution which is then further tasked to improve itself; shows that the GPT-4 models can write code that can call itself to improve itself. ,
8) - proposes a lightweight fine-tuning method to retrofit LLMs with retrieval capabilities; it involves a 2-step approach: 1) updates a pretrained LM to better use the retrieved information 2) updates the retriever to return more relevant results, as preferred by the LM Results show that fine-tuning over tasks that require both knowledge utilization and contextual awareness, each stage leads to additional gains; a 65B model achieves state-of-the-art results on a range of knowledge-intensive zero- and few-shot learning benchmarks; it outperforms existing retrieval-augmented language approaches by up to +8.9% in zero-shot and +1.4% in 5-shot. ,
9) - a model that performs high-fidelity zero-shot image generation from generalized vision-language input that spans multiple images; extends zero-shot subject-driven image generation to multi-entity scenarios; allows the replacement of CLIP, unlocking new applications with other U-Net techniques such as ControlNet and LoRA. ,
10) - a new prompting approach to automatically guide the reasoning process of LLMs; the approach is different from chain-of-thought in that it doesn’t require labeled exemplars of the reasoning process; the approach is inspired by analogical reasoning and prompts LMs to self-generate relevant exemplars or knowledge in the context. ,
1) - finds that LLMs trained on sentences of the form “A is B” will not automatically generalize to the reverse direction “B is A”, i.e., the Reversal Curse; shows the effect through finetuning LLMs on fictitious statements and demonstrating its robustness across model sizes and model families. ,
2) - propose a 70B variant that can already surpass gpt-3.5-turbo-16k’s overall performance on a suite of long-context tasks. This involves a cost-effective instruction tuning procedure that does not require human-annotated long instruction data. ,
3) - proposes a plug-and-play method to assist pre-trained LLMs in learning beneficial knowledge from knowledge graphs ,
4) - identifies artifacts in feature maps of vision transformer networks that are repurposed for internal computations; this work proposes a solution to provide additional tokens to the input sequence to fill that role; the solution fixes the problem, leads to smoother feature and attention maps, and sets new state-of-the-art results on dense visual prediction tasks. ,
5) - presents the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions; it can predict compact formulas for complex functions and be applied to modeling the dynamics of gene regulatory networks. ,
6) - adapts factually augmented RLHF to aligning large multimodal models; this approach alleviates the reward hacking in RLHF and improves performance on the LlaVA-Bench dataset with the 94% performance level of the text-only GPT-4. ,
7) - a comprehensive survey paper on LLM alignment; topics include Outer Alignment, Inner Alignment, Mechanistic Interpretability, Attacks on Aligned LLMs, Alignment Evaluation, Future Directions, and Discussions. ,
8) - proposes a series of LLMs demonstrating the strength of RLHF on tasks involving tool use and planning capabilities for creating language agents. ,
9) - an open-source LLM series for interpretable mental health analysis with instruction-following capability; it also proposes a multi-task and multi-source interpretable mental health instruction dataset on social media with 105K data samples. ,
10) - a new neurosymbolic framework to improve zero-shot chain-of-thought reasoning in LLMs; leverages principles from symbolic logic to verify and revise reasoning processes to improve the reasoning capabilities of LLMs. ,
1) - an AI model classifying missense variants to help pinpoint the cause of diseases; the model is used to develop a catalogue of genetic mutations; it can categorize 89% of all 71 million possible missense variants as either likely pathogenic or likely benign. ,
2) - develops a method to enable LLMs to "deliberate" on responses to correct mistakes; include the following steps: 1) draft initial response, 2) plan verification questions to fact-check the draft, 3) answer questions independently to avoid bias from other responses, and 4) generate a final verified response. ,
3) - shows that contrastive decoding leads Llama-65B to outperform Llama 2 and other models on commonsense reasoning and reasoning benchmarks. ,
4) - an efficient fine-tuning approach to significantly extend the context windows of pre-trained LLMs; implements shift short attention, a substitute that approximates the standard self-attention pattern during training; it has less GPU memory cost and training time compared to full fine-tuning while not compromising accuracy. ,
5) - studies the use of LLMs for generating complex structured data; proposes a structure-aware fine-tuning method, applied to Llama-7B, which significantly outperform other model like GPT-3.5/4 and Vicuna-13B. ,
6) - a large-scale dataset containing 1 million real-world conversations with 25 state-of-the-art LLM; it is collected from 210K unique IP addresses on the Vincuna demo and Chatbot Arena website. ,
7) - evaluates the compression capabilities of LLMs; it investigates how and why compression and prediction are equivalent; shows that LLMs are powerful general-purpose compressors due to their in-context learning abilities; finds that Chinchilla 70B compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG ,
8) - proposes foundation models that leverage multiple expert foundation models trained on language, vision, and action data to solve long-horizon goals. ,
9) - proposes OWL, an LLM for IT operations tuned using a self-instruct strategy based on IT-related tasks; it discusses how to collect a quality instruction dataset and how to put together a benchmark. ,
10) - a multimodal model for machine reading of text-intensive images, capable of document-level text generation and image-to-markdown text generation. ,
1) - a new 1.3 billion parameter model trained on 30 billion tokens; the dataset consists of "textbook-quality" synthetically generated data; phi-1.5 competes or outperforms other larger models on reasoning tasks suggesting that data quality plays a more important role than previously thought. ,
2) - a comprehensive overview of LLM based agents; covers from how to construct these agents to how to harness them for good. ,
3) - combines evolutionary-scale data with diffusion models for controllable protein generation in sequence space; it can generate proteins inaccessible to structure-based models. ,
4) - discovers that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting. ,
5) - presents a system for learning end-to-end vision-based parkour policy which is transferred to a quadrupedal robot using its ecocentric depth camera; shows that low-cost robots can automatically select and execute parkour skills in a real-world environment. ,
6) - classifies different types of hallucination phenomena and provides evaluation criteria for assessing hallucination along with mitigation strategies. ,
7) - an open-source library for building autonomous language agents including support for features like planning, memory, tool usage, multi-agent communication, and more. ,
8) - presents an LLM based on Llama 2 tailored for radiology; it's tuned on a large dataset of radiology reports to generate coherent and clinically useful impressions from radiology findings. ,
9) - presents ChatDev, a virtual chat-powered software development company mirroring the waterfall model; shows the efficacy of the agent in software generation, even completing the entire software development process in less than seven minutes for less than one dollar. ,
10) - a series of open-source LLMs tailored for general math problem-solving; the models are trained on a curated instruction tuning dataset and outperform existing open-source models on several mathematical reasoning datasets. ,
1) - finds that the optimization geometry of self-attention in Transformers exhibits a connection to hard-margin SVM problems; also finds that gradient descent applied without early-stopping leads to implicit regularization and convergence of self-attention; this work has the potential to deepen the understanding of language models.
2) - tests whether RLAIF is a suitable alternative to RLHF by comparing the efficacy of human vs. AI feedback; uses different techniques to generate AI labels and conduct scaling studies to report optimal settings for generating aligned preferences; the main finding is that on the task of summarization, human evaluators prefer generations from both RLAIF and RLHF over a baseline SFT model in ∼70% of cases. ,
3) - shows that with sufficient training data, a 2B language model can perform multi-digit arithmetic operations with 100% accuracy and without data leakage; it’s also competitive with GPT-4 on 5K samples Chinese math problem test set when fine-tuned from GLM-10B on a dataset containing additional multi-step arithmetic operations and detailed math problems. ,
4) - an approach where the optimization problem is described in natural language; an LLM is then instructed to iteratively generate new solutions based on the defined problem and previously found solutions; at each optimization step, the goal is to generate new prompts that increase test accuracy based on the trajectory of previously generated prompts; the optimized prompts outperform human-designed prompts on GSM8K and Big-Bench Hard, sometimes by over 50% ,
5) - presents ImageBind-LLM, a multimodality instruction tuning method of LLMs via ImageBind; this model can respond to instructions of diverse modalities such as audio, 3D point clouds, and video, including high language generation quality; this is achieved by aligning ImageBind’s visual encoder with an LLM via learnable bind network. ,
6) - aims to explain grokking behavior in neural networks; specifically, it predicts and shows two novel behaviors: the first is ungrokking where a model goes from perfect generalization to memorization when trained further on a smaller dataset than the critical threshold; the second is semi-grokking where a network demonstrates grokking-like transition when training a randomly initialized network on the critical dataset size. ,
7) - provides a survey of empirical examples of AI deception. ,
8) - a new open LLM called FLM-101B with 101B parameters and 0.31TB tokens which can be trained on a $100K budget; the authors analyze different growth strategies, growing the number of parameters from smaller sizes to large ones. They ultimately employ an aggressive strategy that reduces costs by >50%. In other words, three models are trained sequentially with each model inheriting knowledge from its smaller predecessor ,
9) - proposes a systematic framework for understanding and building fully-fledged language agents drawing parallels from production systems and cognitive architectures; it systematizes diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework. ,
10) - a scalable RL method for training multi-task policies from large offline datasets leveraging human demonstrations and autonomously collected data; shows good performance on a large diverse real-world robotic manipulation task suite. ,
1) - proposes a large language and speech model trained with cross-modal conversational abilities that supports speech-and-language instruction enabling more natural interactions with AI systems. ,
2) - applies segment anything models ,
3) - suggests that “from a cost–benefit analysis, there does not appear to be a compelling reason to introduce a dedicated vector store into a modern “AI stack” for search since such applications have already received substantial investments in existing, widely deployed infrastructure.” ,
4) - presents a prompting approach that models text generated by LLMs as an arbitrary graph; it enables combining arbitrary "thoughts" and enhancing them using feedback loops; the core idea is to enhance the LLM capabilities through "network reasoning" and without any model updates; this could be seen as a generalization of the now popular Chain-of-Thought and Tree-of-Thought. ,
5) - a multi-view diffusion model that can generate geometrically consistent multi-view images given a text prompt; it leverages pre-trained diffusion models and a multi-view dataset rendered from 3D assets; this leads to generalizability of 2D diffusion and consistency of 3D data. ,
6) - proposes an approach for neural optical understanding of academic documents; it supports the ability to extract text, equations, and tables from academic PDFs, i.e., convert PDFs into LaTeX/markdown. ,
7) - proposes a tool called to detect factual errors in texts generated by LLMs; shows the necessary components needed and the types of tools to integrate with LLMs for better detecting factual errors. ,
8) - an approach for industrial anomaly detection based on large vision-language models; it simulates anomalous images and textual descriptions to generate training data; employs an image decoder and prompt learner to detect anomalies; it shows few-shot in-context learning capabilities and achieves state-of-the-art performance benchmark datasets. ,
9) - a personalized portrait generation framework combining customized image-generation models and face-related perceptual understanding models to generate truthful personalized portraits; it works with a handful of portrait images as input.
10) - introduces a set of large-scale vision-language models demonstrating strong performance in tasks like image captioning, question answering, visual localization, and flexible interaction. ,
1) - a family of LLMs for code based on Llama 2; the models provided as part of this release: foundation base models ,
2) - new survey paper on instruction tuning LLM, including a systematic review of the literature, methodologies, dataset construction, training models, applications, and more. ,
3) - a unified multilingual and multimodal machine translation system that supports ASR, text-to-text translation, speech-to-text translation, text-to-speech translation, and speech-to-speech translation. ,
4) - provides an overview of existing efforts to identify and mitigate threats and vulnerabilities arising from LLMs; serves as a guide to building more reliable and robust LLM-powered systems. ,
5) - a new family of models that are fine-tuned from base Llama and Llama 2; extends the context length to 4K, 16K, and 32K; explores the space of expanding context lengths in LLMs so it also includes insights useful for practitioners and researchers. ,
6) - presents a strategy that leverages explicitly synthesized multi-view images to improve Text-to-3D generation; integrates a discriminator along a Diffusion-GAN dual training strategy to guide the training of the 3D models.
7) - presents a comprehensive survey of LLM-based autonomous agents; delivers a systematic review of the field and a summary of various applications of LLM-based AI agents in domains like social science and engineering. ,
8) - a new framework that accepts a prompt describing a task through natural language; it then uses the prompt to train a small special-purpose model that is conducive to deployment; the proposed pipeline automatically collects and synthesizes knowledge through three channels: dataset retrieval, dataset generation, and model retrieval. ,
9) - a collaboratively constructed benchmark for measuring legal reasoning in LLMs; it consists of 162 tasks covering 6 different types of legal reasoning. ,
10) - proposes a new language-to-reward system that utilizes LLMs to define optimizable reward parameters to achieve a variety of robotic tasks; the method is evaluated on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge. ,
1) - presents an approach to automatically label human-written text with corresponding instruction which enables building a high-quality instruction following language model; the steps are: 1) fine-tune an LLM with small seed data and web corpus, then 2) generate instructions for each web doc, 3) curate high-quality examples via the LLM, and finally 4) fine-tune on the newly curated data; the self-alignment approach outperforms all other Llama-based models on the Alpaca leaderboard. ,
2) - a family of fine-tuned and merged LLMs currently topping the Open LLM Leaderboard; it describes a process of efficiently fine-tuning and merging LoRA modules and also shows the benefits of collecting high-quality datasets for fine-tuning; specifically, it presents a small-scale, high-quality, and highly curated dataset, Open-Platypus, that enables strong performance with short and cheap fine-tuning time and cost... one can train a 13B model on a single A100 GPU using 25K questions in 5 hours. ,
3) - a short survey on the recent model compression techniques for LLMs; provides a high-level overview of topics such as quantization, pruning, knowledge distillation, and more; it also provides an overview of benchmark strategies and evaluation metrics for measuring the effectiveness of compressed LLMs. ,
4) - uses deep learning and gene relationship knowledge graph to help predict cellular responses to genetic perturbation; GEARS exhibited 40% higher precision than existing approaches in the task of predicting four distinct genetic interaction subtypes in a combinatorial perturbation screen. ,
5) - introduces a language model (7B) specifically tuned to critique the model responses and suggest refinements; this enables the capability to identify diverse errors and suggest remedies; its critiques are either similar or preferred to ChatGPT. ,
6) - proposes a zero-shot prompting technique for GPT-4 Code Interpreter that explicitly encourages the use of code for self-verification which further boosts performance on math reasoning problems; initial experiments show that GPT4-Code achieved a zero-shot accuracy of 69.7% on the MATH dataset which is an improvement of 27.5% over GPT-4’s performance (42.2%). Lots to explore here. ,
7) - proposes a general approach based on multitask learning for personalized text generation using LLMs; the goal is to have an LLM generate personalized text without relying on predefined attributes. ,
8) - presents 4 terabytes of Git commits across 350 languages used to instruction tune code LLMs; achieves state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark; the data is also used to extend the HumanEval benchmark to other tasks such as code explanation and code repair. ,
9) - presents a library to help LLM developers guide text generation in a fast and reliable way; provides generation methods that guarantee that the output will match a regular expression, or follow a JSON schema. ,
10) - introduces a new class of generative models bringing together the power of Bayesian inference and deep learning; it differs from diffusion models in that it operates on the parameters of a data distribution rather than on a noisy version of the data; it’s adapted to continuous, discretized and discrete data with minimal changes to the training procedure. ,
1) - presents D-Bot, a framework based on LLMs that continuously acquires database maintenance experience from textual sources; D-Bot can help in performing: 1) database maintenance knowledge detection from documents and tools, 2) tree of thought reasoning for root cause analysis, and 3) collaborative diagnosis among multiple LLMs. ,
2) - develops methods to measure media biases in LLMs, including the fairness of downstream NLP models tuned on top of politically biased LLMs; findings reveal that LLMs have political leanings which reinforce existing polarization in the corpora. ,
3) - presents a multidimensional benchmark (AgentBench) to assess LLM-as-Agent’s reasoning and decision-making abilities; results show that there is a significant disparity in performance between top commercial LLMs and open-source LLMs when testing the ability to act as agents; open-source LLMs lag on the AgentBench tasks while GPT-4 shows potential to build continuously learning agents. ,
4) - introduces an efficient approach to scale influence functions to LLMs with up to 52 billion parameters; the influence functions are used to further investigate the generalization patterns of LLMs such as cross-lingual generalization and memorization; finds that middle layers in the network seem to be responsible for the most abstract generalization patterns. ,
5) - proposes NeuroImagen, a pipeline for reconstructing visual stimuli images from EEG signals to potentially understand visually-evoked brain activity; a latent diffusion model takes EEG data and reconstructs high-resolution visual stimuli images. ,
6) - is a new library that provides an efficient vectorized implementation of inference algorithms for structured distributions; it enables building large-scale differentiable models that explicitly model structure in data like tagging, segmentation, constituency trees, and spanning trees. ,
7) - proposes fine-tuning on simple synthetic data to reduce sycophancy in LLMs; sycophancy occurs when LLMs try to follow a user’s view even when it’s not objectively correct; essentially, the LLM repeats the user’s view even when the opinion is wrong. ,
8) - presents photorealistic and semantically controllable synthetic datasets for representation learning using Unreal Engine; the goal is to democratize photorealistic synthetic data and enable more rigorous evaluations of vision models. ,
9) - develops an approach to select demonstrations and generate high-performing prompts used with GPT for executing tasks such as controlling (Heating, Ventilation, and Air Conditioning) for buildings; GPT-4 performs comparable to RL method but uses fewer samples and lower technical debt. ,
10) - presents a comprehensive overview of important categories and subcategories crucial for assessing LLM trustworthiness; the dimensions include reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness; finds that aligned models perform better in terms of trustworthiness but the effectiveness of alignment varies. ,
1) - provides an overview of open problems and the limitations of RLHF. ,
2) - a new multimodal model that allows in-context learning and enables tasks such as few-shot medical visual question answering; evaluations based on physicians, show improvements of up to 20% in clinician's rating; the authors occasionally observed low-quality generations and hallucinations. ,
3) - enables LLMs to interact with 16000 real-world APIs; it’s a framework that allows data preparation, training, and evaluation; the authors claim that one of their models, ToolLLaMA, has reached the performance of ChatGPT (turbo-16k) in tool use. ,
4) - proposes a prompting strategy that firsts generate an answer skeleton and then performs parallel API calls to generate the content of each skeleton point; reports quality improvements in addition to speed-up of up to 2.39x. ,
5) - a framework involving LLM-based multi-agents that encodes human standardized operating procedures (SOPs) to extend complex problem-solving capabilities that mimic efficient human workflows; this enables MetaGPT to perform multifaceted software development, code generation tasks, and even data analysis using tools like AutoGPT and LangChain. ,
6) - introduces a family of autoregressive vision-language models ranging from 3B to 9B parameters; the technical report describes the models, training data, and evaluation suite. ,
7) - shows that language models exhibit self-repairing properties — when one layer of attention heads is ablated it causes another later layer to take over its function. ,
8) - explores whether LLMs have the capability to perform self-checks which is required for complex tasks that depend on non-linear thinking and multi-step reasoning; it proposes a zero-shot verification scheme to recognize errors without external resources; the scheme can improve question-answering performance through weighting voting and even improve math word problem-solving. ,
9) - presents an agent that learns a multimodal world model that predicts future text and image representations; it learns to predict future language, video, and rewards; it’s applied to different domains and can learn to follow instructions in visually and linguistically complex domains. ,
10) - discovers zero-shot adaptable policies from scratch that enable adaptive behaviors necessary for sudden environmental changes; as an example, the authors demonstrate the automatic discovery of Python code for controlling a robot. ,
1) - finds universal and transferable adversarial attacks that cause aligned models like ChatGPT and Bard to generate objectionable behaviors; the approach automatically produces adversarial suffixes using greedy and gradient search. ,
2) - a new end-to-end vision-language-action model that learns from both web and robotics data; enables the model to translate the learned knowledge to generalized instructions for robotic control. ,
3) - introduces a new multimodal biomedical benchmark with 14 different tasks; it presents a proof of concept for a generalist biomedical AI system called Med-PaLM Multimodal; it supports different types of biomedical data like clinical text, imaging, and genomics. ,
4) - propose a framework for high-quality tracking anything in videos; consists of a video multi-object segmented and a pretrained mask refiner model to refine the tracking results; the model ranks 2nd place in the VOTS2023 challenge. ,
5) - presents a survey and outlook discussing open challenges and research directions for foundational models in computer vision. ,
6) - a standardized evaluation for long context language models containing 411 long documents over 2K query-response pairs encompassing areas such as law, finance, school lectures, long conversations, novels, and meetings. ,
7) - introduces LoraHub to enable efficient cross-task generalization via dynamic LoRA composition; it enables the combination of LoRA modules without human expertise or additional parameters/gradients; mimics the performance of in-context learning in few-shot scenarios. ,
8) - resents a comprehensive overview of alignment approaches, including aspects like data collection, training methodologies, and model evaluation. ,
9) - leverages LLMs to connect various audio models to compose audio content for engaging storytelling; this involves an explainable and interactive design that enhances creative control in audio production. ,
10) - a task and domain agnostic framework for factuality detection of text generated by LLM; the effectiveness of the approach is tested on tasks such as code generation and mathematical reasoning; a benchmark dataset is released, including a ChatGPT plugin. ,
1) - a collection of pretrained foundational models and fine-tuned chat models ranging in scale from 7B to 70B; Llama 2-Chat is competitive on a range of tasks and shows strong results on safety and helpfulness. ,
2) - evaluates different versions of GPT-3.5 and GPT-4 on various tasks and finds that behavior and performance vary greatly over time; this includes differences in performance for tasks such as math problem-solving, safety-related generations, and code formatting. ,
3) - improves work partitioning and parallelism and addresses issues like reducing non-matmul FLOPs, parallelizing attention computation which increases occupancy, and reducing communication through shared memory. ,
4) - nds that CoT reasoning shows large variation across tasks by simple interventions like adding mistakes and paraphrasing; demonstrates that as the model becomes larger and more capable, the reasoning becomes less faithful; suggests carefully choosing the model size and tasks can enable CoT faithfulness. ,
5) - an approach to generate episodic content using LLMs and multi-agent simulation; this enables current systems to perform creative storytelling through the integration of simulation, the user, and powerful AI models and enhance the quality of AI-generated content. ,
6) - summarizes a comprehensive list of challenges when working with LLMs that range from brittle evaluations to prompt brittleness to a lack of robust experimental designs. ,
7) - presents a foundation architecture for LLMs with the goal to improve training efficiency, inference, and efficient long-sequence modeling; adapts retention mechanism for sequence modeling that support parallel representation, recurrent representations, and chunkwise recurrent representation. ,
8) - a framework that performs unified learning across 12 modalities; it can handle tasks that include fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). ,
9) - presents a framework to iteratively train dense retrievers to identify high-quality in-context examples for LLMs; the approach enhances in-context learning performance demonstrated using a suite of 30 tasks; examples with similar patterns are helpful and gains are consistent across model sizes. ,
10) - proposes fine-grained evaluation for LLMs based on a range of alignment skill sets; involves 12 skills and can help to provide a holistic view of a model’s performance depending on skill, domain, and level of difficulty; useful to analyze factors that make LLMs more proficient at specific skills. ,
1) - introduces a retrieval-augmented multi-modal language model that can generate text and images; leverages diverse and large-scale instruction-style data for tuning which leads to significant performance improvements and 5x less training compute than comparable methods. ,
2) - presents a detailed model card for Claude 2 along with results on a range of safety, alignment, and capabilities evaluations. ,
3) - takes a closer look at RLHF and explores the inner workings of PPO with code included. ,
4) - employs a contrastive training process to enhance the structure of the (key, value) space to extend context length; presents a fine-tuned model that lengthens context and demonstrates improvements in long context tasks. ,
5) - introduces a vision transformer for any aspect ratio and resolution through sequence packing; enables flexible model usage, improved training efficiency, and transfers to tasks involving image and video classification among others. ,
6) - shows that even without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning; this work applies zero-shot capabilities to robotics and shows that it’s possible to transfer the pattern among words to actions. ,
7) - introduces a smaller, faster, and more efficient version of Dreambooth; enables personalization of text-to-image diffusion model using a single input image, 25x faster than Dreambooth. ,
8) - trains small transformer models on chain-of-thought style data to significantly improve accuracy and convergence speed; it highlights the importance of high-quality instructive data for rapidly eliciting arithmetic capabilities. ,
9) - appends a motion modeling module to a frozen text-to-image model, which is then trained and used to animate existing personalized models to produce diverse and personalized animated images. ,
10) - presents a new transformer-based multimodal foundation model to generate images and text in a multimodal context; enables performant multimodal assistants via instruction tuning. ,
1) - a comprehensive overview of evaluation methods for LLMs focusing on what to evaluate, where to evaluate, and how to evaluate. ,
2) - finds that LM performance is often highest when relevant information occurs at the beginning or end of the input context; performance degrades when relevant information is provided in the middle of a long context. ,
3) - proposes a prompting technique that enables open-source LLMs to perform state-of-the-art text ranking on standard benchmarks. ,
4) - introduces an approach that effectively maps images to the token space of LLMs; enables models like PaLM and GPT-4 to tackle visual tasks without parameter updates; enables multimodal tasks and uses in-context learning to tackle various visual tasks. ,
5) - releases a new code LLM trained on 1.5T tokens; the 7B model is on par with >15B code-generation models and it’s optimized for fast sampling. ,
6) - introduces an advancement over Decision Transformers and variants by facilitating trajectory stitching during action inference at test time, achieved by adjusting to shorter history that allows transitions to diverse and better future states. ,
7) - presents a framework to measure and align the uncertainty of LLM-based planners that ask for help when needed. ,
8) - proposes a method that uses reinforcement learning to train a policy to control characters in a physics simulator; it retargets motions in real-time from sparse human sensor data to characters of various morphologies. ,
9) - presents LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, with no loss in shorter sequences. ,
10) - introduces a framework of interactive coding as a reinforcement learning environment; this is different from the typical coding benchmarks that consider a static sequence-to-sequence process. ,
1) - an open-source Lean playground consisting of toolkits, data, models, and benchmarks for theorem proving; also develops ReProver, a retrieval augmented LLM-based prover for theorem solving using premises from a vast math library. ,
2) - extends the context window of LLMs like LLaMA to up to 32K with minimal fine-tuning (within 1000 steps); previous methods for extending the context window are inefficient but this approach attains good performance on several tasks while being more efficient and cost-effective. ,
3) - proposes a modular approach for solving computer vision problems by leveraging LLMs; the LLM is used to reason over outputs from independent and descriptive modules that provide extensive information about an image. ,
4) - a foundational model that leverages the power of pretrained models to vision-based robotic navigation; it can be used with any navigation dataset and is built on a flexible Transformer-based architecture that can tackle various navigational tasks. ,
5) - evaluates GPT-4 and ChatGPT on programming education scenarios and compares their performance with human tutors; GPT-4 outperforms ChatGPT and comes close to human tutors' performance. ,
6) - extends interactive point-based image editing using diffusion models; it optimizes the diffusion latent to achieve precise spatial control and complete high-quality editing efficiently. ,
7) - a framework for procedurally generating evaluations with LLMs; proposes a benchmark to study the social reasoning capabilities of LLMs with LLMs. ,
8) - a framework for self-supervised evaluation of LLMs by analyzing their sensitivity or invariance to transformations on input text; can be used to monitor LLM behavior on datasets streamed during live model deployment. ,
9) - an architecture and training procedure for jointly training a retrieval-augmented language model from scratch for long-range language modeling tasks. ,
10) - shows that the performance of MLPs improves with scale and highlights that lack of inductive bias can be compensated. ,
1) - introduces a new 1.3B parameter LLM called phi-1; it’s significantly smaller in size and trained for 4 days using a selection of textbook-quality data and synthetic textbooks and exercises with GPT-3.5; achieves promising results on the HumanEval benchmark. ,
2) - a new foundation agent that can operate different robotic arms and can solve tasks from as few as 100 demonstrations; the self-improving AI agent can self-generate new training data to improve its technique and get more efficient at adapting to new tasks. ,
3) - a language model optimized through extensive and diverse medical data, including medical records, domain-specific knowledge, and multi-round dialogue consultations. ,
4) - provides an overview of the main sources of catastrophic AI risks; the goal is to foster more understanding of these risks and ensure AI systems are developed in a safe manner. ,
5) - proposes a new memory-efficient optimizer that combines gradient computation and parameter update in one step; enables tuning the full parameters of an LLM with limited resources. ,
6) - formulates sequence generation as an imitation learning problem; this framework allows the ability to incorporate backtracking into text generation through a backspace action; this enables the generative model to mitigate compounding errors by reverting sample tokens that lead to sequence OOD. ,
7) - an extensible and lightweight toolkit that simplifies finetuning and inference of general large foundation models; supports continuous pretraining, instruction tuning, parameter-efficient finetuning, alignment tuning, and large model inference. ,
8) - uses multimodal control signals for generating consecutive human motions; it quantizes multimodal control signals intro discrete codes which are converted to LLM instructions that generate motion answers. ,
9) - introduces a simple and effective pruning approach for LLMs; it prunes weights with the smallest magnitudes multiplied by the corresponding input activations, on a per-output basis; the approach requires no retraining or weight update and outperforms baselines of magnitude pruning. ,
10) - fuses text-based and speech-based LMs, PaLM-2 and AudioLM, into a multimodal architecture that supports speech understanding and generation; outperforms existing systems for speech translation tasks with zero-shot speech-to-text translation capabilities. ,
1) - an all-in-one generative speech model; it can synthesize speech across 6 languages; it can perform noise removal, content editing, style conversion, and more; it's 20x faster than current models and outperforms single-purpose models through in-context learning. ,
2) - an open-source LLM for the finance sector; it takes a data-centric approach, providing researchers & practitioners with accessible resources to develop FinLLMs. ,
3) - estimates that 33-46% of crowd workers on MTurk used LLMs when completing a text production task. ,
4) - watermarking is useful to detect LLM-generated text and potentially mitigate harms; this work studies the reliability of watermarking for LLMs and finds that watermarks are detectable even when the watermarked text is re-written by humans or paraphrased by another non-watermarked LLM. ,
5) - a new survey paper highlighting major applications of Transformers for deep learning tasks; includes a comprehensive list of Transformer models. ,
6) - it’s currently challenging to properly assess the best optimizers to train neural networks; this paper presents a new benchmark, AlgoPerf, for benchmarking neural network training algorithms using realistic workloads. ,
7) - provides a roadmap for the unification of LLMs and KGs; covers how to incorporate KGs in LLM pre-training/inferencing, leverage LLMs for KG tasks such as question answering, and enhance both KGs and LLMs for bidirectional reasoning. ,
8) - proposes a framework to enable LLMs to memorize long history; it’s enhanced with memory-augmented adaptation training to memorize long past context and use long-term memory for language modeling; achieves improvements on memory-augmented in-context learning over LLMs. ,
9) - enables tracking any queried point on any physical surface throughout a video sequence; outperforms all baselines and facilitates fast inference on long and high-resolution videos (track points faster than real-time when using modern GPUs). ,
10) - a new dataset for evaluating generalist agents for the web; contains 2350 tasks from 137 websites over 31 domains; it enables testing generalization ability across tasks and environments, covering practical use cases on the web. ,
1) - propose a test-time optimization method for estimating dense and long-range motion; enables accurate, full-length motion estimation of every pixel in a video. ,
2) - a deep reinforcement learning agent which discovers faster sorting algorithms from scratch; the algorithms outperform previously known human benchmarks and have been integrated into the LLVM C++ library. ,
3) - a new compressed format and quantization technique that enables near-lossless compression of LLMs across model scales; “allows LLM inference at 4.75 bits with a 15% speedup”. ,
4) - a simple and controllable model for music generation built on top of a single-stage transformer LM together with efficient token interleaving patterns; it can be conditioned on textual descriptions or melodic features and shows high performance on a standard text-to-music benchmark. ,
5) - combines an LLM with a set of SQL databases, enabling a symbolic memory framework; completes tasks via LLM generating SQL instructions that manipulate the DB autonomously. ,
6) - presents a method called LEAst-squares Concept Erasure (LEACE) to erase target concept information from every layer in a neural network; it’s used for reducing gender bias in BERT embeddings. ,
7) - trains LMs with fine-grained human feedback; instead of using overall preference, more explicit feedback is provided at the segment level which helps to improve efficacy on long-form question answering, reduce toxicity, and enables LM customization. ,
8) - pretrains vision transformers with a visual pretext task (MAE), while removing unnecessary components from a state-of-the-art multi-stage vision transformer; this enables a simple hierarchical vision transformer that’s more accurate and faster at inference and during training. ,
9) - explores ChatGPT’s capabilities to grasp and reproduce humor; finds that over 90% of 1008 generated jokes were the same 25 jokes and that ChatGPT is also overfitted to a particular joke structure. ,
10) - develops a 13B parameter model that learns to imitate the reasoning process of large foundational models like GPT-4; it leverages large-scale and diverse imitation data and surpasses instruction-tuned models such as Vicuna-13B in zero-shot reasoning. ,

Top ML Papers of the Week (May 29-June 4)

1) - achieves state-of-the-art mathematical problem solving by rewarding each correct step of reasoning in a chain-of-thought instead of rewarding the final answer; the model solves 78% of problems from a representative subset of the MATH test set. ,
2) - shows that explicit position embeddings are not essential for decoder-only Transformers; shows that other positional encoding methods like ALiBi and Rotary are not well suited for length generalization. ,
3) - a unified biomedical generative pretrained transformer model for vision, language, and multimodal tasks. Achieves state-of-the-art performance across 5 distinct tasks with 20 public datasets spanning over 15 unique biomedical modalities. ,
4) - introduces an imitation learning framework to learn to think while acting; the idea is not only to clone the behaviors of human demonstrators but also the thoughts humans have when performing behaviors. ,
5) - proposes a memory-efficient zeroth-order optimizer and a corresponding SGD algorithm to finetune large LMs with the same memory footprint as inference. ,
6) - an acoustic music understanding model with large-scale self-supervised training; it incorporates a superior combination of teacher models to outperform conventional speech and audio approaches. ,
7) - investigates performing classification directly on file bytes, without needing to decode files at inference time; achieves ImageNet Top-1 accuracy of 77.33% using a transformer backbone; achieves 95.42% accuracy when operating on WAV files from the Speech Commands v2 dataset. ,
8) - while helpful to train safe and useful LLMs, the RLHF process can be complex and often unstable; this work proposes an approach to finetune LMs by solving a classification problem on the human preferences data, with no RL required. ,
9) - an LLM-based Text-to-SQL adopted from PaLM-2; achieves SoTA in both in-context learning and fine-tuning settings; the few-shot model outperforms the previous fine-tuned SoTA by 3.8% on the Spider benchmark; few-shot SQL-PaLM also outperforms few-shot GPT-4 by 9.9%, using a simple prompting approach. ,
10) - an open-source Transformer library for state-of-the-art code LLMs; supports pretrained code LLMs and popular code benchmarks, including standard methods to train and serve code LLMs efficiently. ,

Top ML Papers of the Week (May 22-28)

1) - an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning performance. ,
2) - a new 65B parameter LLaMa model fine-tuned on 1000 carefully curated prompts and responses; it doesn't use RLHF, generalizes well to unseen tasks not available in the training data, and generates responses equivalent or preferred to GPT-4 in 43% of cases, and even higher compared to Bard. ,
3) - an LLM-powered embodied lifelong learning agent in Minecraft that can continuously explore worlds, acquire skills, and make novel discoveries without human intervention. ,
4) - a finetuned LLaMA-based model that surpasses GPT-4 on writing API calls. This capability can help identify the right API, boosting the ability of LLMs to interact with external tools to complete specific tasks. ,
5) - provides a critical analysis of models that are finetuned on the outputs of a stronger model; argues that model imitation is a false premise and that the higher leverage action to improve open source models is to develop better base models. ,
6) - presents a simple scalable second-order optimizer that has negligible average per-step time and memory overhead; on language modeling, Sophia achieves 2x speed-up compared to Adam in the number of steps, total compute, and wall-clock time. ,
7) - shows that LLMs fail to generate correct Python code when default function names are swapped; they also strongly prefer incorrect continuation as they become bigger. ,
8) - discusses the importance of model evaluation for addressing extreme risks and making responsible decisions about model training, deployment, and security. ,
9) - discusses a list of research directions for students looking to do research with LLMs. ,
10) - proposes an approach that combines the efficient parallelizable training of Transformers with the efficient inference of RNNs; results show that the method performs on part with similarly sized Transformers. ,

Top ML Papers of the Week (May 15-21)

1) - an approach for controlling GANs that allows dragging points of the image to precisely reach target points in a user-interactive manner. ,
2) - argues that language models can learn meaning despite being trained only to perform next token prediction on text. ,
3) - a top-performing LLM for medical question answering; scored up to 86.5% on the MedQA dataset (a new state-of-the-art); approaches or exceeds SoTA across MedMCQA, PubMedQA, and MMLU clinical topics datasets. ,
4) - a multi-scale decoder architecture enabling end-to-end modeling of sequences of over one million bytes; enables sub-quadratic self-attention and improved parallelism during decoding. ,
5) - improves the zero-shot reasoning ability of LLMs over structured data; effective for solving question answering tasks based on structured data. ,
6) - uses a synthetic dataset of short stories to train and evaluate LMs that are much smaller than SoTA models but can produce fluent and consistent stories with several paragraphs, and demonstrate reasoning capabilities. ,
7) - trains a small proxy model over domains to produce domain weights without knowledge of downstream tasks; it then resamples a dataset with the domain weights and trains a larger model; this enables using a 280M proxy model to train an 8B model (30x larger) more efficiently. ,
8) - supports a wide range of code understanding and generation tasks and different training methods to improve efficacy and computing efficiency; tested on 20 code-related benchmarks using different settings like zero-shot, fine-tuning, and instruction tuning; achieves SoTA on tasks like code completion, math programming, and text-to-code retrieval tasks. ,
9) - an approach to finetune LMs on in-context input-label pairs where natural language labels are replaced by arbitrary symbols; boosts performance on unseen in-context learning tasks and algorithmic reasoning tasks. ),
10) - shows that PaLM is exposed to over 30 million translation pairs across at least 44 languages; shows that incidental bilingualism connects to the translation capabilities of PaLM. ,

Top ML Papers of the Week (May 8-14)

1) - applies GPT-4 to automatically write explanations on the behavior of neurons in LLMs and even score those explanations; this offers a promising way to improve interpretability in future LLMs and potentially detect alignment and safety problems. ,
2) - a new state-of-the-art language model integrated into AI features and tools like Bard and the PaLM API; displays competitive performance in mathematical reasoning compared to GPT-4; instruction-tuned model, Flan-PaLM 2, shows good performance on benchmarks like MMLU and BIG-bench Hard. ,
3) - an approach that learns joint embedding data across six modalities at once; extends zero-shot capabilities to new modalities and enables emergent applications including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection, and generation. ,
4) - shows that robots can combine language-based planning and perception with the few-shot summarization capabilities of LLMs to infer generalized user preferences that are applicable to future interactions. ,
5) - demonstrates that CoT explanations can misrepresent the true reason for a model’s prediction; when models are biased towards incorrect answers, CoT generation explanations supporting those answers. ,
6) - explores visual-language instruction tuning based on the pre-trained BLIP-2 models; achieves state-of-the-art zero-shot performance on 13 held-out datasets, outperforming BLIP-2 and Flamingo. ,
7) - introduces FLARE, retrieval augmented generation to improve the reliability of LLMs; FLARE actively decides when and what to retrieve across the course of the generation; demonstrates superior or competitive performance on long-form knowledge-intensive generation tasks. ,
8) - presents strategies to reduce the inference cost associated with using LLMs while improving performance. ,
9) - an open-access 15.5B parameter LLM with 8K context length and is trained on large amounts of code spanning 80+ programming languages. ,
10) - a vision and language model for multi-round dialogue with humans; the model is fine-tuned from OpenFlamingo, with LoRA added in the cross-attention and self-attention parts of the language model. ,
1) - a foundation large language model pretrained on 10 million cells for single-cell biology. ,
2) - a ChatGPT-powered tool for code explanation provided as a VSCode extension; claims to deliver more concise and accurate explanations than vanilla ChatGPT and Copilot; performance and personalization enhanced via prompt engineering; programmed to use more relevant code in its prompts. ,
3) - a conditional generative model for 3D assets; unlike previous 3D generative models, this model generates implicit functions that enable rendering textured meshes and neural radiance fields. ,
4) - presents an alternative explanation to the emergent abilities of LLMs; suggests that existing claims are creations of the researcher’s analyses and not fundamental changes in model behavior on specific tasks with scale ,
5) - releases PySR, an open-source library for practical symbolic regression for the sciences; it’s built on a high-performance distributed back-end and interfaces with several deep learning packages; in addition, a new benchmark, “EmpiricalBench”, is released to quantify applicability of symbolic regression algorithms in science. ,
6) - a LLaMA model fine-tuned on 4.8 million medical papers; enhances capabilities in the medical domain and achieves high performance on biomedical QA benchmarks. ,
7) - a mechanism to extract rationales from LLMs to train smaller models that outperform larger language models with less training data needed by finetuning or distillation. ,
8) - show that adversaries can poison LLMs during instruction tuning by contributing poison examples to datasets; it can induce degenerate outputs across different held-out tasks. ,
9) - proposes long-range transformers with unlimited length input by augmenting pre-trained encoder-decoder transformer with external datastore to support unlimited length input; shows usefulness in long-document summarization; could potentially be used to improve the performance of retrieval-enhanced LLMs. ,
10) - an approach that enables LLMs to reason and memorize enabling them to deviate from the input sequence at any time to explicitly “think”; this enables the LM to recall information and perform reasoning on the fly; experiments show that this method scales better to longer sequences unseen during training. ,
1) - applies deep reinforcement learning to synthesize agile soccer skills for a miniature humanoid robot; the resulting policy allows dynamic movement skills such as fast recovery, walking, and kicking. ,
2) - leverages a recurrent memory transformer architecture to increase BERT’s effective context length to two million tokens while maintaining high memory retrieval accuracy. ,
3) - an interactive tool for video object tracking and segmentation; it’s built on top segment anything and allows flexible tracking and segmenting via user clicks. ,
4) - provides an overview of fundamental techniques and key concepts in SSL; it also introduces practical considerations for implementing SSL methods successfully. ,
5) - a comprehensive and practical guide for practitioners working with LLMs; discusses many use cases with practical applications and limitations of LLMs in real-world scenarios. ,
6) - connects ChatGPT with audio foundational models to handle challenging audio tasks and a modality transformation interface to enable spoken dialogue. ,
7) - releases a new multimodal dataset benchmark containing 12.8B image-text pairs. ,
8) - provides a deeper assessment of ChatGPT's performance on the important information extraction task. ,
9) - investigates if chatbot assistants like ChatGPT can provide responses to patient questions while emphasizing quality and empathy; finds that chatbot responses were preferred over physician responses and rated significantly higher in terms of both quality and empathy. ,
10) - introduces methods for accelerating and stabilizing training of large-scale language vision models. ,
1) - a new method for training high-performance computer vision models based on self-supervised learning; enables learning rich and robust visual features without supervision which are useful for both image-level visual tasks and pixel-level tasks; tasks supported include image classification, instance retrieval, video understanding, depth estimation, and much more. ,
2) - an approach that trains language models to compress prompts into gist tokens reused for compute efficiency; this approach enables 26x compression of prompts, resulting in up to 40% FLOPs reductions. ,
3) - presents a framework for large-scale biomolecular simulation; this is achieved through the high accuracy of equivariant deep learning and the ability to scale to large and long simulations; the system is able to “perform nanoseconds-long stable simulations of protein dynamics and scale up to a 44-million atom structure of a complete, all-atom, explicitly solvated HIV capsid on the Perlmutter supercomputer.” ,
4) - performs human evaluation to audit popular generative search engines such as Bing Chat, Perplexity AI, and NeevaAI; finds that, on average, only 52% of generated sentences are supported by citations and 75% of citations support their associated sentence. ,
5) - an AI system based on LLMs and text-to-image models that generates music visualizations. ,
6) ,
7) - presents an approach that uses language-only GPT-4 to generate multimodal language-image instruction-following data; applies instruction tuning with the data and introduces LLaVA, an end-to-end trained large multimodal model for general-purpose visual and language understanding. ,
8) ,
9) - a plug-and-play compositional reasoning framework that augments LLMs and can infer the appropriate sequence of tools to compose and execute in order to generate final responses; achieves 87% accuracy on ScienceQA and 99% on TabMWP. ,
10) - applies latent diffusion models to high-resolution video generation; validates the model on creative content creation and real driving videos of 512 x 1024 and achieves state-of-the-art performance. ,
1) - combines mip-NeRF 360 and grid-based models to improve NeRFs that train 22x faster than mip-NeRF 360. ,
2) - proposes an architecture that extends LLMs to build agents that enable simulations of human-like behavior; these capabilities are possible by storing a complete record of an agent's experiences, synthesizing memories over time into higher-level reflections, and retrieving them dynamically to plan behavior. ,
3) - presents an agent that combines LLMs for autonomous design, planning, and execution of scientific experiments; shows emergent scientific research capabilities, including the successful performance of catalyzed cross-coupling reactions. ,
4) - derives optimization algorithms that explicitly leverage neural architecture; it proposes a first-order optimizer without hyperparameters that trains CNNs at ImageNet scale. ,
5) - presents an LLM chemistry agent that performs tasks across synthesis, drug discovery, and materials design; it integrates 13 expert-design tools to augment LLM performance in chemistry and demonstrate effectiveness in automating chemical tasks. ,
6) - A Survey of ChatGPT and GPT-4 ,
7) - an open-source research platform to facilitate the development and evaluation of LLMs in solving complex, multi-step tasks through manipulating various domain expert models. ,
8) - a new benchmark to assess foundational models in the context of human-centric standardized exams, including college entrance exams, law school admission tests, and math competitions, among others. ,
9) - proposes an approach that teaches LLMs to debug their predicted program via few-shot demonstrations; this allows a model to identify its mistakes by explaining generated code in natural language; achieves SoTA on several code generation tasks like text-to-SQL generation. ,
10) - a promptable, interactive model for various segmentation tasks that yields competitive performance on open-vocabulary and interactive segmentation benchmarks. ,
1) - presents a set of resources to establish foundational models for image segmentation; releases the largest segmentation dataset with over 1 billion masks on 11M licensed images; the model’s zero-shot performance is competitive with or even superior to fully supervised results. ,
2) - presents GPT-4-LLM, a "first attempt" to use GPT-4 to generate instruction-following data for LLM fine-tuning; the dataset is released and includes 52K unique English and Chinese instruction-following data; the dataset is used to instruction-tune LLaMA models which leads to superior zero-shot performance on new tasks. ,
3) - discusses important considerations regarding the capabilities and limitations of LLMs. ,
4) - a new 50 pages survey on large language models. ,
5) - an open-source chat model fine-tuned with LoRA. Leverages 100K dialogs generated from ChatGPT chatting with itself; it releases the dialogs along with 7B, 13B, and 30B parameter models. ,
6) - a new benchmark of 134 text-based Choose-Your-Own-Adventure games to evaluate the capabilities and unethical behaviors of LLMs. ,
7) - generates pseudo data from knowledge gained through pre-training and fine-tuning; adds the data to the training dataset for the next step; results show that different frameworks can be improved in performance using code-related generation tasks. ,
8) - an overview of applications of ChatGPT and GPT-4; the analysis is done on 194 relevant papers and discusses capabilities, limitations, concerns, and more ,
9) - a suite for analyzing LLMs across training and scaling; includes 16 LLMs trained on public data and ranging in size from 70M to 12B parameters. ,
10) - unifies segmentation tasks into a generalist model through an in-context framework that supports different kinds of data. ,
1) - a new 50B parameter large language model for finance. Claims the largest domain-specific dataset yet with 363 billion tokens... further augmented with 345 billion tokens from general-purpose datasets; outperforms existing models on financial tasks while not sacrificing performance on general LLM benchmarks. ,
2) - a low-cost system that performs end-to-end imitation learning from real demonstrations; also presents an algorithm called Action Chunking with Transformers to learn a generative model that allows a robot to learn difficult tasks in the real world. ,
3) - a system that leverages LLMs like ChatGPT to conduct task planning, select models and act as a controller to execute subtasks and summarize responses according to execution results. ,
4) - a medical chat model fine-tuned on LLaMA using medical domain knowledge. Collects data on around 700 diseases and generated 5K doctor-patient conversations to finetune the LLM. ,
5) - a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model; generates responses comparable to Alpaca with fully fine-tuned 7B parameter; it’s also extended for multi-modal input support. ,
6) - demonstrates that ChatGPT can outperform crowd-workers for several annotation tasks such as relevance, topics, and frames detection; besides better zero-shot accuracy, the per-annotation cost of ChatGPT is less 20 times cheaper than MTurk. ,
7) - shows that a pre-trained LLM agent can execute computer tasks using a simple prompting scheme where the agent recursively criticizes and improves its outputs. ,
8) - a paradigm to enhance large language model completions by allowing models to communicate feedback and iteratively improve output; DERA outperforms base GPT-4 on clinically-focused tasks. ,
9) - discusses why AI systems will become more fit than humans and the potential dangers and risks involved, including ways to mitigate them. ,
10) - Pa review examining avenues of partial differential equations research advanced by machine learning. ,
1) - a comprehensive investigation of an early version of GPT-4 when it was still in active development by OpenAI. ,
2) - proposes an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. ,
3) - shows that GPT-4 exceeds the passing score on USMLE by over 20 points and outperforms GPT-3.5 as well as models specifically fine-tuned on medical knowledge (Med-PaLM, a prompt-tuned version of Flan-PaLM 540B). ,
4) - investigates the potential implications of GPT models and related systems on the US labor market. ,
5) - a long-input Transformer model that employs conditional computation, devoting more resources to important tokens in both feedforward and attention layers. ,
6) - compares human-generated ideas with those generated by generative AI chatbots like ChatGPT and YouChat; reports that 9.4% of humans were more creative than GPT-4 and that GAIs are valuable assistants in the creative process. ,
7) - a comprehensive capability analysis of GPT series models; evaluates performance on 9 natural language understanding tasks using 21 datasets. ,
8) - presents a prompting technique that aims to improve LLMs' faithfulness using strategies such as opinion-based prompts and counterfactual demonstrations. ,
9) - a method for extracting room-scale textured 3D meshes from 2D text-to-image models. ,
10) - a trillion parameter language model with sparse heterogeneous computing. ,
1) - GPT-4 - a large multimodal model with broader general knowledge and problem-solving abilities. ,
2) - a method for grounding language embeddings from models like CLIP into NeRF; this enables open-ended language queries in 3D. ,
3) - an overview of language models covering recent developments and future directions. It also covers topics like linguistic units, structures, training methods, evaluation, and applications. ,
4) - a method for transformer interpretability that can trace a language model predictions as it develops layer by layer. ,
5) - a new pre-training paradigm using techniques that jointly improve training data efficiency and capabilities of LMs in the infilling task; performance improvement is shown in code generation tasks. ,
6) - demonstrates that careful design of deep RNNs using standard signal propagation arguments can recover the performance of deep state-space models on long-range reasoning tasks. ,
7) - a new approach to tune a lightweight and versatile retriever to automatically retrieve prompts to improve zero-shot performance and help mitigate hallucinations. ,
8) - proposes ConvMixer, a parameter-efficient fully-convolutional model which replaces self-attention and MLP layers in ViTs with less-expressive depthwise and pointwise convolutional layers. ,
9) - a compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach; distills NeRFs into geometrically-accurate 3D meshes. ,
10) - a high-throughput generation engine for running LLMs with limited GPU memory. , ,
1) - incorporates real-world continuous sensor modalities resulting in an embodied LM that performs tasks such as robotic manipulation planning, visual QA, and other embodied reasoning tasks. , ,
2) - a parameter-efficient vision-language model powered by an ensemble of domain experts; it efficiently pools expert knowledge from different domains and adapts it to various vision-language reasoning tasks. , , ,
3) - it connects ChatGPT and different visual foundation models to enable users to interact with ChatGPT beyond language format. ,
4) - an overview of generative AI - from GAN to ChatGPT. ,
5) - shows that with scale, LLMs can override semantic priors when presented with enough flipped labels; these models can also perform well when replacing targets with semantically-unrelated targets. ,
6) - provides an overview of foundation models for decision making, including tools, methods, and new research directions. ,
7) - a subquadratic drop-in replacement for attention; it interleaves implicit long convolutions and data-controlled gating and can learn on sequences 10x longer and up to 100x faster than optimized attention. , , ,
8) - a new open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs. , ,
9) - a technique that improves LLM performance on mathematical reasoning problems; it uses zero-shot chain-of-thought prompting and verification to ensure generated answers are accurate. ,
10) - enables scaling up GANs on large datasets for text-to-image synthesis; it’s found to be orders of magnitude faster at inference time, synthesizes high-resolution images, & supports various latent space editing applications. , ,
1) - introduces a multimodal large language model called Kosmos-1; achieves great performance on language understanding, OCR-free NLP, perception-language tasks, visual QA, and more. ,
2) - finds that human brain activity is best explained by the activations of modern language models enhanced with long-range and hierarchical predictions. ,
3) - combines evolutionary prompt engineering with soft prompt-tuning to find high-performing models; it leverages few-shot prompting which is further improved by using an evolutionary search approach to improve the in-context examples. ,
4) - a new family of generative models that achieve high sample quality without adversarial training. ,
5) - a new task that automatically discovers corpus-level differences via language description in a goal-driven way; applications include discovering insights from commercial reviews and error patterns in NLP systems. , ,
6) - proposes an approach for high-resolution image reconstruction with latent diffusion models from human brain activity. ,
7) - a scalable approach to planning with LLMs in embodied settings through grounding functions; GD is found to be a general, flexible, and expressive approach to embodied tasks. ,
8) - a framework for language-driven representation learning from human videos and captions for robotics. , , ,
9) - demonstrates that dropout can mitigate underfitting when used at the start of training; it counteracts SGD stochasticity and limits the influence of individual batches when training models. ,
10) - an approach that enables versatile conversational interactions with mobile UIs using a single LLM. ,
1) - a 65B parameter foundation model released by Meta AI; relies on publicly available data and outperforms GPT-3 on most benchmarks despite being 10x smaller. ,
2) - a 5B parameter creative and controllable diffusion model trained on billions (text, image) pairs. , , ,
3) - an alternative algorithm to train LLMs from feedback; the feedback is converted to instruction by relabeling the original one and training the model, in a supervised way, for better alignment. ,
4) - a prompting technique to adapt LLMs to different task-specific example prompts (annotated with human-designed chain-of-thought reasoning); this process involves finding where the LLM is most uncertain and annotating those. ,
5) - a survey offering a unified view of the building blocks of modular neural networks; it also includes a discussion about modularity in the context of scaling LMs, causal inference, and other key topics in ML. , ,
6) - an approach that recites passages from the LLM’s own memory to produce final answers; shows high performance on knowledge-intensive tasks. ,
7) - an approach that uses LLMs to suggest functionally correct, performance-improving code edits. ,
8) - a comprehensive analysis of novel prompt injection threats to application-integrated LLMs. ,
9) - proposes a fine-tuning method to align generative models using human feedback. ,
10) - a memory-efficient radiance field representation for real-time view synthesis of large-scale scenes in a browser. ,
1) - a simple and effective optimization algorithm that’s more memory-efficient than Adam. ,
2) ,
3) - a 3D-aware conditional generative model extended with neural radiance fields for controllable photorealistic image synthesis.
4) - finds strong evidence that language models trained with RLHF have the capacity for moral self-correction. The capability emerges at 22B model parameters and typically improves with scale. ,
5) - uses reinforcement learning to align computer vision models with task rewards; observes large performance boost across multiple CV tasks such as object detection and colorization.
6) - an unsupervised method for text-image alignment that leverages pretrained language models; it enables few-shot image classification with LLMs. ,
7) - a survey of language models that are augmented with reasoning skills and the capability to use tools. ,
8) - an approach to incorporate geometry-guided transformations into neural networks using geometric algebra. ,
9) - proposes a policy framework for auditing LLMs. ,
10) - a transformer architecture that replaces the sequence of feedforward transformer blocks with a single large Associate Memory model; this follows the popularity that Hopfield Networks have gained in the field of ML. ,
1) - introduces language models that teach themselves to use external tools via simple API calls. ,
2) - proposes using language models for open-world game playing. ,
3) - a comprehensive analysis of ChatGPT failures for categories like reasoning, factual errors, maths, and coding. ,
4) - optimizing hard text prompts through efficient gradient-based optimization. ,
5) - proposes a cheap and scalable data selection framework based on an importance resampling algorithm to improve the downstream performance of LMs. ,
6) - proposes an approach for structure and content-guided video synthesis with diffusion models. , ,
7) - performs a more rigorous evaluation of ChatGPt on reasoning, hallucination, and interactivity. ,
8) - proposes diffusion models to generate high-quality 30-second music clips via text prompts. , ,
9) - introduces an efficient, privacy-preserving transfer learning framework to adapt foundational models to downstream data without access to the full model. , ,
10) - proposes a model for zero-shot image-to-image translation. , ,
1) - a retrieval-augmented LM framework that adapts a retriever to a large-scale, black-box LM like GPT-3. ,
2) - shows that diffusion-based generative models can memorize images from the training data and emit them at generation time. ,
3) - release a more extensive publicly available collection of tasks, templates, and methods to advancing instruction-tuned models. ,
4) - incorporates vision features to elicit chain-of-thought reasoning in multimodality, enabling the model to generate effective rationales that contribute to answer inference. ,
5) - a diffusion model that performs text-based motion and appearance editing of general videos. , ,
6) ,
7) - investigates the mathematical capabilities of ChatGPT on a new holistic benchmark called GHOSTS. ,
8) - trains an AI agent to navigate purely by feeling its way around; no use of vision, audio, or any other sensing (as in animals). , ,
9) - a generative model that synthesizes large-scale 3D landscapes from random noises. ,
10) - finds that many prompting techniques fail when presented with irrelevant context for arithmetic reasoning. ,
1) - a generative model for generating high-fidelity music from text descriptions. ,
2) - an approach to reduce the gap, in terms of performance and hardware utilization, between state space models and attention for language modeling. ,
3) - a watermarking framework for proprietary language models. ,
4) - a new text-to-4D model for dynamic scene generation from input text. , ,
5) - a foundation model for weather and climate, including many capabilities for atmospheric science tasks. , ,
6) - If you're looking for interesting open problems in DL, this is a good reference. Not sure if intentional but it also looks useful to get a general picture of current trends in deep learning with ~300 references. ,
7) - an approach for zero-shot machine-generated text detection. Uses raw log probabilities from the LLM to determine if the passage was sampled from it. ,
8) - a new model that aims to regain the competitiveness of GANs for fast large-scale text-to-image synthesis. , ,
9) - an LLM that can generate protein sequences with a predictable function across large protein families. ,
10) - investigates the possibility of parallelizing boosting. ,
1) - an excellent summary of some notable research Google AI did in 2022. ,
2) - a review paper on the capabilities of LLMs from a cognitive science perspective. ,
3) - an agent trained at scale that leads to a general in-content learning algorithm able to adapt to open-ended embodied 3D problems. ,
4) - an approach to help provide explanations of generative transformer models through memory-efficient attention manipulation. ,
5) - short overview of key concepts in graph representation learning. ,
6) - an approach that extends the functionality of existing pre-trained text-to-image diffusion models by enabling conditioning on grounding inputs. , ,
7) - proposes a method with the capability of editing images from human instructions. ,
8) ,
9) - a new method for automatically adjusting the learning rate during training, applicable to more than a dozen diverse ML problems. ,
10) - a user-friendly color editing approach for the neural radiance field to achieve a more efficient view-consistent recoloring. ,
1) - a general algorithm to collect diamonds in Minecraft from scratch without human data or curricula, a long-standing challenge in AI. ,
2) - a compiler for converting RASP programs into transformer weights. This way of constructing NNs weights enables the development and evaluation of new interpretability tools. , ,
3) - multimodal deep learning is a new book published on ArXiv. ,
4) - new work analyzing how generative LMs could potentially be misused for disinformation and how to mitigate these types of risks. ,
5) - empirically identifies reasons why retrieval-augmented LMs (specifically k-nearest neighbor LMs) perform better than standard parametric LMs. , ,
6) - investigates the use of existing LMs (e.g, Flan-U-PaLM 540B) combined with associative read-write memory to simulate the execution of a universal Turing machine. ,
7) - transformers for RL will be a fascinating research area to track. The same is true for the reverse direction (RL for Transformers)... a notable example: using RLHF to improve LLMs (e.g., ChatGPT). ,
8) - introduces scaling laws for generative mixed-modal language models. ,
9) - a transformer-based network showing robust local feature matching, outperforming the state-of-the-art methods on several benchmarks. ,
10) - addresses the time series forecasting problem with generative modeling; involves a bidirectional VAE backbone equipped with diffusion, denoising for prediction accuracy, and disentanglement for model interpretability. ,
1) - introduces Muse, a new text-to-image generation model based on masked generative transformers; significantly more efficient than other diffusion models like Imagen and DALLE-2. , , ,
2) - introduces VALL-E, a text-to-audio model that performs state-of-the-art zero-shot performance; the text-to-speech synthesis task is treated as a conditional language modeling task. ,
3) - shows the potential of enhancing LLMs by retrieving relevant external knowledge based on decomposed reasoning steps obtained through chain-of-thought prompting. ,
4) - presents a technique for compressing large language models while not sacrificing performance; "pruned to at least 50% sparsity in one-shot, without any retraining." ,
5) - a performant model based on a fully convolutional masked autoencoder framework and other architectural improvements. CNNs are sticking back! , ,
6) - with more capabilities, we are starting to see a wider range of applications with LLMs. This paper utilized large language models for conducting corporate lobbying activities. , ,
7) - aims to better understand how deep learning models overfit or memorize examples; interesting phenomena observed; important work toward a mechanistic theory of memorization. ,
8) - new idea to create new coherent neural networks by reusing pretrained fragments of existing NNs. Not straightforward but there is potential in terms of efficiently reusing learned knowledge in pre-trained networks for complex tasks. ,
9) - proposes integrated decomposition, an approach to improve Science Q&A through a human-in-the-loop workflow for refining compositional LM programs. ,
10) - a nice overview of some important ideas in RL. ,

We use a combination of AI-powered tools, analytics, and human curation to build the lists of papers.

Subscribe to our NLP Newsletter to stay on top of ML research and trends.

Join our Discord .

Contributors 8

@angysaravia

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Environ Res Public Health

Logo of ijerph

The Top 100 Most Cited Scientific Papers in the Public, Environmental & Occupational Health Category of Web of Science: A Bibliometric and Visualized Analysis

Vicenç hernández-gonzález.

1 Human Movement Research Group (RGHM), University of Lleida, Plaça de Víctor Siurana, 25003 Lleida, Spain

2 Physical Education and Sport Section, University of Lleida, Av. De l’Estudi General, 25001 Lleida, Spain

Josep Maria Carné-Torrent

Carme jové-deltell, Álvaro pano-rodríguez, joaquin reverter-masia, associated data.

The Web of Science (WoS) data can be accessed through theWoS’s official website: https://www.webofscience.com/wos/alldb/basic-search (accessed on 14 March 2022).

(1) Background: The main basis for the public recognition of the merits of scientists has always been the system of scientific publications and citations. Our goal is to identify and analyze the most cited articles in the Public, Environmental & Occupational Health category. (2) Methods: We searched the Web of Science for all articles published in the “Public, Environmental & Occupational Health” category up to March 2022 and selected the 100 most cited articles. We recorded the number of citations, the journal, the year of publication, quartile, impact factor, institution, country, authors, topic, type of publication and collaborations. (3) Results: 926,665 documents were analyzed. The top 100 had 401,620 citations. The journal with the most articles was the Journal of Clinical Epidemiology and the one with the highest number of citations was Medical Care. The year with the highest number of articles in the top 100 was 1998. The country with the highest percentage of publications was the USA and the most productive institution was Harvard. The most frequent keywords were bias, quality, and extension. The largest collaboration node was between the USA, Canada, Germany, Spain, Australia, France, and Sweden. (4) Conclusions: This bibliometric study on Public, Environmental & Occupational Health provides valuable information not only to identify topics of interest in the analyzed category, but also to identify the differences in the topics they study.

1. Introduction

Bibliometrics is a science that uses statistical and mathematical procedures to track the general trend of research in a specific field [ 1 ]. Various authors have targeted the participation of researchers in scientific activities, as well as differences and conditioning factors from the different fields of scientific knowledge [ 2 ]. These authors attribute different frequency and different publication practices between scientific disciplines [ 2 ]. The Web of Science (WoS) online database includes all important research papers and provides integrated analysis tools to produce representative figures, that is, it is the reference database of institutions, researchers and actors linked to science [ 3 , 4 , 5 ].

Within bibliometrics, citation analysis is one of the most used tools to assess the academic impact of an article in a specific area of knowledge [ 6 ]. The number of citations a publication receives does not necessarily reflect the quality of the research or the relevance of its authors [ 7 ], but it has been suggested that articles with the highest number of citations may have the ability to generate changes in practice, controversy, discussion and more research [ 6 , 8 , 9 ], or, as suggested by Zhu et al. [ 1 ], the number of citations can measure the article’s influence and merit. In addition, WoS search results could be exported to software for later analysis such as VOS-viewer [ 10 ], which could provide important information associated with collaboration networks between countries, institutions or authors.

Although there have been bibliometric analyses of articles in the field of food safety [ 11 , 12 ]; environmental health [ 13 , 14 ]; health promotion [ 15 , 16 ]; health education [ 17 ]; mental health [ 18 , 19 ]; sport health [ 20 ]; and occupational health [ 21 , 22 ], the entire category of Public, Environmental & Occupational Health has never been studied worldwide.

Few studies have a standardized measure of the wide range of dissemination activities in a scientific category that allows a detailed observation of production, collaboration and interrelation in a scientific field. No explorations have been performed based on quantitative methodologies aimed at building indicators on which to be able to empirically test the scientific productivity of the Public, Environmental & Occupational Health category.

To our knowledge, there is no study that bibliometrically analyzes high citation articles that evaluate the Public, Environmental & Occupational Health category. Therefore, this study aimed to identify and analyze the 100 most cited articles in the Public, Environmental & Occupational Health category to understand the historical perspective and promote discussion and scientific progress in this specialty.

2. Materials and Methods

2.1. search strategy and eligible criteria.

Bibliometric analysis was performed on 14 March 2022. Two independent researchers, who searched the Web of Science Core Collection (Clarivate Analytics), a research platform that provides a substantial bibliographic database through of Science Citation Index Expanded (SCIE) using the search category, identified articles. The search strategy was performed through the “Public, Environmental & Occupational Health” category. We refined the research by selecting original research articles and reviews. The 100 articles with the most citations were eligible for bibliometric analysis, arranged in descending order of citation count. Any disagreement between the reviewers was discussed between them to reach a final decision. Author in descending order according to the number of citations ordered these articles.

2.2. Data Extraction

Two authors independently retrieved information from all articles. Through the Web of Science, the 100 articles with the highest number of citations were selected and exported. Later, they were exported into an Excel document where the following were recorded: the number of citations, name of the journal, year of publication, first and last author and co-authors, total number of authors, geographical location, origin and associated institute, the title of the article, type of document (article or review), abstract and corresponding author. For the analysis of authors, all the authors who participated in the study were counted. In the bibliometric analysis by country, each country that participated in the study was taken into account and the citations received were counted. Citations received by a country more than once were not counted if several authors from different institutions but from the same country had participated in the same study. The number of articles per country was counted as long as there was an author from the country in the study. If the first author was affiliated with two institutions, then the first institute was selected for inclusion.

2.3. Statical Analysis

We used IBM SPSS Statistics for Windows, Version 27.0 (Armonk, NY, USA: IBM Corp.) for correlation analysis. Correlation was determined using Pearson’s correlation coefficient (r), and when p < 0.05, the difference was considered statistically significant. We used a popular bibliometric analysis tool, VOSviewer 1.6.18 software (CWTS, Leiden, The Netherlands) [ 23 ], for cooperative network identification and keyword co-occurrence analysis. In addition, it could generate visual maps of knowledge. We also used the MapChart program [ 24 ], a platform from which a personalized map of different regions of the world was created, using colors and descriptions.

The study flowchart is shown in Figure 1 , and included studies that were published from 1900 to 2022 for the Public, Environmental & Occupational Health category of Web of Science. The search topic, after applying the strategy, produced 926,665 documents. For the analysis of the study, only articles or review articles were taken into consideration, which led to the exclusion of 294,361 documents. Of the remaining 632,304, the 100 documents with the highest number of citations were considered for the study. A total of 632,204 documents were excluded.

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-09645-g001.jpg

Flowchart of study.

3.1. Publication Year, Citation and Bibliometric Analysis of the Keywords

The 100 most cited publications in the Public, Environmental & Occupational Health category were published between 1938 and 2020, of which 70% were published after 2018. We performed an analysis of publication trends by 6-year intervals based on a ranking of publication dates. Between 1998 and 2003, 29 documents were published, with the year 1998 ( n = 11) being the year of greatest production. There has been a visible improvement in the quantity of the data, since, of the 100 articles, before 1998 a total of 29 documents were published, making a big difference with the period 1998 to 2003, when 29 articles were published ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-09645-g002.jpg

Pattern of distribution of top-cited articles (number of articles per year).

The top 100 articles were cited 401,620 times in total, and the average total number of citations was 4016 citations (ranging from 1846 to 30,229). No significant correlation was found between the total number of citations and the age of the articles (r = −0.121, p = 0.229). The most cited article (30,229 citations) was “A new method of classifying prognostic co-morbidity in longitudinal-studies-development and validation” by Charlson et al. [ 25 ] published in the Journal of Chronic Diseases . Based on the number of publications in the 100 articles, and analyzing the citations per publication, 1998 was the most productive year with 11 articles (42,320 citations and an average of 3847 citations/article) in the top 100 list ( Table 1 ).

The top 100 articles with most total citations in Public, Environmental & Ocupational Health category.

Ranking PositionTimes Cited, WoS CoreFirst AuthorArticle TitleSource TitleCountryPublication Year
130,229Charlson, MEA new method of classifying prognostic co-morbidity in longitudinal-studies-development and validation USA1987
224,802Ware, JEThe Mos 36-Item short-form health survey (SF-36). 1. Conceptual-framework and item selection USA1992
318,779Higgins, JPTQuantifying heterogeneity in a meta-analysis England2002
411,110Ware, JEA 12-item short-form health survey—Construction of scales and preliminary tests of reliability and validity USA1996
511,091von Elm, EThe Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies Switzerland2007
610,754Schulz, KFCONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials USA2010
710,302Moher, DPreferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement Canada2009
87879Deyo, RAAdapting a clinical comorbidity index for use eith ICD-9-CM administrative databases USA1992
97582Felitti, VJRelationship of childhood abuse and household dysfunction to many of the leading causes of death in adults—The adverse childhood experiences (ACE) study USA1998
106438Harrell, FEMultivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors USA1996
116116Elixhauser, AComorbidity measures for use with administrative data USA1998
125658Quan, HDCoding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data Canada2005
135138Oberdorster, GNanotoxicology: An emerging discipline evolving from studies of ultrafine particles USA2005
145111Terwee, CBQuality criteria were proposed for measurement properties of health status questionnaires Netherland2007
155034Zou, GYA modified Poisson regression approach to prospective studies with binary data Canada2004
164838McHorney, CAThe Mos 36-Item Short-form Health survey (SF-36). 2. Psychometric and clinical-test of validity in measuring physical and mental-health constructs USA1993
174707Downs, SHThe feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions England1998
184636Garner, JSCDC Definitions for nosocomial infections, 1988 USA1988
194451Pencina, MJEvaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond USA2008
204356Peduzzi, PA simulation study of the number of events per variable in logistic regression analysis USA1996
214273von Elm, EThe Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies Switzerland2008
224143White, IRMultiple imputation using chained equations: Issues and guidance for practice England2011
234068D’Agostino, RBPropensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group USA1998
243962Charlson, MValidation of a combined comorbidity index USA1994
253947Horan, TCCDC/NHSN surveillance definition of health care-associated infection and criteria for specific types of infections in the acute care setting USA2008
263845Guyatt, GHGRADE guidelines: 9. Rating up the quality of evidence Canada2011
273816Ahlbom, AGuidelines for limiting exposure to time-varying electric, magnetic, and electromagnetic fields (up to 300 GHz) Germany1998
283783Caspersen, CJPhysical-activity, exercise, and physical-fitness—definitions and distrinctions for health-related research USA1985
293755de Onis, MDevelopment of a WHO growth reference for school-aged children and adolescents Switzerland2007
303495Balshem, HGRADE guidelines: 3. Rating the quality of evidence USA2011
313385McHorney, CAThe Mos 36-Item short-form health survey (SF-36). 3. Tests of data quality, scaling assumptions, and reliability across diverse patient groups USA1994
323380Willett, WCReproducibility and validatity of a semiquantitative food requency questionnaire USA1985
333363Guillemin, FCross-cultural adaptation of health-related quality-of-life mesures—literatures—review and proposed guidelines Canada1993
343361Bergner, MThe sickness impact profile—development and final revision of a health-status measure USA1981
353246Miles, AAThe estimation of the bactericidal power of the blood Canada1938
363223Parmar, MKBExtracting summary statistics to perform meta-analyses of the published literature for survival endpoints Italy1998
373198Daughton, CGPharmaceuticals and personal care products in the environment: Agents of subtle change? USA1999
383174Herdman, MDevelopment and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L) Spain2011
393162Dolan, PModeling valuations for EuroQol health states England1997
403155Hudak, PLDevelopment of an upper extremity outcome measure: The DASH (Disabilities of the Arm, Shoulder, and Head) Canada1996
413133Berkman, LFSocial networks, host-resistance, and mortality—9- year follou-up-study of alameda county residents USA1979
423079Morisky, DEConcurrent and predictive-validity of a self-reported measure of medication adherence USA1986
433039Clarke, DHTechniques for hemagglutination and hemagglutination-inhibition with arthropod-borne viruses Ireland1958
443028Israel, BAReview of community-based research: Assessing partnership approaches to improve public health USA1998
453007Robins, JMMarginal structural models and causal inference in epidemiology USA2000
462976Andresen, EMScreening for depression in well older adults—evaluation of a short-form of the CES-D USA1994
472946Varni, JWPedsQL (TM) 4.0: Reliability and validity of the pediatric quality of life Inventory (TM) Version 4.0 generic core scales in healthy and patient populations USA2001
482917Mangram, AJGuideline for Prevention of Surgical Site Infection, 1999 USA1999
492883Kroenke, KThe Patient Health Questionnaire-2—Validity of a two-item depression screener USA2003
502878Glasgow, REEvaluating the public health impact of health promotion interventions: The RE-AIM framework USA1999
512849Norman, GRInterpretation of changes in health-related quality of life—The remarkable universality of half a standard deviation Canada2003
522848Newcombe, RGInterval estimation for the difference between independent proportions: Comparison of eleven methods Wales1998
532743Ludvigsson, JFExternal review and validation of the Swedish national inpatient register Sweden2011
542714Kim, HJPermutation tests for joinpoint regression with applications to cancer rates USA2000
552714Colborn, TDelepmental effects of endocrine-disrupting chemicals in wildlife and humans USA1993
562688Resnikoff, SGlobal data on visual impairment in the year 2002 Switzerland2004
572592Van den Berg, MToxic equivalency factors (TEFs) for PCBs, PCDDs, PCDFs for humans and wildlife Netherland1998
582589Lynge, EThe Danish National Patient Register Denmark2011
592574Williams, ODThe atherosclerosis risk in communities (ARIC) study—Deseign and objectives USA1989
602517Willett, WTotal energy-intake—implications for epidemiologic analyses USA1986
612485Wang, CYImmediate Psychological Responses and Associated Factors during the Initial Stage of the 2019 Coronavirus Disease (COVID-19) Epidemic among the General Population in China China2020
622470FerroLuzzi, APhysical status: The use and interpretation of anthropometry—Introduction USA1995
632460Austin, PCBalance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples Canada2009
642452Pedersen, CBThe Danish Civil Registration System Denmark2011
652435Cai, ZJWHO expert committee on drug dependence—Thirty-first report—Introduction China1999
662413Baumgartner, RNEpidemiology of sarcopenia among the elderly in New Mexico USA1998
672356Quan, HDUpdating and Validating the Charlson Comorbidity Index and Score for Risk Adjustment in Hospital Discharge Abstracts Using Data From 6 Countries Canada2011
682304Bild, DEMulti-ethnic study of atherosclerosis: Objectives and design USA2002
692271Steyerberg, EWAssessing the Performance of Prediction Models A Framework for Traditional and Novel Measures Netherland2010
702270Mukaka, MMStatistics Corner: A guide to appropriate use of Correlation coefficient in medical research England2012
712255Workowski, KASexually Transmitted Diseases Treatment Guidelines, 2015 USA2015
722255Peppard, PEIncreased Prevalence of Sleep-Disordered Breathing in Adults USA2013
732220Skevington, SMThe World Health Organization’s WHOQOL-BREF quality of life assessment: Psychometric properties and results of the international field trial—A report from the WHOQOL group England2004
742181Woolf, ADBurden of major musculoskeletal conditions England2003
752154Cohen, SHClinical Practice Guidelines for Clostridium difficile Infection in Adults: 2010 Update by the Society for Healthcare Epidemiology of America (SHEA) and the Infectious Diseases Society of America (IDSA) USA2010
762143Gooley, TAEstimation of failure probabilities in the presence of competing risks: New representations of old estimators USA1999
772135Cella, DThe Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008 USA2010
782132Greenland, SCausal diagrams for epidemiologic research USA1999
792125Klepeis, NEThe National Human Activity Pattern Survey (NHAPS): a resource for assessing exposure to environmental pollutants USA2001
802093Smith, GD‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? England2003
812082Cardo, DNational Nosocomial Infections Surveillance (NNIS) System Report, data summary from January 1992 through June 2004, issued October 2004 USA2004
822010Dowell, DCDC Guideline for Prescribing Opioids for Chronic Pain—United States, 2016 USA2016
831998Rose, GSick individuals and sick populations England1985
841997Vittinghoff, ERelaxing the rule of ten events per variable in logistic and Cox regression USA2007
851980Washburn, RAThe physical-activity scale for the elderly (PASE)—Development and evaluation USA1993
861969Guh, DPThe incidence of co-morbidities related to obesity and overweight: A systematic review and meta-analysis Canada2009
871968Baio, JPrevalence of Autism Spectrum Disorder Among Children Aged 8 Years—Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014 USA2018
881960Torre, LAGlobal Cancer Incidence and Mortality Rates and Trends-An Update USA2016
891932Sterne, JACFunnel plots for detecting bias in meta-analysis: Guidelines on choice of axis England2001
901927Gandek, BCross-validation of item selection and scoring for the SF-12 Health Survey in nine countries: Results from the IQOLA Project USA1998
911909Thompson, SGHow should meta-regression analyses be undertaken and interpreted? England2002
921906Slovic, PRisk as analysis and risk as feelings: Some thoughts about affect, reason, risk, and rationality USA2004
931885Wang, YThe obesity epidemic in the United States—Gender, age, socioeconomic, Racial/Ethnic, and geographic characteristics: A systematic review and meta-regression analysis USA2007
941876Say, LGlobal causes of maternal death: a WHO systematic analysis Switzerland2014
951871Rice, DCritical periods of vulnerability for the developing nervous system: Evidence from humans and animal models USA2000
961868Mokkink, LBThe COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study Netherland2010
971864Reitsma, JBBivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews Netherland2005
981856Jemal, A; Global Patterns of Cancer Incidence and Mortality Rates and Trends USA2010
991848Feinstein, ARHigh agreement but low Kappa. 1. The problem of 2 paradoxes USA1990
1001846Hochberg, Y More powerful procedures for multiple significance testing Israel1990

The oldest study included in the list was published by Miles et al. [ 26 ] in 1938 entitled “The estimation of the bactericidal power of the blood”, with 3246 citations. The last study included was published in 2020 by Wang et al. [ 27 ], the paper entitled “Immediate Psychological Responses and Associated Factors during the Initial Stage of the 2019 Coronavirus Disease (COVID-19) Epidemic among the General Population in China”, with 2485 citations, published in the International Journal of Environmental Research and Public Health ( Table 1 ).

Eighty-six of the 100 publications were original research, and the remaining 14 were reviews. The average number of citations per article in the review works was 3285 citations/article compared to 4135 citations/article in the original works ( Table 1 ). The most common important keywords included quality of life, comorbidity, disease, cancer, clinical-trials, bias and epidemiology, and the keywords that appeared the most were “bias” (total link strength of 14), “quality” (total link strength of 11) and “extension” (total link strength of 10), which had a strong link with “epidemiology”, “metaanalysis” and “cancer” ( Figure 3 ).

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-09645-g003.jpg

The co-occurrence network of keywords. Note : The size of the nodes indicates the frequency of occurrence. The curves between the nodes represent their co-occurrence in the same publication. The smaller the distance between two nodes, the higher the number of co-occurrence of the two keywords.

3.2. Authors and Bibliometric Analysis of the Co-Authorship

A total of 487 authors contributed the 100 most cited. The number of authors in an article ranged from 1 to 26 (mean 5.53). Analysis of the 10 most productive authors based on their number of articles in the top 100, regardless of their authorship positions, showed that Ware, J.E., Altman, D.G. and Horan, T.C. were the authors with the highest number of articles.

Ware, J.E., from the USA, had a maximum of 46,062 citations with five articles listed and an h-index of 100. The average number of citations/article was 9212 citations. However, Altman, D.G. from England, published four papers, the total index of citations was 36,420 and the average per article was 9105; their h-index was 182. The third position is for the researcher Horan, T.C. from the USA, with four published documents, an h-index of 25, and with more than 13,500 total citations ( Table 2 ).

The top authors with the most articles in the top 100.

AuthorsNumber of ArticlesH-IndexFirst AuthorLast AuthorCo-AuthorTotal CitationsMean Citation per Article
Ware, JE51002 346,0629212
Altman, DG4182 1336,4209105
Horan, TC 4251 313,5823396
Egger, M330 1217,2965765
Charlson, M 2582 34,19117,096
Sherbourne, CD266 2 28,18714,094
Moher, D22111 21,05610,528
Higgins, JPT210211 20,68810,344
Thompson, SG25811 20,68810,344
Gotzsche, PC 282 215,3647682

The total number of citations was not related to the number of authors (r = −0.118, p = 0.058). However, the average number of citations per article was associated with the number of authors (r = 0.210, p < 0.001).

There was low collaboration between most of the main authors, creating only one cooperation research network. Authors with a minimum of three papers per author were considered for analysis. Of the 487 authors, seven reached the threshold ( Figure 4 ). Altman, D.G. formed a collaborative network with five other researchers with a link strength of 15.

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-09645-g004.jpg

The author collaboration network. Note: The collaboration map of authors reflects the scientific research cooperation between them. The circle/node signifies the authors; size of the circle/node signifies the number of articles. The lines denote the authors’ collaboration strength, and each color signifies a cluster.

3.3. Countries, Institutions and Bibliometric Analysis of the Collaboration

A total of 26 countries published the 100 most cited articles in the Public, Environmental & Occupational Health category. Table 3 shows the twelve most productive countries, with the USA being the one that contributes the most, with 65 documents, followed by England with 21 and Canada with 17 articles. These three same countries obtain also the greatest number of citations. However, the country with the highest rate of citations per article is Italy, with an average of 5151 citations/article, followed by England with an average of 4724 citations/article.

The top countries with the most highly cited articles.

AddressesTimes Cited, WoS CoreNumber ArticlesMean Citations per Article
USA 266,604654102
England99,202214724
Canada66,702173924
Switzerland44,620123718
Netherlands36,641103664
Denmark30,36983796
Italy15,45235151
Australia12,05143013
Spain10,81442704
France10,08142520
Sweden969742424
Norway926733089

Two collaboration nodes were established. A larger one, involving seven countries and where the USA had the most active partnership (a liaison force of 45 and collaborated on 57 documents); its major research cooperators included Canada, Germany, Spain, Australia, France, Sweden. The other node was where England (with a link strength of 37 and 20 documents) had a strong collaboration with mainly European countries such as Denmark, The Netherlands and Switzerland. We found that Italy and Norway rarely cooperated with other countries in investigations. ( Figure 5 ).

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-09645-g005.jpg

The country collaboration network.

The world map revealed that the articles were mainly concentrated in North American and western Europe, and less so in Oceania. Specifically, the USA was the country with the highest production of documents, followed by England and Canada ( Figure 6 ).

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-09645-g006.jpg

The distribution map of the number of published articles worldwide for countries (MapChart).

In total, 228 institutions participated in the 100 articles. The number of institutions per article ranged between 1 and 21. The average institutional collaboration was 3.8 institutions/article. The article with 21 participating institutions was a review on toxic equivalency factors (TEFs) published in 1998 with 2592 citations. The World Health Organization, with eight articles included in this bibliometric analysis, was the institution with the greatest scientific representation ( Table 4 ). In four of the eight papers, it was included as the main institution of the study. The total number of citations was 20,339 and the average number of citations per article was 2542 citations. The second institution was the University of Harvard in the USA with six documents and one as the main institution. The total number of citations was more than 25,000 with an average of 4209 citations per article.

The top institutions with the most highly cited articles.

InstitutionCountry Number ArticlesNumber of the First InstitutionTotal CitationMean Citation per Article
World Health Organization (WHO) Switzerland & Netherlands8420,3392542.4
Harvard University USA6125,2554209.2
University of WashingtonUSA6119,6903281.7
McMaster University Canada5215,2123042.4
University of Columbia USA3099043301.3
Center for Disease Control & PreventionUSA4120,2645066.0
Johns Hopkins BloombergUSA4192322308.0
Tufts University USA4218,0554513.8
University of BristolEngland4219,3894847.3
University of London England4222,0695517.3
Oxford University England4036,4209105.0
University of Toronto Canada4011,4132853.3
U.S. Environmental Protection AgencyUSA4211,5062876.5

There was a strong and significant correlation between the number of institutions and authors (r = 0.848, p < 0.001). There was a negative correlation between the total citations and the number of participating institutions (r = −0.115, p = 0.286).

In the collaboration network analysis ( Figure 7 ), a minimum of three collaborations between institutions were established, 19 reached the threshold and three cooperation network nodes were formed. In the first of them, McMaster University cooperated with institutions such as Harvard University, University of Washington and the University of Toronto and collaborated on five articles. The University of Washington had a strong partnership and cooperation with the University of Minnesota, NCI, and Wake Forest University. Collaboration network analysis also highlighted the institutional collaboration network that the World Health Organization has with University of Toronto, Wisconsin University and US EPA, with more than seven documents shared.

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-09645-g007.jpg

The institution collaboration network.

3.4. Journal Analysis

The Web of Science “Public, Environmental & Occupational Health” category had 204 indexed journals, of which 35 made it to the top 100 most cited articles list. A total of 16 journals were in the first quartile (approximately 45%), 9 journals in the second quartile, 3 in the third quartile, and 5 were from Q4. Two journals were out of print or had changed their name. A total of 79% of the studies were published in high-impact journals (Q1 or Q2).

The IFs of the 35 journals ranged from 0.875, Malawi Medical Journal , to 59.769, MMWR Surveillance Summaries . There were up to 22 journals with an IF < 5.000, seven between 5.000 and 10.000, and four journals with an IF > 10.000.

Table 5 shows the top nine journals that published three or more articles.

The top journals that published the top 100 highly cited literature in Public, Environmental & Occupacional Health category.

Source TitleRecordsNumber Total Citation% Total De CitationNnumber Citation for PaperImpact Factor (2020)IF without Self CitationsQuartile
1564,75316.1243176.4375.771Q1
1274,18918.4761822.9832.891Q2
1255,02213.7045852.3732.149Q3
1027,9636.9627964.8974.722Q1
515,5133.8631039.0318.657Q1
412,8973.2132249.4089.252Q1
310,6652.6635552.9182.655Q2
374101.8524704.8224.623Q1
372621.8124214.1473.898Q1

Journal of Clinical Epidemiology was the most productive journal ( n = 15), followed by Medical Care and Statistics in Medicine ( n = 12). The top five journals published 54% of the articles and account for more than 59% of the total citations. The self-citation rate for the top nine journals ranged from 1.7% for the Bulletin of the World Health Organization to 10.3% for the Journal of Clinical Epidemiology . The journal with the highest number of citations was Medical Care ( n = 74,189) and its mean number of citations per article was 6182 citations/article.

4. Discussion

This is the first paper that analyzes the 100 most cited papers in the Public, Environmental & Occupational Health category of Web of Science. This article identifies the authors, journals, countries, institutions, etc., with the greatest impact in this category from the beginning of the 20th century to the present. The sample size was set at 100 manuscripts to provide a manageable and significant number of articles to be analyzed, in accordance with several published works [ 1 , 6 , 8 , 9 , 28 , 29 ].

The period of the greatest publication of articles starts in 1998; a clear upward trend in the production of works started during the period 1989–2010, but then it disappears in the last decade. A stochastic process is observed. The Mann–Kendall trend test ( Figure 3 ) revealed a significant positive trend towards a greater number of articles over the years starting in 1985 ( p = 0.055, Kendell’s Tauβ). Our results would be in line with those found by the authors of [ 6 , 29 , 30 ]. They contrasts with recent reviews, on other topics and specialties, in which most of the most cited papers were published earlier in the 1980s [ 31 ] or later, from the year 2000 [ 32 , 33 ]. The socioeconomic growth of recent years may be one of the causes of the advancement in scientific research, an evolution that the dissemination and communication of science has already been experiencing as an exponential change for some time. To understand these changes, it is necessary to know how science spreads. In the professional field, one of the main ways that the research community has to disseminate its work is the publication of scientific articles; however, on the other hand, they also use social networks and all Internet options (scientific forums, blogs, etc.). These tools are also protagonists in recent years, which encourage more dissemination of science and, therefore, more knowledge of what is published [ 34 ]. Some experts believe that studies that are more recent are cited today due to the advancement in scientific dissemination [ 10 ]. It may seem surprising that the studies with the highest number of citations are recent studies; among other factors, this could be due to the appearance of scientific journals in an electronic format, facilitating access and thus favoring circulation in the scientific community [ 6 ].

Some specialists consider that research goes further, suggesting the publication of an article ends when it is read and understood by a large part of society, that is, it is not enough just to publish, it is necessary for the audience to clearly understand its content and, thus, be able to cite it [ 35 ].

The keyword co-occurrence analysis found that the words “cancer”, “quality of life”, “comorbidity”, “epidemilogy” and “disease” had the highest frequency of co-occurrence in the research in the analyzed category. Our work reflects, in part, a growing trend in public health research. Studies on quality of life, comorbidity or cancer have been the focus of research in the scientific community and specifically within the Public, Environmental & Occupational Health category [ 36 , 37 ]. Performing a quick bibliographic search in WoS, these terms occupy the fourth, eighth and fourteenth position, respectively, with more records among the different categories.

Metadata from all documents were used to reveal the most productive authors and the most impactful sources. The high number of authors (487) contributing to the 100 articles, with more than an average of five authors per article, made it difficult to determine the individual contribution and, consequently, the role of each author [ 38 ]. As suggested by Bruni et al. [ 33 ], traditionally, in multi-author articles, the first position is occupied by the main contributor, while the last position is reserved for the supervisor. The authors with most impact in the studied category generally held relevant positions, either as main author or as supervisor. This is becoming more common due to the influence of experimental sciences, considering the same importance to the first and last author, based on the author/director relationship. This interpretation is known as the FLAE approach, an expression of first last author emphasis [ 39 ].

The h-index quantifies the research performance of individual scientists, incorporating both the number and visibility of publications [ 40 ]. In the work, we can see an unequal distribution of the h index among the authors of the 100 articles, where the number of citations that a scientific subcommunity grants to a manuscript is undoubtedly and directly related to the number of researchers that make up such a sub-community [ 41 ]. The analyzed category is a very broad field of knowledge; therefore, the number of citations of the articles will be very different depending on the topic analyzed.

As indicated by Jung et al. [ 42 ], in a context in which there is great interest in intensifying international collaboration within scientific practice, this paper proposes an approach on how to measure and visualize international collaborative work at the institutional level. The low collaboration observed between the different authors in our work contrasts with that found in studies such as by Zhu et al. [ 1 ] or Yu et al. [ 10 ]. The joint analysis of the collaboration indexes of the relationships between the different authors of the documents allows us to make a better interpretation of the structure of international scientific collaboration networks in the category of study [ 43 ]. One of the variables handled in our work was the possibility of identifying whether there was a high level of international and potentially multinational collaboration with other institutions that could affect the visibility of the research and the frequency of citations of a category [ 44 ]. This was not the case, but we have been able to map and identify the existing collaborations within the Public, Environmental & Occupational Health field, as well as the main citation sources.

The most relevant works were mainly in North America, specifically the USA and Canada, and Western Europe. Similarly, citation analysis showed this same trend in previous studies [ 45 , 46 ]. This trend can be explained by several reasons, first of all, by the cumulative geographical advantage, since citations originate more frequently from institutions located in the same country as the place of residence of the author [ 47 , 48 ]. Second, as suggested by Wang et al. [ 49 ], the USA can count on a broad scientific community and generous science funding policies. In fact, the most productive institutions in our study are geographically located in the USA and Europe. A third reason may be that larger universities provide greater opportunities for scientists to collaborate and work on similar topics, and co-authorship may lead to higher citation rates.

Most items originated from two major advanced economies: North America and Western Europe. These are undoubtedly economically developed continents with more access to early research and they can support medical research [ 29 ]. This can be seen in the data published by WHO in a report published in 2020, in which high-income countries spend a higher percentage of GDP on R&D in the health sector [ 50 ].

The cooperative network of research institutions can reveal the distribution of research forces in the field of Public, Environmental & Occupational Health. The USA has the most extensive cooperative relationships and prefers to cooperate with Canada and some European countries. England, with the second highest number of co-authored articles, prefers to work with other European countries. Our results would be in line with those found by Song et al. [ 51 ] in the Entrepreneurship research area. In the field of science, collaborative work, institutional and disciplinary structures face the challenge of a global context. This challenge has led to the creation of initiatives such as e-Science in the United Kingdom, which was announced as a global collaboration program in key areas of science, and the development of the next generation of infrastructure. These types of initiatives show that contemporary scientific practice is characterized by being very collaborative, multidisciplinary, global work with intensive data management [ 52 ].

According to the results, more than half of the classified articles were published in only five journals, collecting more than half of the total citations. These results demonstrate that a significant number of studies concentrated on a limited core of journals, in accordance with Bradford’s law [ 53 ]. As indicated by Highhouse et al. [ 54 ], authors tend to send their work to the most prestigious journals, attracted, according to Bruni et al. [ 33 ], due to the greater visibility in the search results, as well as the greater probability of being cited.

If we look at the quartiles of the journals, works mostly appears on first and second quartile journals. As stated by Torres-Salinas and Cabezas-Calvijo [ 55 ], publication in high-impact journals generates benefits, starting with the fact that a scientist who regularly publishes in these journals will be able to advance smoothly in his scientific career and will be recognized as an expert in his field. Other authors affirm that publication in high-impact journals helps to develop one’s own criteria, increases self-esteem, strengthens the confidence of the researcher, and feeds the desire to continue researching and publishing, in addition to guaranteeing quality through arbitration, such as peer review demonstrates [ 35 , 56 ].

There is no doubt that the use of the JIF as an evaluation measure generates debate, but today it is a useful way to measure the prestige and importance of scientific journals in the international system, as well as for their researchers [ 57 ]. Many authors have pointed out that the JIF has some limitations such as: (a) a built-in bias that favors American journals (in the case of our study, six of the nine journals in the ranking are published in the USA), (b) scoring highly variable IF between fields and specialties within fields, (c) vulnerability to inflation due to self-citation of journals, (d) vulnerability to inflation due to the publication of review articles and meta-analyses, and, finally, (e) an arbitrary citation window that penalizes some fields or specialties within them [ 54 , 57 ].

Finally, we wanted to also compare the JIF without the self-citations. In this sense, the level of self-citation of the analyzed journals was relatively low, with some exceptions. The abuse of self-citations is another element that can substantially affect the JIF. The self-citation rate in the presented list was low (8.1%) compared with other studies [ 33 , 58 , 59 ]. This is a bias that many platforms have been working on for years to solve [ 60 , 61 ].

5. Conclusions

The work allows the identification of relevant aspects in order to encourage scientific mapping in the Public, Environmental & Occupational Health category. The analysis can help the governance of specific areas or it can outline an institution’s research. The category analyzed has very varied topics; however, it allowed us to identify the most cited authors, institutions with greater visibility, and the most notable articles.

It has also made it possible to analyze the researchers who are forming national and international collaboration networks, as well as to identify the most collaborative authors and institutions.

Currently, the publication rate of American researchers is the highest in the category studied and its institutions are among the most productive. In addition, the collaborative network of countries, institutions and authors shows the influence of European and American countries in the Public, Environmental & Occupational Health category.

Keyword analysis was an effective method to identify interesting topics among researchers and mark research trend lines.

The results of this research open up new possibilities to identify new strategies and institutional policies that allow them to consolidate their research networks.

Although there has been an exponential growth in work, greater efforts are still required from both researchers and institutions.

In this article, valuable information is provided not only to identify topics of interest in the analyzed category, but also to identify the differences on topics studied between the areas that form the category.

Funding Statement

This research was funded by Human Movement Research Group: SGR-Cat 2021 grant number SGR 1463. Generalitat de Catalunya.

Author Contributions

Conceptualization, V.H.-G. and J.M.C.-T.; methodology, V.H.-G., C.J.-D. and J.M.C.-T.; formal analysis, V.H.-G. and J.R.-M.; investigation, J.M.C.-T., C.J.-D. and Á.P.-R.; data curation, V.H.-G. and J.M.C.-T.; writing—original draft preparation, V.H.-G. and J.R.-M.; writing—review and editing, V.H.-G., Á.P.-R. and J.R.-M.; visualization, J.M.C.-T. and Á.P.-R.; supervision, J.R.-M.; project administration, J.R.-M.; funding acquisition, V.H.-G., C.J.-D., Á.P.-R. and J.R.-M. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

TOPBOTS Logo

The Best of Applied Artificial Intelligence, Machine Learning, Automation, Bots, Chatbots

Top 10 Influential AI Research Papers in 2023 from Google, Meta, Microsoft, and More

December 5, 2023 by Mariya Yao

top AI research papers, generative agents

From Generative Agents research paper

In this article, we delve into ten transformative research papers from diverse domains, spanning language models, image processing, image generation, and video editing. As discussions around Artificial General Intelligence (AGI) reveal that AGI seems more approachable than ever, it’s no wonder that some of the featured papers explore various paths to AGI, such as extending language models or harnessing reinforcement learning for domain-spanning mastery.

If you’d like to skip around, here are the research papers we featured:

  • Sparks of AGI by Microsoft
  • PALM-E by Google
  • LLaMA 2 by Meta AI
  • LLaVA by University of Wisconsin–Madison, Microsoft, and Columbia University
  • Generative Agents by Stanford University and Google
  • Segment Anything by Meta AI
  • DALL-E 3 by OpenAI
  • ControlNet by Stanford University
  • Gen-1 by Runway
  • DreamerV3 by DeepMind and University of Toronto

If this in-depth educational content is useful for you, subscribe to our AI mailing list to be alerted when we release new material. 

Top 10 AI Research Papers 2023

1. sparks of agi by microsoft.

In this research paper, a team from Microsoft Research analyzes an early version of OpenAI’s GPT-4, which was still under active development at the time. The team argues that GPT-4 represents a new class of large language models, exhibiting more generalized intelligence compared to previous AI models. Their investigation reveals GPT-4’s expansive capabilities across various domains, including mathematics, coding, vision, medicine, law, and psychology. They highlight that GPT-4 can solve complex and novel tasks without specialized prompting, often achieving performance close to human level. 

The Microsoft team also emphasizes the potential of GPT-4 to be considered an early, albeit incomplete, form of artificial general intelligence (AGI). They focus on identifying GPT-4’s limitations and discuss the challenges in progressing towards more advanced and comprehensive AGI versions. This includes considering new paradigms beyond the current next-word prediction model.

sparks of AGI

Where to learn more about this research?

  • Sparks of Artificial General Intelligence: Early experiments with GPT-4 (research paper)
  • Sparks of AGI: early experiments with GPT-4 (a talk by the paper’s first author Sébastien Bubeck)

Where can you get implementation code?

  • Not applicable

Applied AI Book Second Edition

2. PALM-E by Google

The research paper introduces PaLM-E , a novel approach to language models that bridges the gap between words and percepts in the real world by directly incorporating continuous sensor inputs. This embodied language model seamlessly integrates multi-modal sentences containing visual, continuous state estimation, and textual information. These inputs are trained end-to-end with a pre-trained LLM and applied to various embodied tasks, including sequential robotic manipulation planning, visual question answering, and captioning.

PaLM-E, particularly the largest model with 562B parameters, demonstrates remarkable performance on a wide range of tasks and modalities. Notably, it excels in embodied reasoning tasks, exhibits positive transfer from joint training across language, vision, and visual-language domains, and showcases state-of-the-art capabilities in OK-VQA benchmarking. Despite its focus on embodied reasoning, PaLM-E-562B also exhibits an array of capabilities, including zero-shot multimodal chain-of-thought reasoning, few-shot prompting, OCR-free math reasoning, and multi-image reasoning, despite being trained on only single-image examples.

PALM-E model

  • PaLM-E: An Embodied Multimodal Language Model (research paper)
  • PaLM-E (demos)
  • PaLM-E (blog post)
  • Code implementation of the PaLM-E model is not available.

3. LLaMA 2 by Meta AI

Summary .

LLaMA 2 is an enhanced version of its predecessor, trained on a new data mix, featuring a 40% larger pretraining corpus, doubled context length, and grouped-query attention. The LLaMA 2 series of models includes LLaMA 2 and LLaMA 2-Chat , optimized for dialogue, with sizes ranging from 7 to 70 billion parameters. These models exhibit superior performance in helpfulness and safety benchmarks compared to open-source counterparts and are comparable to some closed-source models. The development process involved rigorous safety measures, including safety-specific data annotation and red-teaming. The paper aims to contribute to the responsible development of LLMs by providing detailed descriptions of fine-tuning methodologies and safety improvements.

LLaMA 2 Chat

  • Llama 2: Open Foundation and Fine-Tuned Chat Models (research paper)
  • Llama 2: open source, free for research and commercial use (blog post)
  • Meta AI released LLaMA 2 models to individuals, creators, researchers, and businesses of all sizes. You can access model weights and starting code for pretrained and fine-tuned LLaMA 2 language models through GitHub .

4. LLaVA by University of Wisconsin–Madison, Microsoft, and Columbia University

The research paper introduces LLaVA , L arge L anguage a nd V ision A ssistant, a groundbreaking multimodal model that leverages language-only GPT-4 to generate instruction-following data for both text and images. This novel approach extends the concept of instruction tuning to the multimodal space, enabling the development of a general-purpose visual assistant.

The paper addresses the challenge of a scarcity of vision-language instruction-following data by presenting a method to convert image-text pairs into the appropriate instruction-following format, utilizing GPT-4. They construct a large multimodal model (LMM) by integrating the open-set visual encoder of CLIP with the language decoder LLaMA. The fine-tuning process on generated instructional vision-language data proves effective, and practical insights are offered for building a general-purpose instruction-following visual agent.

The paper’s contributions include the generation of multimodal instruction-following data, the development of large multimodal models through end-to-end training on generated data, and the achievement of state-of-the-art performance on the Science QA multimodal reasoning dataset. Additionally, the paper demonstrates a commitment to open-source principles by making the generated multimodal instruction data, codebase for data generation and model training, model checkpoint, and a visual chat demo available to the public.

LLaVA

  • Visual Instruction Tuning (research paper)
  • LLaVA: Large Language and Vision Assistant (blog post with demos)
  • The LLaVa code implementation is available on GitHub .

5. Generative Agents by Stanford University and Google

The paper introduces a groundbreaking concept – generative agents that can simulate believable human behavior. These agents exhibit a wide range of actions, from daily routines like cooking breakfast to creative endeavors such as painting and writing. They form opinions, engage in conversations, and remember past experiences, creating a vibrant simulation of human-like interactions.

To achieve this, the paper presents an architectural framework that extends large language models, allowing agents to store their experiences in natural language, synthesize memories over time, and retrieve them dynamically for behavior planning. These generative agents find applications in various domains, from role-play scenarios to social prototyping in virtual worlds. The research validates their effectiveness through evaluations, emphasizing the importance of memory, reflection, and planning in creating convincing agent behavior while addressing ethical and societal considerations.

generative agents paper

  • Generative Agents: Interactive Simulacra of Human Behavior (research paper)
  • Generative Agents (video presentation of the research by the paper’s first author, Joon Sung Park)
  • The core simulation module for generative agents was released on GitHub .

6. Segment Anything by Meta AI

In this paper, the Meta AI team introduced a groundbreaking task, model, and dataset for image segmentation. Leveraging an efficient model in a data collection loop, the project has created the most extensive segmentation dataset to date, featuring over 1 billion masks for 11 million licensed and privacy-respecting images. To achieve their goal of building a foundational model for image segmentation, the project focuses on promptable models trained on a diverse dataset. SAM, the Segment Anything Model , employs a straightforward yet effective architecture comprising an image encoder, a prompt encoder, and a mask decoder. The experiments demonstrate that SAM competes favorably with fully supervised results on a diverse range of downstream tasks, including edge detection, object proposal generation, and instance segmentation. 

Segment Anything paper

  • Segment Anything (research paper)
  • Segment Anything (the research website with demos, datasets, etc)
  • The Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images have been released here .

7. DALL-E 3 by OpenAI

The research paper presents a groundbreaking approach to addressing one of the most significant challenges in text-to-image models: prompt following. Text-to-image models have historically struggled with accurately translating detailed image descriptions into visuals, often misinterpreting prompts or overlooking critical details. The authors of the paper hypothesize that these issues come from noisy and inaccurate image captions in the training dataset. To overcome this limitation, they developed a specialized image captioning system capable of generating highly descriptive and precise image captions. These enhanced captions are then used to recaption the training dataset for text-to-image models. The results are remarkable, with the DALL-E model trained on the improved dataset showcasing significantly enhanced prompt-following abilities.

Note: The paper does not cover training or implementation details of the DALL-E 3 model and only focuses on evaluating the improved prompt following of DALL-E 3 as a result of training on highly descriptive generated captions.

DALL-E 3 example

  • Improving Image Generation with Better Captions (research paper)
  • DALL-E 3 (blog post by OpenAI)
  • The code implementation of DALL-E 3 is not available, but the authors released text-to-image samples collected for the evaluations of DALL-E against the competitors.

8. ControlNet by Stanford University

ControlNet is a neural network structure designed by the Stanford University research team to control pretrained large diffusion models and support additional input conditions. ControlNet learns task-specific conditions in an end-to-end manner and demonstrates robust learning even with small training datasets. The training process is as fast as fine-tuning a diffusion model and can be performed on personal devices or scaled to handle large amounts of data using powerful computation clusters. By augmenting large diffusion models like Stable Diffusion with ControlNets, the researchers enable conditional inputs such as edge maps, segmentation maps, and keypoints, thereby enriching methods to control large diffusion models and facilitating related applications.

ControlNet paper

  • Adding Conditional Control to Text-to-Image Diffusion Models (research paper)
  • Ablation Study: Why ControlNets use deep encoder? What if it was lighter? Or even an MLP? (blog post by ControlNet developers)
  • The official implementation of this paper is available on GitHub .

9. Gen-1 by Runway

The Gen-1 research paper introduced a groundbreaking advancement in the realm of video editing through the fusion of text-guided generative diffusion models. While such models had previously revolutionized image creation and manipulation, extending their capabilities to video editing had remained a formidable challenge. Existing methods either required laborious re-training for each input or resorted to error-prone techniques to propagate image edits across frames. In response to these limitations, the researchers presented a structure and content-guided video diffusion model that allowed seamless video editing based on textual or visual descriptions of the desired output. The suggested solution was to leverage monocular depth estimates with varying levels of detail to gain precise control over structure and content fidelity. 

Gen-1 was trained jointly on images and videos, paving the way for versatile video editing capabilities. It empowered users with fine-grained control over output characteristics, enabling customization based on a few reference images. Extensive experiments demonstrated its prowess, from preserving temporal consistency to achieving user preferences in editing outcomes.

Gen-1 paper

  • Structure and Content-Guided Video Synthesis with Diffusion Models (research paper)
  • Gen-1: The Next Step Forward for Generative AI (blog post by Runway)
  • Gen-2: The Next Step Forward for Generative AI (blog post by Runway)
  • The code implementation of Gen-1 is not available.

10. DreamerV3 by DeepMind and University of Toronto

The paper introduces DreamerV3 , a pioneering algorithm, based on world models, that showcases remarkable performance across a wide spectrum of domains, encompassing both continuous and discrete actions, visual and low-dimensional inputs, 2D and 3D environments, varied data budgets, reward frequencies, and reward scales. At the heart of DreamerV3 lies a world model that learns from experience, combining rich perception and imagination training. This model incorporates three neural networks: one for predicting future outcomes based on potential actions, another for assessing the value of different situations, and a third for learning how to navigate toward valuable situations. The algorithm’s generalizability across domains with fixed hyperparameters is achieved through the transformation of signal magnitudes and robust normalization techniques. 

A particularly noteworthy achievement of DreamerV3 is its ability to conquer the challenge of collecting diamonds in the popular video game Minecraft entirely from scratch, without any reliance on human data or curricula. DreamerV3 also demonstrates scalability, where larger models directly translate to higher data efficiency and superior final performance.

DreamerV3 paper

  • Mastering Diverse Domains through World Models (research paper)
  • DreamerV3 (project website)
  • A reimplementation of DreamerV3 is available on GitHub .

In 2023, the landscape of AI research witnessed remarkable advancements, and these ten transformative papers have illuminated the path forward. From innovative language models to groundbreaking image generation and video editing techniques, these papers have pushed the boundaries of AI capabilities. As we reflect on these achievements, we anticipate even more transformative discoveries and applications on the horizon, shaping the AI landscape for years to come.

Enjoy this article? Sign up for more AI research updates.

We’ll let you know when we release more summary articles like this one.

  • Email Address *
  • Name * First Last
  • Natural Language Processing (NLP)
  • Chatbots & Conversational AI
  • Computer Vision
  • Ethics & Safety
  • Machine Learning
  • Deep Learning
  • Reinforcement Learning
  • Generative Models
  • Other (Please Describe Below)
  • What is your biggest challenge with AI research? *

' src=

About Mariya Yao

Mariya is the co-author of Applied AI: A Handbook For Business Leaders and former CTO at Metamaven. She "translates" arcane technical concepts into actionable business advice for executives and designs lovable products people actually want to use. Follow her on Twitter at @thinkmariya to raise your AI IQ.

About TOPBOTS

  • Expert Contributors
  • Terms of Service & Privacy Policy
  • Contact TOPBOTS

Subscribe to our biweekly newsletter for the latest in applied AI, cutting-edge research updates, funding deals, and promising AI startups.

Academia Insider

Top 100 Journal Publications In The World: Global Rankings

Discover the pinnacle of academic publishing with our definitive list of the Top 100 Journal Publications in the World.

This carefully curated selection spans a wide array of disciplines, offering insights into the journals that lead in innovation, research impact, and scholarly influence.

Based on authoritative metrics and citation data from Google Scholar, this guide serves as an invaluable resource you to contribute to and stay informed on the cutting edge of their fields.

Top 100 Journal Publications In The World

Name of Journalh5-Index ScoreField
Nature467Multidisciplinary
The New England Journal of Medicine439Medicine
Science424Multidisciplinary
IEEE/CVF Conference on Computer Vision and Pattern Recognition422Computer Science
The Lancet368Medicine
Nature Communications349Multidisciplinary
Advanced Materials326Materials Science
Cell316Biology
Neural Information Processing Systems309Computer Science
International Conference on Learning Representations303Computer Science
JAMA286Medicine
Science of The Total Environment273Environmental Science
Nature Medicine268Biomedical Science
Proceedings of the National Academy of Sciences268Multidisciplinary
Angewandte Chemie International Edition266Chemistry
Chemical Reviews264Chemistry
International Conference on Machine Learning254Computer Science
Chemical Society Reviews248Chemistry
Journal of Cleaner Production246Environmental Science & Technology
Nucleic Acids Research238Molecular Biology

How To Determine A Journal’s Rankings?

Determining the value of a journal in the dense forest of academic publishing can be quite the task, especially with the myriad of scientific disciplines vying for your attention.

Impact Factor

A good starting point is to look at the journal’s impact factor.

This metric, often used as an indicator of a journal’s prestige and influence, reflects the average number of citations received by articles published in the journal. 

High-impact journals like those in the “Journal of the American Academy of Sciences” or “Springer’s Environmental Research” often boast impressive impact factors, signaling their importance in their respective fields.

Databases – Google Scholar, Scopus

Another critical aspect to consider is the journal’s ranking within international databases like Google Scholar or Scopus.

These platforms aggregate data on academic journals and provide rankings based on various metrics, including citation counts and impact factors.

A journal listed in the top 100 according to Google Scholar rank, for instance, is likely to be a high-quality publication respected by scholars worldwide. Google uses the H5-index score to rank journals.

top 100 journal publications in the world

It’s essential to dive into these rankings and understand the criteria behind them, as they offer insights into a journal’s reputation and the impact of its research output.

Top 100 Journal Publications In The World (Google Scholar)

Google Scholar, a widely used resource for academic literature, provides a comprehensive ranking of journals based on metrics such as citation counts and impact factors. Here, we explore the top 100 journal publications in the world according to the latest Google Scholar rankings (at the time of writing). We also look into their h5-index scores, fields, and a brief overview of each.

  • Nature:  Score: 467 – Field: Multidisciplinary Nature is a flagship journal renowned for publishing some of the most pioneering research across all scientific disciplines. Its high score reflects its broad influence and the groundbreaking nature of its articles.
  • The New England Journal of Medicine (NEJM): Score: 439 – Field: Medicine NEJM is considered one of the most prestigious medical journals, known for its rigorous peer review process and for publishing high-impact research that often shapes global health policies and practices.
  • Science: Score: 424 – Field: Multidisciplinary As a peer-reviewed academic journal, Science is celebrated for its highly cited papers across diverse scientific fields, reflecting its role in advancing knowledge and fostering scientific discourse.
  • IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): Score: 422 – Field: Computer Science CVPR is a highly regarded conference in the field of computer vision, with its proceedings being pivotal for researchers and professionals looking to stay abreast of the latest developments in the field.
  • The Lancet: Score: 368 – Field: Medicine The Lancet’s long-standing reputation for publishing high-quality medical research makes it a go-to source for clinicians and researchers alike, covering a wide array of medical disciplines.
  • Nature Communications: Score: 349 – Field: Multidisciplinary This open-access journal is known for facilitating the exchange of ideas and findings across all areas of science, ensuring broad visibility of research through its accessible format.
  • Advanced Materials: Score: 326 – Field: Materials Science With a focus on cutting-edge research in materials science, Advanced Materials is instrumental for those exploring the design and applications of new materials.
  • Cell: Score: 316 – Field: Biology Cell’s high impact factor is testament to its role in publishing significant discoveries in biology, particularly those that advance understanding of cell structure and function.
  • Neural Information Processing Systems (NeurIPS): Score: 309 – Field: Computer Science As a premier conference, NeurIPS is at the forefront of new advancements in machine learning and artificial intelligence, showcasing high-quality research that pushes the boundaries of the field.
  • International Conference on Learning Representations (ICLR): Score: 303 – Field: Computer Science ICLR has rapidly become a key venue for the dissemination of cutting-edge research in deep learning and neural networks, reflecting the dynamic nature of the field.
  • JAMA (Journal of the American Medical Association): Score: 286 – Field: Medicine JAMA is recognized for its influential research articles that span the spectrum of medicine, aiming to inform clinical practice and health policy.
  • Science of The Total Environment: Score: 273 – Field: Environmental Science This journal provides an interdisciplinary platform for research on the environment, tackling issues from pollution and contamination to sustainable practices.
  • Nature Medicine: Score: 268 – Field: Biomedical Science Focused on groundbreaking research in health and disease, Nature Medicine is a key resource for understanding the biomedical implications of scientific discoveries.
  • Proceedings of the National Academy of Sciences (PNAS): Score: 268 – Field: Multidisciplinary PNAS publishes high-quality research across the physical, biological, and social sciences, making it a cornerstone of the scientific community.
  • Angewandte Chemie International Edition: Score: 266 – Field: Chemistry Known for its high-quality papers in all areas of chemistry, this journal is essential for chemists seeking comprehensive insights into the field’s latest developments.
  • Chemical Reviews: Score: 264 – Field: Chemistry This journal stands out for its in-depth review articles that synthesize current knowledge and trends in chemistry, serving as invaluable resources for researchers and educators alike.
  • International Conference on Machine Learning (ICML): Score: 254 – Field: Computer Science ICML is a leading conference focusing on the latest in machine learning research, offering insights into theoretical foundations and practical applications.
  • Chemical Society Reviews: Score: 248 – Field: Chemistry With its focus on review articles, this journal provides a critical overview of the major developments in the chemical sciences, guiding researchers through the vast landscape of the field.
  • Journal of Cleaner Production: Score: 246 – Field: Environmental Science & Technology This journal addresses the environmental and sustainability challenges in production and consumption, promoting research that leads to cleaner and more efficient practices.
  • Nucleic Acids Research: Score: 238 – Field: Molecular Biology Specializing in DNA and RNA research, this journal is pivotal for those in the field of genetics and molecular biology, publishing high-impact studies on the structure, function, and applications of nucleic acids.

For journals from position 21-100, we list them down below:

  • European Conference on Computer Vision : Score: 238 – Field: Computer Vision
  • Advanced Energy Materials : Score: 236 – Field: Energy Materials
  • Journal of the American Chemical Society : Score: 235 – Field: Chemistry
  • IEEE Access : Score: 233 – Field: Multidisciplinary Engineering
  • Advanced Functional Materials : Score: 230 – Field: Materials Science
  • IEEE/CVF International Conference on Computer Vision : Score: 228 – Field: Computer Vision
  • Renewable and Sustainable Energy Reviews : Score: 226 – Field: Energy & Environment
  • ACS Nano : Score: 220 – Field: Nanoscience
  • BMJ : Score: 218 – Field: Medicine
  • Physical Review Letters : Score: 216 – Field: Physics
  • International Journal of Molecular Sciences : Score: 215 – Field: Molecular Sciences
  • Journal of Clinical Oncology : Score: 213 – Field: Oncology
  • AAAI Conference on Artificial Intelligence : Score: 212 – Field: Artificial Intelligence
  • Science Advances : Score: 212 – Field: Multidisciplinary Science
  • PLoS ONE : Score: 212 – Field: Multidisciplinary Science
  • Frontiers in Immunology : Score: 212 – Field: Immunology
  • Scientific Reports : Score: 210 – Field: Multidisciplinary Science
  • Circulation : Score: 206 – Field: Cardiology
  • Chemical Engineering Journal : Score: 206 – Field: Chemical Engineering
  • Energy & Environmental Science : Score: 205 – Field: Environmental Science
  • Applied Catalysis B: Environmental : Score: 205 – Field: Environmental Science
  • International Journal of Environmental Research and Public Health : Score: 201 – Field: Public Health
  • The Lancet Oncology : Score: 198 – Field: Oncology
  • Journal of the American College of Cardiology : Score: 193 – Field: Cardiology
  • Meeting of the Association for Computational Linguistics (ACL) : Score: 192 – Field: Computational Linguistics
  • Nutrients : Score: 189 – Field: Nutrition & Dietetics
  • Nature Genetics : Score: 188 – Field: Genetics
  • Morbidity and Mortality Weekly Report : Score: 186 – Field: Public Health
  • Applied Energy : Score: 186 – Field: Energy
  • Nature Biotechnology : Score: 185 – Field: Biotechnology
  • Sustainability : Score: 185 – Field: Sustainability
  • Nano Energy : Score: 184 – Field: Energy
  • Joule : Score: 183 – Field: Energy
  • Journal of Materials Chemistry A : Score: 183 – Field: Materials Chemistry
  • Nature Materials : Score: 180 – Field: Materials Science
  • IEEE Transactions on Pattern Analysis and Machine Intelligence : Score: 179 – Field: Computer Science
  • ACS Applied Materials & Interfaces : Score: 179 – Field: Materials Science
  • Nature Energy : Score: 177 – Field: Energy
  • ACS Catalysis : Score: 177 – Field: Catalysis
  • The Lancet Infectious Diseases : Score: 176 – Field: Infectious Diseases
  • Conference on Empirical Methods in Natural Language Processing (EMNLP) : Score: 176 – Field: Natural Language Processing
  • Journal of Business Research : Score: 173 – Field: Business
  • Gastroenterology : Score: 172 – Field: Gastroenterology
  • European Heart Journal : Score: 171 – Field: Cardiology
  • IEEE Internet of Things Journal : Score: 171 – Field: Internet of Things
  • Nature Nanotechnology : Score: 170 – Field: Nanotechnology
  • Environmental Pollution : Score: 170 – Field: Environmental Science
  • The Astrophysical Journal : Score: 169 – Field: Astrophysics
  • Environmental Science & Technology : Score: 169 – Field: Environmental Science
  • Frontiers in Psychology : Score: 169 – Field: Psychology
  • Immunity : Score: 168 – Field: Immunology
  • Sensors : Score: 168 – Field: Sensors & Electronics
  • Annals of Oncology : Score: 166 – Field: Oncology
  • ACS Energy Letters : Score: 166 – Field: Energy
  • Journal of Hazardous Materials : Score: 166 – Field: Environmental Science
  • IEEE Communications Surveys & Tutorials : Score: 164 – Field: Telecommunications
  • Nature Neuroscience : Score: 164 – Field: Neuroscience
  • Gut : Score: 164 – Field: Gastroenterology
  • Molecular Cancer : Score: 164 – Field: Cancer Research
  • Molecules : Score: 164 – Field: Chemistry
  • Small : Score: 164 – Field: Nanoscience
  • Clinical Infectious Diseases : Score: 163 – Field: Infectious Diseases
  • Nature Methods : Score: 163 – Field: Methods in Life Sciences
  • Accounts of Chemical Research : Score: 163 – Field: Chemistry
  • IEEE Transactions on Industrial Informatics : Score: 162 – Field: Industrial Informatics
  • Physical Review D : Score: 161 – Field: Physics
  • Bioresource Technology : Score: 161 – Field: Bioenergy and Bioproducts
  • American Economic Review : Score: 160 – Field: Economics
  • Cell Metabolism : Score: 160 – Field: Metabolism
  • Monthly Notices of the Royal Astronomical Society : Score: 160 – Field: Astronomy
  • Chemosphere : Score: 160 – Field: Environmental Science
  • Blood : Score: 158 – Field: Hematology
  • Cell Reports : Score: 158 – Field: Cell Biology
  • Nano Letters : Score: 158 – Field: Nanoscience
  • Advanced Science : Score: 158 – Field: Multidisciplinary Science
  • Journal of High Energy Physics : Score: 158 – Field: High Energy Physics
  • Nature Reviews Immunology : Score: 157 – Field: Immunology
  • Technological Forecasting and Social Change : Score: 157 – Field: Social Sciences & Technology
  • Frontiers in Microbiology : Score: 155 – Field: Microbiology
  • Water Research : Score: 155 – Field: Water Science & Engineering

Prestigious Academic Journals By Ranking and Citation

This list of Top 100 Journal Publications represent the zenith of scholarly communication, embodying excellence across diverse scientific and academic fields.

This list not only highlights the journals that set the benchmark for quality and impact but also serves as a beacon for researchers aiming for the highest standards in their work.

As the landscape of academic publishing continues to evolve, these publications will undoubtedly play a pivotal role in shaping the future of research and knowledge dissemination.

top 10 research papers

Dr Andrew Stapleton has a Masters and PhD in Chemistry from the UK and Australia. He has many years of research experience and has worked as a Postdoctoral Fellow and Associate at a number of Universities. Although having secured funding for his own research, he left academia to help others with his YouTube channel all about the inner workings of academia and how to make it work for you.

Thank you for visiting Academia Insider.

We are here to help you navigate Academia as painlessly as possible. We are supported by our readers and by visiting you are helping us earn a small amount through ads and affiliate revenue - Thank you!

top 10 research papers

2024 © Academia Insider

top 10 research papers

Research Paper Topics

Academic Writing Service

Choose your Topic Smart

What starts well, ends well, so you need to be really careful with research paper topics. The topic of a research paper defines the whole piece of writing. How often have you chosen the book by its title? First impression is often influential, so make sure your topic will attract the reader instantly. By choosing your topic smart, the half of your job is done. That is why we have singled out several secrets on how to pick the best topic for you. Also see the list of 1000 thesis topics .

Browse Research Paper Topics by Category:

  • Anthropology
  • Argumentative
  • Communication
  • Criminal Justice
  • Environmental
  • Political Science

What is the Key to a Perfect Topic for a Research Paper?

The key to a perfect topic includes three main secrets: interest, precision, and innovation.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% off with 24start discount code.

It is impossible to do something great if you have no interest in what you are doing. For this reason, make sure you choose the topic that drives you. If you are bored by what you investigate, do not expect that your paper will be exciting. Right now, spend some minutes or even hours thinking about what interests you. Jot down all your preferences in life, science, politics, social issues etc. It will help you get the idea what you can write about.

After realizing what drives you, narrow this general idea to a more specific one. A research paper is not about beating around the bush. You will need clear facts and data. You will have to provide evidence to your ideas. You will need to be precise, specific and convincing.

Finally, the idea of any research is that it should be surprising and distinctive. Think what makes your perspective and approach special. What is the novelty of your research?

Use Technology

If you are still stuck, use technology. Today we have an opportunity to make our lives easier with a bit of technology used. You can find paper topic generators online. This software will examine the category you want to investigate and the keywords from your research. Within several seconds, this program generates paper topics, so you can try it yourself. It can help you get started with your assignment.

100% Effective Advice

We will now give you advice that is 100% effective when picking the topic. Firstly, forget about what others may think about your topic. This is your topic and this is your perception of the world. Stay personal and let your personal style get you the top grades. Secondly, never decide on the topic before analyzing the background for your research. By this we mean, investigate the topic before you start the research proper. It happens quite often that students choose the topic and later they realize there is no data or information to use. That is why conduct some research beforehand. Thirdly, read other researchers’ papers on the topic you want to write about. It will help you get the idea of the investigation. Moreover, it will help you understand whether you truly want to write a paper on this topic. Finally, when you have picked the topic, started your research, make sure you dedicate your time and energy. If you want to get high results, you need to study every little details of your research.

Examine Different Ideas

People often come up with genius ideas after analyzing thousands of other people’s ideas. This is how our brain works. That is why you can analyze other people’s ideas for research paper topics and think up your own. If you have never written any paper of that kind, it will help you understand the gist of this assignment, the style and the requirements. By comparing different topics, you can motivate yourself and get inspired with these ideas. Luckily, you have come to the right place. Here is our list of top 100 research paper topics.

Top 10 Argumentative Research Paper Topics:

Argumentative research papers examine some controversial issues. Your task is to provide your point of view, your argument, and support your idea with the evidence. This academic assignment requires appropriate structuring and formatting.

  • Does a College Education Pay?
  • Dual Career Families and Working Mothers
  • Electronic Copyright and Piracy
  • Drinking on Campus
  • Education for Homeless Children
  • Glass ceiling
  • Honor System at Colleges
  • Sex and Violence on TV
  • Word Population and Hunger
  • World Trade and Globalization

Top 10 Economics Research Paper Topics:

If you are studying economics, you can find various topics at our site. Check out topics of micro- and macroeconomics. See ideas for urgent economic problems, economic models and strategies. Get inspired and come up with your perfect topic.

  • Beyond Make-or-Buy: Advances in Transaction Cost Economics
  • Economic Aspects of Cultural Heritage
  • Economics of Energy Markets
  • Globalization and Inequality
  • International Trade and Trade Restrictions
  • Aggregate Expenditures Model and Equilibrium Output
  • Taxes Versus Standards
  • Predatory Pricing and Strategic Entry Barriers
  • Marxian and Institutional Industrial Relations in the United States
  • Twentieth-Century Economic Methodology

Top 10 Education Research Paper Topics:

Education has so many questions, and yet few answers. The list of education topic is endless. We have chosen the top 10 topics on the urgent issues in education. You can find ideas related to different approaches, methodology, classroom management, etc.

  • Teachers Thinking About Their Practice
  • Cognitive Approaches to Motivation in Education
  • Responsive Classroom Management
  • Ten Steps to Complex Learning
  • Economics and School-to-Work
  • Reading and Literacy in Adolescence
  • Diversifying the Teaching Force
  • Teacher-Student Relationships
  • Preparing for College and Graduate School
  • Role of Professional Learning

Top 10 History Research Paper Topics:

Choose your topic regarding cultural, economic, environmental, military, political or social history. See what other researchers investigated, compare their ideas and pick the topic that interests you.

  • European Expansion
  • Orientalism
  • Current trends in Historiography
  • Green Revolution
  • Religion and War
  • Women’s Emancipation Movements
  • History of Civilization

Top 10 Psychology Research Paper Topics:

The list of psychology categories and topics is enormous. We have singled out the most popular topics on psychology in 2019. It is mostly topics on modern psychology. Choose the topic the appeals to you the most or ask our professionals to help you come up with some original idea.

  • Imaging Techniques for the Localization of Brain Function
  • Memory and Eyewitness Testimony
  • Traditional Neuroscience Research Methods
  • Meditation and the Relaxation Response
  • Assessment of Mental Health in Older Adults
  • Cross-Cultural Psychology and Research
  • Industrial and Organizational Psychology
  • Diagnostic and Statistical Manual of Mental Disorders
  • Prejudice and Stereotyping
  • Nature Versus Nurture

Top 10 Biology Research Paper Topics:

Here you can find topics related to the science of all forms of life. Examine the topics from different fields in biology and choose the best one for you.

  • Biological Warfare
  • Clone and Cloning
  • Genetic Disorders
  • Genetic Engineering
  • Kangaroos and Wallabies
  • Mendelian Laws of Inheritance
  • Molecular Biology
  • Sexually Transmitted Diseases

Top 10 Chemistry Research Paper Topics:

The best way to understand chemistry is to write a paper on chemistry topic. Below you can see the topics from different fields of chemistry: organic, inorganic, physical, analytical and others.

  • Acids and Bases
  • Alkaline Earth Metals
  • Dyes and Pigments
  • Chemical Warfare
  • Industrial Minerals
  • Photochemistry
  • Soaps and Detergents
  • Transition Elements

Top 10 Physics Research Paper Topics:

Check out the topics on classical and modern physics. Find ideas for writing about interrelationships of physics to other sciences.

  • Aerodynamics
  • Atomic Theory
  • Celestial Mechanics
  • Fluid Dynamics
  • Magnetic recording
  • Microwave Communication
  • Quantum mechanics
  • Subatomic particles

Top 10 Sociology Research Paper Topics:

Find ideas related to different sociological theories, research and methodologies.

  • Feminist Methodologies and Epistemology
  • Quality-of-Life Research
  • Sociology of Men and Masculinity
  • Sociology of Leisure and Recreation
  • Environmental Sociology
  • Teaching and Learning in Sociology
  • The History of Sociology: The North American Perspective
  • The Sociology of Voluntary Associations
  • Marriage and Divorce in the United States
  • Urban Sociology in the 21 st Century

Top 10 Technology Research Paper Topics:

See topics related to the cutting-edge technology or dive into history of electronics, or even early advances in agriculture.

  • Food Preservation: Freeze Drying, Irradiation, and Vacuum Packing
  • Tissue Culturing
  • Digital Telephony
  • Computer-Aided Control Technology
  • Minerals Prospecting
  • Prefabricated Buildings
  • Timber Engineering
  • Quantum Electronic Devices
  • Thermal Water Moderated Nuclear Reactors
  • Long Range Radars and Early Warning Systems

What Makes a Good Topic for a Research Paper?

A good research paper topic is the one that is successful and manageable in your particular case. A successful research paper poses an interesting question you can actually answer. Just as important, it poses a question you can answer within the time available. The question should be one that interests you and deserves exploration. It might be an empirical question or a theoretical puzzle. In some fields, it might be a practical problem or policy issue. Whatever the question is, you need to mark off its boundaries clearly and intelligently so you can complete the research paper and not get lost in the woods. That means your topic should be manageable as well as interesting and important.

A topic is  manageable  if you can:

  • Master the relevant literature
  • Collect and analyze the necessary data
  • Answer the key questions you have posed
  • Do it all within the time available, with the skills you have

A topic is  important  if it:

  • Touches directly on major theoretical issues and debates, or
  • Addresses substantive topics of great interest in your field

Ideally, your topic can do both, engaging theoretical and substantive issues. In elementary education, for example, parents, teachers, scholars, and public officials all debate the effectiveness of charter schools, the impact of vouchers, and the value of different reading programs. A research paper on any of these would resonate within the university and well beyond it. Still, as you approach such topics, you need to limit the scope of your investigation so you can finish your research and writing on time. After all, to be a good research paper, it first has to be a completed one. A successful research paper poses an interesting question you can actually answer within the time available for the project. Some problems are simply too grand, too sweeping to master within the time limits. Some are too minor to interest you or anybody else.

The solution, however, is not to find a lukewarm bowl of porridge, a bland compromise. Nor is it to abandon your interest in larger, more profound issues such as the relationship between school organization and educational achievement or between immigration and poverty. Rather, the solution is to select a well-defined topic that is closely linked to some larger issue and then explore that link. Your research paper will succeed if you nail a well-defined topic. It will rise to excellence if you probe that topic deeply and show how it illuminates wider issues.The best theses deal with important issues, framed in manageable ways. The goal is to select a well-defined topic that is closely linked to some larger issue and can illuminate it.

You can begin your project with either a large issue or a narrowly defined topic, depending on your interests and the ideas you have generated. Whichever way you start, the goals are the same: to connect the two in meaningful ways and to explore your specific topic in depth.

Of course, the choice of a particular research paper topic depends on the course you’re taking. Our site can offer you the following research paper topics and example research papers:

Moving from a Research Paper Idea to a Research Paper Topic

Let’s begin as most students actually do, by going from a “big issue” to a more manageable research paper topic. Suppose you start with a big question such as, “Why has the United States fought so many wars since 1945?” That’s certainly a big, important question. Unfortunately, it’s too complex and sprawling to cover well in a research paper. Working with your professor or instructor, you could zero in on a related but feasible research topic, such as “Why did the Johnson administration choose to escalate the U.S. war in Vietnam?” By choosing this topic, your research paper can focus on a specific war and, within that, on a few crucial years in the mid-1960s.

You can draw on major works covering all aspects of the Vietnam War and the Johnson administration’s decision making. You have access to policy memos that were once stamped top secret. These primary documents have now been declassified, published by the State Department, and made available to research libraries. Many are readily available on the Web. You can also take advantage of top-quality secondary sources (that is, books and articles based on primary documents, interviews, and other research data).

Drawing on these primary and secondary sources, you can uncover and critique the reasons behind U.S. military escalation. As you answer this well-defined question about Vietnam, you can (and you should) return to the larger themes that interest you, namely, “What does the escalation in Southeast Asia tell us about the global projection of U.S. military power since 1945?” As one of America’s largest military engagements since World War II, the war in Vietnam should tell us a great deal about the more general question.

The goal here is to pick a good case to study, one that is compelling in its own right and speaks to the larger issue. It need not be a typical example, but it does need to illuminate the larger question. Some cases are better than others precisely because they illuminate larger issues. That’s why choosing the best cases makes such a difference in your research paper.

Since you are interested in why the United States has fought so often since 1945, you probably shouldn’t focus on U.S. invasions of Grenada, Haiti, or Panama in the past two decades. Why? Because the United States has launched numerous military actions against small, weak states in the Caribbean for more than a century. That is important in its own right, but it doesn’t say much about what has changed so dramatically since 1945. The real change since 1945 is the projection of U.S. power far beyond the Western Hemisphere, to Europe and Asia. You cannot explain this change—or any change, for that matter—by looking at something that remains constant.

In this case, to analyze the larger pattern of U.S. war fighting and the shift it represents, you need to pick examples of distant conflicts, such as Korea, Vietnam, Kosovo, Afghanistan, or Iraq. That’s the noteworthy change since 1945: U.S. military intervention outside the Western Hemisphere. The United States has fought frequently in such areas since World War II but rarely before then. Alternatively, you could use statistics covering many cases of U.S. intervention around the world, perhaps supplemented with some telling cases studies.

Students in the humanities want to explore their own big ideas, and they, too, need to focus their research. In English literature, their big issue might be “masculinity” or, to narrow the range a bit, “masculinity in Jewish American literature.” Important as these issues are, they are too vast for anyone to read all the major novels plus all the relevant criticism and then frame a comprehensive research paper.

If you don’t narrow these sprawling topics and focus your work, you can only skim the surface. Skimming the surface is not what you want to do in a research paper. You want to understand your subject in depth and convey that understanding to your readers.

That does not mean you have to abandon your interest in major themes. It means you have to restrict their scope in sensible ways. To do that, you need to think about which aspects of masculinity really interest you and then find works that deal with them.

You may realize your central concern is how masculinity is defined in response to strong women. That focus would still leave you considerable flexibility, depending on your academic background and what you love to read. That might be anything from a reconsideration of Macbeth to an analysis of early twentieth-century American novels, where men must cope with women in assertive new roles. Perhaps you are interested in another aspect of masculinity: the different ways it is defined within the same culture at the same moment. That would lead you to novelists who explore these differences in their characters, perhaps contrasting men who come from different backgrounds, work in different jobs, or simply differ emotionally. Again, you would have considerable flexibility in choosing specific writers.

Connecting a Specific Research Paper Topic to a Bigger Idea

Not all students begin their research paper concerned with big issues such as masculinity or American wars over the past half century. Some start with very specific topics in mind. One example might be the decision to create NAFTA, the North American Free Trade Agreement encompassing Canada, the United States, and Mexico. Perhaps you are interested in NAFTA because you discussed it in a course, heard about it in a political campaign, or saw its effects firsthand on local workers, companies, and consumers. It intrigues you, and you would like to study it in a research paper. The challenge is to go from this clear-cut subject to a larger theme that will frame your paper.

Why do you even need to figure out a larger theme? Because NAFTA bears on several major topics, and you cannot explore all of them. Your challenge—and your opportunity—is to figure out which one captures your imagination.

One way to think about that is to finish this sentence: “For me, NAFTA is a case of ___________.” If you are mainly interested in negotiations between big and small countries, then your answer is, “For me, NAFTA is a case of a large country like the United States bargaining with a smaller neighbor.” Your answer would be different if you are mainly interested in decision making within the United States, Mexico, or Canada. In that case, you might say, “NAFTA seems to be a case where a strong U.S. president pushed a trade policy through Congress.” Perhaps you are more concerned with the role played by business lobbies. “For me, NAFTA is a case of undue corporate influence over foreign economic policy.” Or you could be interested in the role of trade unions, environmental groups, or public opinion.

The NAFTA decision is related to all these big issues and more. You cannot cover them all. There is not enough time, and even if there were, the resulting paper would be too diffuse, too scattershot. To make an impact, throw a rock, not a handful of pebbles.

Choosing one of these large issues will shape your research paper on NAFTA. If you are interested in U.S. decision making, for example, you might study the lobbying process or perhaps the differences between Democrats and Republicans. If you are interested in diplomacy, you would focus on negotiations between the United States, Canada, and Mexico. Either would make an interesting research paper, but they are different topics.

Although the subject matter and analysis are decidedly different in the humanities, many of the same considerations still apply to topic selection. In English or comparative literature, for example, you may be attracted to a very specific topic such as several poems by William Wordsworth. You are not trying, as a social scientist would, to test some generalizations that apply across time or space. Rather, you want to analyze these specific poems, uncover their multiple meanings, trace their allusions, and understand their form and beauty.

As part of the research paper, however, you may wish to say something bigger, something that goes beyond these particular poems. That might be about Wordsworth’s larger body of work. Are these poems representative or unusual? Do they break with his previous work or anticipate work yet to come? You may wish to comment on Wordsworth’s close ties to his fellow “Lake Poets,” Coleridge and Southey, underscoring some similarities in their work. Do they use language in shared ways? Do they use similar metaphors or explore similar themes? You may even wish to show how these particular poems are properly understood as part of the wider Romantic movement in literature and the arts. Any of these would connect the specific poems to larger themes.

How to Refine Your Research Paper Topic

One of your professor’s or instructor’s most valuable contributions to the success of your research paper is to help you refine your topic. She can help you select the best cases for detailed study or the best data and statistical techniques. S/he can help you find cases that shed light on larger questions, have good data available, and are discussed in a rich secondary literature. She may know valuable troves of documents to explore. That’s why it is so important to bring these issues up in early meetings. These discussions with your instructor are crucial in moving from a big but ill-defined idea to a smart, feasible topic.Some colleges supplement this advising process by offering special workshops and tutorial support for students. These are great resources, and you should take full advantage of them. They can improve your project in at least two ways.

First, tutors and workshop leaders are usually quite adept at helping you focus and shape your topic. That’s what they do best. Even if they are relatively new teachers, they have been writing research papers themselves for many years. They know how to do it well and how to avoid common mistakes. To craft their own papers, they have learned how to narrow their topics, gather data, interpret sources, and evaluate conjectures. They know how to use appropriate methods and how to mine the academic literature. In all these ways, they can assist you with their own hard-won experience. To avoid any confusion, just make sure your instructor knows what advice you are getting from workshop leaders and tutors. You want everyone to be pulling in the same direction.

Second, you will benefit enormously from batting around your research paper in workshops. The more you speak about your subject, the better you will understand it yourself. The better you understand it, the clearer your research and writing will be. You will learn about your project as you present your ideas; you will learn more as you listen to others discuss your work; and you will learn still more as you respond to their suggestions. Although you should do that in sessions with your instructor, you will also profit from doing it in workshops and tutorial sessions.

Secrets to Keep in Mind when Writing a Research Paper

As a bonus, we have prepared several secrets for you to make your paper perfect. Firstly, always write your paper from scratch. Do not copy the already existing materials, as it can lead to unsatisfactory mark or even expulsion. Secondly, start your research early; do not put off investigating the topic. The earlier you start, the easier it will be to meet the deadline. Thirdly, plan your work and create an outline for your task. A planned work will help you be systematic. Plus, it will help you avoid writer’s block, as you always have an outline to follow. Another secret is following all the requirements. A research paper is an academic assignment, so all these structural and formatting standards are important. Finally, make sure you proofread and edit your task. Check your paper for grammar and spelling mistakes, examine your choice of vocabulary. If it seems too much, you can always ask our professional editors and they will check the paper for you. A mistakes-free paper is essential to get high results.

Custom Research Paper Writing Service

If you still have concerns regarding your research paper, we are here to answer your questions. It is no secret that studying is becoming more and more difficult at college. Every week you have an overload of tasks and assignments. You work hard, sleep little. As a result, you can be at the edge of a nervous breakdown trying to finish all the tasks on time. That is why we are here helping thousands of students to study smart.

24/7 you can contact us and order your paper. We never miss the deadline and always provide our clients with a top-notch quality. When you feel that you cannot handle it on your own, a bit of assistance will do no harm. All our writers are experts with years of experience. They are aware of all the subtleties of academic writing and they know all the recent college requirements. You can turn to us for help any time and we will get down to work immediately. From choosing the topic to writing the whole paper – this is what we have to offer. Getting top grades is much easier when the real professionals help you.

  • ABM Thesis Topics
  • Accounting and Finance Thesis Topics
  • Computer Science Thesis Topics
  • Education Thesis Topics
  • Law Thesis Topics
  • Literature Thesis Topics

ORDER HIGH QUALITY CUSTOM PAPER

top 10 research papers

PrepScholar

Choose Your Test

  • Search Blogs By Category
  • College Admissions
  • AP and IB Exams
  • GPA and Coursework

113 Great Research Paper Topics

author image

General Education

feature_pencilpaper

One of the hardest parts of writing a research paper can be just finding a good topic to write about. Fortunately we've done the hard work for you and have compiled a list of 113 interesting research paper topics. They've been organized into ten categories and cover a wide range of subjects so you can easily find the best topic for you.

In addition to the list of good research topics, we've included advice on what makes a good research paper topic and how you can use your topic to start writing a great paper.

What Makes a Good Research Paper Topic?

Not all research paper topics are created equal, and you want to make sure you choose a great topic before you start writing. Below are the three most important factors to consider to make sure you choose the best research paper topics.

#1: It's Something You're Interested In

A paper is always easier to write if you're interested in the topic, and you'll be more motivated to do in-depth research and write a paper that really covers the entire subject. Even if a certain research paper topic is getting a lot of buzz right now or other people seem interested in writing about it, don't feel tempted to make it your topic unless you genuinely have some sort of interest in it as well.

#2: There's Enough Information to Write a Paper

Even if you come up with the absolute best research paper topic and you're so excited to write about it, you won't be able to produce a good paper if there isn't enough research about the topic. This can happen for very specific or specialized topics, as well as topics that are too new to have enough research done on them at the moment. Easy research paper topics will always be topics with enough information to write a full-length paper.

Trying to write a research paper on a topic that doesn't have much research on it is incredibly hard, so before you decide on a topic, do a bit of preliminary searching and make sure you'll have all the information you need to write your paper.

#3: It Fits Your Teacher's Guidelines

Don't get so carried away looking at lists of research paper topics that you forget any requirements or restrictions your teacher may have put on research topic ideas. If you're writing a research paper on a health-related topic, deciding to write about the impact of rap on the music scene probably won't be allowed, but there may be some sort of leeway. For example, if you're really interested in current events but your teacher wants you to write a research paper on a history topic, you may be able to choose a topic that fits both categories, like exploring the relationship between the US and North Korea. No matter what, always get your research paper topic approved by your teacher first before you begin writing.

113 Good Research Paper Topics

Below are 113 good research topics to help you get you started on your paper. We've organized them into ten categories to make it easier to find the type of research paper topics you're looking for.

Arts/Culture

  • Discuss the main differences in art from the Italian Renaissance and the Northern Renaissance .
  • Analyze the impact a famous artist had on the world.
  • How is sexism portrayed in different types of media (music, film, video games, etc.)? Has the amount/type of sexism changed over the years?
  • How has the music of slaves brought over from Africa shaped modern American music?
  • How has rap music evolved in the past decade?
  • How has the portrayal of minorities in the media changed?

music-277279_640

Current Events

  • What have been the impacts of China's one child policy?
  • How have the goals of feminists changed over the decades?
  • How has the Trump presidency changed international relations?
  • Analyze the history of the relationship between the United States and North Korea.
  • What factors contributed to the current decline in the rate of unemployment?
  • What have been the impacts of states which have increased their minimum wage?
  • How do US immigration laws compare to immigration laws of other countries?
  • How have the US's immigration laws changed in the past few years/decades?
  • How has the Black Lives Matter movement affected discussions and view about racism in the US?
  • What impact has the Affordable Care Act had on healthcare in the US?
  • What factors contributed to the UK deciding to leave the EU (Brexit)?
  • What factors contributed to China becoming an economic power?
  • Discuss the history of Bitcoin or other cryptocurrencies  (some of which tokenize the S&P 500 Index on the blockchain) .
  • Do students in schools that eliminate grades do better in college and their careers?
  • Do students from wealthier backgrounds score higher on standardized tests?
  • Do students who receive free meals at school get higher grades compared to when they weren't receiving a free meal?
  • Do students who attend charter schools score higher on standardized tests than students in public schools?
  • Do students learn better in same-sex classrooms?
  • How does giving each student access to an iPad or laptop affect their studies?
  • What are the benefits and drawbacks of the Montessori Method ?
  • Do children who attend preschool do better in school later on?
  • What was the impact of the No Child Left Behind act?
  • How does the US education system compare to education systems in other countries?
  • What impact does mandatory physical education classes have on students' health?
  • Which methods are most effective at reducing bullying in schools?
  • Do homeschoolers who attend college do as well as students who attended traditional schools?
  • Does offering tenure increase or decrease quality of teaching?
  • How does college debt affect future life choices of students?
  • Should graduate students be able to form unions?

body_highschoolsc

  • What are different ways to lower gun-related deaths in the US?
  • How and why have divorce rates changed over time?
  • Is affirmative action still necessary in education and/or the workplace?
  • Should physician-assisted suicide be legal?
  • How has stem cell research impacted the medical field?
  • How can human trafficking be reduced in the United States/world?
  • Should people be able to donate organs in exchange for money?
  • Which types of juvenile punishment have proven most effective at preventing future crimes?
  • Has the increase in US airport security made passengers safer?
  • Analyze the immigration policies of certain countries and how they are similar and different from one another.
  • Several states have legalized recreational marijuana. What positive and negative impacts have they experienced as a result?
  • Do tariffs increase the number of domestic jobs?
  • Which prison reforms have proven most effective?
  • Should governments be able to censor certain information on the internet?
  • Which methods/programs have been most effective at reducing teen pregnancy?
  • What are the benefits and drawbacks of the Keto diet?
  • How effective are different exercise regimes for losing weight and maintaining weight loss?
  • How do the healthcare plans of various countries differ from each other?
  • What are the most effective ways to treat depression ?
  • What are the pros and cons of genetically modified foods?
  • Which methods are most effective for improving memory?
  • What can be done to lower healthcare costs in the US?
  • What factors contributed to the current opioid crisis?
  • Analyze the history and impact of the HIV/AIDS epidemic .
  • Are low-carbohydrate or low-fat diets more effective for weight loss?
  • How much exercise should the average adult be getting each week?
  • Which methods are most effective to get parents to vaccinate their children?
  • What are the pros and cons of clean needle programs?
  • How does stress affect the body?
  • Discuss the history of the conflict between Israel and the Palestinians.
  • What were the causes and effects of the Salem Witch Trials?
  • Who was responsible for the Iran-Contra situation?
  • How has New Orleans and the government's response to natural disasters changed since Hurricane Katrina?
  • What events led to the fall of the Roman Empire?
  • What were the impacts of British rule in India ?
  • Was the atomic bombing of Hiroshima and Nagasaki necessary?
  • What were the successes and failures of the women's suffrage movement in the United States?
  • What were the causes of the Civil War?
  • How did Abraham Lincoln's assassination impact the country and reconstruction after the Civil War?
  • Which factors contributed to the colonies winning the American Revolution?
  • What caused Hitler's rise to power?
  • Discuss how a specific invention impacted history.
  • What led to Cleopatra's fall as ruler of Egypt?
  • How has Japan changed and evolved over the centuries?
  • What were the causes of the Rwandan genocide ?

main_lincoln

  • Why did Martin Luther decide to split with the Catholic Church?
  • Analyze the history and impact of a well-known cult (Jonestown, Manson family, etc.)
  • How did the sexual abuse scandal impact how people view the Catholic Church?
  • How has the Catholic church's power changed over the past decades/centuries?
  • What are the causes behind the rise in atheism/ agnosticism in the United States?
  • What were the influences in Siddhartha's life resulted in him becoming the Buddha?
  • How has media portrayal of Islam/Muslims changed since September 11th?

Science/Environment

  • How has the earth's climate changed in the past few decades?
  • How has the use and elimination of DDT affected bird populations in the US?
  • Analyze how the number and severity of natural disasters have increased in the past few decades.
  • Analyze deforestation rates in a certain area or globally over a period of time.
  • How have past oil spills changed regulations and cleanup methods?
  • How has the Flint water crisis changed water regulation safety?
  • What are the pros and cons of fracking?
  • What impact has the Paris Climate Agreement had so far?
  • What have NASA's biggest successes and failures been?
  • How can we improve access to clean water around the world?
  • Does ecotourism actually have a positive impact on the environment?
  • Should the US rely on nuclear energy more?
  • What can be done to save amphibian species currently at risk of extinction?
  • What impact has climate change had on coral reefs?
  • How are black holes created?
  • Are teens who spend more time on social media more likely to suffer anxiety and/or depression?
  • How will the loss of net neutrality affect internet users?
  • Analyze the history and progress of self-driving vehicles.
  • How has the use of drones changed surveillance and warfare methods?
  • Has social media made people more or less connected?
  • What progress has currently been made with artificial intelligence ?
  • Do smartphones increase or decrease workplace productivity?
  • What are the most effective ways to use technology in the classroom?
  • How is Google search affecting our intelligence?
  • When is the best age for a child to begin owning a smartphone?
  • Has frequent texting reduced teen literacy rates?

body_iphone2

How to Write a Great Research Paper

Even great research paper topics won't give you a great research paper if you don't hone your topic before and during the writing process. Follow these three tips to turn good research paper topics into great papers.

#1: Figure Out Your Thesis Early

Before you start writing a single word of your paper, you first need to know what your thesis will be. Your thesis is a statement that explains what you intend to prove/show in your paper. Every sentence in your research paper will relate back to your thesis, so you don't want to start writing without it!

As some examples, if you're writing a research paper on if students learn better in same-sex classrooms, your thesis might be "Research has shown that elementary-age students in same-sex classrooms score higher on standardized tests and report feeling more comfortable in the classroom."

If you're writing a paper on the causes of the Civil War, your thesis might be "While the dispute between the North and South over slavery is the most well-known cause of the Civil War, other key causes include differences in the economies of the North and South, states' rights, and territorial expansion."

#2: Back Every Statement Up With Research

Remember, this is a research paper you're writing, so you'll need to use lots of research to make your points. Every statement you give must be backed up with research, properly cited the way your teacher requested. You're allowed to include opinions of your own, but they must also be supported by the research you give.

#3: Do Your Research Before You Begin Writing

You don't want to start writing your research paper and then learn that there isn't enough research to back up the points you're making, or, even worse, that the research contradicts the points you're trying to make!

Get most of your research on your good research topics done before you begin writing. Then use the research you've collected to create a rough outline of what your paper will cover and the key points you're going to make. This will help keep your paper clear and organized, and it'll ensure you have enough research to produce a strong paper.

What's Next?

Are you also learning about dynamic equilibrium in your science class? We break this sometimes tricky concept down so it's easy to understand in our complete guide to dynamic equilibrium .

Thinking about becoming a nurse practitioner? Nurse practitioners have one of the fastest growing careers in the country, and we have all the information you need to know about what to expect from nurse practitioner school .

Want to know the fastest and easiest ways to convert between Fahrenheit and Celsius? We've got you covered! Check out our guide to the best ways to convert Celsius to Fahrenheit (or vice versa).

These recommendations are based solely on our knowledge and experience. If you purchase an item through one of our links, PrepScholar may receive a commission.

Trending Now

How to Get Into Harvard and the Ivy League

How to Get a Perfect 4.0 GPA

How to Write an Amazing College Essay

What Exactly Are Colleges Looking For?

ACT vs. SAT: Which Test Should You Take?

When should you take the SAT or ACT?

Get Your Free

PrepScholar

Find Your Target SAT Score

Free Complete Official SAT Practice Tests

How to Get a Perfect SAT Score, by an Expert Full Scorer

Score 800 on SAT Math

Score 800 on SAT Reading and Writing

How to Improve Your Low SAT Score

Score 600 on SAT Math

Score 600 on SAT Reading and Writing

Find Your Target ACT Score

Complete Official Free ACT Practice Tests

How to Get a Perfect ACT Score, by a 36 Full Scorer

Get a 36 on ACT English

Get a 36 on ACT Math

Get a 36 on ACT Reading

Get a 36 on ACT Science

How to Improve Your Low ACT Score

Get a 24 on ACT English

Get a 24 on ACT Math

Get a 24 on ACT Reading

Get a 24 on ACT Science

Stay Informed

Get the latest articles and test prep tips!

Follow us on Facebook (icon)

Christine graduated from Michigan State University with degrees in Environmental Biology and Geography and received her Master's from Duke University. In high school she scored in the 99th percentile on the SAT and was named a National Merit Finalist. She has taught English and biology in several countries.

Ask a Question Below

Have any questions about this article or other topics? Ask below and we'll reply!

Reference management. Clean and simple.

The top list of academic research databases

best research databases

2. Web of Science

5. ieee xplore, 6. sciencedirect, 7. directory of open access journals (doaj), get the most out of your academic research database, frequently asked questions about academic research databases, related articles.

Whether you are writing a thesis , dissertation, or research paper it is a key task to survey prior literature and research findings. More likely than not, you will be looking for trusted resources, most likely peer-reviewed research articles.

Academic research databases make it easy to locate the literature you are looking for. We have compiled the top list of trusted academic resources to help you get started with your research:

Scopus is one of the two big commercial, bibliographic databases that cover scholarly literature from almost any discipline. Besides searching for research articles, Scopus also provides academic journal rankings, author profiles, and an h-index calculator .

  • Coverage: 90.6 million core records
  • References: N/A
  • Discipline: Multidisciplinary
  • Access options: Limited free preview, full access by institutional subscription only
  • Provider: Elsevier

Search interface of Scopus

Web of Science also known as Web of Knowledge is the second big bibliographic database. Usually, academic institutions provide either access to Web of Science or Scopus on their campus network for free.

  • Coverage: approx. 100 million items
  • References: 1.4 billion
  • Access options: institutional subscription only
  • Provider: Clarivate (formerly Thomson Reuters)

Web of Science landing page

PubMed is the number one resource for anyone looking for literature in medicine or biological sciences. PubMed stores abstracts and bibliographic details of more than 30 million papers and provides full text links to the publisher sites or links to the free PDF on PubMed Central (PMC) .

  • Coverage: approx. 35 million items
  • Discipline: Medicine and Biological Sciences
  • Access options: free
  • Provider: NIH

Search interface of PubMed

For education sciences, ERIC is the number one destination. ERIC stands for Education Resources Information Center, and is a database that specifically hosts education-related literature.

  • Coverage: approx. 1.6 million items
  • Discipline: Education
  • Provider: U.S. Department of Education

Search interface of ERIC academic database

IEEE Xplore is the leading academic database in the field of engineering and computer science. It's not only journal articles, but also conference papers, standards and books that can be search for.

  • Coverage: approx. 6 million items
  • Discipline: Engineering
  • Provider: IEEE (Institute of Electrical and Electronics Engineers)

Search interface of IEEE Xplore

ScienceDirect is the gateway to the millions of academic articles published by Elsevier, 1.4 million of which are open access. Journals and books can be searched via a single interface.

  • Coverage: approx. 19.5 million items

Search interface of ScienceDirect

The DOAJ is an open-access academic database that can be accessed and searched for free.

  • Coverage: over 8 million records
  • Provider: DOAJ

Search interface of DOAJ database

JSTOR is another great resource to find research papers. Any article published before 1924 in the United States is available for free and JSTOR also offers scholarships for independent researchers.

  • Coverage: more than 12 million items
  • Provider: ITHAKA

Search interface of JSTOR

Start using a reference manager like Paperpile to save, organize, and cite your references. Paperpile integrates with PubMed and many popular databases, so you can save references and PDFs directly to your library using the Paperpile buttons:

top 10 research papers

Scopus is one of the two big commercial, bibliographic databases that cover scholarly literature from almost any discipline. Beside searching for research articles, Scopus also provides academic journal rankings, author profiles, and an h-index calculator .

PubMed is the number one resource for anyone looking for literature in medicine or biological sciences. PubMed stores abstracts and bibliographic details of more than 30 million papers and provides full text links to the publisher sites or links to the free PDF on PubMed Central (PMC)

top 10 research papers

View the latest institution tables

View the latest country/territory tables

Google Scholar reveals its most influential papers for 2021

Early clinical observations of COVID-19 and its mortality risk factors among the most cited output, while a five-year-old AI paper continues to command attention.

top 10 research papers

Examples of using SSD, an object-detection algorithm described in a highly cited artificial intelligence paper. Credit: Wei Liu et al. European Conference on Computer Vision (2016)

24 August 2021

top 10 research papers

Wei Liu et al. European Conference on Computer Vision (2016)

Examples of using SSD, an object-detection algorithm described in a highly cited artificial intelligence paper.

COVID-19-related papers have eclipsed artificial intelligence research in the annual listing of the most highly-cited publications in the Google Scholar database. The most highly cited COVID-19 paper, published in The Lancet in early 2020, has garnered more than 30,000 citations to date (see below for paper summary).

But, in the database of almost 400 million academic papers and other scholarly literature, even it fell a long way short of the most highly cited paper of the last five years, ‘Deep Residual Learning for Image Recognition’, published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition by a team from Microsoft in 2016.

The five-year-old paper’s astonishing ascendancy continues, from 25,256 citations in 2019 to 49,301 citations in 2020 to 82,588 citations in 2021. We wrote about it last year here .

The 2021 Google Scholar Metrics ranking tracks papers published between 2016 and 2020, and includes citations from all articles that were indexed in Google Scholar as of July 2020. Google Scholar is the largest database in the world of its kind.

Below we describe selections from Google Scholar’s most highly-cited articles for 2021. COVID-19 research dominated new arrivals in the list, but we’re also featuring a popular AI paper from 2016, and research that provides an economical shortcut to seeing patterns of human genetic variation, also from 2016.

See our coverage of the 2019 and 2020 lists.

‘Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China’

30,529 citations

Published in February 2020, this is one of the earliest papers to describe the clinical characteristics of COVID-19. It was authored by researchers in China and doctors working in hospitals in Wuhan, the city where COVID-19 was first detected in late 2019.

The team, from institutions such as the Jin Yin-tan Hospital in Wuhan and China-Japan Friendship Hospital in Beijing, reviewed the clinical and nursing reports, chest X-rays and lab results of the first 41 COVID-19 patients. They noted that the novel virus acts similarly to SARS and MERS, in that it causes pneumonia, but is different in that it seldom manifests as a runny nose or intestinal symptoms.

The final sentences of the paper call for robust and rapid testing, because of the likelihood of the disease spreading out of control:

“Reliable quick pathogen tests and feasible differential diagnosis based on clinical description are crucial for clinicians in their first contact with suspected patients. Because of the pandemic potential of 2019-nCoV, careful surveillance is essential to monitor its future host adaption, viral evolution, infectivity, transmissibility, and pathogenicity.”

The paper has been referenced or cited in almost 100 policy documents to date , including several released by the World Health Organization on topics such as mask-wearing and clinical care of patients with severe symptoms .

‘Clinical Characteristics of Coronavirus Disease 2019 in China’

New England Journal of Medicine

19,656 citations

Published online in February 2020, this study was a retrospective review of medical records for 1,099 COVID-19 cases reported to the National Health Commission of the People's Republic of China between 11 December 2019 and 29 January 2020.

The team, which included almost 40 researchers from China from institutions such as the Guangzhou Medical University in Guangzhou and Wuhan Jinyintan Hospital in Wuhan, accessed electronic medical records from 552 hospitals in mainland China to summarise exposure risk, signs and symptoms, laboratory and radiologic findings related to COVID-19 infection.

The study garnered a lot of media attention based on the evidence it put forward that men might be more severely impacted by disease – 58% of the patient cohort were male.

However, as Sharon Begley reported for STAT , “It’s possible the apparent sex imbalance reflects patterns of travel and contacts that make men more likely to be exposed to carriers of the virus, not any inherent biological differences. It’s also possible the apparent worse disease severity in men could skew the data.”

A paper published in JAMA around the same time by researchers in the United States reported that, among hospitalized patients, there is “a slight predominance of men”.

A Nature Communications meta-analysis , published in December 2020, looked at 92 studies covering more than three million patients and concluded that, while males and females appeared to be susceptible to infection, men were 2.84 times more likely to be end up in intensive care and 1.39 times more likely to die from the disease.

‘Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study’

17,047 citations

Published in March 2020, The Lancet described this study as the first time researchers have examined risk factors associated with severe symptoms and death in hospitalised or deceased patients. Of the 191 patients studied, 137 were discharged from hospital and 54 died.

The study, by researchers from hospitals in China, also presented new data on viral shedding – information that informed early understanding of how the virus spreads and can be detected over the cause of infection.

“The extended viral shedding noted in our study has important implications for guiding decisions around isolation precautions and antiviral treatment in patients with confirmed COVID-19 infection,” said co-lead author, Bin Cao, from the China-Japan Friendship Hospital and Capital Medical University in Beijing.

“However, we need to be clear that viral shedding time should not be confused with other self-isolation guidance for people who may have been exposed to COVID-19 but do not have symptoms, as this guidance is based on the incubation time of the virus.”

‘A Novel Coronavirus from Patients with Pneumonia in China, 2019’

The New England journal of medicine

16,194 citations

On 31 December 2019, the Chinese Center for Disease Control and Prevention (China CDC) dispatched a rapid response team to accompany health authorities in Hubei province and Wuhan city in conducting COVID-19 investigations.

This study, published in January 2020, reported the results of that investigation, including the clinical features of the pneumonia of two patients.

Described by Jose Manuel Jimenez-Guardeño, a researcher in the Department of Infectious Diseases at King's College London , UK and colleagues in an article for The Conversation as “the article that released this virus to the world”, the paper details how the virus was isolated from patients with pneumonia in Wuhan in cell cultures.

“In fact, actual photographs of SARS-CoV-2 were shown to the world for the first time here,” say Jimenez-Guardeño and his co-authors .

alt

The study authors urged that more epidemiologic investigations were needed in order to characterize transmission modes, reproduction intervals and other characteristics of the virus to inform strategies to control and stop its spread.

‘SSD: Single Shot MultiBox Detector’

European Conference on Computer Vision

15,368 citations

A change of pace from recent COVID-19 studies, this paper, led by Wei Liu from the University of North Carolina at Chapel Hill and published in 2016, remains one of the most highly cited in the field of artificial intelligence (AI). It describes a new method for detecting objects in images or video footage using a single deep neural network – a set of AI algorithms inspired by the neurological processes that fire in the human cerebral cortex.

The approach, called the Single Shot MultiBox Detector, or SSD, has been described as faster than Faster R-CNN – another object detection technology that was described in a very highly cited paper published in 2015 ( see our coverage here ).

SSD works by dividing the image into a grid, with each grid cell responsible for detecting objects within that part of the image. As the name indicates, the network is able to identify all objects within an image in a single pass, allowing for real-time analysis.

SSD is now one of a handful of object detection technologies that are now available. YOLO (You Only Look Once) is a similar single-shot object detection algorithm, whereas R-CNN and Faster R-CNN use a two-step approach , which involves first identifying the regions where objects might be, and then detecting them.

‘Analysis of protein-coding genetic variation in 60,706 humans’

7,696 citations

Led by Monkol Lek from the University of Sydney in Australia and Daniel MacArthur from the Broad Institute of MIT and Harvard University , this 2016 paper presents an open-access catalogue of more than 60,000 human exome sequences (exomes are the coding portions of genes) from people of European, African, South Asian, East Asian, and Latinx ancestry.

The collection was compiled as part of the Exome Aggregation Consortium project, run by an international group of researchers with a focus on exome sequencing. As exomes only make up about 2% of the human genome , the approach has been praised for being able to highlight patterns of genetic variation, including known disease-related variants, in a more cost-effective way than whole-genome sequencing.

Presented at a 2015 genomics conference, the catalogue encompasses 7.4 million genetic variants, which can be used to identify those connected to rare diseases. “Large-scale reference datasets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes,” Lek said when the paper was published.

Sign up to the Nature Index newsletter

Get regular news, analysis and data insights from the editorial team delivered to your inbox.

  • Sign up to receive the Nature Index newsletter. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy .

top 10 research papers

  • Green Goals
  • Recommender
  • QATAR AIRWAYS
  • Red Sea Global
  • AMAZON PAYMENT SERVICES
  • THE MOST INNOVATIVE COMPANIES IN THE KINGDOM REPORT
  • Most Creative People In Business 2024
  • Most Creative People In Business 2023
  • MOST CREATIVE PEOPLE IN BUSINESS 2022
  • Most Innovative Companies 2024
  • Most Innovative Companies 2023
  • Most Innovative Companies 2022
  • 100 IDEAS THAT WILL SHAPE 2023
  • Next Big Things in Retail 2024
  • Innovation by Design Summit
  • The Green Goals Summit
  • World Changing Ideas Middle East Summit
  • IMPACT COUNCIL
  • RED SEA GLOBAL

CREATE A NEW ACCOUNT

top 10 research papers

  • 08-21-24 | 8:00 am

How AI tools help students—and their professors—in academic research

New systems can help surface relevant research papers and quickly understand what they have to say..

How AI tools help students—and their professors—in academic research

For students and professional scholars alike, starting a new research project typically means digging through academic literature to understand what others have already written.

That can take a considerable amount of time, with researchers tracking down and combing through journal articles to begin their research and contextualize their own findings. But a growing collection of AI-powered tools aims to make that process easier. These new tools can help researchers more quickly find relevant papers, pull out relevant information from them, or both.

“It can be a really helpful way to get started with research, especially for students who aren’t familiar with the research process,” says Breanne Kirsch, director of the library at Illinois College. “As long as they’re taught how to use it in an ethical way, and that they can then expand beyond what it does.”

A tool called  Elicit  can help researchers conduct what are called  systematic reviews , which involve going through copious amounts of published research to find an answer to a question, like how a particular drug affects a medical condition. “It’s all very, very manual,” says James Brady, head of engineering at Elicit. “It takes teams of people many months, and you know, costs hundreds of thousands or millions of dollars to do these things.”

Elicit can make that process much faster, and also help researchers by quickly finding and summarizing published papers related to a particular question. It can also generate tables describing a whole set of relevant papers, with columns for data points like algorithms and statistical techniques used, variables examined, and the number of participants in experiments.

The company recommends researchers still look at the original papers, and Brady emphasizes that the tool doesn’t replace the human judgment and analysis necessary to scientific research. “It’s not like you take the final step of Elicit and hit the publish button and then it ends up in  Nature  or something,” he says, but it can still greatly speed the process of sifting through and understanding prior work.

Understanding how AI can help academic research is part of a larger industry question of how and when the technology can  replace or supplement  traditional web search tools. And  since the 1990s , computer scientists have realized that the academic publishing landscape—where scholars cite each other’s papers and publish in journals with a particular reputation in a particular field—isn’t that different from the  internet ecosystem . That means techniques for finding relevant materials, minimizing AI errors and hallucinations, and presenting useful and verifiable results to the user may transfer from academia to the broader web.

Indeed, not everyone searching for scientific answers is a professional scientist. And the organizations behind these tools say they can be especially helpful for people looking to understand new fields of interest, whether they’re students, professionals doing interdisciplinary work, or interested members of the public.

Eric Olson, cofounder and CEO at AI research search engine  Consensus , says about 50% of the tool’s research is at academic institutions, where it’s often used by graduate students. “We typically do quite well with folks who need that easy, quick access to research but maybe aren’t a full-blown expert yet,” he says.

Consensus lets users type in natural language queries to get answers summarized from across published work. It surfaces summaries of particular papers, metadata like publication year and citation count, and an indication of how much scientific consensus there is about a particular question. Another popular audience for the tool is healthcare workers, including doctors, who use the tool to get insights more quickly than traditional scholarly search engines or Google can provide. Everyday users also use Consensus to research health topics, parenting practices, and policy issues in the news, Olson says.

Like other companies in the field, Consensus doesn’t simply rely on a single GPT-style large language model to generate answers to user questions. The company deploys a custom search engine to find papers addressing a query, and a variety of expert-trained language models to extract relevant information and—equally important—verify the paper is actually on topic, cutting the chance that an overzealous AI model will try to point out facts that aren’t actually there.

“I’m only gonna let this go to the model if we think that it actually has a relevant insight in it,” Olson says. “It’s a really great trick to reduce the risk of misinterpreting the paper.”

Academic publishing giant Elsevier has similarly developed a  tool called Scopus AI  to search through research collected in its  Scopus database , which includes article abstracts and metadata from tens of thousands of journals (including those published by rival publishers). Scopus AI can generate summary responses based on particular queries, suggest additional questions to help users expand their knowledge of the field, and highlight “foundational papers” and “topic expert” authors who have especial influence in an area of expertise.

“We’ve actually found this is quite a shared need across a number of different people who are at this precipice of trying to understand another domain,” says Maxim Khan, SVP of analytics products and data platform at Elsevier.

Khan says users have confirmed it helps them understand new fields faster and come across papers they might not otherwise have discovered. Thanks in part to licensing terms, the tool doesn’t include full text, meaning users can’t directly query about material in articles beyond the abstracts and citations.

Other software can help users dive deep into specific research. An  AI tool from JStor , still in limited beta, lets users see article summaries customized to their particular queries and can answer questions based on document contents, pointing to particular passages that contain the answer. That can help users figure out which papers are relevant enough for a close read, and the tool can also point to other topics or particular papers for a user to investigate based on particular passages.

The organization, with its focus on helping students with research, deliberately doesn’t generate aggregate answers to particular questions from multiple articles. Beth LaPensee, senior product manager at Ithaka, says the software can help students learning research skills and specialized vocabulary understand material they might otherwise struggle with. In a June blog post, Guthrie and LaPensee compared the process to learning the basic plot of a Shakespeare play before diving into the antiquated text, and say it can be especially helpful with humanities and social science papers that customarily don’t include abstracts.

The software has also proven helpful to professors. “One faculty member we were talking to said that they could do in one day what used to take them four or five days,” LaPensee says.

And the organization has found participants in the AI beta, which is slated to expand in the fall, spend “significantly more time on JStor” than other users.

Measuring results—and even knowing what to measure—is naturally an important part of testing new AI resources. Since 2015, a  project called Semantic Scholar  has focused on using AI to analyze scientific papers. It’s part of  Ai2 , the AI research institute founded by late Microsoft cofounder Paul Allen, and today it includes features to help users understand papers, like surfacing definitions of technical terms from within a paper or other research it cites, answering general questions about specific papers, and generating “tl; dr” summaries of papers based on the types of descriptions authors post on social media.

How to test whether those summaries were helpful wasn’t immediately obvious, recalls Dan Weld, chief scientist and general manager of Semantic Scholar. If users were benefiting from them, they might either click more articles from search results—if the summaries indicated they were interesting—or fewer, if the summaries helped them weed out extraneous results. But when the summaries were later added to email alerts, the results seemed positive—users clicked fewer emailed articles overall, but were more likely to save articles they clicked, suggesting the summaries steered them to interesting work.

Evaluating a feature Semantic Scholar is currently testing to answer questions from across multiple papers is even more challenging, according to Weld, who says, “It’s really quite difficult to compare different systems. There are some other systems out there that do question answering—we think ours is better than theirs, but we can’t prove it yet.”

And since different AI research tools have access to different sets of papers as well as different features, researchers may still find they need to use multiple AI platforms—often along with traditional database tools—to find everything they need. It’s important to note, Illinois College’s Kirsch says, that reading AI summaries can’t substitute for working through actual papers and verifying that they say what the tools claim, tempting though it can be.

“While the generative AI tools may help as a starting point, just like Wikipedia would, you still want to go to some of those actual sources,” she says. “You can’t just rely solely on the GenAI tools. You also need to look at the sources themselves and make sure it really does make sense for what you’re trying to do and what you’re trying to research.”

top 10 research papers

Featured Videos

Issam Kazim on Dubai's quest to be the world's most visited destination | PART 1

Today's Top Stories:

top 10 research papers

EGA set to enhance global recycling network with Spectro Alloys acquisition

top 10 research papers

Egypt's ‘Mashroa’ak’ Program: $598 million investment fuels 1.4 million jobs and 212,800 projects

top 10 research papers

Saudi Ministry unveils $26.65 million initiative to transform Najran into mining hub

top 10 research papers

Saudi Arabia’s economy set to grow in second half of 2024

top 10 research papers

Is there a shift towards holistic wellness in the Middle East?

More top stories:.

top 10 research papers

FROM OUR PARTNERS

top 10 research papers

Spectro Alloys Corporation will retain the remaining 20% after the acquisition.

Egypt’s Mashroa’ak program creates over 1.4 million jobs since its debut in 2015

Egypt’s ‘Mashroa’ak’ Program: $598 million investment fuels 1.4 million jobs and 212,800 projects

The program has similarly facilitated $598.1 million in loans.

Saudi Arabia offers over 120 incentives for mining investors

The kingdom’s mineral wealth is estimated at $2.5 trillion.

top 10 research papers

[pmpro_login show_menu=”true” show_logout_link=”true”] Create an Account

You can see how this popup was set up in our step-by-step guide: https://wppopupmaker.com/guides/auto-opening-announcement-popups/

Unparalleled Journalism. Start Your Subscription Today.

cancel anytime

Monthly Digital Access

$1.99 / month

Two Year Digital Access

Best Value $34.99 / year

Annual Digital Access

$19.99 / year

  • Access to the award-winning journalism on fastcompany.com
  • Delivery of the latest digital issue and access to the digital magazine archive
  • The Fast Company app
  • Insightful videos and webinars
  • Exclusive coverage of the Fast Company franchises, including Most Innovative Companies, Innovation by Design, World Changing Ideas, and more
  • Exploratory conversation with our weekly podcasts

top 10 research papers

Student subscriptions

top 10 research papers

Gift subscriptions

top 10 research papers

Group subscriptions

Have an inquiry? Visit our Help Center

[pmpro_levels]

Create a new account

Or continue with, reset password.

  • Frontiers in Medicine
  • Intensive Care Medicine and Anesthesiology
  • Research Topics

Advancements in Mechanical Ventilation: Understanding Physiology to Mitigate Complications

Total Downloads

Total Views and Downloads

About this Research Topic

Minimizing the risk of Ventilation-Induced Lung Injury (VILI) necessitates a profound understanding of the physiology of mechanical ventilation. However, this critical aspect has not been extensively addressed. Similarly, there is a limited number of experiments geared towards understanding the mechanism of lung-diaphragm interaction and its association with Ventilation-Induced Diaphragm Dysfunction (VIDD). The aim of this Research Topic is to gather comprehensive insights into the understanding of the physiology of mechanical ventilation. We aim to explore potential strategies aimed at enhancing gas exchange and preventing complications, such as barotrauma (pneumothorax, pneumomediastinum, and subcutaneous emphysema). We also consider it critical to delve into the respiratory physiology and gas exchange within the context of Extracorporeal Membrane Oxygenation (ECMO), another significant research area worthy of further investigation. Thus, it is paramount to clarify the physiology of mechanical ventilation. In this Research Topic we welcome all the techniques available to clinicians to improve the patient gas exchange of the respiratory system with non-invasive and invasive respiratory support (NRS) or replacing mechanical ventilation with extracorporeal membrane oxygenation (ECMO). Setting the mechanical ventilation, minimizing the risks of ventilation lung injury (VILI) can help to improve the outcome of the patients in intensive care unit (ICU), and particular emphasis will be placed on the articles focusing on the lung-diaphragm interaction, identifying strategies that can mitigate the effects in terms of ventilation induce diaphragm dysfunction (VIDD) and sepsis induce diaphragm dysfunction (SIDD). We welcome submissions of original research articles, reviews, case series, and case reports that explore optimal respiratory strategies to mitigate VILI and VIDD. We are particularly interested in understanding how and when these strategies could be implemented to reduce barotrauma. Furthermore, we also encourage submissions of studies that attempt to elucidate the mechanisms of lung physiology during ECMO treatment.

Keywords : Mechanical Ventilation, Ventilation-Induced Lung Injury (VILI), Ventilation-Induced Diaphragm Dysfunction (VIDD), Extracorporeal Membrane Oxygenation (ECMO), Respiratory Physiology

Important Note : All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic Editors

Topic coordinators, submission deadlines.

Manuscript Summary
Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

total views

  • Demographics

No records found

total views article views downloads topic views

Top countries

Top referring sites, about frontiers research topics.

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

IMAGES

  1. The nations with the most published scientific papers (infographic

    top 10 research papers

  2. The 20 most cited papers by year.

    top 10 research papers

  3. Types of research papers

    top 10 research papers

  4. 🏷️ Best topics to write a research paper on. 200 Best Research Paper

    top 10 research papers

  5. What Are The Top 8 Type of Research Papers?

    top 10 research papers

  6. 43+ Research Paper Examples

    top 10 research papers

COMMENTS

  1. The top 100 papers : Nature News & Comment

    Twelve papers on the top-100 list relate to it, including 2 of the top 10. At its heart, DFT is an approximation that makes impossible mathematics easy, says Feliciano Giustino, a materials ...

  2. The top 10 journal articles of 2022

    Thematic analysis (TA) is a method used in qualitative research to examine themes or patterns of meaning within a data set, with the goal of answering a specific research question. This paper in Qualitative Psychology (Vol. 9, No. 1) aims to bolster researchers' conceptual and design thinking when using TA to produce more methodologically ...

  3. Top 10 Research Topics from 2021

    Find the answers to your biggest research questions from 2021. With collective views of over 3.7 million, researchers explored topics spanning from nutritional

  4. Journal Top 100

    Journal Top 100 - 2022. This collection highlights our most downloaded* research papers published in 2022. Featuring authors from around the world, these papers highlight valuable research from an ...

  5. The 100 most-cited scientific papers

    The 100 most-cited scientific papers. Here at Science we love ranking things, so we were thrilled with this list of the top 100 most-cited scientific papers, courtesy of Nature. Surprisingly absent are many of the landmark discoveries you might expect, such as the discovery of DNA's double helix structure. Instead, most of these influential ...

  6. Journal Top 100

    Journal Top 100 This collection highlights our most downloaded* research papers published in 2020. Featuring authors from around the world, these papers showcase valuable research from an ...

  7. Top-10 Research Papers in AI

    Mar 8, 2021. 5. Each year scientists from around the world publish thousands of research papers in AI but only a few of them reach wide audiences and make a global impact in the world. Below are the top-10 most impactful research papers published in top AI conferences during the last 5 years. The ranking is based on the number of citations and ...

  8. SSRN

    Total New Downloads. Total # of Downloads. Total # of Citations. # of Authors. 1. The Sweep and Force of Section Three William Baude and Michael Stokes Paulsen. University of Chicago - Law School and University of St. Thomas School of Law. Date Posted: 14 Aug 2023Last Revised: 26 Feb 2024. 107,426.

  9. The top 10 journal articles of 2020

    DOI: 10.1037/tra0000592. 10. Emotional Intelligence Predicts Academic Performance: A Meta-Analysis MacCann, C., et al. Students with high emotional intelligence get better grades and score higher on standardized tests, according to the research presented in this article in Psychological Bulletin (Vol. 146, No. 2). Researchers analyzed data from ...

  10. GitHub

    Paper Links; 1) Llama 3.1 - a collection of LLMs that include 8B, 70B, and 405B parameters models; supports eight languages and extends the context window to 128K tokens; performs competitively and in some cases outperforms state-of-the-art models across capabilities like general knowledge, math reasoning, and tool use. Paper, Tweet: 2) AlphaProof & Alpha Geometry 2 - solved 4 out of 6 ...

  11. The Top 100 Most Cited Scientific Papers in the Public, Environmental

    The Web of Science (WoS) online database includes all important research papers and provides integrated analysis tools to produce representative ... There were up to 22 journals with an IF < 5.000, seven between 5.000 and 10.000, and four journals with an IF > 10.000. Table 5 shows the top nine journals that published three or more ...

  12. The best academic search engines [Update 2024]

    Get 30 days free. 1. Google Scholar. Google Scholar is the clear number one when it comes to academic search engines. It's the power of Google searches applied to research papers and patents. It not only lets you find research papers for all academic disciplines for free but also often provides links to full-text PDF files.

  13. Top 10 Influential AI Research Papers in 2023 from Google, Meta

    Top 10 AI Research Papers 2023. 1. Sparks of AGI by Microsoft Summary. In this research paper, a team from Microsoft Research analyzes an early version of OpenAI's GPT-4, which was still under active development at the time. The team argues that GPT-4 represents a new class of large language models, exhibiting more generalized intelligence ...

  14. Top 100 Journal Publications In The World: Global Rankings

    Score: 368 - Field: Medicine. The Lancet's long-standing reputation for publishing high-quality medical research makes it a go-to source for clinicians and researchers alike, covering a wide array of medical disciplines. Nature Communications: Score: 349 - Field: Multidisciplinary.

  15. Top 100 Research Paper Topics: Start Smart

    Top 10 Technology Research Paper Topics: See topics related to the cutting-edge technology or dive into history of electronics, or even early advances in agriculture. Food Preservation: Freeze Drying, Irradiation, and Vacuum Packing. Tissue Culturing.

  16. 113 Great Research Paper Topics · PrepScholar

    113 Great Research Paper Topics. One of the hardest parts of writing a research paper can be just finding a good topic to write about. Fortunately we've done the hard work for you and have compiled a list of 113 interesting research paper topics. They've been organized into ten categories and cover a wide range of subjects so you can easily ...

  17. The top list of academic research databases

    Organize your papers in one place. Try Paperpile. 1. Scopus. Scopus is one of the two big commercial, bibliographic databases that cover scholarly literature from almost any discipline. Besides searching for research articles, Scopus also provides academic journal rankings, author profiles, and an h-index calculator. 2.

  18. 10 Best Online Academic Research Tools and Resources

    3. Library of Congress. As the largest library in the world, the Library of Congress is an amazing online resource for academic research. Students can search its collections to access digital resources, videos, audio recordings, photographs, and maps. The library's materials also include notated music, web archives, legislation, and 3D objects.

  19. Journal Top 100

    Journal Top 100. This collection highlights our most downloaded* research papers published in 2021. Featuring authors from around the world, these papers highlight valuable research from an ...

  20. Google Scholar reveals its most influential papers for 2020

    The journal, Nucleic Acids Research, while ranked outside the top 10 of Google Scholar's most influential journals, has more papers with 3,000+ citations each than The Lancet (ranked 4th). 7.

  21. Search

    Find the research you need | With 160+ million publications, 1+ million questions, and 25+ million researchers, this is where everyone can access science

  22. Top 10 Research Papers on GenAI

    Here are our top 10 picks from the hundreds of research papers published on GenAI. 1. Improving Language Understanding by Generative Pre-Training. This research paper explores a semi-supervised approach for enhancing natural language understanding tasks by combining unsupervised pre-training and supervised fine-tuning.

  23. Google Scholar reveals its most influential papers for 2021

    The five-year-old paper's astonishing ascendancy continues, from 25,256 citations in 2019 to 49,301 citations in 2020 to 82,588 citations in 2021. We wrote about it last year here. The 2021 ...

  24. How AI tools help students—and their professors—in academic research

    It's part of Ai2, the AI research institute founded by late Microsoft cofounder Paul Allen, and today it includes features to help users understand papers, like surfacing definitions of technical terms from within a paper or other research it cites, answering general questions about specific papers, and generating "tl; dr" summaries of ...

  25. Advancements in Mechanical Ventilation: Understanding Physiology to

    Minimizing the risk of Ventilation-Induced Lung Injury (VILI) necessitates a profound understanding of the physiology of mechanical ventilation. However, this critical aspect has not been extensively addressed. Similarly, there is a limited number of experiments geared towards understanding the mechanism of lung-diaphragm interaction and its association with Ventilation-Induced Diaphragm ...