Short-Term Papers: Types, Examples, and Financial Insights

Last updated 03/08/2024 by

Fact checked by

Compare Investment Advisors

The world of short-term paper, understanding short-term papers, investing and issuing short-term papers, issuers of short-term papers, pros and cons of short-term paper.

  • Provides a quick and low-risk financing alternative
  • Allows for easy accessibility of funds
  • Attractive returns for investors
  • May have lower yields compared to other investments
  • Dependent on the stability of issuers
  • Potential for fluctuations in market conditions

Examples of short-term papers

1. treasury bills (t-bills), 2. promissory notes, 3. certificates of deposit (cds), the role of short-term papers in economic stability, 1. liquidity provision, 2. flexible financing for businesses, 3. money market funds, 4. short-term municipal notes, factors influencing short-term paper yields, 1. economic conditions, 2. credit ratings, 3. interest rate policies, frequently asked questions, what is the primary purpose of issuing short-term papers, how do short-term papers contribute to liquidity in financial markets, are short-term papers only suitable for institutional investors, can short-term papers be adjusted to meet specific investor needs, what role did short-term papers play during the 2008 financial meltdown, do short-term papers always have maturities of less than nine months, how do credit ratings affect the issuance of short-term papers, key takeaways.

  • Short-term papers, with maturities under nine months, offer a low-risk financing alternative for governments, corporations, and financial institutions.
  • Investors find short-term papers attractive due to their safety, liquidity, and the potential for returns, making them a preferred choice for institutional investors.
  • Examples of short-term papers include Treasury Bills, commercial paper, promissory notes, certificates of deposit, money market funds, and short-term municipal notes.
  • Short-term papers play a vital role in economic stability, contributing to liquidity provision and providing flexible financing options for businesses during changing market conditions.
  • Factors influencing short-term paper yields include economic conditions, credit ratings, and central bank interest rate policies, shaping the dynamic nature of returns for investors and issuers.

Community reviews are used to determine product recommendation ratings, but these ratings are not influenced by partner compensation. SuperMoney checks for and removes fake reviews when identified.

Loading results ...

Show Article Sources

You might also like.

What is Commercial Paper?

Risks of commercial paper, real-world example, additional resouces, commercial paper.

A short-term, unsecured debt instrument with a duration of 1-270 days

Commercial paper refers to a short-term, unsecured debt obligation that is issued by financial institutions and large corporations as an alternative to costlier methods of funding. It is a money market instrument that generally comes with a maturity of up to 270 days.

Commercial Paper - Image of commercial papers in various denominations

Commercial paper is sold at a discount to its face value to compensate the investor, as opposed to paying cash interest like a typical debt security. In other words, the difference between the face value at maturity and the investor’s discounted purchase price is the investor’s “profit.” The need for commercial paper often arises due to corporations facing a short-term need to cover expenses.

Commercial paper is often referred to as an unsecured promissory note , as the security is not supported by anything other than the issuer’s promise to repay the face value at the maturity date specified on the note.

  • Commercial paper is a short-term, unsecured debt instrument with a duration of 1-270 days.
  • Financial institutions and large corporations are the main issuers of commercial paper because they have high credit ratings. There is trust in the market that they will repay unsecured promissory notes of this nature.
  • Commercial paper is usually sold at a discount to its face value and is a cheaper alternative to other forms of borrowing.

1. Credit rating

It is important to note that due to the promissory nature of the commercial paper, only large corporations with high credit ratings will be able to sell the instrument at a reasonable rate. Such corporations are what is colloquially defined as “blue-chip companies” and are the only ones that enjoy the option of issuing such debt instruments without collateral backing.

If a smaller organization were to try to issue commercial paper, it is quite likely that there would not be enough trust on the part of investors to buy the securities. The credit risk, which can be defined as the likelihood that a borrower is unable to repay the loan, will be too high for smaller organizations, and there will be no market for this type of issue.

2. Liquidity

Another potential risk of commercial paper, although less relevant than with other, longer-term debt instruments, is that of liquidity. Liquidity generally refers to the ability of a security to be converted into cash at a price that reflects its fair value. That is to say, liquidity reflects how easily a security can be bought or sold in the market.

In the case of commercial paper, liquidity is less of a concern than credit (default) risk as the debt matures quite rapidly, leaving little room for additional trading on secondary markets . For this reason, such secondary markets are quite small, despite the issue being one of the most used money market debt instruments.

A real-world example would be that a large corporation, take Microsoft Corp. , would like additional low-cost funding to launch a new research and development program. At this point, the company’s leadership would weigh their options and possibly conclude that commercial paper is a more attractive source of capital than taking out a line of credit with a financial institution.

In such a situation, Microsoft will be leveraging its status as an established business with a high credit rating to issue an unsecured debt instrument, such as commercial paper, and in the process lowering its cost of capital. `

Comprehensive Trading & Investing eBook

From capital markets to trading and technical analysis strategies, CFI's 115-page Trading & Investing eBook covers all the major topics a world-class analyst needs to know!

  • First Name *

Thank you for reading CFI’s guide on Commercial Paper. To keep learning and developing your knowledge base, please explore the additional relevant resources below:

  • Agency Bonds
  • Credit Risk Analysis
  • Discount Rate
  • See all fixed income resources
  • Share this article

Excel Fundamentals - Formulas for Finance

Create a free account to unlock this Template

Access and download collection of free Templates to help power your productivity and performance.

Already have an account? Log in

Supercharge your skills with Premium Templates

Take your learning and productivity to the next level with our Premium Templates.

Upgrading to a paid membership gives you access to our extensive collection of plug-and-play Templates designed to power your performance—as well as CFI's full course catalog and accredited Certification Programs.

Already have a Self-Study or Full-Immersion membership? Log in

Access Exclusive Templates

Gain unlimited access to more than 250 productivity Templates, CFI's full course catalog and accredited Certification Programs, hundreds of resources, expert reviews and support, the chance to work with real-world finance and research tools, and more.

Already have a Full-Immersion membership? Log in

Home

  • Recently Active
  • Top Discussions
  • Best Content

By Industry

  • Investment Banking
  • Private Equity
  • Hedge Funds
  • Real Estate
  • Venture Capital
  • Asset Management
  • Equity Research
  • Investing, Markets Forum
  • Business School
  • Fashion Advice
  • Technical Skills

Commercial Paper

It is an unsecured promissory note with a fixed amount of interest issued to achieve short-term requirements

Christy Grimste

Christy currently works as a senior associate for EdR Trust, a publicly traded multi-family  REIT . Prior to joining EdR Trust, Christy works for CBRE in investment property sales. Before completing her  MBA  and breaking into finance, Christy founded and education startup in which she actively pursued for seven years and works as an internal auditor for the U.S. Department of State and CIA.

Christy has a Bachelor of Arts from the University of Maryland and a Master of Business Administrations from the University of London.

Rohan Arora

Mr. Arora is an experienced private equity investment professional, with experience working across multiple markets. Rohan has a focus in particular on consumer and business services transactions and operational growth. Rohan has also worked at Evercore, where he also spent time in private equity advisory.

Rohan holds a BA (Hons., Scholar) in Economics and Management from Oxford University.

What Is Commercial Paper?

  • Advantages Of Commercial Paper
  • Risks Of Commercial Paper
  • How Commercial Paper Works
  • Types Of Commercial Paper

Commercial Paper FAQs

Commercial paper is an unsecured promissory note frequently issued by banks, companies, and other financial organizations with a fixed amount of interest to achieve their financing short-term requirements.

Large corporations issue (sell) commercial paper in the commercial paper market to raise money for short-term debt obligations (such as payroll), and it is backed only by the issuing bank's or company's promise to pay the face amount on the note's maturity date.

In most cases, the face value of the commercial paper is discounted when it is issued, and the difference between the two amounts constitutes the interest that the investor will receive.

It is often issued at a discount to its face value and matures between a few days to a few months, with the most common maturity time ranging from 30 to 270 days.

Money market funds , institutional investors, corporations, and individual investors who seek low-risk, short-term investments are examples of short-term paper investors. 

It has lower interest rates than other kinds of borrowing since organizations frequently issue it with good credit ratings. 

It also allows issuers to readily convert their short-term assets into cash, while investors can swap their short -term debt paper holdings in the secondary market . This liquidity provision provides flexibility and allows investors to access cash if needed swiftly.

The issuance is governed by regulations, which ensures openness and investor protection. The Securities and Exchange Commission ( SEC ) oversees its offerings in the United States, imposing disclosure obligations on issuing corporations. 

These obligations include disclosing information regarding the issuer's financial health , activities, and potential risks so investors can make informed judgments.

The interest rates are often lower than other debt instruments, reflecting the issuer' s perceived creditworthiness . As a result, it is an enticing investment option for people seeking to conserve wealth while earning a return.

Investors must carefully examine creditworthiness and market circumstances to avoid risks and achieve optimal profits. Commercial paper is critical to preserving business financial stability and promoting economic growth .

Key Takeaways

Commercial paper is an unsecured promissory note issued by banks and companies for short-term funding needs. Typically issued at a discount, it matures between a few days to a few months, attracting low-risk, short-term investors.

It offers a cost-effective source of short-term capital, flexibility in borrowing amount and maturity period, quick access to cash, and diversification of funding sources.

Risks include credit, liquidity, market, regulatory, and reinvestment risks and should be carefully considered by investors.

It is issued through underwriters, involving credit rating, documentation, marketing, and distribution. Investors purchase paper, and upon maturity, the issuer repays face value; secondary market trading is available.

Types of commercial paper include unsecured, asset-backed, financial, non-financial, foreign, seasoned, and tax-exempt, each catering to different preferences and needs.

Advantages of Commercial Paper

Let us take a brief look at the different benefits of issuing the paper below:

For issuing entities

  • Cost-Effectiveness: It is frequently a more cost-effective source of short-term capital than other kinds of borrowing, such as bank loans or lines of credit. The interest rates are often lower, especially for issuers with excellent credit ratings. This can lead to huge cost reductions for businesses. 
  • Flexibility: It provides flexibility in terms of the amount borrowed and the maturity period. The size and duration of the paper issuance can be tailored to the issuer's individual funding needs, allowing them to align their financing with their short-term cash flow requirements . 
  • Financing Diversification: Companies can diversify their funding sources by issuing paper. Relying solely on bank loans or lines of credit may expose enterprises to concentration risk if the banking sector is disrupted or credit conditions change. 
  • Issuance Ease: This is an alternate way to raise financing from a broader group of investors. Compared to obtaining a regular bank loan, the paper approval and issuance process is often shorter and requires less documentation. 
  • Access to Liquidity: It enables organizations to access cash more quickly, which can be critical in meeting immediate working capital requirements or capitalizing on short-term business possibilities.

For investors

  • Accessibility: It typically matures in a few days to many months. This appeals to short-term investors because it allows them to swiftly reinvest their assets at potentially higher rates if interest rates rise.
  • Higher Yields: While the paper provides low-risk returns, the yields are often greater when compared to other low-risk short-term assets such as government securities or  money market accounts . Investors can earn a decent return while protecting their wealth.
  • Low Risk: Well-established corporations typically issue it with good credit ratings. This lowers the default risk, making it a reasonably safe investment alternative, especially compared to longer-term debt instruments.
  • Liquid Investment: It is generally considered a liquid investment. Investors can sell their paper holdings on the secondary market before maturity, giving them liquidity and the opportunity to retrieve their cash if necessary.
  • Investment Diversification: It allows investors to diversify their investment portfolios. They can invest in those papers issued by various companies and industries, spreading their risk across different issuers and thereby improving their portfolio's overall risk-reward profile.

Risks of Commercial Paper

While commercial paper has many advantages, it also has certain risks. Credit risk is a substantial risk since it exposes investors to the chance of nonpayment or delayed reimbursement if the issuing firm defaults. 

To mitigate this risk, investors carefully evaluate issuers' creditworthiness , considering credit ratings provided by respected rating agencies. Let us take a look at some of the risks that commercial paper carries:

  • Liquidity Risk: While it is normally considered a liquid investment, there may be times when the secondary market becomes illiquid. Investors may experience difficulties selling the paper before maturity if there is a dearth of buyers or a loss of confidence in the market.
  • Issuer-Specific Risks: The risks connected with an issuer's financial health , industry-specific issues, or management actions can impact the paper's creditworthiness. Changes in the issuer's credit rating or impression of its creditworthiness might impact market demand and pricing for the paper.
  • Credit Risk: It is an unsecured debt instrument, which means there is no security backing the obligation. The creditworthiness of the issuer becomes critical in determining the risk of default. If the issuer's financial status deteriorates or there is a substantial economic downturn, the issuer may fail on principle and interest payments .
  • Market Risk : Market conditions can influence pricing. Interest rate fluctuations can have an effect on the price and yield of paper. If interest rates rise, the value of the existing paper may fall, potentially resulting in losses for investors who desire to sell before maturity.
  • Regulatory Risk: It is subject to regulatory scrutiny and compliance obligations. Changes in regulations or the introduction of new regulations may impact the issuance or trading of the paper, thereby influencing its liquidity and value.
  • Reinvestment Risk: Investors who rely on paper income face the risk of reinvesting the proceeds at lower interest rates if the paper matures during a period of dropping interest rates.

How Commercial Paper works

Commercial paper is created through a simple procedure that involves issuers, investors, and financial intermediaries. 

Here is a detailed description of how the short-term debt paper works.

  • Issuance and Terms: Through the underwriter, the issuer decides the amount of paper to be issued and the terms, including the maturity date, interest rate, and other pertinent components. Based on the issuer's creditworthiness and current market conditions, the underwriter supports the issuer in calculating the appropriate interest rate.
  •  Credit Rating and Documentation: The credit rating evaluates the issuer's creditworthiness and informs investors about the risk of the paper. The issuer prepares the relevant documentation, such as a prospectus that explains the issuer's financial position, operations, and terms.
  • Marketing and distribution: The underwriter uses its network and skills to attract buyers and develop demand for the paper. The distribution may occur through a private placement to a small group of investors or through public offers registered with the appropriate regulatory authorities.
  • Investor Purchase: Investors send their buy orders to the underwriter or their approved broker after they are satisfied. Investors can specify the face value and maturity date of the paper they want to purchase.
  • Issuance and Settlement: The underwriter completes the issuance process by accepting purchase orders and confirming the distribution of the paper to investors. The issuer then distributes them to investors, either electronically or as physical certificates.
  • Interest Payments and Maturity: The issuer pays periodic interest to investors depending on the agreed-upon interest rate and schedule. When the paper matures, the issuer repays the face value to the investors.
  • Secondary Market Trading: Investors who want to sell their short-term debt paper before maturity can do so in the secondary market. The secondary market provides liquidity and allows investors to buy or sell the paper to other market participants at a price different from the initial acquisition.

Types of Commercial Paper

Commercial paper comes in various forms to meet the demands and preferences of issuers and investors. 

Here are a few examples of commercial paper:

  • Unsecured Commercial Paper: The most frequent sort of paper in which the issuer provides no specific collateral or security for the debt. Investors evaluate the risk associated with the paper based on the issuer's creditworthiness and reputation.
  • Asset-Backed Commercial Paper (ABCP): It is backed by a single asset or a group of underlying assets, such as accounts receivable , inventory, or mortgage loans . The cash flows generated by these assets are used to secure them. ABCP gives investors greater security by tying the paper to tangible assets.
  • Financial Commercial Paper: It is issued by financial entities such as banks, investment businesses, or insurance organizations. These institutions employ that paper to raise short-term capital to meet operational needs, finance trading activities, or support lending operations.
  • Non-Financial Commercial Paper: Firms in various industries other than the financial industry issue these. Companies utilize nonfinancial short-term debt papers to meet short-term liquidity needs, finance working capital, or fund specific projects.
  • Foreign Commercial Paper: It is issued by entities located in countries other than the investor's home country. Investors may seek foreign commercial paper exposure to diversify their investment portfolio or to capitalize on specific market opportunities.
  • Seasoned Commercial Paper: It is a  commercial paper that has had one or more rollovers or renewals. It is more established and may have a track record that investors can use to gauge risk.
  • Tax-Exempt Commercial Paper: It is issued by municipal bodies like states, cities, or public agencies and is exempt from certain taxes like the federal income tax . Investors looking for tax breaks may be interested in tax-exempt commercial paper.

short term papers

It involves issuers borrowing funds from investors for a set time, ranging from 270 days. When the securities mature, the issuers reimburse the principal amount to investors, together with interest.

Corporations of various sizes, creditworthiness , and financial organizations such as banks and insurance companies can issue those papers.

Credit risk (issuer failing on payment), market risk ( interest rate variations affecting prices), liquidity risk (lack of buyers in the secondary market ), and issuer-specific risks ( financial health and industry-specific issues) are all concerns associated with commercial paper .

VBA Macros

Everything You Need To Master Financial Statement Modeling

To Help you Thrive in the Most Prestigious Jobs on Wall Street.

Researched and authored by Priya | Linkedin

Free Resources

To continue learning and advancing your career, check out these additional helpful  WSO  resources:

  • Commercial Credit Analyst
  • Commercial Lending School
  • Loan Analysis

short term papers

Get instant access to lessons taught by experienced private equity pros and bulge bracket investment bankers including financial statement modeling, DCF, M&A, LBO, Comps and Excel Modeling.

or Want to Sign up with your social account?

  • Privacy Policy

Research Method

Home » Term Paper – Format, Examples and Writing Guide

Term Paper – Format, Examples and Writing Guide

Table of Contents

V

Definition:

Term paper is a type of academic writing assignment that is typically assigned to students at the end of a semester or term. It is usually a research-based paper that is meant to demonstrate the student’s understanding of a particular topic, as well as their ability to analyze and synthesize information from various sources.

Term papers are usually longer than other types of academic writing assignments and can range anywhere from 5 to 20 pages or more, depending on the level of study and the specific requirements of the assignment. They often require extensive research and the use of a variety of sources, including books, articles, and other academic publications.

Term Paper Format

The format of a term paper may vary depending on the specific requirements of your professor or institution. However, a typical term paper usually consists of the following sections:

  • Title page: This should include the title of your paper, your name, the course name and number, your instructor’s name, and the date.
  • Abstract : This is a brief summary of your paper, usually no more than 250 words. It should provide an overview of your topic, the research question or hypothesis, your methodology, and your main findings or conclusions.
  • Introduction : This section should introduce your topic and provide background information on the subject. You should also state your research question or hypothesis and explain the importance of your research.
  • Literature review : This section should review the existing literature on your topic. You should summarize the key findings and arguments made by other scholars and identify any gaps in the literature that your research aims to address.
  • Methodology: This section should describe the methods you used to collect and analyze your data. You should explain your research design, sampling strategy, data collection methods, and data analysis techniques.
  • Results : This section should present your findings. You can use tables, graphs, and charts to illustrate your data.
  • Discussion : This section should interpret your findings and explain what they mean in relation to your research question or hypothesis. You should also discuss any limitations of your study and suggest areas for future research.
  • Conclusion : This section should summarize your main findings and conclusions. You should also restate the importance of your research and its implications for the field.
  • References : This section should list all the sources you cited in your paper using a specific citation style (e.g., APA, MLA, Chicago).
  • Appendices : This section should include any additional materials that are relevant to your study but not essential to your main argument (e.g., survey questions, interview transcripts).

Structure of Term Paper

Here’s an example structure for a term paper:

I. Introduction

A. Background information on the topic

B. Thesis statement

II. Literature Review

A. Overview of current literature on the topic

B. Discussion of key themes and findings from literature

C. Identification of gaps in current literature

III. Methodology

A. Description of research design

B. Discussion of data collection methods

C. Explanation of data analysis techniques

IV. Results

A. Presentation of findings

B. Analysis and interpretation of results

C. Comparison of results with previous studies

V. Discussion

A. Summary of key findings

B. Explanation of how results address the research questions

C. Implications of results for the field

VI. Conclusion

A. Recap of key points

B. Significance of findings

C. Future directions for research

VII. References

A. List of sources cited in the paper

How to Write Term Paper

Here are some steps to help you write a term paper:

  • Choose a topic: Choose a topic that interests you and is relevant to your course. If your professor has assigned a topic, make sure you understand it and clarify any doubts before you start.
  • Research : Conduct research on your topic by gathering information from various sources such as books, academic journals, and online resources. Take notes and organize your information systematically.
  • Create an outline : Create an outline of your term paper by arranging your ideas and information in a logical sequence. Your outline should include an introduction, body paragraphs, and a conclusion.
  • Write a thesis statement: Write a clear and concise thesis statement that states the main idea of your paper. Your thesis statement should be included in your introduction.
  • Write the introduction: The introduction should grab the reader’s attention, provide background information on your topic, and introduce your thesis statement.
  • Write the body : The body of your paper should provide supporting evidence for your thesis statement. Use your research to provide details and examples to support your argument. Make sure to organize your ideas logically and use transition words to connect paragraphs.
  • Write the conclusion : The conclusion should summarize your main points and restate your thesis statement. Avoid introducing new information in the conclusion.
  • Edit and proofread: Edit and proofread your term paper carefully to ensure that it is free of errors and flows smoothly. Check for grammar, spelling, and punctuation errors.
  • Format and cite your sources: Follow the formatting guidelines provided by your professor and cite your sources properly using the appropriate citation style.
  • Submit your paper : Submit your paper on time and according to the instructions provided by your professor.

Term Paper Example

Here’s an example of a term paper:

Title : The Role of Artificial Intelligence in Cybersecurity

As the world becomes more digitally interconnected, cybersecurity threats are increasing in frequency and sophistication. Traditional security measures are no longer enough to protect against these threats. This paper explores the role of artificial intelligence (AI) in cybersecurity, including how AI can be used to detect and respond to threats in real-time, the challenges of implementing AI in cybersecurity, and the potential ethical implications of AI-powered security systems. The paper concludes with recommendations for organizations looking to integrate AI into their cybersecurity strategies.

Introduction :

The increasing number of cybersecurity threats in recent years has led to a growing interest in the potential of artificial intelligence (AI) to improve cybersecurity. AI has the ability to analyze vast amounts of data and identify patterns and anomalies that may indicate a security breach. Additionally, AI can automate responses to threats, allowing for faster and more effective mitigation of security incidents. However, there are also challenges associated with implementing AI in cybersecurity, such as the need for large amounts of high-quality data, the potential for AI systems to make mistakes, and the ethical considerations surrounding the use of AI in security.

Literature Review:

This section of the paper reviews existing research on the use of AI in cybersecurity. It begins by discussing the types of AI techniques used in cybersecurity, including machine learning, natural language processing, and neural networks. The literature review then explores the advantages of using AI in cybersecurity, such as its ability to detect previously unknown threats and its potential to reduce the workload of security analysts. However, the review also highlights some of the challenges associated with implementing AI in cybersecurity, such as the need for high-quality training data and the potential for AI systems to be fooled by sophisticated attacks.

Methodology :

To better understand the challenges and opportunities associated with using AI in cybersecurity, this paper conducted a survey of cybersecurity professionals working in a variety of industries. The survey included questions about the types of AI techniques used in their organizations, the challenges they faced when implementing AI in cybersecurity, and their perceptions of the ethical implications of using AI in security.

The results of the survey showed that while many organizations are interested in using AI in cybersecurity, they face several challenges when implementing these systems. These challenges include the need for high-quality training data, the potential for AI systems to be fooled by sophisticated attacks, and the difficulty of integrating AI with existing security systems. Additionally, many respondents expressed concerns about the ethical implications of using AI in security, such as the potential for AI to be biased or to make decisions that are harmful to individuals or society as a whole.

Discussion :

Based on the results of the survey and the existing literature, this paper discusses the potential benefits and risks of using AI in cybersecurity. It also provides recommendations for organizations looking to integrate AI into their security strategies, such as the need to prioritize data quality and to ensure that AI systems are transparent and accountable.

Conclusion :

While there are challenges associated with implementing AI in cybersecurity, the potential benefits of using these systems are significant. AI can help organizations detect and respond to threats more quickly and effectively, reducing the risk of security breaches. However, it is important for organizations to be aware of the potential ethical implications of using AI in security and to take steps to ensure that these systems are transparent and accountable.

References:

  • Alkhaldi, S., Al-Daraiseh, A., & Lutfiyya, H. (2019). A Survey on Artificial Intelligence Techniques in Cyber Security. Journal of Information Security, 10(03), 191-207.
  • Gartner. (2019). Gartner Top 10 Strategic Technology Trends for 2020. Retrieved from https://www.gartner.com/smarterwithgartner/gartner-top-10-strategic-technology-trends-for-2020/
  • Kshetri, N. (2018). Blockchain’s roles in meeting key supply chain management objectives. International Journal of Information Management, 39, 80-89.
  • Lipton, Z. C. (2018). The mythos of model interpretability. arXiv preprint arXiv:1606.03490.
  • Schneier, B. (2019). Click Here to Kill Everybody: Security and Survival in a Hyper-Connected World. WW Norton & Company.
  • Wahab, M. A., Rahman, M. S., & Islam, M. R. (2020). A Survey on AI Techniques in Cybersecurity. International Journal of Scientific & Engineering Research, 11(2), 22-27.

When to Write Term Paper

A term paper is usually a lengthy research paper that is assigned to students at the end of a term or semester. There are several situations when writing a term paper may be required, including:

  • As a course requirement: In most cases, a term paper is required as part of the coursework for a particular course. It may be assigned by the instructor as a way of assessing the student’s understanding of the course material.
  • To explore a specific topic : A term paper can be an excellent opportunity for students to explore a specific topic of interest in-depth. It allows them to conduct extensive research on the topic and develop their understanding of it.
  • To develop critical thinking skills : Writing a term paper requires students to engage in critical thinking and analysis. It helps them to develop their ability to evaluate and interpret information, as well as to present their ideas in a clear and coherent manner.
  • To prepare for future academic or professional pursuits: Writing a term paper can be an excellent way for students to prepare for future academic or professional pursuits. It can help them to develop the research and writing skills necessary for success in higher education or in a professional career.

Purpose of Term Paper

The main purposes of a term paper are:

  • Demonstrate mastery of a subject: A term paper provides an opportunity for students to showcase their knowledge and understanding of a particular subject. It requires students to research and analyze the topic, and then present their findings in a clear and organized manner.
  • Develop critical thinking skills: Writing a term paper requires students to think critically about their subject matter, analyzing various sources and viewpoints, and evaluating evidence to support their arguments.
  • Improve writing skills : Writing a term paper helps students improve their writing skills, including organization, clarity, and coherence. It also requires them to follow specific formatting and citation guidelines, which can be valuable skills for future academic and professional endeavors.
  • Contribute to academic discourse : A well-written term paper can contribute to academic discourse by presenting new insights, ideas, and arguments that add to the existing body of knowledge on a particular topic.
  • Prepare for future research : Writing a term paper can help prepare students for future research, by teaching them how to conduct a literature review, evaluate sources, and formulate research questions and hypotheses. It can also help them develop research skills that they can apply in future academic or professional endeavors.

Advantages of Term Paper

There are several advantages of writing a term paper, including:

  • In-depth exploration: Writing a term paper allows you to delve deeper into a specific topic, allowing you to gain a more comprehensive understanding of the subject matter.
  • Improved writing skills: Writing a term paper involves extensive research, critical thinking, and the organization of ideas into a cohesive written document. As a result, writing a term paper can improve your writing skills significantly.
  • Demonstration of knowledge: A well-written term paper demonstrates your knowledge and understanding of the subject matter, which can be beneficial for academic or professional purposes.
  • Development of research skills : Writing a term paper requires conducting thorough research, analyzing data, and synthesizing information from various sources. This process can help you develop essential research skills that can be applied in many other areas.
  • Enhancement of critical thinking : Writing a term paper encourages you to think critically, evaluate information, and develop well-supported arguments. These skills can be useful in many areas of life, including personal and professional decision-making.
  • Preparation for further academic work : Writing a term paper is excellent preparation for more extensive academic projects, such as a thesis or dissertation.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Scientist

Scientist – Definition, Types and Working Area

Implications

Implications – Definition, Types, and...

What is Art

What is Art – Definition, Types, Examples

Future Studies

Future Studies – Types, Approaches and Methods

Concept

Concept – Definition, Types and Examples

Writer

Writer – Definition, Types and Work Area

  • PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
  • EDIT Edit this Article
  • EXPLORE Tech Help Pro About Us Random Article Quizzes Request a New Article Community Dashboard This Or That Game Happiness Hub Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
  • Browse Articles
  • Learn Something New
  • Quizzes Hot
  • Happiness Hub
  • This Or That Game
  • Train Your Brain
  • Explore More
  • Support wikiHow
  • About wikiHow
  • Log in / Sign up
  • Education and Communications
  • College University and Postgraduate
  • Academic Writing
  • Research Papers

Everything You Need to Know to Write an A+ Term Paper

Last Updated: March 4, 2024 Fact Checked

Sample Term Papers

Researching & outlining.

  • Drafting Your Paper
  • Revising Your Paper

Expert Q&A

This article was co-authored by Matthew Snipp, PhD and by wikiHow staff writer, Raven Minyard, BA . C. Matthew Snipp is the Burnet C. and Mildred Finley Wohlford Professor of Humanities and Sciences in the Department of Sociology at Stanford University. He is also the Director for the Institute for Research in the Social Science’s Secure Data Center. He has been a Research Fellow at the U.S. Bureau of the Census and a Fellow at the Center for Advanced Study in the Behavioral Sciences. He has published 3 books and over 70 articles and book chapters on demography, economic development, poverty and unemployment. He is also currently serving on the National Institute of Child Health and Development’s Population Science Subcommittee. He holds a Ph.D. in Sociology from the University of Wisconsin—Madison. There are 13 references cited in this article, which can be found at the bottom of the page. This article has been fact-checked, ensuring the accuracy of any cited facts and confirming the authority of its sources. This article has been viewed 2,248,104 times.

A term paper is a written assignment given to students at the end of a course to gauge their understanding of the material. Term papers typically count for a good percentage of your overall grade, so of course, you’ll want to write the best paper possible. Luckily, we’ve got you covered. In this article, we’ll teach you everything you need to know to write an A+ term paper, from researching and outlining to drafting and revising.

Quick Steps to Write a Term Paper

  • Hook your readers with an interesting and informative intro paragraph. State your thesis and your main points.
  • Support your thesis by providing quotes and evidence that back your claim in your body paragraphs.
  • Summarize your main points and leave your readers with a thought-provoking question in your conclusion.

short term papers

  • Think of your term paper as the bridge between what you’ve learned in class and how you apply that knowledge to real-world topics.
  • For example, a history term paper may require you to explore the consequences of a significant historical event, like the Civil War. An environmental science class, on the other hand, may have you examine the effects of climate change on a certain region.
  • Your guidelines should tell you the paper’s word count and formatting style, like whether to use in-text citations or footnotes and whether to use single- or double-spacing. If these things aren’t specified, be sure to reach out to your instructor.

Step 2 Choose an interesting topic.

  • Make sure your topic isn’t too broad. For example, if you want to write about Shakespeare’s work, first narrow it down to a specific play, like Macbeth , then choose something even more specific like Lady Macbeth’s role in the plot.
  • If the topic is already chosen for you, explore unique angles that can set your content and information apart from the more obvious approaches many others will probably take. [3] X Research source
  • Try not to have a specific outcome in mind, as this will close you off to new ideas and avenues of thinking. Rather than trying to mold your research to fit your desired outcome, allow the outcome to reflect a genuine analysis of the discoveries you made. Ask yourself questions throughout the process and be open to having your beliefs challenged.
  • Reading other people's comments, opinions, and entries on a topic can often help you to refine your own, especially where they comment that "further research" is required or where they posit challenging questions but leave them unanswered.

Step 3 Do your research.

  • For example, if you’re writing a term paper about Macbeth , your primary source would be the play itself. Then, look for other research papers and analyses written by academics and scholars to understand how they interpret the text.

Step 4 Craft your thesis statement.

  • For example, if you’re writing a paper about Lady Macbeth, your thesis could be something like “Shakespeare’s characterization of Lady Macbeth reveals how desire for power can control someone’s life.”
  • Remember, your research and thesis development doesn’t stop here. As you continue working through both the research and writing, you may want to make changes that align with the ideas forming in your mind and the discoveries you continue to unearth.
  • On the other hand, don’t keep looking for new ideas and angles for fear of feeling confined. At some point, you’re going to have to say enough is enough and make your point. You may have other opportunities to explore these questions in future studies, but for now, remember your term paper has a finite word length and an approaching due date!

Step 5 Develop an outline for the paper.

  • Abstract: An abstract is a concise summary of your paper that informs readers of your topic, its significance, and the key points you’ll explore. It must stand on its own and make sense without referencing outside sources or your actual paper.
  • Introduction: The introduction establishes the main idea of your paper and directly states the thesis. Begin your introduction with an attention-grabbing sentence to intrigue your readers, and provide any necessary background information to establish your paper’s purpose and direction.
  • Body paragraphs: Each body paragraph focuses on a different argument supporting your thesis. List specific evidence from your sources to back up your arguments. Provide detailed information about your topic to enhance your readers’ understanding. In your outline, write down the main ideas for each body paragraph and any outstanding questions or points you’re not yet sure about.
  • Results: Depending on the type of term paper you’re writing, your results may be incorporated into your body paragraphs or conclusion. These are the insights that your research led you to. Here you can discuss how your perspective and understanding of your topic shifted throughout your writing process.
  • Conclusion: Your conclusion summarizes your argument and findings. You may restate your thesis and major points as you wrap up your paper.

Drafting Your Term Paper

Step 1 Make your point in the introduction.

  • Writing an introduction can be challenging, but don’t get too caught up on it. As you write the rest of your paper, your arguments might change and develop, so you’ll likely need to rewrite your intro at the end, anyway. Writing your intro is simply a means of getting started and you can always revise it later. [10] X Trustworthy Source PubMed Central Journal archive from the U.S. National Institutes of Health Go to source
  • Be sure to define any words your readers might not understand. For example, words like “globalization” have many different meanings depending on context, and it’s important to state which ones you’ll be using as part of your introductory paragraph.

Step 2 Persuade your readers with your body paragraphs.

  • Try to relate the subject of the essay (say, Plato’s Symposium ) to a tangentially related issue you happen to know something about (say, the growing trend of free-wheeling hookups in frat parties). Slowly bring the paragraph around to your actual subject and make a few generalizations about why this aspect of the book/subject is so fascinating and worthy of study (such as how different the expectations for physical intimacy were then compared to now).

Step 3 Summarize your argument with your conclusion.

  • You can also reflect on your own experience of researching and writing your term paper. Discuss how your understanding of your topic evolved and any unexpected findings you came across.

Step 4 Write your abstract.

  • While peppering quotes throughout your text is a good way to help make your point, don’t overdo it. If you use too many quotes, you’re basically allowing other authors to make the point and write the paper for you. When you do use a quote, be sure to explain why it is relevant in your own words.
  • Try to sort out your bibliography at the beginning of your writing process to avoid having a last-minute scramble. When you have all the information beforehand (like the source’s title, author, publication date, etc.), it’s easier to plug them into the correct format.

Step 6 Come up with a good title.

Revising & Finalizing Your Term Paper

Step 1 Make your writing as concise as possible.

  • Trade in weak “to-be” verbs for stronger “action” verbs. For example: “I was writing my term paper” becomes “I wrote my term paper.”

Step 2 Check for grammar and spelling errors.

  • It’s extremely important to proofread your term paper. If your writing is full of mistakes, your instructor will assume you didn’t put much effort into your paper. If you have too many errors, your message will be lost in the confusion of trying to understand what you’ve written.

Step 3 Have someone else read over your paper.

  • If you add or change information to make things clearer for your readers, it’s a good idea to look over your paper one more time to catch any new typos that may have come up in the process.

Matthew Snipp, PhD

  • The best essays are like grass court tennis—the argument should flow in a "rally" style, building persuasively to the conclusion. Thanks Helpful 2 Not Helpful 0
  • If you get stuck, consider giving your professor a visit. Whether you're still struggling for a thesis or you want to go over your conclusion, most instructors are delighted to help and they'll remember your initiative when grading time rolls around. Thanks Helpful 1 Not Helpful 1
  • At least 2 hours for 3-5 pages.
  • At least 4 hours for 8-10 pages.
  • At least 6 hours for 12-15 pages.
  • Double those hours if you haven't done any homework and you haven't attended class.
  • For papers that are primarily research-based, add about two hours to those times (although you'll need to know how to research quickly and effectively, beyond the purview of this brief guide).

short term papers

You Might Also Like

Write a Comparative Essay

  • ↑ https://www.binghamton.edu/counseling/self-help/term-paper.html
  • ↑ Matthew Snipp, PhD. Research Fellow, U.S. Bureau of the Census. Expert Interview. 26 March 2020.
  • ↑ https://emory.libanswers.com/faq/44525
  • ↑ https://writing.wisc.edu/handbook/assignments/planresearchpaper/
  • ↑ https://owl.purdue.edu/owl/general_writing/the_writing_process/thesis_statement_tips.html
  • ↑ https://libguides.usc.edu/writingguide/outline
  • ↑ https://gallaudet.edu/student-success/tutorial-center/english-center/writing/guide-to-writing-introductions-and-conclusions/
  • ↑ https://www.ncbi.nlm.nih.gov/pubmed/26731827
  • ↑ https://writing.wisc.edu/handbook/assignments/writing-an-abstract-for-your-research-paper/
  • ↑ https://www.ivcc.edu/stylesite/Essay_Title.pdf
  • ↑ https://www.uni-flensburg.de/fileadmin/content/institute/anglistik/dokumente/downloads/how-to-write-a-term-paper-daewes.pdf
  • ↑ https://library.sacredheart.edu/c.php?g=29803&p=185937
  • ↑ https://www.cornerstone.edu/blog-post/six-steps-to-really-edit-your-paper/

About This Article

Matthew Snipp, PhD

If you need to write a term paper, choose your topic, then start researching that topic. Use your research to craft a thesis statement which states the main idea of your paper, then organize all of your facts into an outline that supports your thesis. Once you start writing, state your thesis in the first paragraph, then use the body of the paper to present the points that support your argument. End the paper with a strong conclusion that restates your thesis. For tips on improving your term paper through active voice, read on! Did this summary help you? Yes No

  • Send fan mail to authors

Reader Success Stories

Bill McReynolds

Bill McReynolds

Apr 7, 2017

Did this article help you?

Gerard Mortera

Gerard Mortera

Mar 30, 2016

Ayuba Muhammad Bello

Ayuba Muhammad Bello

Dec 28, 2016

Ivy Kiarie

Mar 24, 2016

Jera Andarino

Jera Andarino

May 11, 2016

Do I Have a Dirty Mind Quiz

Featured Articles

Enjoy Your Preteen Years

Trending Articles

Pirate Name Generator

Watch Articles

Make Fluffy Pancakes

  • Terms of Use
  • Privacy Policy
  • Do Not Sell or Share My Info
  • Not Selling Info

Don’t miss out! Sign up for

wikiHow’s newsletter

How to Write a Term Paper From Start to Finish

short term papers

The term paper, often regarded as the culmination of a semester's hard work, is a rite of passage for students in pursuit of higher education. Here's an interesting fact to kick things off: Did you know that the term paper's origins can be traced back to ancient Greece, where scholars like Plato and Aristotle utilized written works to explore and document their philosophical musings? Just as these great minds once wrote their thoughts on parchment, you, too, can embark on this intellectual voyage with confidence and skill.

How to Write a Term Paper: Short Description

In this article, we'll delve into the core purpose of this kind of assignment – to showcase your understanding of a subject, your research abilities, and your capacity to communicate complex ideas effectively. But it doesn't stop there. We'll also guide you in the art of creating a well-structured term paper format, a roadmap that will not only keep you on track but also ensure your ideas flow seamlessly and logically. Packed with valuable tips on writing, organization, and time management, this resource promises to equip you with the tools needed to excel in your academic writing.

Understanding What Is a Term Paper

A term paper, a crucial component of your college education, is often assigned towards the conclusion of a semester. It's a vehicle through which educators gauge your comprehension of the course content. Imagine it as a bridge between what you've learned in class and your ability to apply that knowledge to real-world topics.

For instance, in a history course, you might be asked to delve into the causes and consequences of a significant historical event, such as World War II. In a psychology class, your term paper might explore the effects of stress on mental health, or in an environmental science course, you could analyze the impact of climate change on a specific region.

Writing a term paper isn't just about summarizing facts. It requires a blend of organization, deep research, and the art of presenting your findings in a way that's both clear and analytical. This means structuring your arguments logically, citing relevant sources, and critically evaluating the information you've gathered.

For further guidance, we've prepared an insightful guide for you authored by our expert essay writer . It's brimming with practical tips and valuable insights to help you stand out in this academic endeavor and earn the recognition you deserve.

How to Start a Term Paper

Before you start, keep the guidelines for the term paper format firmly in mind. If you have any doubts, don't hesitate to reach out to your instructor for clarification before you begin your research and writing process. And remember, procrastination is your worst enemy in this endeavor. If you're aiming to produce an exceptional piece and secure a top grade, it's essential to plan ahead and allocate dedicated time each day to work on it. Now, let our term paper writing services provide you with some valuable tips to help you on your journey:

start a term paper

  • Hone Your Topic : Start by cultivating a learning mindset that empowers you to effectively organize your thoughts. Discover how to research a topic in the section below.
  • Hook Your Readers: Initiate a brainstorming session and unleash a barrage of creative ideas to captivate your audience right from the outset. Pose intriguing questions, share compelling anecdotes, offer persuasive statistics, and more.
  • Craft a Concise Thesis Statement Example : If you find yourself struggling to encapsulate the main idea of your paper in just a sentence or two, it's time to revisit your initial topic and consider narrowing it down.
  • Understand Style Requirements: Your work must adhere to specific formatting guidelines. Delve into details about the APA format and other pertinent regulations in the section provided.
  • Delve Deeper with Research : Equipped with a clearer understanding of your objectives, dive into your subject matter with a discerning eye. Ensure that you draw from reputable and reliable sources.
  • Begin Writing: Don't obsess over perfection from the get-go. Just start writing, and don't worry about initial imperfections. You can always revise or remove those early sentences later. The key is to initiate the term papers as soon as you've amassed sufficient information.

Ace your term paper with EssayPro 's expert help. Our academic professionals are here to guide you through every step, ensuring your term paper is well-researched, structured, and written to the highest standards.

order term paper

Term Paper Topics

Selecting the right topic for your term paper is a critical step, one that can significantly impact your overall experience and the quality of your work. While instructors sometimes provide specific topics, there are instances when you have the freedom to choose your own. To guide you on how to write a term paper, consider the following factors when deciding on your dissertation topics :

choose a term paper topic

  • Relevance to Assignment Length: Begin by considering the required length of your paper. Whether it's a substantial 10-page paper or a more concise 5-page one, understanding the word count will help you determine the appropriate scope for your subject. This will inform whether your topic should be broad or more narrowly focused.
  • Availability of Resources : Investigate the resources at your disposal. Check your school or community library for books and materials that can support your research. Additionally, explore online sources to ensure you have access to a variety of reference materials.
  • Complexity and Clarity : Ensure you can effectively explain your chosen topic, regardless of how complex it may seem. If you encounter areas that are challenging to grasp fully, don't hesitate to seek guidance from experts or your professor. Clarity and understanding are key to producing a well-structured term paper.
  • Avoiding Overused Concepts : Refrain from choosing overly trendy or overused topics. Mainstream subjects often fail to captivate the interest of your readers or instructors, as they can lead to repetitive content. Instead, opt for a unique angle or approach that adds depth to your paper.
  • Manageability and Passion : While passion can drive your choice of topic, it's important to ensure that it is manageable within the given time frame and with the available resources. If necessary, consider scaling down a topic that remains intriguing and motivating to you, ensuring it aligns with your course objectives and personal interests.

Worrying About the Quality of Your Upcoming Essay?

"Being highly trained professionals, our writers can provide term paper help by creating a paper specifically tailored to your needs.

Term Paper Outline

Before embarking on the journey of writing a term paper, it's crucial to establish a well-structured outline. Be mindful of any specific formatting requirements your teacher may have in mind, as these will guide your outline's structure. Here's a basic format to help you get started:

  • Cover Page: Begin with a cover page featuring your name, course number, teacher's name, and the deadline date, centered at the top.
  • Abstract: Craft a concise summary of your work that informs readers about your paper's topic, its significance, and the key points you'll explore.
  • Introduction: Commence your term paper introduction with a clear and compelling statement of your chosen topic. Explain why it's relevant and outline your approach to addressing it.
  • Body: This section serves as the meat of academic papers, where you present the primary findings from your research. Provide detailed information about the topic to enhance the reader's understanding. Ensure you incorporate various viewpoints on the issue and conduct a thorough analysis of your research.
  • Results: Share the insights and conclusions that your research has led you to. Discuss any shifts in your perspective or understanding that have occurred during the course of your project.
  • Discussion: Conclude your term paper with a comprehensive summary of the topic and your findings. You can wrap up with a thought-provoking question or encourage readers to explore the subject further through their own research.

How to Write a Term Paper with 5 Steps

Before you begin your term paper, it's crucial to understand what a term paper proposal entails. This proposal serves as your way to introduce and justify your chosen topic to your instructor, and it must gain approval before you start writing the actual paper.

In your proposal, include recent studies or research related to your topic, along with proper references. Clearly explain the topic's relevance to your course, outline your objectives, and organize your ideas effectively. This helps your instructor grasp your term paper's direction. If needed, you can also seek assistance from our expert writers and buy term paper .

how to write a term paper

Draft the Abstract

The abstract is a critical element while writing a term paper, and it plays a crucial role in piquing the reader's interest. To create a captivating abstract, consider these key points from our dissertation writing service :

  • Conciseness: Keep it short and to the point, around 150-250 words. No need for lengthy explanations.
  • Highlight Key Elements: Summarize the problem you're addressing, your research methods, and primary findings or conclusions. For instance, if your paper discusses the impact of social media on mental health, mention your research methods and significant findings.
  • Engagement: Make your abstract engaging. Use language that draws readers in. For example, if your paper explores the effects of artificial intelligence on the job market, you might begin with a question like, 'Is AI revolutionizing our work landscape, or should we prepare for the robots to take over?'
  • Clarity: Avoid excessive jargon or technical terms to ensure accessibility to a wider audience.

Craft the Introduction

The introduction sets the stage for your entire term paper and should engage readers from the outset. To craft an intriguing introduction, consider these tips:

  • Hook Your Audience: Start with a captivating hook, such as a thought-provoking question or a compelling statistic. For example, if your paper explores the impact of smartphone addiction, you could begin with, 'Can you remember the last time you went a whole day without checking your phone?'
  • State Your Purpose: Clearly state the purpose of your paper and its relevance. If your term paper is about renewable energy's role in combating climate change, explain why this topic is essential in today's world.
  • Provide a Roadmap: Briefly outline how your paper is structured. For instance, if your paper discusses the benefits of mindfulness meditation, mention that you will explore its effects on stress reduction, emotional well-being, and cognitive performance.
  • Thesis Statement: Conclude your introduction with a concise thesis statement that encapsulates the central argument or message of your paper. In the case of a term paper on the impact of online education, your thesis might be: 'Online education is revolutionizing learning by providing accessibility, flexibility, and innovative teaching methods.'

Develop the Body Sections: Brainstorming Concepts and Content

Generate ideas and compose text: body sections.

The body of your term paper is where you present your research, arguments, and analysis. To generate ideas and write engaging text in the body sections, consider these strategies from our research paper writer :

  • Structure Your Ideas: Organize your paper into sections or paragraphs, each addressing a specific aspect of your topic. For example, if your term paper explores the impact of social media on interpersonal relationships, you might have sections on communication patterns, privacy concerns, and emotional well-being.
  • Support with Evidence: Back up your arguments with credible evidence, such as data, research findings, or expert opinions. For instance, when discussing the effects of social media on mental health, you can include statistics on social media usage and its correlation with anxiety or depression.
  • Offer Diverse Perspectives: Acknowledge and explore various viewpoints on the topic. When writing about the pros and cons of genetic engineering, present both the potential benefits, like disease prevention, and the ethical concerns associated with altering human genetics.
  • Use Engaging Examples: Incorporate real-life examples to illustrate your points. If your paper discusses the consequences of climate change, share specific instances of extreme weather events or environmental degradation to make the topic relatable.
  • Ask Thought-Provoking Questions: Integrate questions throughout your text to engage readers and stimulate critical thinking. In a term paper on the future of artificial intelligence, you might ask, 'How will AI impact job markets and the concept of work in the coming years?'

Formulate the Conclusion

The conclusion section should provide a satisfying wrap-up of your arguments and insights. To craft a compelling term paper example conclusion, follow these steps:

  • Revisit Your Thesis: Begin by restating your thesis statement. This reinforces the central message of your paper. For example, if your thesis is about the importance of biodiversity conservation, reiterate that biodiversity is crucial for ecological balance and human well-being.
  • Summarize Key Points: Briefly recap the main points you've discussed in the body of your paper. For instance, if you've been exploring the impact of globalization on local economies, summarize the effects on industries, job markets, and cultural diversity.
  • Emphasize Your Main Argument: Reaffirm the significance of your thesis and the overall message of your paper. Discuss why your findings are important or relevant in a broader context. If your term paper discusses the advantages of renewable energy, underscore its potential to combat climate change and reduce our reliance on fossil fuels.
  • Offer a Thoughtful Reflection: Share your own reflections or insights about the topic. How has your understanding evolved during your research? Have you uncovered any unexpected findings or implications? If your paper discusses the future of space exploration, consider what it means for humanity's quest to explore the cosmos.
  • End with Impact: Conclude your term paper with a powerful closing statement. You can leave the reader with a thought-provoking question, a call to action, or a reflection on the broader implications of your topic. For instance, if your paper is about the ethics of artificial intelligence, you could finish by asking, 'As AI continues to advance, what ethical considerations will guide our choices and decisions?'

Edit and Enhance the Initial Draft

After completing your initial draft, the revision and polishing phase is essential for improving your paper. Here's how to refine your work efficiently:

  • Take a Break: Step back and return to your paper with a fresh perspective.
  • Structure Check: Ensure your paper flows logically and transitions smoothly from the introduction to the conclusion.
  • Clarity and Conciseness: Trim excess words for clarity and precision.
  • Grammar and Style: Proofread for errors and ensure consistent style.
  • Citations and References: Double-check your citations and reference list.
  • Peer Review: Seek feedback from peers or professors for valuable insights.
  • Enhance Intro and Conclusion: Make your introduction and conclusion engaging and impactful.
  • Coherence Check: Ensure your arguments support your thesis consistently.
  • Read Aloud: Reading your paper aloud helps identify issues.
  • Final Proofread: Perform a thorough proofread to catch any remaining errors.

Term Paper Format

When formatting your term paper, consider its length and the required citation style, which depends on your research topic. Proper referencing is crucial to avoid plagiarism in academic writing. Common citation styles include APA and MLA.

If unsure how to cite term paper for social sciences, use the APA format, including the author's name, book title, publication year, publisher, and location when citing a book.

For liberal arts and humanities, MLA is common, requiring the publication name, date, and location for referencing.

Adhering to the appropriate term paper format and citation style ensures an organized and academically sound paper. Follow your instructor's guidelines for a polished and successful paper.

Term Paper Example

To access our term paper example, simply click the button below.

The timeline of events from 1776 to 1861, that, in the end, prompted the American Civil War, describes and relates to a number of subjects modern historians acknowledge as the origins and causes of the Civil War. In fact, pre-Civil War events had both long-term and short-term influences on the War—such as the election of Abraham Lincoln as the American president in 1860 that led to the Fall of Fort Sumter in April of the same year. In that period, contentions that surrounded states’ rights progressively exploded in Congress—since they were the initial events that formed after independence. Congress focused on resolving significant issues that affected the states, which led to further issues. In that order, the US’s history from 1776 to 1861 provides a rich history, as politicians brought forth dissimilarities, dissections, and tensions between the Southern US & the people of slave states, and the Northern states that were loyal to the Union. The events that unfolded from the period of 1776 to 1861 involved a series of issues because they promoted the great sectional crisis that led to political divisions and the build-up to the Civil War that made the North and the South seem like distinctive and timeless regions that predated the crisis itself.

Final Thoughts

In closing, approach the task of writing term papers with determination and a positive outlook. Begin well in advance, maintain organization, and have faith in your capabilities. Don't hesitate to seek assistance if required, and express your individual perspective with confidence. You're more than capable of succeeding in this endeavor!

Need a Winning Hand in Academia?

Arm yourself with our custom-crafted academic papers that are sharper than a well-honed pencil! Order now and conquer your academic challenges with style!

What is the Difference between a Term Paper and a Research Paper?

What is the fastest way to write a term paper.

Daniel Parker

Daniel Parker

is a seasoned educational writer focusing on scholarship guidance, research papers, and various forms of academic essays including reflective and narrative essays. His expertise also extends to detailed case studies. A scholar with a background in English Literature and Education, Daniel’s work on EssayPro blog aims to support students in achieving academic excellence and securing scholarships. His hobbies include reading classic literature and participating in academic forums.

short term papers

is an expert in nursing and healthcare, with a strong background in history, law, and literature. Holding advanced degrees in nursing and public health, his analytical approach and comprehensive knowledge help students navigate complex topics. On EssayPro blog, Adam provides insightful articles on everything from historical analysis to the intricacies of healthcare policies. In his downtime, he enjoys historical documentaries and volunteering at local clinics.

research paper abstract

We use cookies to give you the best experience possible. By continuing we’ll assume you’re on board with our cookie policy

Logo

  • A Research Guide
  • Writing Guide
  • Assignment Writing

How to Write a Term Paper

  • Purpose of a term paper
  • How to start a term paper
  • Structure and outline

Step-by-step writing guide

Standard term paper format.

  • Term paper examples
  • Writing tips

What is the purpose of a term paper?

How to start a term paper correctly.

  • Choose your topic by focusing on what inspires you unless you are already given a topic.
  • Take time to research and analyze your subject.
  • Start with a term paper outline (see our templates in the next sections).
  • Come up with a strong thesis statement before writing anything for body paragraphs.
  • Provide topic sentences and practical examples.
  • Provide a strong lesson in the conclusion if it suits the subject you write about.
  • Edit and proofread available information for trustworthiness.

Term paper structure and outline

  • Introduction. This is where you talk about the subject and a problem you are researching. It helps to introduce your thesis statement and explain the objectives that have been set.
  • Body Paragraphs. As a rule, in writing college term papers, one must write down several subheadings and headings to divide ideas and arguments into several (at least four) paragraphs. As done below, each body paragraph should contain one idea and a strong topic sentence.
  • Heading 1: History of the argument and background.
  • Heading 2: Extent of the problem that you write about.
  • Heading 3: Effects of the problem and possible causes.
  • Heading 4: Possible solutions and outcomes.
  • Conclusion. The final part should represent a strong summary and a response to your thesis statement.

Step 1: Data collection

Step 2: explaining research relevance, step 3: introducing your subject, step 4: literature review preparation, step 5: offering results and conclusions, step 6: structural term paper evaluation, step 7: check your citations and references.

service-1

Helpful term paper examples

  • Term paper examples that earned an A grade from the University of Delaware
  • Sample term paper offered by the Justus-Liebig Universitat Giessen
  • Purdue Owl Lab Citation Formats Database
  • Simon Fraser University Sample Term Paper

Term paper writing tips

  • Choose a topic that inspires you if you have an opportunity. If you have been given an already existing prompt to write, research your subject online and ask about the use of course materials. It will help you to narrow things down and already have source materials for referencing purposes.
  • If you can choose a subject to write a final paper for your course, think about something you can support with statistical data and some practical evidence.
  • Most importantly, keep your term paper relevant to the main objectives of your study course.
  • Keep your tone reflective and natural as you write.
  • Double-check your grading rubric regarding limitations and obligatory requirements that must be met.
  • Always proofread your term paper aloud!
  • If you have an opportunity, consider editing your term paper with the help of a friend or a fellow college student.

aside icon

Receive paper in 3 Hours!

  • Choose the number of pages.
  • Select your deadline.
  • Complete your order.

Number of Pages

550 words (double spaced)

Deadline: 10 days left

By clicking "Log In", you agree to our terms of service and privacy policy . We'll occasionally send you account related and promo emails.

Sign Up for your FREE account

  • To save this word, you'll need to log in. Log In

short - term paper

Definition of short-term paper, love words.

You must — there are over 200,000 words in our free online dictionary, but you are looking for one that’s only in the Merriam-Webster Unabridged Dictionary.

Start your free trial today and get unlimited access to America's largest dictionary, with:

  • More than 250,000 words that aren't in our free dictionary
  • Expanded definitions, etymologies, and usage notes
  • Advanced search features

Dictionary Entries Near short-term paper

short-term note

short-term paper

Cite this Entry

“Short-term paper.” Merriam-Webster.com Dictionary , Merriam-Webster, https://www.merriam-webster.com/dictionary/short-term%20paper. Accessed 5 Sep. 2024.

Subscribe to America's largest dictionary and get thousands more definitions and advanced search—ad free!

Play Quordle: Guess all four words in a limited number of tries.  Each of your guesses must be a real 5-letter word.

Can you solve 4 words at once?

Word of the day.

See Definitions and Examples »

Get Word of the Day daily email!

Popular in Grammar & Usage

Plural and possessive names: a guide, 31 useful rhetorical devices, more commonly misspelled words, why does english have so many silent letters, your vs. you're: how to use them correctly, popular in wordplay, 8 words for lesser-known musical instruments, it's a scorcher words for the summer heat, 7 shakespearean insults to make life more interesting, birds say the darndest things, 10 words from taylor swift songs (merriam's version), games & quizzes.

Play Blossom: Solve today's spelling word game by finding as many words as you can using just 7 letters. Longer words score more points.

help for assessment

  • Customer Reviews
  • Extended Essays
  • IB Internal Assessment
  • Theory of Knowledge
  • Literature Review
  • Dissertations
  • Essay Writing
  • Research Writing
  • Assignment Help
  • Capstone Projects
  • College Application
  • Online Class

Term Paper vs Research Paper: What’s the Difference?

Author Image

by  Antony W

June 27, 2024

how to write a term paper

It’s easy to confuse term and research paper for the same thing because they have a number of elements that easily overlap. However, there are features that set them apart. In this term paper versus research paper guide, we look at the similarities and differences so you never confuse the two assignments for the same thing ever again.

What is a Research Paper?

We can define a research paper as an academic piece of assignment that requires a student to investigate subject methodically and theoretically and present their findings on the topic. Notably, research papers focus on analyzing issues (or problems) within a specific course.

In other words, when your professor asks you to write a research paper, they expect you to study a specific problem. More often than not, the problem under investigation is one that either has had questionable results in the past or hasn’t had an extensive coverage in the already existing studies.

What is a Term Paper?

A term paper , on the other hand, is an assignment issued to test student’s knowledge on a given subject or themes after a given duration of studies.

The type of assignment you write in the case of a term paper will vary depending on your instructor’s preference. They may ask you to write an essay , complete a test, or do some school work linked to the theme you’ve explored in a classroom setting.

You will write a term paper near the end of a class, and what you score for the assignment will count in that specific subject’s final grades.

Term Paper vs Research Paper: What Are The Key Differences?

Many elements easily overlap between a term paper and a research paper, but that doesn’t mean they’re 100% similar to one another. You need to learn the differences so that you never confuse between these two types of assignments.

The table below indicates the apparent differences between a term paper and a research paper.

Assigned in the middle or at the end of a given study or termInstructors assign this at the beginning of a term
You’ll need to write it using an outline for a term paperTakes the format of an academic work also known as the research paper outline
Has a shorter deadline, usually between a day to a weekA longer assignment that takes weeks or even months to complete
Written to examine a student’s level of understanding on a topic or theme already discussed in the classroom.The assignment focuses on solving a particular problem.
It supports a Often written to support a
Term paper influence a student’s overall gradeIt doesn’t always have an influence on a student’s final grades.
A term paper can be as short as one pageThe length vary and it’s often not less than 5 pages

These differences may not be clear at a first glance, so it often helps to do a lookup when in doubt.

Let’s take this even further by explaining these similarities and differences in more details so that you have more insights on the same.

Term Paper Vs Research Paper: Similarities

Topic selection.

The criterion for topic selection is the same for term and research paper. Your instructor can either assign you a topic to work on or ask you to choose one yourself, with the option to identify your own topic being the most common option.

If your instructor has given you the freedom to choose a topic yourself, make sure the subject you pick relates to the discussion had and study material issued in class.

Requirements

Both term and research paper need to adhere to academic formatting and referencing style. You’ll find these requirements clearly indicated in the assignments’ instructions. If your instructor doesn’t give you a formatting and referencing style to use, stick to MLA or APA.

A Term Paper Can Be a Research Assignment

We understand that this can bring a lot of confusion, but it’s important to note that a term paper can also be a research assignment. If your instructor has asked you to investigate a topic based on existing evidence by using a methodological approach in a 10-page term paper, they’re most likely asking you to write a research assignment.

Term Paper vs Research Paper: Understanding the Differences

Structural differences.

One clear difference between term paper and a research paper is the components that go into the assignment.

A research paper should have an introduction, literature review, methodology, results (or findings), discussion (or analysis), conclusion, and reflection (optional).

You won’t have a question to explore in a term paper and it doesn’t include a hypothesis either. The assignment doesn’t require appendices, but your instructor may ask you to include an annotated bibliography in the term paper.

Differences in Goals

The goal of a research paper assignment is to solve a specific problem. Often, you’ll have to study existing literature to find gaps or contradictions and then suggest solutions based on your findings. 

A term paper, on the other hand, seeks to test your knowledge on a topic. The emphasis is on testing your understanding of a given subject or theme discussed in classroom.

Differences in Length

A term paper is longer than a typical essay, but it won’t be as voluminous as a research paper. In fact, term papers hardly ever go beyond 20 pages, and the shortest ones that Help for Assessment writers have worked on are as short as 1,000 words.

A research paper is longer than a term paper, with the number of pages ranging between 10 and 40 give or take, if not more at least.

Term papers tend to be shorter because, in part, they’re a bridge between essays and research works, and mostly because they don’t presuppose serious data collection and detailed analysis.

Differences in Deadlines and Grades

The word term, in respect to academic assignments, refers to a finite period within which a task should be complete. Therefore, in essence, a term paper is an assessment given at the very end of a course, and it often determines a student’s final grades.

A research paper may or may not influence your final grade depending on the instructions given – or your professor’s preference.

About the author 

Antony W is a professional writer and coach at Help for Assessment. He spends countless hours every day researching and writing great content filled with expert advice on how to write engaging essays, research papers, and assignments.

  • Open access
  • Published: 28 August 2020

Short-term stock market price trend prediction using a comprehensive deep learning system

  • Jingyi Shen 1 &
  • M. Omair Shafiq   ORCID: orcid.org/0000-0002-1859-8296 1  

Journal of Big Data volume  7 , Article number:  66 ( 2020 ) Cite this article

278k Accesses

187 Citations

90 Altmetric

Metrics details

In the era of big data, deep learning for predicting stock market prices and trends has become even more popular than before. We collected 2 years of data from Chinese stock market and proposed a comprehensive customization of feature engineering and deep learning-based model for predicting price trend of stock markets. The proposed solution is comprehensive as it includes pre-processing of the stock market dataset, utilization of multiple feature engineering techniques, combined with a customized deep learning based system for stock market price trend prediction. We conducted comprehensive evaluations on frequently used machine learning models and conclude that our proposed solution outperforms due to the comprehensive feature engineering that we built. The system achieves overall high accuracy for stock market trend prediction. With the detailed design and evaluation of prediction term lengths, feature engineering, and data pre-processing methods, this work contributes to the stock analysis research community both in the financial and technical domains.

Introduction

Stock market is one of the major fields that investors are dedicated to, thus stock market price trend prediction is always a hot topic for researchers from both financial and technical domains. In this research, our objective is to build a state-of-art prediction model for price trend prediction, which focuses on short-term price trend prediction.

As concluded by Fama in [ 26 ], financial time series prediction is known to be a notoriously difficult task due to the generally accepted, semi-strong form of market efficiency and the high level of noise. Back in 2003, Wang et al. in [ 44 ] already applied artificial neural networks on stock market price prediction and focused on volume, as a specific feature of stock market. One of the key findings by them was that the volume was not found to be effective in improving the forecasting performance on the datasets they used, which was S&P 500 and DJI. Ince and Trafalis in [ 15 ] targeted short-term forecasting and applied support vector machine (SVM) model on the stock price prediction. Their main contribution is performing a comparison between multi-layer perceptron (MLP) and SVM then found that most of the scenarios SVM outperformed MLP, while the result was also affected by different trading strategies. In the meantime, researchers from financial domains were applying conventional statistical methods and signal processing techniques on analyzing stock market data.

The optimization techniques, such as principal component analysis (PCA) were also applied in short-term stock price prediction [ 22 ]. During the years, researchers are not only focused on stock price-related analysis but also tried to analyze stock market transactions such as volume burst risks, which expands the stock market analysis research domain broader and indicates this research domain still has high potential [ 39 ]. As the artificial intelligence techniques evolved in recent years, many proposed solutions attempted to combine machine learning and deep learning techniques based on previous approaches, and then proposed new metrics that serve as training features such as Liu and Wang [ 23 ]. This type of previous works belongs to the feature engineering domain and can be considered as the inspiration of feature extension ideas in our research. Liu et al. in [ 24 ] proposed a convolutional neural network (CNN) as well as a long short-term memory (LSTM) neural network based model to analyze different quantitative strategies in stock markets. The CNN serves for the stock selection strategy, automatically extracts features based on quantitative data, then follows an LSTM to preserve the time-series features for improving profits.

The latest work also proposes a similar hybrid neural network architecture, integrating a convolutional neural network with a bidirectional long short-term memory to predict the stock market index [ 4 ]. While the researchers frequently proposed different neural network solution architectures, it brought further discussions about the topic if the high cost of training such models is worth the result or not.

There are three key contributions of our work (1) a new dataset extracted and cleansed (2) a comprehensive feature engineering, and (3) a customized long short-term memory (LSTM) based deep learning model.

We have built the dataset by ourselves from the data source as an open-sourced data API called Tushare [ 43 ]. The novelty of our proposed solution is that we proposed a feature engineering along with a fine-tuned system instead of just an LSTM model only. We observe from the previous works and find the gaps and proposed a solution architecture with a comprehensive feature engineering procedure before training the prediction model. With the success of feature extension method collaborating with recursive feature elimination algorithms, it opens doors for many other machine learning algorithms to achieve high accuracy scores for short-term price trend prediction. It proved the effectiveness of our proposed feature extension as feature engineering. We further introduced our customized LSTM model and further improved the prediction scores in all the evaluation metrics. The proposed solution outperformed the machine learning and deep learning-based models in similar previous works.

The remainder of this paper is organized as follows. “ Survey of related works ” section describes the survey of related works. “ The dataset ” section provides details on the data that we extracted from the public data sources and the dataset prepared. “ Methods ” section presents the research problems, methods, and design of the proposed solution. Detailed technical design with algorithms and how the model implemented are also included in this section. “ Results ” section presents comprehensive results and evaluation of our proposed model, and by comparing it with the models used in most of the related works. “ Discussion ” section provides a discussion and comparison of the results. “ Conclusion ” section presents the conclusion. This research paper has been built based on Shen [ 36 ].

Survey of related works

In this section, we discuss related works. We reviewed the related work in two different domains: technical and financial, respectively.

Kim and Han in [ 19 ] built a model as a combination of artificial neural networks (ANN) and genetic algorithms (GAs) with discretization of features for predicting stock price index. The data used in their study include the technical indicators as well as the direction of change in the daily Korea stock price index (KOSPI). They used the data containing samples of 2928 trading days, ranging from January 1989 to December 1998, and give their selected features and formulas. They also applied optimization of feature discretization, as a technique that is similar to dimensionality reduction. The strengths of their work are that they introduced GA to optimize the ANN. First, the amount of input features and processing elements in the hidden layer are 12 and not adjustable. Another limitation is in the learning process of ANN, and the authors only focused on two factors in optimization. While they still believed that GA has great potential for feature discretization optimization. Our initialized feature pool refers to the selected features. Qiu and Song in [ 34 ] also presented a solution to predict the direction of the Japanese stock market based on an optimized artificial neural network model. In this work, authors utilize genetic algorithms together with artificial neural network based models, and name it as a hybrid GA-ANN model.

Piramuthu in [ 33 ] conducted a thorough evaluation of different feature selection methods for data mining applications. He used for datasets, which were credit approval data, loan defaults data, web traffic data, tam, and kiang data, and compared how different feature selection methods optimized decision tree performance. The feature selection methods he compared included probabilistic distance measure: the Bhattacharyya measure, the Matusita measure, the divergence measure, the Mahalanobis distance measure, and the Patrick-Fisher measure. For inter-class distance measures: the Minkowski distance measure, city block distance measure, Euclidean distance measure, the Chebychev distance measure, and the nonlinear (Parzen and hyper-spherical kernel) distance measure. The strength of this paper is that the author evaluated both probabilistic distance-based and several inter-class feature selection methods. Besides, the author performed the evaluation based on different datasets, which reinforced the strength of this paper. However, the evaluation algorithm was a decision tree only. We cannot conclude if the feature selection methods will still perform the same on a larger dataset or a more complex model.

Hassan and Nath in [ 9 ] applied the Hidden Markov Model (HMM) on the stock market forecasting on stock prices of four different Airlines. They reduce states of the model into four states: the opening price, closing price, the highest price, and the lowest price. The strong point of this paper is that the approach does not need expert knowledge to build a prediction model. While this work is limited within the industry of Airlines and evaluated on a very small dataset, it may not lead to a prediction model with generality. One of the approaches in stock market prediction related works could be exploited to do the comparison work. The authors selected a maximum 2 years as the date range of training and testing dataset, which provided us a date range reference for our evaluation part.

Lei in [ 21 ] exploited Wavelet Neural Network (WNN) to predict stock price trends. The author also applied Rough Set (RS) for attribute reduction as an optimization. Rough Set was utilized to reduce the stock price trend feature dimensions. It was also used to determine the structure of the Wavelet Neural Network. The dataset of this work consists of five well-known stock market indices, i.e., (1) SSE Composite Index (China), (2) CSI 300 Index (China), (3) All Ordinaries Index (Australian), (4) Nikkei 225 Index (Japan), and (5) Dow Jones Index (USA). Evaluation of the model was based on different stock market indices, and the result was convincing with generality. By using Rough Set for optimizing the feature dimension before processing reduces the computational complexity. However, the author only stressed the parameter adjustment in the discussion part but did not specify the weakness of the model itself. Meanwhile, we also found that the evaluations were performed on indices, the same model may not have the same performance if applied on a specific stock.

Lee in [ 20 ] used the support vector machine (SVM) along with a hybrid feature selection method to carry out prediction of stock trends. The dataset in this research is a sub dataset of NASDAQ Index in Taiwan Economic Journal Database (TEJD) in 2008. The feature selection part was using a hybrid method, supported sequential forward search (SSFS) played the role of the wrapper. Another advantage of this work is that they designed a detailed procedure of parameter adjustment with performance under different parameter values. The clear structure of the feature selection model is also heuristic to the primary stage of model structuring. One of the limitations was that the performance of SVM was compared to back-propagation neural network (BPNN) only and did not compare to the other machine learning algorithms.

Sirignano and Cont leveraged a deep learning solution trained on a universal feature set of financial markets in [ 40 ]. The dataset used included buy and sell records of all transactions, and cancellations of orders for approximately 1000 NASDAQ stocks through the order book of the stock exchange. The NN consists of three layers with LSTM units and a feed-forward layer with rectified linear units (ReLUs) at last, with stochastic gradient descent (SGD) algorithm as an optimization. Their universal model was able to generalize and cover the stocks other than the ones in the training data. Though they mentioned the advantages of a universal model, the training cost was still expensive. Meanwhile, due to the inexplicit programming of the deep learning algorithm, it is unclear that if there are useless features contaminated when feeding the data into the model. Authors found out that it would have been better if they performed feature selection part before training the model and found it as an effective way to reduce the computational complexity.

Ni et al. in [ 30 ] predicted stock price trends by exploiting SVM and performed fractal feature selection for optimization. The dataset they used is the Shanghai Stock Exchange Composite Index (SSECI), with 19 technical indicators as features. Before processing the data, they optimized the input data by performing feature selection. When finding the best parameter combination, they also used a grid search method, which is k cross-validation. Besides, the evaluation of different feature selection methods is also comprehensive. As the authors mentioned in their conclusion part, they only considered the technical indicators but not macro and micro factors in the financial domain. The source of datasets that the authors used was similar to our dataset, which makes their evaluation results useful to our research. They also mentioned a method called k cross-validation when testing hyper-parameter combinations.

McNally et al. in [ 27 ] leveraged RNN and LSTM on predicting the price of Bitcoin, optimized by using the Boruta algorithm for feature engineering part, and it works similarly to the random forest classifier. Besides feature selection, they also used Bayesian optimization to select LSTM parameters. The Bitcoin dataset ranged from the 19th of August 2013 to 19th of July 2016. Used multiple optimization methods to improve the performance of deep learning methods. The primary problem of their work is overfitting. The research problem of predicting Bitcoin price trend has some similarities with stock market price prediction. Hidden features and noises embedded in the price data are threats of this work. The authors treated the research question as a time sequence problem. The best part of this paper is the feature engineering and optimization part; we could replicate the methods they exploited in our data pre-processing.

Weng et al. in [ 45 ] focused on short-term stock price prediction by using ensemble methods of four well-known machine learning models. The dataset for this research is five sets of data. They obtained these datasets from three open-sourced APIs and an R package named TTR. The machine learning models they used are (1) neural network regression ensemble (NNRE), (2) a Random Forest with unpruned regression trees as base learners (RFR), (3) AdaBoost with unpruned regression trees as base learners (BRT) and (4) a support vector regression ensemble (SVRE). A thorough study of ensemble methods specified for short-term stock price prediction. With background knowledge, the authors selected eight technical indicators in this study then performed a thoughtful evaluation of five datasets. The primary contribution of this paper is that they developed a platform for investors using R, which does not need users to input their own data but call API to fetch the data from online source straightforward. From the research perspective, they only evaluated the prediction of the price for 1 up to 10 days ahead but did not evaluate longer terms than two trading weeks or a shorter term than 1 day. The primary limitation of their research was that they only analyzed 20 U.S.-based stocks, the model might not be generalized to other stock market or need further revalidation to see if it suffered from overfitting problems.

Kara et al. in [ 17 ] also exploited ANN and SVM in predicting the movement of stock price index. The data set they used covers a time period from January 2, 1997, to December 31, 2007, of the Istanbul Stock Exchange. The primary strength of this work is its detailed record of parameter adjustment procedures. While the weaknesses of this work are that neither the technical indicator nor the model structure has novelty, and the authors did not explain how their model performed better than other models in previous works. Thus, more validation works on other datasets would help. They explained how ANN and SVM work with stock market features, also recorded the parameter adjustment. The implementation part of our research could benefit from this previous work.

Jeon et al. in [ 16 ] performed research on millisecond interval-based big dataset by using pattern graph tracking to complete stock price prediction tasks. The dataset they used is a millisecond interval-based big dataset of historical stock data from KOSCOM, from August 2014 to October 2014, 10G–15G capacity. The author applied Euclidean distance, Dynamic Time Warping (DTW) for pattern recognition. For feature selection, they used stepwise regression. The authors completed the prediction task by ANN and Hadoop and RHive for big data processing. The “ Results ” section is based on the result processed by a combination of SAX and Jaro–Winkler distance. Before processing the data, they generated aggregated data at 5-min intervals from discrete data. The primary strength of this work is the explicit structure of the whole implementation procedure. While they exploited a relatively old model, another weakness is the overall time span of the training dataset is extremely short. It is difficult to access the millisecond interval-based data in real life, so the model is not as practical as a daily based data model.

Huang et al. in [ 12 ] applied a fuzzy-GA model to complete the stock selection task. They used the key stocks of the 200 largest market capitalization listed as the investment universe in the Taiwan Stock Exchange. Besides, the yearly financial statement data and the stock returns were taken from the Taiwan Economic Journal (TEJ) database at www.tej.com.tw/ for the time period from year 1995 to year 2009. They conducted the fuzzy membership function with model parameters optimized with GA and extracted features for optimizing stock scoring. The authors proposed an optimized model for selection and scoring of stocks. Different from the prediction model, the authors more focused on stock rankings, selection, and performance evaluation. Their structure is more practical among investors. But in the model validation part, they did not compare the model with existed algorithms but the statistics of the benchmark, which made it challenging to identify if GA would outperform other algorithms.

Fischer and Krauss in [ 5 ] applied long short-term memory (LSTM) on financial market prediction. The dataset they used is S&P 500 index constituents from Thomson Reuters. They obtained all month-end constituent lists for the S&P 500 from Dec 1989 to Sep 2015, then consolidated the lists into a binary matrix to eliminate survivor bias. The authors also used RMSprop as an optimizer, which is a mini-batch version of rprop. The primary strength of this work is that the authors used the latest deep learning technique to perform predictions. They relied on the LSTM technique, lack of background knowledge in the financial domain. Although the LSTM outperformed the standard DNN and logistic regression algorithms, while the author did not mention the effort to train an LSTM with long-time dependencies.

Tsai and Hsiao in [ 42 ] proposed a solution as a combination of different feature selection methods for prediction of stocks. They used Taiwan Economic Journal (TEJ) database as data source. The data used in their analysis was from year 2000 to 2007. In their work, they used a sliding window method and combined it with multi layer perceptron (MLP) based artificial neural networks with back propagation, as their prediction model. In their work, they also applied principal component analysis (PCA) for dimensionality reduction, genetic algorithms (GA) and the classification and regression trees (CART) to select important features. They did not just rely on technical indices only. Instead, they also included both fundamental and macroeconomic indices in their analysis. The authors also reported a comparison on feature selection methods. The validation part was done by combining the model performance stats with statistical analysis.

Pimenta et al. in [ 32 ] leveraged an automated investing method by using multi-objective genetic programming and applied it in the stock market. The dataset was obtained from Brazilian stock exchange market (BOVESPA), and the primary techniques they exploited were a combination of multi-objective optimization, genetic programming, and technical trading rules. For optimization, they leveraged genetic programming (GP) to optimize decision rules. The novelty of this paper was in the evaluation part. They included a historical period, which was a critical moment of Brazilian politics and economics when performing validation. This approach reinforced the generalization strength of their proposed model. When selecting the sub-dataset for evaluation, they also set criteria to ensure more asset liquidity. While the baseline of the comparison was too basic and fundamental, and the authors did not perform any comparison with other existing models.

Huang and Tsai in [ 13 ] conducted a filter-based feature selection assembled with a hybrid self-organizing feature map (SOFM) support vector regression (SVR) model to forecast Taiwan index futures (FITX) trend. They divided the training samples into clusters to marginally improve the training efficiency. The authors proposed a comprehensive model, which was a combination of two novel machine learning techniques in stock market analysis. Besides, the optimizer of feature selection was also applied before the data processing to improve the prediction accuracy and reduce the computational complexity of processing daily stock index data. Though they optimized the feature selection part and split the sample data into small clusters, it was already strenuous to train daily stock index data of this model. It would be difficult for this model to predict trading activities in shorter time intervals since the data volume would be increased drastically. Moreover, the evaluation is not strong enough since they set a single SVR model as a baseline, but did not compare the performance with other previous works, which caused difficulty for future researchers to identify the advantages of SOFM-SVR model why it outperforms other algorithms.

Thakur and Kumar in [ 41 ] also developed a hybrid financial trading support system by exploiting multi-category classifiers and random forest (RAF). They conducted their research on stock indices from NASDAQ, DOW JONES, S&P 500, NIFTY 50, and NIFTY BANK. The authors proposed a hybrid model combined random forest (RF) algorithms with a weighted multicategory generalized eigenvalue support vector machine (WMGEPSVM) to generate “Buy/Hold/Sell” signals. Before processing the data, they used Random Forest (RF) for feature pruning. The authors proposed a practical model designed for real-life investment activities, which could generate three basic signals for investors to refer to. They also performed a thorough comparison of related algorithms. While they did not mention the time and computational complexity of their works. Meanwhile, the unignorable issue of their work was the lack of financial domain knowledge background. The investors regard the indices data as one of the attributes but could not take the signal from indices to operate a specific stock straightforward.

Hsu in [ 11 ] assembled feature selection with a back propagation neural network (BNN) combined with genetic programming to predict the stock/futures price. The dataset in this research was obtained from Taiwan Stock Exchange Corporation (TWSE). The authors have introduced the description of the background knowledge in detail. While the weakness of their work is that it is a lack of data set description. This is a combination of the model proposed by other previous works. Though we did not see the novelty of this work, we can still conclude that the genetic programming (GP) algorithm is admitted in stock market research domain. To reinforce the validation strengths, it would be good to consider adding GP models into evaluation if the model is predicting a specific price.

Hafezi et al. in [ 7 ] built a bat-neural network multi-agent system (BN-NMAS) to predict stock price. The dataset was obtained from the Deutsche bundes-bank. They also applied the Bat algorithm (BA) for optimizing neural network weights. The authors illustrated their overall structure and logic of system design in clear flowcharts. While there were very few previous works that had performed on DAX data, it would be difficult to recognize if the model they proposed still has the generality if migrated on other datasets. The system design and feature selection logic are fascinating, which worth referring to. Their findings in optimization algorithms are also valuable for the research in the stock market price prediction research domain. It is worth trying the Bat algorithm (BA) when constructing neural network models.

Long et al. in [ 25 ] conducted a deep learning approach to predict the stock price movement. The dataset they used is the Chinese stock market index CSI 300. For predicting the stock price movement, they constructed a multi-filter neural network (MFNN) with stochastic gradient descent (SGD) and back propagation optimizer for learning NN parameters. The strength of this paper is that the authors exploited a novel model with a hybrid model constructed by different kinds of neural networks, it provides an inspiration for constructing hybrid neural network structures.

Atsalakis and Valavanis in [ 1 ] proposed a solution of a neuro-fuzzy system, which is composed of controller named as Adaptive Neuro Fuzzy Inference System (ANFIS), to achieve short-term stock price trend prediction. The noticeable strength of this work is the evaluation part. Not only did they compare their proposed system with the popular data models, but also compared with investment strategies. While the weakness that we found from their proposed solution is that their solution architecture is lack of optimization part, which might limit their model performance. Since our proposed solution is also focusing on short-term stock price trend prediction, this work is heuristic for our system design. Meanwhile, by comparing with the popular trading strategies from investors, their work inspired us to compare the strategies used by investors with techniques used by researchers.

Nekoeiqachkanloo et al. in [ 29 ] proposed a system with two different approaches for stock investment. The strengths of their proposed solution are obvious. First, it is a comprehensive system that consists of data pre-processing and two different algorithms to suggest the best investment portions. Second, the system also embedded with a forecasting component, which also retains the features of the time series. Last but not least, their input features are a mix of fundamental features and technical indices that aim to fill in the gap between the financial domain and technical domain. However, their work has a weakness in the evaluation part. Instead of evaluating the proposed system on a large dataset, they chose 25 well-known stocks. There is a high possibility that the well-known stocks might potentially share some common hidden features.

As another related latest work, Idrees et al. [ 14 ] published a time series-based prediction approach for the volatility of the stock market. ARIMA is not a new approach in the time series prediction research domain. Their work is more focusing on the feature engineering side. Before feeding the features into ARIMA models, they designed three steps for feature engineering: Analyze the time series, identify if the time series is stationary or not, perform estimation by plot ACF and PACF charts and look for parameters. The only weakness of their proposed solution is that the authors did not perform any customization on the existing ARIMA model, which might limit the system performance to be improved.

One of the main weaknesses found in the related works is limited data-preprocessing mechanisms built and used. Technical works mostly tend to focus on building prediction models. When they select the features, they list all the features mentioned in previous works and go through the feature selection algorithm then select the best-voted features. Related works in the investment domain have shown more interest in behavior analysis, such as how herding behaviors affect the stock performance, or how the percentage of inside directors hold the firm’s common stock affects the performance of a certain stock. These behaviors often need a pre-processing procedure of standard technical indices and investment experience to recognize.

In the related works, often a thorough statistical analysis is performed based on a special dataset and conclude new features rather than performing feature selections. Some data, such as the percentage of a certain index fluctuation has been proven to be effective on stock performance. We believe that by extracting new features from data, then combining such features with existed common technical indices will significantly benefit the existing and well-tested prediction models.

The dataset

This section details the data that was extracted from the public data sources, and the final dataset that was prepared. Stock market-related data are diverse, so we first compared the related works from the survey of financial research works in stock market data analysis to specify the data collection directions. After collecting the data, we defined a data structure of the dataset. Given below, we describe the dataset in detail, including the data structure, and data tables in each category of data with the segment definitions.

Description of our dataset

In this section, we will describe the dataset in detail. This dataset consists of 3558 stocks from the Chinese stock market. Besides the daily price data, daily fundamental data of each stock ID, we also collected the suspending and resuming history, top 10 shareholders, etc. We list two reasons that we choose 2 years as the time span of this dataset: (1) most of the investors perform stock market price trend analysis using the data within the latest 2 years, (2) using more recent data would benefit the analysis result. We collected data through the open-sourced API, namely Tushare [ 43 ], mean-while we also leveraged a web-scraping technique to collect data from Sina Finance web pages, SWS Research website.

Data structure

Figure  1 illustrates all the data tables in the dataset. We collected four categories of data in this dataset: (1) basic data, (2) trading data, (3) finance data, and (4) other reference data. All the data tables can be linked to each other by a common field called “Stock ID” It is a unique stock identifier registered in the Chinese Stock market. Table  1 shows an overview of the dataset.

figure 1

Data structure for the extracted dataset

The Table  1 lists the field information of each data table as well as which category the data table belongs to.

In this section, we present the proposed methods and the design of the proposed solution. Moreover, we also introduce the architecture design as well as algorithmic and implementation details.

Problem statement

We analyzed the best possible approach for predicting short-term price trends from different aspects: feature engineering, financial domain knowledge, and prediction algorithm. Then we addressed three research questions in each aspect, respectively: How can feature engineering benefit model prediction accuracy? How do findings from the financial domain benefit prediction model design? And what is the best algorithm for predicting short-term price trends?

The first research question is about feature engineering. We would like to know how the feature selection method benefits the performance of prediction models. From the abundance of the previous works, we can conclude that stock price data embedded with a high level of noise, and there are also correlations between features, which makes the price prediction notoriously difficult. That is also the primary reason for most of the previous works introduced the feature engineering part as an optimization module.

The second research question is evaluating the effectiveness of findings we extracted from the financial domain. Different from the previous works, besides the common evaluation of data models such as the training costs and scores, our evaluation will emphasize the effectiveness of newly added features that we extracted from the financial domain. We introduce some features from the financial domain. While we only obtained some specific findings from previous works, and the related raw data needs to be processed into usable features. After extracting related features from the financial domain, we combine the features with other common technical indices for voting out the features with a higher impact. There are numerous features said to be effective from the financial domain, and it would be impossible for us to cover all of them. Thus, how to appropriately convert the findings from the financial domain to a data processing module of our system design is a hidden research question that we attempt to answer.

The third research question is that which algorithms are we going to model our data? From the previous works, researchers have been putting efforts into the exact price prediction. We decompose the problem into predicting the trend and then the exact number. This paper focuses on the first step. Hence, the objective has been converted to resolve a binary classification problem, meanwhile, finding an effective way to eliminate the negative effect brought by the high level of noise. Our approach is to decompose the complex problem into sub-problems which have fewer dependencies and resolve them one by one, and then compile the resolutions into an ensemble model as an aiding system for investing behavior reference.

In the previous works, researchers have been using a variety of models for predicting stock price trends. While most of the best-performed models are based on machine learning techniques, in this work, we will compare our approach with the outperformed machine learning models in the evaluation part and find the solution for this research question.

Proposed solution

The high-level architecture of our proposed solution could be separated into three parts. First is the feature selection part, to guarantee the selected features are highly effective. Second, we look into the data and perform the dimensionality reduction. And the last part, which is the main contribution of our work is to build a prediction model of target stocks. Figure  2 depicts a high-level architecture of the proposed solution.

figure 2

High-level architecture of the proposed solution

There are ways to classify different categories of stocks. Some investors prefer long-term investments, while others show more interest in short-term investments. It is common to see the stock-related reports showing an average performance, while the stock price is increasing drastically; this is one of the phenomena that indicate the stock price prediction has no fixed rules, thus finding effective features before training a model on data is necessary.

In this research, we focus on the short-term price trend prediction. Currently, we only have the raw data with no labels. So, the very first step is to label the data. We mark the price trend by comparing the current closing price with the closing price of n trading days ago, the range of n is from 1 to 10 since our research is focusing on the short-term. If the price trend goes up, we mark it as 1 or mark as 0 in the opposite case. To be more specified, we use the indices from the indices of n  −  1 th day to predict the price trend of the n th day.

According to the previous works, some researchers who applied both financial domain knowledge and technical methods on stock data were using rules to filter the high-quality stocks. We referred to their works and exploited their rules to contribute to our feature extension design.

However, to ensure the best performance of the prediction model, we will look into the data first. There are a large number of features in the raw data; if we involve all the features into our consideration, it will not only drastically increase the computational complexity but will also cause side effects if we would like to perform unsupervised learning in further research. So, we leverage the recursive feature elimination (RFE) to ensure all the selected features are effective.

We found most of the previous works in the technical domain were analyzing all the stocks, while in the financial domain, researchers prefer to analyze the specific scenario of investment, to fill the gap between the two domains, we decide to apply a feature extension based on the findings we gathered from the financial domain before we start the RFE procedure.

Since we plan to model the data into time series, the number of the features, the more complex the training procedure will be. So, we will leverage the dimensionality reduction by using randomized PCA at the beginning of our proposed solution architecture.

Detailed technical design elaboration

This section provides an elaboration of the detailed technical design as being a comprehensive solution based on utilizing, combining, and customizing several existing data preprocessing, feature engineering, and deep learning techniques. Figure  3 provides the detailed technical design from data processing to prediction, including the data exploration. We split the content by main procedures, and each procedure contains algorithmic steps. Algorithmic details are elaborated in the next section. The contents of this section will focus on illustrating the data workflow.

figure 3

Detailed technical design of the proposed solution

Based on the literature review, we select the most commonly used technical indices and then feed them into the feature extension procedure to get the expanded feature set. We will select the most effective i features from the expanded feature set. Then we will feed the data with i selected features into the PCA algorithm to reduce the dimension into j features. After we get the best combination of i and j , we process the data into finalized the feature set and feed them into the LSTM [ 10 ] model to get the price trend prediction result.

The novelty of our proposed solution is that we will not only apply the technical method on raw data but also carry out the feature extensions that are used among stock market investors. Details on feature extension are given in the next subsection. Experiences gained from applying and optimizing deep learning based solutions in [ 37 , 38 ] were taken into account while designing and customizing feature engineering and deep learning solution in this work.

Applying feature extension

The first main procedure in Fig.  3 is the feature extension. In this block, the input data is the most commonly used technical indices concluded from related works. The three feature extension methods are max–min scaling, polarizing, and calculating fluctuation percentage. Not all the technical indices are applicable for all three of the feature extension methods; this procedure only applies the meaningful extension methods on technical indices. We choose meaningful extension methods while looking at how the indices are calculated. The technical indices and the corresponding feature extension methods are illustrated in Table  2 .

After the feature extension procedure, the expanded features will be combined with the most commonly used technical indices, i.e., input data with output data, and feed into RFE block as input data in the next step.

Applying recursive feature elimination

After the feature extension above, we explore the most effective i features by using the Recursive Feature Elimination (RFE) algorithm [ 6 ]. We estimate all the features by two attributes, coefficient, and feature importance. We also limit the features that remove from the pool by one, which means we will remove one feature at each step and retain all the relevant features. Then the output of the RFE block will be the input of the next step, which refers to PCA.

Applying principal component analysis (PCA)

The very first step before leveraging PCA is feature pre-processing. Because some of the features after RFE are percentage data, while others are very large numbers, i.e., the output from RFE are in different units. It will affect the principal component extraction result. Thus, before feeding the data into the PCA algorithm [ 8 ], a feature pre-processing is necessary. We also illustrate the effectiveness and methods comparison in “ Results ” section.

After performing feature pre-processing, the next step is to feed the processed data with selected i features into the PCA algorithm to reduce the feature matrix scale into j features. This step is to retain as many effective features as possible and meanwhile eliminate the computational complexity of training the model. This research work also evaluates the best combination of i and j, which has relatively better prediction accuracy, meanwhile, cuts the computational consumption. The result can be found in the “ Results ” section, as well. After the PCA step, the system will get a reshaped matrix with j columns.

Fitting long short-term memory (LSTM) model

PCA reduced the dimensions of the input data, while the data pre-processing is mandatory before feeding the data into the LSTM layer. The reason for adding the data pre-processing step before the LSTM model is that the input matrix formed by principal components has no time steps. While one of the most important parameters of training an LSTM is the number of time steps. Hence, we have to model the matrix into corresponding time steps for both training and testing dataset.

After performing the data pre-processing part, the last step is to feed the training data into LSTM and evaluate the performance using testing data. As a variant neural network of RNN, even with one LSTM layer, the NN structure is still a deep neural network since it can process sequential data and memorizes its hidden states through time. An LSTM layer is composed of one or more LSTM units, and an LSTM unit consists of cells and gates to perform classification and prediction based on time series data.

The LSTM structure is formed by two layers. The input dimension is determined by j after the PCA algorithm. The first layer is the input LSTM layer, and the second layer is the output layer. The final output will be 0 or 1 indicates if the stock price trend prediction result is going down or going up, as a supporting suggestion for the investors to perform the next investment decision.

Design discussion

Feature extension is one of the novelties of our proposed price trend predicting system. In the feature extension procedure, we use technical indices to collaborate with the heuristic processing methods learned from investors, which fills the gap between the financial research area and technical research area.

Since we proposed a system of price trend prediction, feature engineering is extremely important to the final prediction result. Not only the feature extension method is helpful to guarantee we do not miss the potentially correlated feature, but also feature selection method is necessary for pooling the effective features. The more irrelevant features are fed into the model, the more noise would be introduced. Each main procedure is carefully considered contributing to the whole system design.

Besides the feature engineering part, we also leverage LSTM, the state-of-the-art deep learning method for time-series prediction, which guarantees the prediction model can capture both complex hidden pattern and the time-series related pattern.

It is known that the training cost of deep learning models is expansive in both time and hardware aspects; another advantage of our system design is the optimization procedure—PCA. It can retain the principal components of the features while reducing the scale of the feature matrix, thus help the system to save the training cost of processing the large time-series feature matrix.

Algorithm elaboration

This section provides comprehensive details on the algorithms we built while utilizing and customizing different existing techniques. Details about the terminologies, parameters, as well as optimizers. From the legend on the right side of Fig.  3 , we note the algorithm steps as octagons, all of them can be found in this “ Algorithm elaboration ” section.

Before dive deep into the algorithm steps, here is the brief introduction of data pre-processing: since we will go through the supervised learning algorithms, we also need to program the ground truth. The ground truth of this research is programmed by comparing the closing price of the current trading date with the closing price of the previous trading date the users want to compare with. Label the price increase as 1, else the ground truth will be labeled as 0. Because this research work is not only focused on predicting the price trend of a specific period of time but short-term in general, the ground truth processing is according to a range of trading days. While the algorithms will not change with the prediction term length, we can regard the term length as a parameter.

The algorithmic detail is elaborated, respectively, the first algorithm is the hybrid feature engineering part for preparing high-quality training and testing data. It corresponds to the Feature extension, RFE, and PCA blocks in Fig.  3 . The second algorithm is the LSTM procedure block, including time-series data pre-processing, NN constructing, training, and testing.

Algorithm 1: Short-term stock market price trend prediction—applying feature engineering using FE + RFE + PCA

The function FE is corresponding to the feature extension block. For the feature extension procedure, we apply three different processing methods to translate the findings from the financial domain to a technical module in our system design. While not all the indices are applicable for expanding, we only choose the proper method(s) for certain features to perform the feature extension (FE), according to Table  2 .

Normalize method preserves the relative frequencies of the terms, and transform the technical indices into the range of [0, 1]. Polarize is a well-known method often used by real-world investors, sometimes they prefer to consider if the technical index value is above or below zero, we program some of the features using polarize method and prepare for RFE. Max-min (or min-max) [ 35 ] scaling is a transformation method often used as an alternative to zero mean and unit variance scaling. Another well-known method used is fluctuation percentage, and we transform the technical indices fluctuation percentage into the range of [− 1, 1].

The function RFE () in the first algorithm refers to recursive feature elimination. Before we perform the training data scale reduction, we will have to make sure that the features we selected are effective. Ineffective features will not only drag down the classification precision but also add more computational complexity. For the feature selection part, we choose recursive feature elimination (RFE). As [ 45 ] explained, the process of recursive feature elimination can be split into the ranking algorithm, resampling, and external validation.

For the ranking algorithm, it fits the model to the features and ranks by the importance to the model. We set the parameter to retain i numbers of features, and at each iteration of feature selection retains Si top-ranked features, then refit the model and assess the performance again to begin another iteration. The ranking algorithm will eventually determine the top Si features.

The RFE algorithm is known to have suffered from the over-fitting problem. To eliminate the over-fitting issue, we will run the RFE algorithm multiple times on randomly selected stocks as the training set and ensure all the features we select are high-weighted. This procedure is called data resampling. Resampling can be built as an optimization step as an outer layer of the RFE algorithm.

The last part of our hybrid feature engineering algorithm is for optimization purposes. For the training data matrix scale reduction, we apply Randomized principal component analysis (PCA) [ 31 ], before we decide the features of the classification model.

Financial ratios of a listed company are used to present the growth ability, earning ability, solvency ability, etc. Each financial ratio consists of a set of technical indices, each time we add a technical index (or feature) will add another column of data into the data matrix and will result in low training efficiency and redundancy. If non-relevant or less relevant features are included in training data, it will also decrease the precision of classification.

figure a

The above equation represents the explanation power of principal components extracted by PCA method for original data. If an ACR is below 85%, the PCA method would be unsuitable due to a loss of original information. Because the covariance matrix is sensitive to the order of magnitudes of data, there should be a data standardize procedure before performing the PCA. The commonly used standardized methods are mean-standardization and normal-standardization and are noted as given below:

Mean-standardization: \(X_{ij}^{*} = X_{ij} /\overline{{X_{j} }}\) , which \(\overline{{X_{j} }}\) represents the mean value.

Normal-standardization: \(X_{ij}^{*} = (X_{ij} - \overline{{X_{j} }} )/s_{j}\) , which \(\overline{{X_{j} }}\) represents the mean value, and \(s_{j}\) is the standard deviation.

The array fe_array is defined according to Table  2 , row number maps to the features, columns 0, 1, 2, 3 note for the extension methods of normalize, polarize, max–min scale, and fluctuation percentage, respectively. Then we fill in the values for the array by the rule where 0 stands for no necessity to expand and 1 for features need to apply the corresponding extension methods. The final algorithm of data preprocessing using RFE and PCA can be illustrated as Algorithm 1.

Algorithm 2: Price trend prediction model using LSTM

After the principal component extraction, we will get the scale-reduced matrix, which means i most effective features are converted into j principal components for training the prediction model. We utilized an LSTM model and added a conversion procedure for our stock price dataset. The detailed algorithm design is illustrated in Alg 2. The function TimeSeriesConversion () converts the principal components matrix into time series by shifting the input data frame according to the number of time steps [ 3 ], i.e., term length in this research. The processed dataset consists of the input sequence and forecast sequence. In this research, the parameter of LAG is 1, because the model is detecting the pattern of features fluctuation on a daily basis. Meanwhile, the N_TIME_STEPS is varied from 1 trading day to 10 trading days. The functions DataPartition (), FitModel (), EvaluateModel () are regular steps without customization. The NN structure design, optimizer decision, and other parameters are illustrated in function ModelCompile () .

Some procedures impact the efficiency but do not affect the accuracy or precision and vice versa, while other procedures may affect both efficiency and prediction result. To fully evaluate our algorithm design, we structure the evaluation part by main procedures and evaluate how each procedure affects the algorithm performance. First, we evaluated our solution on a machine with 2.2 GHz i7 processor, with 16 GB of RAM. Furthermore, we also evaluated our solution on Amazon EC2 instance, 3.1 GHz Processor with 16 vCPUs, and 64 GB RAM.

In the implementation part, we expanded 20 features into 54 features, while we retain 30 features that are the most effective. In this section, we discuss the evaluation of feature selection. The dataset was divided into two different subsets, i.e., training and testing datasets. Test procedure included two parts, one testing dataset is for feature selection, and another one is for model testing. We note the feature selection dataset and model testing dataset as DS_test_f and DS_test_m, respectively.

We randomly selected two-thirds of the stock data by stock ID for RFE training and note the dataset as DS_train_f; all the data consist of full technical indices and expanded features throughout 2018. The estimator of the RFE algorithm is SVR with linear kernels. We rank the 54 features by voting and get 30 effective features then process them using the PCA algorithm to perform dimension reduction and reduce the features into 20 principal components. The rest of the stock data forms the testing dataset DS_test_f to validate the effectiveness of principal components we extracted from selected features. We reformed all the data from 2018 as the training dataset of the data model and noted as DS_train_m. The model testing dataset DS_test_m consists of the first 3 months of data in 2019, which has no overlap with the dataset we utilized in the previous steps. This approach is to prevent the hidden problem caused by overfitting.

Term length

To build an efficient prediction model, instead of the approach of modeling the data to time series, we determined to use 1 day ahead indices data to predict the price trend of the next day. We tested the RFE algorithm on a range of short-term from 1 day to 2 weeks (ten trading days), to evaluate how the commonly used technical indices correlated to price trends. For evaluating the prediction term length, we fully expanded the features as Table  2 , and feed them to RFE. During the test, we found that different length of the term has a different level of sensitive-ness to the same indices set.

We get the close price of the first trading date and compare it with the close price of the n _ th trading date. Since we are predicting the price trend, we do not consider the term lengths if the cross-validation score is below 0.5. And after the test, as we can see from Fig.  4 , there are three-term lengths that are most sensitive to the indices we selected from the related works. They are n  = {2, 5, 10}, which indicates that price trend prediction of every other day, 1 week, and 2 weeks using the indices set are likely to be more reliable.

figure 4

How do term lengths affect the cross-validation score of RFE

While these curves have different patterns, for the length of 2 weeks, the cross-validation score increases with the number of features selected. If the prediction term length is 1 week, the cross-validation score will decrease if selected over 8 features. For every other day price trend prediction, the best cross-validation score is achieved by selecting 48 features. Biweekly prediction requires 29 features to achieve the best score. In Table  3 , we listed the top 15 effective features for these three-period lengths. If we predict the price trend of every other day, the cross-validation score merely fluctuates with the number of features selected. So, in the next step, we will evaluate the RFE result for these three-term lengths, as shown in Fig.  4 .

We compare the output feature set of RFE with the all-original feature set as a baseline, the all-original feature set consists of n features and we choose n most effective features from RFE output features to evaluate the result using linear SVR. We used two different approaches to evaluate feature effectiveness. The first method is to combine all the data into one large matrix and evaluate them by running the RFE algorithm once. Another method is to run RFE for each individual stock and calculate the most effective features by voting.

Feature extension and RFE

From the result of the previous subsection, we can see that when predicting the price trend for every other day or biweekly, the best result is achieved by selecting a large number of features. Within the selected features, some features processed from extension methods have better ranks than original features, which proves that the feature extension method is useful for optimizing the model. The feature extension affects both precision and efficiency, while in this part, we only discuss the precision aspect and leave efficiency part in the next step since PCA is the most effective method for training efficiency optimization in our design. We involved an evaluation of how feature extension affects RFE and use the test result to measure the improvement of involving feature extension.

We further test the effectiveness of feature extension, i.e., if polarize, max–min scale, and calculate fluctuation percentage works better than original technical indices. The best case to leverage this test is the weekly prediction since it has the least effective feature selected. From the result we got from the last section, we know the best cross-validation score appears when selecting 8 features. The test consists of two steps, and the first step is to test the feature set formed by original features only, in this case, only SLOWK, SLOWD, and RSI_5 are included. The next step is to test the feature set of all 8 features we selected in the previous subsection. We leveraged the test by defining the simplest DNN model with three layers.

The normalized confusion matrix of testing the two feature sets are illustrated in Fig.  5 . The left one is the confusion matrix of the feature set with expanded features, and the right one besides is the test result of using original features only. Both precisions of true positive and true negative have been improved by 7% and 10%, respectively, which proves that our feature extension method design is reasonably effective.

figure 5

Confusion matrix of validating feature extension effectiveness

Feature reduction using principal component analysis

PCA will affect the algorithm performance on both prediction accuracy and training efficiency, while this part should be evaluated with the NN model, so we also defined the simplest DNN model with three layers as we used in the previous step to perform the evaluation. This part introduces the evaluation method and result of the optimization part of the model from computational efficiency and accuracy impact perspectives.

In this section, we will choose bi-weekly prediction to perform a use case analysis, since it has a smoothly increasing cross-validation score curve, moreover, unlike every other day prediction, it has excluded more than 20 ineffective features already. In the first step, we select all 29 effective features and train the NN model without performing PCA. It creates a baseline of the accuracy and training time for comparison. To evaluate the accuracy and efficiency, we keep the number of the principal component as 5, 10, 15, 20, 25. Table  4 recorded how the number of features affects the model training efficiency, then uses the stack bar chart in Fig.  6 to illustrate how PCA affects training efficiency. Table  6 shows accuracy and efficiency analysis on different procedures for the pre-processing of features. The times taken shown in Tables  4 , 6 are based on experiments conducted in a standard user machine to show the viability of our solution with limited or average resource availability.

figure 6

Relationship between feature number and training time

We also listed the confusion matrix of each test in Fig.  7 . The stack bar chart shows that the overall time spends on training the model is decreasing by the number of selected features, while the PCA method is significantly effective in optimizing training dataset preparation. For the time spent on the training stage, PCA is not as effective as the data preparation stage. While there is the possibility that the optimization effect of PCA is not drastic enough because of the simple structure of the NN model.

figure 7

How does the number of principal components affect evaluation results

Table  5 indicates that the overall prediction accuracy is not drastically affected by reducing the dimension. However, the accuracy could not fully support if the PCA has no side effect to model prediction, so we looked into the confusion matrices of test results.

From Fig.  7 we can conclude that PCA does not have a severe negative impact on prediction precision. The true positive rate and false positive rate are barely be affected, while the false negative and true negative rates are influenced by 2% to 4%. Besides evaluating how the number of selected features affects the training efficiency and model performance, we also leveraged a test upon how data pre-processing procedures affect the training procedure and predicting result. Normalizing and max–min scaling is the most commonly seen data pre-procedure performed before PCA, since the measure units of features are varied, and it is said that it could increase the training efficiency afterward.

We leveraged another test on adding pre-procedures before extracting 20 principal components from the original dataset and make the comparison in the aspects of time elapse of training stage and prediction precision. However, the test results lead to different conclusions. In Table  6 we can conclude that feature pre-processing does not have a significant impact on training efficiency, but it does influence the model prediction accuracy. Moreover, the first confusion matrix in Fig.  8 indicates that without any feature pre-processing procedure, the false-negative rate and true negative rate are severely affected, while the true positive rate and false positive rate are not affected. If it performs the normalization before PCA, both true positive rate and true negative rate are decreasing by approximately 10%. This test also proved that the best feature pre-processing method for our feature set is exploiting the max–min scale.

figure 8

Confusion matrices of different feature pre-processing methods

In this section, we discuss and compare the results of our proposed model, other approaches, and the most related works.

Comparison with related works

From the previous works, we found the most commonly exploited models for short-term stock market price trend prediction are support vector machine (SVM), multilayer perceptron artificial neural network (MLP), Naive Bayes classifier (NB), random forest classifier (RAF) and logistic regression classifier (LR). The test case of comparison is also bi-weekly price trend prediction, to evaluate the best result of all models, we keep all 29 features selected by the RFE algorithm. For MLP evaluation, to test if the number of hidden layers would affect the metric scores, we noted layer number as n and tested n  = {1, 3, 5}, 150 training epochs for all the tests, found slight differences in the model performance, which indicates that the variable of MLP layer number hardly affects the metric scores.

From the confusion matrices in Fig.  9 , we can see all the machine learning models perform well when training with the full feature set we selected by RFE. From the perspective of training time, training the NB model got the best efficiency. LR algorithm cost less training time than other algorithms while it can achieve a similar prediction result with other costly models such as SVM and MLP. RAF algorithm achieved a relatively high true-positive rate while the poor performance in predicting negative labels. For our proposed LSTM model, it achieves a binary accuracy of 93.25%, which is a significantly high precision of predicting the bi-weekly price trend. We also pre-processed data through PCA and got five principal components, then trained for 150 epochs. The learning curve of our proposed solution, based on feature engineering and the LSTM model, is illustrated in Fig.  10 . The confusion matrix is the figure on the right in Fig.  11 , and detailed metrics scores can be found in Table  9 .

figure 9

Model prediction comparison—confusion matrices

figure 10

Learning curve of proposed solution

figure 11

Proposed model prediction precision comparison—confusion matrices

The detailed evaluate results are recorded in Table  7 . We will also initiate a discussion upon the evaluation result in the next section.

Because the resulting structure of our proposed solution is different from most of the related works, it would be difficult to make naïve comparison with previous works. For example, it is hard to find the exact accuracy number of price trend prediction in most of the related works since the authors prefer to show the gain rate of simulated investment. Gain rate is a processed number based on simulated investment tests, sometimes one correct investment decision with a large trading volume can achieve a high gain rate regardless of the price trend prediction accuracy. Besides, it is also a unique and heuristic innovation in our proposed solution, we transform the problem of predicting an exact price straight forward to two sequential problems, i.e., predicting the price trend first, focus on building an accurate binary classification model, construct a solid foundation for predicting the exact price change in future works. Besides the different result structure, the datasets that previous works researched on are also different from our work. Some of the previous works involve news data to perform sentiment analysis and exploit the SE part as another system component to support their prediction model.

The latest related work that can compare is Zubair et al. [ 47 ], the authors take multiple r-square for model accuracy measurement. Multiple r-square is also called the coefficient of determination, and it shows the strength of predictor variables explaining the variation in stock return [ 28 ]. They used three datasets (KSE 100 Index, Lucky Cement Stock, Engro Fertilizer Limited) to evaluate the proposed multiple regression model and achieved 95%, 89%, and 97%, respectively. Except for the KSE 100 Index, the dataset choice in this related work is individual stocks; thus, we choose the evaluation result of the first dataset of their proposed model.

We listed the leading stock price trend prediction model performance in Table  8 , from the comparable metrics, the metric scores of our proposed solution are generally better than other related works. Instead of concluding arbitrarily that our proposed model outperformed other models in related works, we first look into the dataset column of Table  8 . By looking into the dataset used by each work [ 18 ], only trained and tested their proposed solution on three individual stocks, which is difficult to prove the generalization of their proposed model. Ayo [ 2 ] leveraged analysis on the stock data from the New York Stock Exchange (NYSE), while the weakness is they only performed analysis on closing price, which is a feature embedded with high noise. Zubair et al. [ 47 ] trained their proposed model on both individual stocks and index price, but as we have mentioned in the previous section, index price only consists of the limited number of features and stock IDs, which will further affect the model training quality. For our proposed solution, we collected sufficient data from the Chinese stock market, and applied FE + RFE algorithm on the original indices to get more effective features, the comprehensive evaluation result of 3558 stock IDs can reasonably explain the generalization and effectiveness of our proposed solution in Chinese stock market. However, the authors of Khaidem and Dey [ 18 ] and Ayo [ 2 ] chose to analyze the stock market in the United States, Zubair et al. [ 47 ] performed analysis on Pakistani stock market price, and we obtained the dataset from Chinese stock market, the policies of different countries might impact the model performance, which needs further research to validate.

Proposed model evaluation—PCA effectiveness

Besides comparing the performance across popular machine learning models, we also evaluated how the PCA algorithm optimizes the training procedure of the proposed LSTM model. We recorded the confusion matrices comparison between training the model by 29 features and by five principal components in Fig.  11 . The model training using the full 29 features takes 28.5 s per epoch on average. While it only takes 18 s on average per epoch training on the feature set of five principal components. PCA has significantly improved the training efficiency of the LSTM model by 36.8%. The detailed metrics data are listed in Table  9 . We will leverage a discussion in the next section about complexity analysis.

Complexity analysis of proposed solution

This section analyzes the complexity of our proposed solution. The Long Short-term Memory is different from other NNs, and it is a variant of standard RNN, which also has time steps with memory and gate architecture. In the previous work [ 46 ], the author performed an analysis of the RNN architecture complexity. They introduced a method to regard RNN as a directed acyclic graph and proposed a concept of recurrent depth, which helps perform the analysis on the intricacy of RNN.

The recurrent depth is a positive rational number, and we denote it as \(d_{rc}\) . As the growth of \(n\) \(d_{rc}\) measures, the nonlinear transformation average maximum number of each time step. We then unfold the directed acyclic graph of RNN and denote the processed graph as \(g_{c}\) , meanwhile, denote \(C(g_{c} )\) as the set of directed cycles in this graph. For the vertex \(v\) , we note \(\sigma_{s} (v)\) as the sum of edge weights and \(l(v)\) as the length. The equation below is proved under a mild assumption, which could be found in [ 46 ].

They also found that another crucial factor that impacts the performance of LSTM, which is the recurrent skip coefficients. We note \(s_{rc}\) as the reciprocal of the recurrent skip coefficient. Please be aware that \(s_{rc}\) is also a positive rational number.

According to the above definition, our proposed model is a 2-layers stacked LSTM, which \(d_{rc} = 2\) and \(s_{rc} = 1\) . From the experiments performed in previous work, the authors also found that when facing the problems of long-term dependency, LSTMs may benefit from decreasing the reciprocal of recurrent skip coefficients and from increasing recurrent depth. The empirical findings above mentioned are useful to enhance the performance of our proposed model further.

This work consists of three parts: data extraction and pre-processing of the Chinese stock market dataset, carrying out feature engineering, and stock price trend prediction model based on the long short-term memory (LSTM). We collected, cleaned-up, and structured 2 years of Chinese stock market data. We reviewed different techniques often used by real-world investors, developed a new algorithm component, and named it as feature extension, which is proved to be effective. We applied the feature expansion (FE) approaches with recursive feature elimination (RFE), followed by principal component analysis (PCA), to build a feature engineering procedure that is both effective and efficient. The system is customized by assembling the feature engineering procedure with an LSTM prediction model, achieved high prediction accuracy that outperforms the leading models in most related works. We also carried out a comprehensive evaluation of this work. By comparing the most frequently used machine learning models with our proposed LSTM model under the feature engineering part of our proposed system, we conclude many heuristic findings that could be future research questions in both technical and financial research domains.

Our proposed solution is a unique customization as compared to the previous works because rather than just proposing yet another state-of-the-art LSTM model, we proposed a fine-tuned and customized deep learning prediction system along with utilization of comprehensive feature engineering and combined it with LSTM to perform prediction. By researching into the observations from previous works, we fill in the gaps between investors and researchers by proposing a feature extension algorithm before recursive feature elimination and get a noticeable improvement in the model performance.

Though we have achieved a decent outcome from our proposed solution, this research has more potential towards research in future. During the evaluation procedure, we also found that the RFE algorithm is not sensitive to the term lengths other than 2-day, weekly, biweekly. Getting more in-depth research into what technical indices would influence the irregular term lengths would be a possible future research direction. Moreover, by combining latest sentiment analysis techniques with feature engineering and deep learning model, there is also a high potential to develop a more comprehensive prediction system which is trained by diverse types of information such as tweets, news, and other text-based data.

Abbreviations

Long short term memory

Principal component analysis

Recurrent neural networks

Artificial neural network

Deep neural network

Dynamic Time Warping

Recursive feature elimination

Support vector machine

Convolutional neural network

Stochastic gradient descent

Rectified linear unit

Multi layer perceptron

Atsalakis GS, Valavanis KP. Forecasting stock market short-term trends using a neuro-fuzzy based methodology. Expert Syst Appl. 2009;36(7):10696–707.

Article   Google Scholar  

Ayo CK. Stock price prediction using the ARIMA model. In: 2014 UKSim-AMSS 16th international conference on computer modelling and simulation. 2014. https://doi.org/10.1109/UKSim.2014.67 .

Brownlee J. Deep learning for time series forecasting: predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery. 2018. https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

Eapen J, Bein D, Verma A. Novel deep learning model with CNN and bi-directional LSTM for improved stock market index prediction. In: 2019 IEEE 9th annual computing and communication workshop and conference (CCWC). 2019. pp. 264–70. https://doi.org/10.1109/CCWC.2019.8666592 .

Fischer T, Krauss C. Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res. 2018;270(2):654–69. https://doi.org/10.1016/j.ejor.2017.11.054 .

Article   MathSciNet   MATH   Google Scholar  

Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn 2002;46:389–422.

Hafezi R, Shahrabi J, Hadavandi E. A bat-neural network multi-agent system (BNNMAS) for stock price prediction: case study of DAX stock price. Appl Soft Comput J. 2015;29:196–210. https://doi.org/10.1016/j.asoc.2014.12.028 .

Halko N, Martinsson PG, Tropp JA. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 2001;53(2):217–88.

Article   MathSciNet   Google Scholar  

Hassan MR, Nath B. Stock market forecasting using Hidden Markov Model: a new approach. In: Proceedings—5th international conference on intelligent systems design and applications 2005, ISDA’05. 2005. pp. 192–6. https://doi.org/10.1109/ISDA.2005.85 .

Hochreiter S, Schmidhuber J. Long short-term memory. J Neural Comput. 1997;9(8):1735–80. https://doi.org/10.1162/neco.1997.9.8.1735 .

Hsu CM. A hybrid procedure with feature selection for resolving stock/futures price forecasting problems. Neural Comput Appl. 2013;22(3–4):651–71. https://doi.org/10.1007/s00521-011-0721-4 .

Huang CF, Chang BR, Cheng DW, Chang CH. Feature selection and parameter optimization of a fuzzy-based stock selection model using genetic algorithms. Int J Fuzzy Syst. 2012;14(1):65–75. https://doi.org/10.1016/J.POLYMER.2016.08.021 .

Huang CL, Tsai CY. A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting. Expert Syst Appl. 2009;36(2 PART 1):1529–39. https://doi.org/10.1016/j.eswa.2007.11.062 .

Idrees SM, Alam MA, Agarwal P. A prediction approach for stock market volatility based on time series data. IEEE Access. 2019;7:17287–98. https://doi.org/10.1109/ACCESS.2019.2895252 .

Ince H, Trafalis TB. Short term forecasting with support vector machines and application to stock price prediction. Int J Gen Syst. 2008;37:677–87. https://doi.org/10.1080/03081070601068595 .

Jeon S, Hong B, Chang V. Pattern graph tracking-based stock price prediction using big data. Future Gener Comput Syst. 2018;80:171–87. https://doi.org/10.1016/j.future.2017.02.010 .

Kara Y, Acar Boyacioglu M, Baykan ÖK. Predicting direction of stock price index movement using artificial neural networks and support vector machines: the sample of the Istanbul Stock Exchange. Expert Syst Appl. 2011;38(5):5311–9. https://doi.org/10.1016/j.eswa.2010.10.027 .

Khaidem L, Dey SR. Predicting the direction of stock market prices using random forest. 2016. pp. 1–20.

Kim K, Han I. Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert Syst Appl. 2000;19:125–32. https://doi.org/10.1016/S0957-4174(00)00027-0 .

Lee MC. Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Syst Appl. 2009;36(8):10896–904. https://doi.org/10.1016/j.eswa.2009.02.038 .

Lei L. Wavelet neural network prediction method of stock price trend based on rough set attribute reduction. Appl Soft Comput J. 2018;62:923–32. https://doi.org/10.1016/j.asoc.2017.09.029 .

Lin X, Yang Z, Song Y. Expert systems with applications short-term stock price prediction based on echo state networks. Expert Syst Appl. 2009;36(3):7313–7. https://doi.org/10.1016/j.eswa.2008.09.049 .

Liu G, Wang X. A new metric for individual stock trend prediction. Eng Appl Artif Intell. 2019;82(March):1–12. https://doi.org/10.1016/j.engappai.2019.03.019 .

Liu S, Zhang C, Ma J. CNN-LSTM neural network model for quantitative strategy analysis in stock markets. 2017;1:198–206. https://doi.org/10.1007/978-3-319-70096-0 .

Long W, Lu Z, Cui L. Deep learning-based feature engineering for stock price movement prediction. Knowl Based Syst. 2018;164:163–73. https://doi.org/10.1016/j.knosys.2018.10.034 .

Malkiel BG, Fama EF. Efficient capital markets: a review of theory and empirical work. J Finance. 1970;25(2):383–417.

McNally S, Roche J, Caton S. Predicting the price of bitcoin using machine learning. In: Proceedings—26th Euromicro international conference on parallel, distributed, and network-based processing, PDP 2018. pp. 339–43. https://doi.org/10.1109/PDP2018.2018.00060 .

Nagar A, Hahsler M. News sentiment analysis using R to predict stock market trends. 2012. http://past.rinfinance.com/agenda/2012/talk/Nagar+Hahsler.pdf . Accessed 20 July 2019.

Nekoeiqachkanloo H, Ghojogh B, Pasand AS, Crowley M. Artificial counselor system for stock investment. 2019. ArXiv Preprint arXiv:1903.00955 .

Ni LP, Ni ZW, Gao YZ. Stock trend prediction based on fractal feature selection and support vector machine. Expert Syst Appl. 2011;38(5):5569–76. https://doi.org/10.1016/j.eswa.2010.10.079 .

Pang X, Zhou Y, Wang P, Lin W, Chang V. An innovative neural network approach for stock market prediction. J Supercomput. 2018. https://doi.org/10.1007/s11227-017-2228-y .

Pimenta A, Nametala CAL, Guimarães FG, Carrano EG. An automated investing method for stock market based on multiobjective genetic programming. Comput Econ. 2018;52(1):125–44. https://doi.org/10.1007/s10614-017-9665-9 .

Piramuthu S. Evaluating feature selection methods for learning in data mining applications. Eur J Oper Res. 2004;156(2):483–94. https://doi.org/10.1016/S0377-2217(02)00911-6 .

Qiu M, Song Y. Predicting the direction of stock market index movement using an optimized artificial neural network model. PLoS ONE. 2016;11(5):e0155133.

Scikit-learn. Scikit-learn Min-Max Scaler. 2019. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html . Retrieved 26 July 2020.

Shen J. Thesis, “Short-term stock market price trend prediction using a customized deep learning system”, supervised by M. Omair Shafiq, Carleton University. 2019.

Shen J, Shafiq MO. Deep learning convolutional neural networks with dropout—a parallel approach. ICMLA. 2018;2018:572–7.

Google Scholar  

Shen J, Shafiq MO. Learning mobile application usage—a deep learning approach. ICMLA. 2019;2019:287–92.

Shih D. A study of early warning system in volume burst risk assessment of stock with Big Data platform. In: 2019 IEEE 4th international conference on cloud computing and big data analysis (ICCCBDA). 2019. pp. 244–8.

Sirignano J, Cont R. Universal features of price formation in financial markets: perspectives from deep learning. Ssrn. 2018. https://doi.org/10.2139/ssrn.3141294 .

Article   MATH   Google Scholar  

Thakur M, Kumar D. A hybrid financial trading support system using multi-category classifiers and random forest. Appl Soft Comput J. 2018;67:337–49. https://doi.org/10.1016/j.asoc.2018.03.006 .

Tsai CF, Hsiao YC. Combining multiple feature selection methods for stock prediction: union, intersection, and multi-intersection approaches. Decis Support Syst. 2010;50(1):258–69. https://doi.org/10.1016/j.dss.2010.08.028 .

Tushare API. 2018. https://github.com/waditu/tushare . Accessed 1 July 2019.

Wang X, Lin W. Stock market prediction using neural networks: does trading volume help in short-term prediction?. n.d.

Weng B, Lu L, Wang X, Megahed FM, Martinez W. Predicting short-term stock prices using ensemble methods and online data sources. Expert Syst Appl. 2018;112:258–73. https://doi.org/10.1016/j.eswa.2018.06.016 .

Zhang S. Architectural complexity measures of recurrent neural networks, (NIPS). 2016. pp. 1–9.

Zubair M, Fazal A, Fazal R, Kundi M. Development of stock market trend prediction system using multiple regression. Computational and mathematical organization theory. Berlin: Springer US; 2019. https://doi.org/10.1007/s10588-019-09292-7 .

Book   Google Scholar  

Download references

Acknowledgements

This research is supported by Carleton University, in Ottawa, ON, Canada. This research paper has been built based on the thesis [ 36 ] of Jingyi Shen, supervised by M. Omair Shafiq at Carleton University, Canada, available at https://curve.carleton.ca/52e9187a-7f71-48ce-bdfe-e3f6a420e31a .

NSERC and Carleton University.

Author information

Authors and affiliations.

School of Information Technology, Carleton University, Ottawa, ON, Canada

Jingyi Shen & M. Omair Shafiq

You can also search for this author in PubMed   Google Scholar

Contributions

Yes. All authors read and approved the final manuscript.

Corresponding author

Correspondence to M. Omair Shafiq .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shen, J., Shafiq, M.O. Short-term stock market price trend prediction using a comprehensive deep learning system. J Big Data 7 , 66 (2020). https://doi.org/10.1186/s40537-020-00333-6

Download citation

Received : 24 January 2020

Accepted : 30 July 2020

Published : 28 August 2020

DOI : https://doi.org/10.1186/s40537-020-00333-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Stock market trend
  • Feature engineering

short term papers

  • Search Search Please fill out this field.

What Is Tax-Exempt Commercial Paper?

  • How It Works

The Bottom Line

  • Fixed Income

Tax-Exempt Commerical Paper: What It is, How it Works

Julia Kagan is a financial/consumer journalist and former senior editor, personal finance, of Investopedia.

short term papers

Erika Rasure is globally-recognized as a leading consumer economics subject matter expert, researcher, and educator. She is a financial therapist and transformational coach, with a special interest in helping women learn how to invest.

short term papers

Investopedia / Theresa Chiechi

  • Tax-exempt commercial paper is short-term unsecured debt where the bondholder does not pay federal, state, or local taxes on the interest payments.

Key Takeaways

  • Tax-exempt commercial paper is issued with a fixed interest rate, has a maturity date of fewer than 270 days, and is commonly denominated in increments of $1,000.
  • Interest rates on tax-exempt commercial paper are typically higher than other short-term cash instruments but will be lower than taxable debt.

Understanding Tax-Exempt Commercial Paper

Tax-exempt commercial paper is usually issued to finance short-term liabilities, which provides the debt holders (bondholders) with some level of tax preference on their debt investment earnings. Tax-exempt commercial paper is issued with a fixed interest rate , has a maturity date of fewer than 270 days, and is commonly denominated in increments of $1,000.

Commercial paper is mostly a promissory note backed by the financial intuition's health. Federal government policy does not cover losses incurred from investing in commercial paper. Furthermore, the Federal Deposit Insurance Company (FDIC) does not insure against losses from investing in tax-exempt commercial paper.  An investor's due diligence should include checking the desired tax-exempt commercial paper's quality ratings listed by agencies such as Standard & Poor's or Moody's . 

Given the probability of default risk and timeliness issues, interest rates on tax-exempt commercial paper are typically higher than other short-term cash instruments. Conversely, tax-exempt commercial paper interest rates will be lower than taxable debt. Additionally, tax-exempt commercial paper interest rates should rise as the economy grows.

Tax-exempt commercial paper issued by the government is an indirect method of support for those specific entities as opposed to directly funding these entities. The government forgoes the collection of taxes on the interest income, but the logic is that the entity issuing the tax-exempt commercial paper will engage in activities that serve the community that will end up generating more value than the lost tax revenue. Thus, tax-exempt commercial paper can be viewed as an instrument of public policy.

Only companies with an investment-grade rating may issue commercial paper. Institutions, such as universities and governments, typically issue tax-exempt commercial paper, while banks, mutual funds, or brokerage firms buy the tax-exempt commercial paper. The buyers may hold the commercial paper as an investment or act as an intermediary and resell the investment to their customers. There is a limited market for tax-exempt commercial paper issued directly to smaller investors. Due to the 2008 financial recession, new legislation limits the type and amount of commercial paper held in money market funds.

The Federal Reserve Board (FRB) publishes current borrowing rates on commercial paper on its website. The FRB also publishes the rates of highly rated commercial paper in a statistical release occurring each Friday. Information relating to the total amount of outstanding paper issued is also released once per week.

The 3-month Fed rate for AA financial commercial paper, as of July 2023.

Tax-Exempt Commercial Paper Benefits

The tax-exempt commercial paper is beneficial for the borrower (issuer) as they are able to access funds at lower rates than they might otherwise have to pay if they had borrowed the money from a traditional financial institution, such as a bank. Tax-exempt commercial paper can be beneficial for the lender (bond buyer) as the net rate of return may end up being higher than if they had invested in taxable commercial paper.

Why Do Municipalities Issue Commercial Paper?

Municipalities and local governments may issue tax-exempt commercial paper as a way to meet short-term financial obligations, such as payroll or government expenses. They may also issue commercial paper as a way to meet expenses while pursuing longer-term capital raises.

Where Do You Buy Tax-Exempt Commercial Paper?

Most commercial paper is sold in very large increments that are not available to the average retail investor. However, you can gain exposure to the commercial paper market by investing in a mutual fund or money market fund that invests in tax-exempt commercial paper.

Who Can Issue Tax-Exempt Commercial Paper?

Only governments and affiiliated bodies can issue tax-exempt commercial paper. The rules for issuance are determined by the tax code of the issuing state.

Tax-exempt commercial paper refers to short-term securities whose interest is exempt from certain state or local income taxes. This is frequently used by local and municipal governments as a way to finance their short-term debt obligations. Due to certain associated risks, the interest rates on tax-exempt commercial paper are typically higher than other short-term cash instruments.

Board of Governors of the Federal Reserve System. " Commercial Paper Rates and Outstanding Summary: About Commercial Paper ."

Federal Deposit Insurance Corporation. " Financial Products That Are Not Insured by the FDIC ."

Washington State, Office of the Attorney General. " Ability of State and Local Governments to Invest in Commercial Paper:AGO 1993 No. 8 - Apr 27 1993 ."

U.S. Securities and Exchange Commission. " Testimony on 'Perspectives on Money Market Mutual Fund Reforms' ."

Board of Governors of the Federal Reserve System. " Commercial Paper Rates and Outstanding Summary: Commercial Paper Rates ."

Board of Governors of the Federal Reserve System. " Commercial Paper Rates and Outstanding Summary: Volume Statistics for Commercial Paper Issuance ."

Federal Reserve Board. " Commercial Paper Rates and Outstanding Summary ."

short term papers

  • Terms of Service
  • Editorial Policy
  • Privacy Policy

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, time series forecasting.

505 papers with code • 71 benchmarks • 31 datasets

Time Series Forecasting is the task of fitting a model to historical, time-stamped data in order to predict future values. Traditional approaches include moving average, exponential smoothing, and ARIMA, though models as various as RNNs, Transformers, or XGBoost can also be applied. The most popular benchmark is the ETTh1 dataset. Models are typically evaluated using the Mean Square Error (MSE) or Root Mean Square Error (RMSE).

( Image credit: ThaiBinh Nguyen )

short term papers

Benchmarks Add a Result

--> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> -->
Trend Dataset Best ModelPaper Code Compare
D-PAD
SegRNN
PatchMixer
PatchTST/64
PatchMixer
AutoCon
PatchMixer
PatchMixer
AutoCon
SegRNN
SegRNN
PatchMixer
SegRNN
SegRNN
GBRT
SegRNN
PatchMixer
PatchMixer
SegRNN
SegRNN
MoLE-RMLP
PatchMixer
SCINet
SCINet
TSMixer
MoLE-RMLP
GBRT
STGCN-Cov
SCINet
SCINet
SCINet
SCINet
SCINet
SCINet
SCINet
SCINet
SCINet
Informer
MoLE-DLinear
SCNN
SCNN
TSMixer
MoLE-DLinear
TSMixer
TSMixer
TSMixer
TSMixer
AA-Forecast
MoLE-RLinear
MoLE-DLinear
MoLE-RLinear
MoLE-RLinear
GA-LSTM
AA-Forecast
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
MoLE-DLinear
TSMixer
TSMixer
TSMixer
TSMixer
TSMixer

short term papers

Most implemented papers

Sequence to sequence learning with neural networks.

short term papers

Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

Multi-horizon forecasting problems often contain a complex mix of inputs -- including static (i. e. time-invariant) covariates, known future inputs, and other exogenous time series that are only observed historically -- without any prior information on how they interact with the target.

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

laiguokun/multivariate-time-series-data • 21 Mar 2017

Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation.

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

Probabilistic forecasting, i. e. estimating the probability distribution of a time series' future given its past, is a key enabler for optimizing business processes.

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

We focus on solving the univariate times series point forecasting problem using deep learning.

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Spatiotemporal forecasting has various applications in neuroscience, climate and transportation domain.

AA-Forecast: Anomaly-Aware Forecast for Extreme Events

Moreover, the framework employs a dynamic uncertainty optimization algorithm that reduces the uncertainty of forecasts in an online manner.

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp.

Are Transformers Effective for Time Series Forecasting?

Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task.

GluonTS: Probabilistic Time Series Models in Python

short term papers

We introduce Gluon Time Series (GluonTS, available at https://gluon-ts. mxnet. io), a library for deep-learning-based time series modeling.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

The Mind and Brain of Short-Term Memory

The past 10 years have brought near-revolutionary changes in psychological theories about short-term memory, with similarly great advances in the neurosciences. Here, we critically examine the major psychological theories (the “mind”) of short-term memory and how they relate to evidence about underlying brain mechanisms. We focus on three features that must be addressed by any satisfactory theory of short-term memory. First, we examine the evidence for the architecture of short-term memory, with special attention to questions of capacity and how—or whether—short-term memory can be separated from long-term memory. Second, we ask how the components of that architecture enact processes of encoding, maintenance, and retrieval. Third, we describe the debate over the reason about forgetting from short-term memory, whether interference or decay is the cause. We close with a conceptual model tracing the representation of a single item through a short-term memory task, describing the biological mechanisms that might support psychological processes on a moment-by-moment basis as an item is encoded, maintained over a delay with some forgetting, and ultimately retrieved.

INTRODUCTION

Mentally add 324 and 468. Follow the instructions to complete any form for your federal income taxes. Read and comprehend this sentence.

What are the features of the memory system that allows us to complete these and other complex tasks? Consider the opening example. First, you must create a temporary representation in memory for the two numbers. This representation needs to survive for several seconds to complete the task. You must then allocate your attention to different portions of the representation so that you can apply the rules of arithmetic required by the task. By one strategy, you need to focus attention on the “tens” digits (“2” and “6”) and mitigate interference from the other digits (e.g., “3” and “4”) and from partial results of previous operations (e.g., the “12” that results from adding “4” and “8”). While attending to local portions of the problem, you must also keep accessible the parts of the problem that are not in the current focus of attention (e.g., that you now have the units digit “2” as a portion of the final answer). These tasks implicate a short-term memory (STM). In fact, there is hardly a task that can be completed without the involvement of STM, making it a critical component of cognition.

Our review relates the psychological phenomena of STM to their underlying neural mechanisms. The review is motivated by three questions that any adequate account of STM must address:

1. What is its structure?

A proper theory must describe an architecture for short-term storage. Candidate components of this architecture include storage buffers, a moving and varying focus of attention, or traces with differing levels of activation. In all cases, it is essential to provide a mechanism that allows a representation to exist beyond the sensory stimulation that caused it or the process that retrieved the representation from long-term memory (LTM). This architecture should be clear about its psychological constructs. Furthermore, being clear about the neural mechanisms that implement those constructs will aid in development of psychological theory, as we illustrate below.

2. What processes operate on the stored information?

A proper theory must articulate the processes that create and operate on representations. Candidate processes include encoding and maintenance operations, rehearsal, shifts of attention from one part of the representation to another, and retrieval mechanisms. Some of these processes are often classified as executive functions.

3. What causes forgetting?

A complete theory of STM must account for the facts of forgetting. Traditionally, the two leading contending accounts of forgetting have relied on the concepts of decay and interference. We review the behavioral and neurophysiological evidence that has traditionally been brought to the table to distinguish decay and interference accounts, and we suggest a possible mechanism for short-term forgetting.

Most models of STM fall between two extremes: Multistore models view STM and LTM as architecturally separate systems that rely on distinct representations. By contrast, according to unitary-store models, STM and LTM rely largely on the same representations, but differ by ( a ) the level of activation of these representations and ( b ) some of the processes that normally act upon them. We focus on the distinctions drawn by these theories as we examine the evidence concerning the three questions that motivate our review. In this discussion, we assume that a representation in memory consists of a bundle of features that define a memorandum, including the context in which that memorandum was encountered.

WHAT IS THE STRUCTURE OF SHORT-TERM MEMORY?

Multistore models that differentiate short- and long-term memory.

In his Principles of Psychology , William James (1890) articulated the view that short-term (“primary”) memory is qualitatively different from long-term (“secondary”) memory (see also Hebb 1949 ). The most influential successor to this view is the model of STM developed by Baddeley and colleagues (e.g., Baddeley 1986 , 1992 ; Baddeley & Hitch 1974 ; Repov & Baddeley 2006 ). For the years 1980 to 2006, of the 16,154 papers that cited “working memory” in their titles or abstracts, fully 7339 included citations to Alan Baddeley.

According to Baddeley’s model, there are separate buffers for different forms of information. These buffers, in turn, are separate from LTM. A verbal buffer, the phonological loop, is assumed to hold information that can be rehearsed verbally (e.g., letters, digits). A visuospatial sketchpad is assumed to maintain visual information and can be further fractionated into visual/object and spatial stores ( Repov & Baddeley 2006 , Smith et al. 1995 ). An episodic buffer that draws on the other buffers and LTM has been added to account for the retention of multimodal information ( Baddeley 2000 ). In addition to the storage buffers described above, a central executive is proposed to organize the interplay between the various buffers and LTM and is implicated in controlled processing.

In short, the multistore model includes several distinctions: ( a ) STM is distinct from LTM, ( b ) STM can be stratified into different informational buffers based on information type, and ( c ) storage and executive processes are distinguishable. Evidence in support of these claims has relied on behavioral interference studies, neuropsychological studies, and neuroimaging data.

Evidence for the distinction between short- and long-term memory

Studies of brain-injured patients who show a deficit in STM but not LTM or vice versa lead to the implication that STM and LTM are separate systems. 1 Patients with parietal and temporal lobe damage show impaired short-term phonological capabilities but intact LTM( Shallice & Warrington 1970 , Vallar & Papagno 2002 ). Conversely, it is often claimed that patients with medial temporal lobe (MTL) damage demonstrate impaired LTM but preserved STM (e.g., Baddeley & Warrington 1970 , Scoville & Milner 1957 ; we reinterpret these effects below).

Neuroimaging data from healthy subjects have yielded mixed results, however. A meta-analysis comparing regions activated during verbal LTM and STM tasks indicated a great deal of overlap in neural activation for the tasks in the frontal and parietal lobes ( Cabeza et al. 2002 , Cabeza & Nyberg 2000 ). Three studies that directly compared LTM and STM in the same subjects did reveal some regions selective for each memory system ( Braver et al. 2001 , Cabeza et al. 2002 , Talmi et al. 2005 ). Yet, of these studies, only one found that the MTL was uniquely activated for LTM ( Talmi et al. 2005 ). What might account for the discrepancy between the neuropsychological and neuroimaging data?

One possibility is that neuroimaging tasks of STM often use longer retention intervals than those employed for neuropsychological tasks, making the STM tasks more similar to LTM tasks. In fact, several studies have shown that the MTL is important when retention intervals are longer than a few seconds ( Buffalo et al. 1998 , Cabeza et al. 2002 , Holdstock et al. 1995 , Owen et al. 1995 ). Of the studies that compared STM and LTM in the same subjects, only Talmi et al. (2005) used an STM retention interval shorter than five seconds. This study did find, in fact, that the MTL was uniquely recruited at longer retention intervals, providing support for the earlier neuropsychological work dissociating long- and short-term memory. As we elaborate below, however, there are other possible interpretations, especially with regard to the MTL’s role in memory.

Evidence for separate buffers in short-term memory

The idea that STM can be parceled into information-specific buffers first received support from a series of studies of selective interference (e.g., Brooks 1968 , den Heyer & Barrett 1971 ). These studies relied on the logic that if two tasks use the same processing mechanisms, they should show interfering effects on one another if performed concurrently. This work showed a double dissociation: Verbal tasks interfered with verbal STM but not visual STM, and visual tasks interfered with visual STM but not verbal STM, lending support to the idea of separable memory systems (for reviews, see Baddeley 1986 and Baddeley & Hitch 1974 ).

The advent of neuroimaging has allowed researchers to investigate the neural correlates of the reputed separability of STM buffers. Verbal STM has been shown to rely primarily on left inferior frontal and left parietal cortices, spatial STM on right posterior dorsal frontal and right parietal cortices, and object/visual STM on left inferior frontal, left parietal, and left inferior temporal cortices (e.g., Awh et al. 1996 , Jonides et al. 1993 , Smith & Jonides 1997 ; see review by Wager & Smith 2003 ). Verbal STM shows a marked left hemisphere preference, whereas spatial and object STM can be distinguished mainly by a dorsal versus ventral separation in posterior cortices (consistent with Ungerleider & Haxby 1994 ; see Baddeley 2003 for an account of the function of these regions in the service of STM).

The more recently postulated episodic buffer arose from the need to account for interactions between STM buffers and LTM. For example, the number of words recalled in an STM experiment can be greatly increased if the words form a sentence ( Baddeley et al. 1987 ). This “chunking” together of words to increase short-term capacity relies on additional information from LTM that can be used to integrate the words ( Baddeley 2000 ). Thus, there must be some representational space that allows for the integration of information stored in the phonological loop and LTM. This ability to integrate information from STM and LTM is relatively preserved even when one of these memory systems is damaged ( Baddeley & Wilson 2002 , Baddeley et al. 1987 ). These data provide support for an episodic buffer that is separable from other short-term buffers and from LTM ( Baddeley 2000 , Baddeley & Wilson 2002 ). Although neural evidence about the possible localization of this buffer is thin, there is some suggestion that dorsolateral prefrontal cortex plays a role ( Prabhakaran et al. 2000 , Zhang et al. 2004 ).

Evidence for separate storage and executive processes

Baddeley’s multistore model assumes that a collection of processes act upon the information stored in the various buffers. Jointly termed the “central executive,” these processes are assumed to be separate from the storage buffers and have been associated with the frontal lobes.

Both lesion and neuroimaging data support the distinction between storage and executive processes. For example, patients with frontal damage have intact STM under conditions of low distraction ( D’Esposito & Postle 1999 , 2000 ; Malmo 1942 ). However, when distraction is inserted during a delay interval, thereby requiring the need for executive processes to overcome interference, patients with frontal damage show significant memory deficits ( D’Esposito & Postle 1999 , 2000 ). By contrast, patients with left temporo-parietal damage show deficits in phonological storage, regardless of the effects of interference ( Vallar & Baddeley 1984 , Vallar & Papagno 2002 ).

Consistent with these patterns, a meta-analysis of 60 functional neuroimaging studies indicated that increased demand for executive processing recruits dorsolateral frontal cortex and posterior parietal cortex ( Wager & Smith 2003 ). By contrast, storage processes recruit predominately posterior areas in primary and secondary association cortex. These results corroborate the evidence from lesion studies and support the distinction between storage and executive processing.

Unitary-Store Models that Combine Short-Term and Long-Term Memory

The multistore models reviewed above combine assumptions about the distinction between short-term and long-term systems, the decomposition of short-term memory into information-specific buffers, and the separation of systems of storage from executive functions. We now consider unitary models that reject the first assumption concerning distinct systems.

Contesting the idea of separate long-term and short-term systems

The key data supporting separable short-term and long-term systems come from neuropsychology. To review, the critical contrast is between patients who show severely impaired LTM with apparently normal STM (e.g., Cave & Squire 1992 , Scoville & Milner 1957 ) and those who show impaired STM with apparently normal LTM (e.g., Shallice & Warrington 1970 ). However, questions have been raised about whether these neuropsychological studies do, in fact, support the claim that STM and LTM are separable. A central question is the role of the medial temporal lobe. It is well established that the MTL is critical for long-term declarative memory formation and retrieval ( Gabrieli et al. 1997 , Squire 1992 ). However, is the MTL also engaged by STM tasks? Much research with amnesic patients showing preserved STM would suggest not, but Ranganath & Blumenfeld (2005) have summarized evidence showing that MTL is engaged in short-term tasks (see also Ranganath & D’Esposito 2005 and Nichols et al. 2006 ).

In particular, there is growing evidence that a critical function of the MTL is to establish representations that involve novel relations. These relations may be among features or items, or between items and their context. By this view, episodic memory is a special case of such relations (e.g., relating a list of words to the experimental context in which the list was recently presented), and the special role of the MTL concerns its binding capabilities, not the timescale on which it operates. STM that is apparently preserved in amnesic patients may thus reflect a preserved ability to maintain and retrieve information that does not require novel relations or binding, in keeping with their preserved retrieval of remote memories consolidated before the amnesia-inducing lesion.

If this view is correct, then amnesic patients should show deficits in situations that require STM for novel relations, which they do (Hannula et al. 2005, Olson et al. 2006b ). They also show STM deficits for novel materials (e.g., Buffalo et al. 1998 , Holdstock et al. 1995 , Olson et al. 1995, 2006a ). As mentioned above, electrophysiological and neuroimaging studies support the claim that the MTL is active in support of short-term memories (e.g., Miyashita & Chang 1968 , Ranganath & D’Esposito 2001 ). Taken together, the MTL appears to operate in both STM and LTM to create novel representations, including novel bindings of items to context.

Additional evidence for the STM-LTM distinction comes from patients with perisylvian cortical lesions who are often claimed to have selective deficits in STM (e.g., Hanley et al. 1991 , Warrington & Shallice 1969 ). However, these deficits may be substantially perceptual. For example, patients with left perisylvian damage that results in STM deficits also have deficits in phonological processing in general, which suggests a deficit that extends beyond STM per se (e.g., Martin 1993 ).

The architecture of unitary-store models

Our review leads to the conclusion that short- and long-term memory are not architecturally separable systems—at least not in the strong sense of distinct underlying neural systems. Instead, the evidence points to a model in which short-term memories consist of temporary activations of long-term representations. Such unitary models of memory have a long history in cognitive psychology, with early theoretical unification achieved via interference theory ( Postman 1961 , Underwood & Schultz 1960). Empirical support came from demonstrations that memories in both the short and long term suffered from proactive interference (e.g., Keppel & Underwood 1962 ).

Perhaps the first formal proposal that short-term memory consists of activated long-term representations was by Atkinson & Shiffrin (1971 , but also see Hebb 1949) . The idea fell somewhat out of favor during the hegemony of the Baddeley multistore model, although it was given its first detailed computational treatment by Anderson (1983) . It has recently been revived and greatly developed by Cowan (1988 , 1995 , 2000) , McElree (2001) , Oberauer (2002) , Verhaeghen et al. (2004) , Anderson et al. (2004) , and others. The key assumption is the construct of a very limited focus of attention, although as we elaborate below, there are disagreements regarding the scope of the focus.

One shared assumption of these models is that STM consists of temporary activations of LTM representations or of representations of items that were recently perceived. The models differ from one to another regarding specifics, but Cowan’s model (e.g., Cowan 2000 ) is representative. According to this model, there is only one set of representations of familiar material—the representations in LTM. These representations can vary in strength of activation, where that strength varies as a function of such variables as recency and frequency of occurrence. Representations that have increased strength of activation are more available for retrieval in STM experiments, but they must be retrieved nonetheless to participate in cognitive action. In addition, these representations are subject to forgetting over time. A special but limited set of these representations, however, can be within the focus of attention, where being within the focus makes these representations immediately available for cognitive processing. According to this and similar models, then, STM is functionally seen as consisting of LTM representations that are either in the focus of attention or at a heightened level of activation.

These unitary-store models suggest a different interpretation of frontal cortical involvement in STM from multistore models. Early work showing the importance of frontal cortex for STM, particularly that of Fuster and Goldman-Rakic and colleagues, was first seen as support for multistore models (e.g., Funahashi et al. 1989 , Fuster 1973 , Jacobsen 1936 , Wilson et al. 1993 ). For example, single-unit activity in dorsolateral prefrontal cortex regions (principal sulcus, inferior convexity) that was selectively responsive to memoranda during the delay interval was interpreted as evidence that these regions were the storage sites for STM. However, the sustained activation of frontal cortex during the delay period does not necessarily mean that this region is a site of STM storage. Many other regions of neo-cortex also show activation that outlasts the physical presence of a stimulus and provides a possible neural basis for STM representations (see Postle 2006 ). Furthermore, increasing evidence suggests that frontal activations reflect the operation of executive processes [including those needed to keep the representations in the focus of attention; see reviews by Postle (2006) , Ranganath & D’Esposito (2005) , Reuter-Lorenz & Jonides (2007) , and Ruchkin et al. (2003) ]. Modeling work and lesion data provide further support for the idea that the representations used in both STM and LTM are stored in those regions of cortex that are involved in initial perception and encoding, and that frontal activations reflect processes involved in selecting this information for the focus of attention and keeping it there ( Damasio 1989 , McClelland et al. 1995 ).

The principle of posterior storage also allows some degree of reconciliation between multi- and unitary-store models. Posterior regions are clearly differentiated by information type (e.g., auditory, visual, spatial), which could support the information-specific buffers postulated by multistore models. Unitary-store models focus on central capacity limits, irrespective of modality, but they do allow for separate resources ( Cowan 2000 ) or feature components ( Lange & Oberauer 2005 , Oberauer & Kliegl 2006 ) that occur at lower levels of perception and representation. Multi- and unitary-store models thus both converge on the idea of modality-specific representations (or components of those representations) supported by distinct posterior neural systems.

Controversies over Capacity

Regardless of whether one subscribes to multi- or unitary-store models, the issue of how much information is stored in STM has long been a prominent one ( Miller 1956 ). Multistore models explain capacity estimates largely as interplay between the speed with which information can be rehearsed and the speed with which information is forgotten ( Baddeley 1986 , 1992 ; Repov & Baddeley 2006 ). Several studies have measured this limit by demonstrating that approximately two seconds worth of verbal information can be re-circulated successfully (e.g., Baddeley et al. 1975 ).

Unitary-store models describe capacity as limited by the number of items that can be activated in LTM, which can be thought of as the bandwidth of attention. However, these models differ on what that number or bandwidth might be. Cowan (2000) suggested a limit of approximately four items, based on performance discontinuities such as errorless performance in immediate recall when the number of items is less than four, and sharp increases in errors for larger numbers. (By this view, the classic “seven plus or minus two” is an overestimate because it is based on studies that allowed participants to engage in processes of rehearsal and chunking, and reflected contributions of both the focus and LTM; see also Waugh & Norman 1965 .) At the other extreme are experimental paradigms suggesting that the focus of attention consists of a single item ( Garavan 1998 , McElree 2001 , Verhaeghen & Basak 2007 ). We briefly consider some of the central issues behind current controversies concerning capacity estimates.

Behavioral and neural evidence for the magic number 4

Cowan (2000) has reviewed an impressive array of studies leading to his conclusion that the capacity limit is four items, plus or minus one (see his Table 1). Early behavioral evidence came from studies showing sharp drop-offs in performance at three or four items on short-term retrieval tasks (e.g., Sperling 1960 ). These experiments were vulnerable to the criticism that this limit might reflect output interference occurring during retrieval rather than an actual limit on capacity. However, additional evidence comes from change-detection and other tasks that do not require the serial recall of individual items. For example, Luck & Vogel (1997) presented subjects with 1 to 12 colored squares in an array. After a blank interval of nearly a second, another array of squares was presented, in which one square may have changed color. Subjects were to respond whether the arrays were identical. These experiments and others that avoid the confound of output-interference (e.g., Pashler 1988 ) likewise have yielded capacity estimates of approximately four items.

Electrophysiological and neuroimaging studies also support the idea of a four-item capacity limit. The first such report was by Vogel & Machizawa (2004) , who recorded event-related potentials (ERPs) from subjects as they performed a visual change-detection task. ERP recording shortly after the onset of the retention interval in this task indicated a negative-going wave over parietal and occipital sites that persisted for the duration of the retention interval and was sensitive to the number of items held in memory. Importantly, this signal plateaued when array size reached between three and four items. The amplitude of this activity was strongly correlated with estimates of each subject’s memory capacity and was less pronounced on incorrect than correct trials, indicating that it was causally related to performance. Subsequent functional magnetic resonance imaging (fMRI) studies have observed similar load- and accuracy-dependent activations, especially in intraparietal and intraoccipital sulci ( Todd & Marois 2004 , 2005 ). These regions have been implicated by others (e.g., Yantis & Serences 2003 ) in the control of attentional allocation, so it seems plausible that one rate-limiting step in STM capacity has to do with the allocation of attention ( Cowan 2000 ; McElree 1998 , 2001 ; Oberauer 2002 ).

Evidence for more severe limits on focus capacity

Another set of researchers agree there is a fixed capacity, but by measuring a combination of response time and accuracy, they contend that the focus of attention is limited to just one item (e.g., Garavan 1998 , McElree 2001 , Verhaeghen & Basak 2007 ). For example, Garavan (1998) required subjects to keep two running counts in STM, one for triangles and one for squares—as shape stimuli appeared one after another in random order. Subjects controlled their own presentation rate, which allowed Garavan to measure the time spent processing each figure before moving on. He found that responses to a figure of one category (e.g., a triangle) that followed a figure from the other category (e.g., a square) were fully 500 milliseconds longer than responses to the second of two figures from the same category (e.g., a triangle followed by another triangle). These findings suggested that attention can be focused on only one internal counter in STM at a time. Switching attention from one counter to another incurred a substantial cost in time. Using a speed-accuracy tradeoff procedure, McElree (1998) came to the same conclusion that the focus of attention contained just one item. He found that the retrieval speed for the last item in a list was substantially faster than for any other item in the list, and that other items were retrieved at comparable rates to each other even though the accuracy of retrieval for these other items varied.

Oberauer (2002) suggested a compromise solution to the “one versus four” debate. In his model, up to four items can be directly accessible, but only one of these items can be in the focus of attention. This model is similar to that of Cowan (2000) , but adds the assumption that an important method of accessing short-term memories is to focus attention on one item, depending on task demands. Thus, in tasks that serially demand attention on several items (such as those of Garavan 1998 or McElree 2001 ), the mechanism that accomplishes this involves changes in the focus of attention among temporarily activated representations in LTM.

Alternatives to capacity limits based on number of items

Attempting to answer the question of how many items may be held in the focus implicitly assumes that items are the appropriate unit for expressing capacity limits. Some reject this basic assumption. For example, Wilken & Ma (2004) demonstrated that a signal-detection account of STM, in which STM capacity is primarily constrained by noise, better fit behavioral data than an item-based fixed-capacity model. Recent data from change-detection tasks suggest that object complexity ( Eng et al. 2005 ) and similarity ( Awh et al. 2007 ) play an important role in determining capacity. Xu & Chun (2006) offer neuroimaging evidence that may reconcile the item-based and complexity accounts: In a change-detection task, they found that activation of inferior intra-parietal sulcus tracked a capacity limit of four, but nearby regions were sensitive to the complexity of the memoranda, as were the behavioral results.

Other researchers disagree with fixed item-based limits because they have demonstrated that the limit is mutable. Practice may improve subjects’ ability to use processes such as chunking to allow greater functional capacities ( McElree 1998 , Verhaeghen et al. 2004 ; but see Oberauer 2006 ). However, this type of flexibility appears to alter the amount of information that can be compacted into a single representation rather than the total number of representations that can be held in STM ( Miller 1956 ). The data of Verhaegen et al. (2004; see Figure 5 of that paper) suggest that the latter number still approximates four, consistent with Cowan’s claims.

Building on these findings, we suggest a new view of capacity. The fundamental idea that attention can be allocated to one piece of information in memory is correct, but the definition of what that one piece is needs to be clarified. It cannot be that just one item is in the focus of attention because if that were so, hardly any computation would be possible. How could one add 3+4, for example, if at any one time, attention could be allocated only to the “3” or the “4” or the “+” operation? We propose that attention focuses on what is bound together into a single “functional context,” whether that context is defined by time, space, some other stimulus characteristic such as semantic or visual similarity or momentary task relevance. By this account, attention can be placed on the whole problem “3+4,” allowing relevant computations to be made. Complexity comes into play by limiting the number of subcomponents that can be bound into one functional context.

What are we to conclude from the data concerning the structure of STM? We favor the implication that the representational bases for perception, STM, and LTM are identical. That is, the same neural representations initially activated during the encoding of a piece of information show sustained activation during STM (or retrieval from LTM into STM; Wheeler et al. 2000 ) and are the repository of long-term representations. Because regions of neocortex represent different sorts of information (e.g., verbal, spatial), it is reasonable to expect that STM will have an organization by type of material as well. Functionally, memory in the short term seems to consist of items in the focus of attention along with recently attended representations in LTM. These items in the focus of attention number no more than four, and they may be limited to just a single representation (consisting of items bound within a functional context).

We turn below to processes that operate on these representations.

WHAT PROCESSES OPERATE ON THE STORED INFORMATION?

Theoretical debate about the nature of STM has been dominated by discussion of structure and capacity, but the issue of process is also important. Verbal rehearsal is perhaps most intuitively associated with STM and plays a key role in the classic model ( Baddeley 1986 ). However, as we discuss below, rehearsal most likely reflects a complex strategy rather than a primitive STM process. Modern approaches offer a large set of candidate processes, including encoding and maintenance ( Ranganath et al. 2004 ), attention shifts ( Cowan 2000 ), spatial rehearsal ( Awh & Jonides 2001 ), updating (Oberauer 2005), overwriting ( Neath & Nairne 1995 ), cue-based parallel retrieval ( McElree 2001 ), and interference-resolution ( Jonides & Nee 2006 ).

Rather than navigating this complex and growing list, we take as our cornerstone the concept of a limited focus of attention. The central point of agreement for the unitary-store models discussed above is that there is a distinguishable focus of attention in which representations are directly accessible and available for cognitive action. Therefore, it is critical that all models must identify the processes that govern the transition of memory representations into and out of this focused state.

The Three Core Processes of Short-Term Memory: Encoding, Maintenance, and Retrieval

If one adopts the view that a limited focus of attention is a key feature of short-term storage, then understanding processing related to this limited focus amounts to understanding three basic types of cognitive events 2 : ( a ) encoding processes that govern the transformation from perceptual representations into the cognitive/attentional focus, ( b ) maintenance processes that keep information in the focus (and protect it from interference or decay), and ( c ) retrieval processes that bring information from the past back into the cognitive focus (possibly reactivating perceptual representations).

Encoding of items into the focus

Encoding processes are the traditional domain of theories of perception and are not treated explicitly in any of the current major accounts of STM. Here we outline three implicit assumptions about encoding processes made in most accounts of STM, and we assess their empirical and theoretical support.

First, the cognitive focus is assumed to have immediate access to perceptual processing— that is, the focus may include contents from the immediate present as well as contents retrieved from the immediate past. In Cowan’s (2000) review of evidence in favor of the number four in capacity estimates, several of the experimental paradigms involve focused representations of objects in the immediate perceptual present or objects presented less than a second ago. These include visual tracking experiments ( Pylyshyn et al. 1994 ), enumeration ( Trick & Pylyshyn 1993 ), and whole report of spatial arrays and spatiotemporal arrays ( Darwin et al. 1972 , Sperling 1960 ). Similarly, in McElree’s (2006) and Garavan’s (1998) experiments, each incoming item in the stream of material (words or letters or objects) is assumed to be represented momentarily in the focus.

Second, all of the current theories assume that perceptual encoding into the focus of attention results in a displacement of other items from the focus. For example, in McElree’s single-item focus model, each incoming item not only has its turn in the focus, but it also replaces the previous item. On the one hand, the work reviewed above regarding performance discontinuities after the putative limit of STM capacity has been reached appears to support the idea of whole-item displacement. On the other hand, as also described above, this limit may be susceptible to factors such as practice and stimulus complexity. An alternative to whole-item displacement as the basis for interference is a graded similarity-based interference, in which new items entering the focus may partially overwrite features of the old items or compete with old items to include those featural components in their representations as a function of their similarity. At some level, graded interference is clearly at work in STM, as Nairne (2002) and others have demonstrated (we review this evidence in more detail below). But the issue at hand is whether the focus is subject to such graded interference, and if such interference is the process by which encoding (or retrieving) items into the focus displaces prior items. Although there does not appear to be evidence that bears directly on this issue (the required experiments would involve manipulations of similarity in just the kinds of paradigms that Cowan, McElree, Oberauer, and others have used to provide evidence for the limited focus), the performance discontinuities strongly suggest that something like displacement is at work.

Third, all of the accounts assume that perceptual encoding does not have obligatory access to the focus. Instead, encoding into the focus is modulated by attention. This follows rather directly from the assumptions about the severe limits on focus capacity: There must be some controlled way of directing which aspects of the perceptual present, as well as the cognitive past, enter into the focused state. Stated negatively, there must be some way of preventing aspects of the perceptual present from automatically entering into the focused state. Postle (2006) recently found that increased activity in dorsolateral prefrontal cortex during the presentation of distraction during a retention interval was accompanied by a selective decrease in inferior temporal cortical activity. This pattern suggests that prefrontal regions selectively modulated posterior perceptual areas to prevent incoming sensory input from disrupting the trace of the task-relevant memorandum.

In summary, current approaches to STM have an obligation to account for how controlled processes bring relevant aspects of perception into cognitive focus and leave others out. It is by no means certain that existing STM models and existing models of perceptual attention are entirely compatible on this issue, and this is a matter of continued lively debate ( Milner 2001 , Schubert & Frensch 2001 , Woodman et al. 2001 ).

Maintenance of items in the focus

Once an item is in the focus of attention, what keeps it there? If the item is in the perceptual present, the answer is clear: attention-modulated, perceptual encoding. The more pressing question is: What keeps something in the cognitive focus when it is not currently perceived? For many neuroscientists, this is the central question of STM—how information is held in mind for the purpose of future action after the perceptual input is gone. There is now considerable evidence from primate models and from imaging studies on humans for a process of active maintenance that keeps representations alive and protects them from irrelevant incoming stimuli or intruding thoughts (e.g., Postle 2006 ).

We argue that this process of maintenance is not the same as rehearsal. Indeed, the number of items that can be maintained without rehearsal forms the basis of Cowan’s (2000) model. Under this view, rehearsal is not a basic process but rather is a strategy for accomplishing the functional demands for sustaining memories in the short term—a strategy composed of a series of retrievals and re-encodings. We consider rehearsal in more detail below, but we consider here the behavioral and neuroimaging evidence for maintenance processes.

There is now considerable evidence from both primate models and human electroencephalography and fMRI studies for a set of prefrontal-posterior circuits underlying active maintenance. Perhaps the most striking is the classic evidence from single-cell recordings showing that some neurons in prefrontal cortex fire selectively during the delay period in delayed-match-to-sample tasks (e.g., Funahashi et al. 1989 , Fuster 1973 ). As mentioned above, early interpretations of these frontal activations linked them directly to STM representations ( Goldman-Rakic 1987 ), but more recent theories suggest they are part of a frontal-posterior STM circuit that maintains representations in posterior areas ( Pasternak & Greenlee 2005 , Ranganath 2006 , Ruchkin et al. 2003 ). Furthermore, as described above, maintenance operations may modulate perceptual encoding to prevent incoming perceptual stimuli from disrupting the focused representation in posterior cortex ( Postle 2006 ). Several computational neural-network models of circuits for maintenance hypothesize that prefrontal cortical circuits support attractors, self-sustaining patterns observed in certain classes of recurrent networks ( Hopfield 1982 , Rougier et al. 2005 , Polk et al. 2002 ). A major challenge is to develop computational models that are able to engage in active maintenance of representations in posterior cortex while simultaneously processing, to some degree, incoming perceptual material (see Renart et al. 1999 for a related attempt).

Retrieval of items into the focus

Many of the major existing STM architectures are silent on the issue of retrieval. However, all models that assume a limited focus also assume that there is some means by which items outside that focus (either in a dormant long-term store or in some highly activated portion of LTM) are brought into the focus by switching the attentional focus onto those items. Following Sternberg (1966) , McElree (2006) , and others, we label this process “retrieval.” Despite this label, it is important to keep in mind that the associated spatial metaphor of an item moving from one location to another is misleading given our assumption about the common neural representations underlying STM and LTM.

There is now considerable evidence, mostly from mathematical models of behavioral data, that STM retrieval of item information is a rapid, parallel, content-addressable process. The current emphasis on parallel search processes is quite different from the earliest models of STM retrieval, which postulated a serial scanning process (i.e., Sternberg 1966 ; see McElree 2006 for a recent review and critique). Serial-scanning models fell out of favor because of empirical and modeling work showing that parallel processes provide a better account of the reaction time distributions in STM tasks (e.g., Hockley 1984 ). For example, McElree has created a variation on the Sternberg recognition probe task that provides direct support for parallel, rather than serial, retrieval. In the standard version of the task, participants are presented with a memory set consisting of a rapid sequence of verbal items (e.g., letters or digits), followed by a probe item. The task is to identify whether the probe was a member of the memory set. McElree & Dosher’s (1989) innovation was to manipulate the deadline for responding. The time course of retrieval (accuracy as a function of response deadline) can be separately plotted for each position within the presentation sequence, allowing independent assessments of accessibility (how fast an item can be retrieved) and availability (asymptotic accuracy) as a function of set size and serial position. Many experiments yield a uniform rate of access for all items except for the most recent item, which is accessed more quickly. The uniformity of access rate is evidence for parallel access, and the distinction between the most recent item and the other items is evidence for a distinguished focus of attention.

Neural Mechanisms of Short- and Long-Term Memory Retrieval

The cue-based retrieval processes described above for STM are very similar to those posited for LTM (e.g., Anderson et al. 2004 , Gillund & Shiffrin 1984 , Murdock 1982 ). As a result, retrieval failures resulting from similarity-based interference and cue overlap are ubiquitous in both STM and LTM. Both classic studies of recall from STM (e.g., Keppel & Underwood 1962 ) and more recent studies of interference in probe-recognition tasks (e.g., Jonides & Nee 2006 , McElree & Dosher 1989 , Monsell 1978 ) support the idea that interference plays a major role in forgetting over short retention intervals as well as long ones (see below). These common effects would not be expected if STM retrieval were a different process restricted to operate over a limited buffer, but they are consistent with the notion that short-term and long-term retrieval are mediated by the same cue-based mechanisms.

The heavy overlap in the neural substrates for short-term and long-term retrieval provides additional support for the idea that retrieval processes are largely the same over different retention intervals. A network of medial temporal regions, lateral prefrontal regions, and anterior prefrontal regions has been extensively studied and shown to be active in long-term retrieval tasks (e.g., Buckner et al. 1998 , Cabeza & Nyberg 2000 , Fletcher & Henson 2001 ). We reviewed above the evidence for MTL involvement in both short- and long-term memory tasks that require novel representations (see section titled “Contesting the Idea of Separate Long-Term and Short-Term Systems”). Here, we examine whether the role of frontal cortex is the same for both short- and long-term retrieval.

The conclusion derived from neuroimaging studies of various different STM procedures is that this frontal role is the same in short-term and long-term retrieval. For example, several event-related fMRI studies of the retrieval stage of the probe-recognition task found increased activation in lateral prefrontal cortex similar to the activations seen in studies of LTM retrieval (e.g., D’Esposito et al. 1999 , D’Esposito & Postle 2000 , Manoach et al. 2003 ). Badre & Wagner (2005) also found anterior prefrontal activations that overlapped with regions implicated in episodic recollection. The relatively long retention intervals often used in event-related fMRI studies leaves them open to the criticism that by the time of the probe, the focus of attention has shifted elsewhere, causing the need to retrieve information from LTM (more on this discussion below). However, a meta-analysis of studies that involved bringing very recently presented items to the focus of attention likewise found specific involvement of lateral and anterior prefrontal cortex ( Johnson et al. 2005 ). These regions appear to be involved in retrieval, regardless of timescale.

The same conclusion may be drawn from recent imaging studies that have directly compared long- and short-term retrieval tasks using within-subjects designs ( Cabeza et al. 2002 , Ranganath et al. 2003 , Talmi et al. 2005 ). Ranganath et al. (2003) found the same bilateral ventrolateral and dorsolateral prefrontal regions engaged in both short- and long-term tasks. In some cases, STM and LTM tasks involve the same regions but differ in the relative amount of activation shown within those regions. For example, Cabeza et al. (2002) reported similar engagement of medial temporal regions in both types of task, but greater anterior and ventrolateral activation in the long-term episodic tasks. Talmi et al. (2005) reported greater activation in both medial temporal and lateral frontal cortices for recognition probes of items presented early in a 12-item list (presumably necessitating retrieval from LTM) versus items presented later in the list (presumably necessitating retrieval from STM). One possible reason for this discrepancy is that recognition for late-list items did not require retrieval because these items were still in the focus of attention. This account is plausible since late-list items were drawn either from the last-presented or second-to-last presented item and preceded the probe by less than two seconds.

In summary, the bulk of the neuroimaging evidence points to the conclusion that the activation of frontal and medial temporal regions depends on whether the information is currently in or out of focus, not whether the task nominally tests STM or LTM. Similar reactivation processes occur during retrieval from LTM and from STM when the active maintenance has been interrupted (see Sakai 2003 for a more extensive review).

The Relationship of Short-Term Memory Processes to Rehearsal

Notably, our account of core STM processes excludes rehearsal. How does rehearsal fit in? We argue that rehearsal is simply a controlled sequence of retrievals and re-encodings of items into the focus of attention ( Baddeley 1986 , Cowan 1995 ). The theoretical force of this assumption can be appreciated by examining the predictions it makes when coupled with our other assumptions about the structures and processes of the underlying STM architecture. Below we outline these predictions and the behavioral, developmental, neuroimaging, and computational work that support this view.

Rehearsal as retrieval into the focus

When coupled with the idea of a single-item focus, the assumption that rehearsal is a sequence of retrievals into the focus of attention makes a very clear prediction: A just-rehearsed item should display the same retrieval dynamics as a just-perceived item. McElree (2006) directly tested this prediction using a version of his response-deadline recognition task, in which subjects were given a retention interval between presentation of the list and the probe rather than presented with the probe immediately after the list. Subjects were explicitly instructed to rehearse the list during this interval and were trained to do so at a particular rate. By controlling the rate, it was possible to know when each item was rehearsed and hence re-established in the focus. The results were compelling: When an item was predicted to be in focus because it had just been rehearsed, it showed the same fast retrieval dynamics as an item that had just been perceived. In short, the speed-accuracy tradeoff functions showed the familiar in-focus/out-of-focus dichotomy of the standard paradigm, but the dichotomy was established for internally controlled rehearsal as well as externally controlled perception.

Rehearsal as strategic retrieval

Rehearsal is often implicitly assumed as a component of active maintenance, but formal theoretical considerations of STM typically take the opposite view. For example, Cowan (2000) provides evidence that although first-grade children do not use verbal rehearsal strategies, they nevertheless have measurable focus capacities. In fact, Cowan (2000) uses this evidence to argue that the performance of very young children is revealing of the fundamental capacity limits of the focus of attention because it is not confounded with rehearsal.

If rehearsal is the controlled composition of more primitive STM processes, then rehearsal should activate the same brain circuits as the primitive processes, possibly along with additional (frontal) circuits associated with their control. In other words, there should be overlap of rehearsal with brain areas sub-serving retrieval and initial perceptual encoding. Likewise, there should be control areas distinct from those of the primitive processes.

Both predictions receive support from neuroimaging studies. The first prediction is broadly confirmed: There is now considerable evidence for the reactivation of areas associated with initial perceptual encoding in tasks that require rehearsal (see Jonides et al. 2005 for a recent review; note also that evidence exists for reactivation in LTM retrieval: Wheeler 2000 , 2006 ).

The second prediction—that rehearsal engages additional control areas beyond those participating in maintenance, encoding, and retrieval—receives support from two effects. One is that verbal rehearsal engages a set of frontal structures associated with articulation and its planning: supplementary motor, premotor, inferior frontal, and posterior parietal areas (e.g., Chein & Fiez 2001, Jonides et al. 1998 , Smith & Jonides 1999 ). The other is that spatial rehearsal engages attentionally mediated occipital regions, suggesting rehearsal processes that include retrieval of spatial information ( Awh et al. 1998 , 1999 , 2001 ).

Computational modeling relevant to strategic retrieval

Finally, prominent symbolic and connectionist computational models of verbal STM tasks are based on architectures that do not include rehearsal as a primitive process, but rather assume it as a strategic composition of other processes operating over a limited focus. The Burgess & Hitch (2005 , 2006) connectionist model, the Executive-Process/Interactive Control (EPIC) symbolic model ( Meyer and Kieras 1997 ), and the Atomic Components of Thought (ACT-R) hybrid model ( Anderson & Matessa 1997 ) all assume that rehearsal in verbal STM consists of a controlled sequence of retrievals of items into a focused state. They all assume different underlying mechanisms for the focus (the Burgess & Hitch model has a winner-take-all network; ACT-R has an architectural buffer with a capacity of one chunk; EPIC has a special auditory store), but all assume strategic use of this focus to accomplish rehearsal. These models jointly represent the most successful attempts to account for a range of detailed empirical phenomena traditionally associated with rehearsal, especially in verbal serial recall tasks. Their success therefore provides further support for the plausibility of a compositional view of rehearsal.

WHY DO WE FORGET?

Forgetting in STM is a vexing problem: What accounts for failures to retrieve something encoded just seconds ago? There are two major explanations for forgetting, often placed in opposition: time-based decay and similarity-based interference. Below, we describe some of the major findings in the literature related to each of these explanations, and we suggest that they may ultimately result from the same underlying principles.

Decay Theories: Intuitive but Problematic

The central claim of decay theory is that as time passes, information in memory erodes, and so it is less available for later retrieval. This explanation has strong intuitive appeal. However, over the years there have been sharp critiques of decay, questioning whether it plays any role at all (for recent examples, see Lewandowsky et al. 2004 and the review in this journal by Nairne 2002 ).

Decay explanations are controversial for two reasons: First, experiments attempting to demonstrate decay can seldom eliminate alternative explanations. For example, Keppel & Underwood (1962) demonstrated that forgetting in the classic Brown-Peterson paradigm (designed to measure time-based decay) was due largely, if not exclusively, to proactive interference from prior trials. Second, without an explanation of how decay occurs, it is difficult to see decay theories as more than a restatement of the problem. Some functional arguments have been made for the usefulness of the notion of memory decay—that decaying activations adaptively mirror the likelihood that items will need to be retrieved ( Anderson & Schooler 1991 ), or that decay is functionally necessary to reduce interference ( Altmann & Gray 2002 ). Nevertheless, McGeoch’s famous (1932) criticism of decay theories still holds merit: Rust does not occur because of time itself, but rather from oxidation processes that occur with time. Decay theories must explain the processes by which decay could occur, i.e., they must identify the oxidation process in STM.

Retention-interval confounds: controlling for rehearsal and retroactive interference

The main problem in testing decay theories is controlling for what occurs during the retention interval. Many experiments include an attention-demanding task to prevent participants from using rehearsal that would presumably circumvent decay. However, a careful analysis of these studies by Roediger et al. (1977) makes one wonder whether the use of a secondary task is appropriate to prevent rehearsal at all. They compared conditions in which a retention interval was filled by nothing, by a relatively easy task, or by a relatively difficult one. Both conditions with a filled interval led to worse memory performance, but the difficulty of the intervening task had no effect. Roediger et al. (1977) concluded that the primary memory task and the interpolated task, although demanding, used different processing pools of resources, and hence the interpolated tasks may not have been effective in preventing rehearsal. So, they argued, this sort of secondary-task technique may not prevent rehearsal and may not allow for a convincing test of a decay hypothesis.

Another problem with tasks that fill the retention interval is that they require subjects to use STM (consider counting backward, as in the Brown-Peterson paradigm). This could lead to active displacement of items from the focus according to views (e.g., McElree 2001 ) that posit such displacement as a mechanism of STM forgetting, or increase the noise according to interference-based explanations (see discussion below in What Happens Neurally During the Delay?). By either account, the problem with retention-interval tasks is that they are questionable ways to prevent rehearsal of the to-be-remembered information, and they introduce new, distracting information that may engage STM. This double-edged sword makes it difficult to tie retention-interval manipulations directly to decay.

Attempts to address the confounding factors

A potential way out of the rehearsal conundrum is to use stimuli that are not easily converted to verbal codes and that therefore may be difficult to rehearse. For example, Harris (1952) used tones that differed so subtly in pitch that they would be difficult to name by subjects without perfect pitch. On each trial, participants were first presented with a to-be-remembered tone, followed by a retention interval of 0.1 to 25 seconds, and finally a probe tone. The accuracy of deciding whether the initial and probe tones were the same declined with longer retention intervals, consistent with the predictions of decay theory.

Using another technique, McKone (1995 , 1998) reduced the probability of rehearsal or other explicit-memory strategies by using an implicit task. Words and nonwords were repeated in a lexical-decision task, with the measure of memory being faster performance on repeated trials than on novel ones (priming). To disentangle the effects of decay and interference, McKone varied the time between repetitions (the decay-related variable) while holding the number of items between repetitions (the interference-related variable) constant, and vice versa. She found that greater time between repetitions reduced priming even after accounting for the effects of intervening items, consistent with decay theory. However, interference and decay effects seemed to interact and to be especially important for nonwords.

Procedures such as those used by Harris (1952) and McKone (1995 , 1998) do not have the problems associated with retention-interval tasks. They are, however, potentially vulnerable to the criticism of Keppel & Underwood (1962) regarding interference from prior trials within the task, although McKone’s experiments address this issue to some degree. Another potential problem is that these participants’ brains and minds are not inactive during the retention interval ( Raichle et al. 2001 ). There is increasing evidence that the processes ongoing during nominal “resting states” are related to memory, including STM ( Hampson et al. 2006 ). Spontaneous retrieval by participants during the retention interval could interfere with memory for the experimental items. So, although experiments that reduce the influence of rehearsal provide some of the best evidence of decay, they are not definitive.

What happens neurally during the delay?

Neural findings of delay-period activity have also been used to support the idea of decay. For example, at the single-cell level, Fuster (1995) found that in monkeys performing a delayed-response task, delay-period activity in inferotemporal cortex steadily declined over 18 seconds (see also Pasternak & Greenlee 2005 ). At a molar level, human neuroimaging studies often show delay-period activity in prefrontal and posterior regions, and this activity is often thought to support maintenance or storage (see review by Smith & Jonides 1999 ). As reviewed above, it is likely that the posterior regions support storage and that frontal regions support processes related to interference-resolution, control, attention, response preparation, motivation, and reward.

Consistent with the suggestive primate data, Jha & McCarthy (2000) found a general decline in activation in posterior regions over a delay period, which suggests some neural evidence for decay. However, this decline in activation was not obviously related to performance, which suggests two (not mutually exclusive) possibilities: ( a ) the decline in activation was not representative of decay, so it did not correlate with performance, or ( b ) these regions might not have been storage regions (but see Todd & Marois 2004 and Xu & Chun 2006 for evidence more supportive of load sensitivity in posterior regions).

The idea that neural activity decays also faces a serious challenge in the classic results of Malmo (1942) , who found that a monkey with frontal lesions was able to perform a delayed response task extremely well (97% correct) if visual stimulation and motor movement (and therefore associated interference) were restricted during a 10-second delay. By contrast, in unrestricted conditions, performance was as low as 25% correct (see also Postle & D’Esposito 1999 ). In summary, evidence for time-based declines in neural activity that would naturally be thought to be part of a decay process is at best mixed.

Is there a mechanism for decay?

Although there are data supporting the existence of decay, much of these data are subject to alternative, interference-based explanations. However, as Crowder (1976) noted, “Good ideas die hard.” At least a few key empirical results ( Harris 1952 ; McKone 1995 , 1998) do seem to implicate some kind of time-dependent decay. If one assumes that decay happens, how might it occur?

One possibility—perhaps most compatible with results like those of Malmo (1942) —is that what changes over time is not the integrity of the representation itself, but the likelihood that attention will be attracted away from it. As more time passes, the likelihood increases that attention will be attracted away from the target and toward external stimuli or other memories, and it will be more difficult to return to the target representation. This explanation seems compatible with the focus-of-attention views of STM that we have reviewed. By this explanation, capacity limits are a function of attention limits rather than a special property of STM per se.

Another explanation, perhaps complementary to the first, relies on stochastic variability in the neuronal firing patterns that make up the target representation. The temporal synchronization of neuronal activity is an important part of the representation (e.g., Deiber et al. 2007 , Jensen 2006 , Lisman & Idiart 1995 ). As time passes, variability in the firing rates of individual neurons may cause them to fall increasingly out of synchrony unless they are reset (e.g., by rehearsal). As the neurons fall out of synchrony, by this hypothesis, the firing pattern that makes up the representation becomes increasingly difficult to discriminate from surrounding noise [see Lustig et al. (2005) for an example that integrates neural findings with computational ( Frank et al. 2001 ) and behaviorally based ( Brown et al. 2000 ) models of STM].

Interference Theories: Comprehensive but Complex

Interference effects play several roles in memory theory: First, they are the dominant explanation of forgetting. Second, some have suggested that STM capacity and its variation among individuals are largely determined by the ability to overcome interference (e.g., Hasher & Zacks 1988 , Unsworth & Engle 2007 ). Finally, differential interference effects in STM and LTM have been used to justify the idea that they are separate systems, and common interference effects have been used to justify the idea that they are a unitary system.

Interference theory has the opposite problem of decay: It is comprehensive but complex ( Crowder 1976 ). The basic principles are straightforward. Items in memory compete, with the amount of interference determined by the similarity, number, and strength of the competitors. The complexity stems from the fact that interference may occur at multiple stages (encoding, retrieval, and possibly storage) and at multiple levels (the representation itself or its association with a cue or a response). Interference from the past (proactive interference; PI) may affect both the encoding and the retrieval of new items, and it often increases over time. By contrast, interference from new items onto older memories (retroactive interference; RI) frequently decreases over time and may not be as reliant on similarity (see discussion by Wixted 2004 ).

Below, we review some of the major findings with regard to interference in STM, including a discussion of its weaknesses in explaining short-term forgetting. We then present a conceptual model of STM that attempts to address these weaknesses and the questions regarding structure, process, and forgetting raised throughout this review.

Interference Effects in Short-Term Memory

Selection-based interference effects.

The Brown-Peterson task, originally conceived to test decay theory, became a workhorse for testing similarity-based interference as well. In the “release-from-PI” version ( Wickens 1970 ), short lists of categorized words are used as memoranda. Participants learn one three-item list on each trial, perform some other task during the retention interval, and then attempt to recall the list. For the first three trials, all lists consist of words from the same category (e.g., flowers). The typical PI effects occur: Recall declines over subsequent trials. The critical manipulation occurs at the final list. If it is from a different category (e.g., sports), recall is much higher than if it is from the same category as preceding trials. In some cases, performance on this set-shift or release from-PI trial is nearly as high as on the very first trial.

The release-from-PI effect was originally interpreted as an encoding effect. Even very subtle shifts (e.g., from “flowers” to “wild-flowers”) produce the effect if participants are warned about the shift before the words are presented (see Wickens 1970 for an explanation). However, Gardiner et al. (1972) showed that release also occurs if the shift-cue is presented only at the time of the retrieval test—i.e., after the list has been encoded. They suggested that cues at retrieval could reduce PI by differentiating items from the most recent list, thus aiding their selection.

Selection processes remain an important topic in interference research. Functional neuroimaging studies consistently identify a region in left inferior frontal gyrus (LIFG) as active during interference resolution, at least for verbal materials (see a review by Jonides & Nee 2006 ). This region appears to be generally important for selection among competing alternatives, e.g., in semantic memory as well as in STM ( Thompson-Schill et al. 1997 ). In STM, LIFG is most prominent during the test phase of interference trials, and its activation during this phase often correlates with behavioral measures of interference resolution ( D’Esposito et al. 1999 , Jonides et al. 1998 , Reuter-Lorenz et al. 2000 , Thompson-Schill et al. 2002 ). These findings attest to the importance of processes for resolving retrieval interference. The commonality of the neural substrate for interference resolution across short-term and long-term tasks provides yet further support for the hypothesis of shared retrieval processes for the two types of memory.

Interference effects occur at multiple levels, and it is important to distinguish between interference at the level of representations and interference at the level of responses. The LIFG effects described above appear to be familiarity based and to occur at the level of representations. Items on a current trial must be distinguished and selected from among items on previous trials that are familiar because of prior exposure but are currently incorrect. A separate contribution occurs at the level of responses: An item associated with a positive response on a prior trial may now be associated with a negative response, or vice versa. This response-based conflict can be separated from the familiarity-based conflict, and its resolution appears to rely more on the anterior cingulate ( Nelson et al. 2003 ).

Other mechanisms for interference effects?

Despite the early work of Keppel & Underwood (1962) , most studies examining encoding in STM have focused on RI: how new information disrupts previous memories. Early theorists described this disruption in terms of displacement of entire items from STM, perhaps by disrupting consolidation (e.g., Waugh & Norman 1965 ). However, rapid serial visual presentation studies suggest that this type of consolidation is complete within a very short time—approximately 500 milliseconds, and in some situations as short as 50 milliseconds ( Vogel et al. 2006 ).

What about interference effects beyond this time window? As reviewed above, most current focus-based models implicitly assume something like whole-item displacement is at work, but these models may need to be elaborated to account for retroactive similarity-based interference, such as the phonological interference effects reviewed by Nairne (2002) . The models of Nairne (2002) and Oberauer (2006) suggest a direction for such an elaboration. Rather than a competition at the item level for a single-focus resource, these models posit a lower-level similarity-based competition for “feature units.” By this idea, items in STM are represented as bundles of features (e.g., color, shape, spatial location, temporal location). Representations of these features in turn are distributed over multiple units. The more two items overlap, the more they compete for these feature units, resulting in greater interference. This proposed mechanism fits well with the idea that working memory reflects the heightened activation of representations that are distributed throughout sensory, semantic, and motor cortex ( Postle 2006 ), and that similarity-based interference constrains the capacity due to focusing (see above; Awh et al. 2007 ). Hence, rather than whole-item displacement, specific feature competition may underlie the majority of encoding-stage RI.

Interference-based decay?

Above, we proposed a mechanism for decay based on the idea that stochastic variability causes the neurons making up a representation to fall out of synchrony (become less coherent in their firing patterns). Using the terminology of Nairne (2002) and Oberauer (2006) , the feature units become less tightly bound. Importantly, feature units that are not part of a representation also show some random activity due to their own stochastic variability, creating a noise distribution. Over time, there is an increasing likelihood that the feature units making up the to-be-remembered item’s representation will overlap with those of the noise distribution, making them increasingly difficult to distinguish. This increasing overlap with the noise distribution and loss of feature binding could lead to the smooth forgetting functions often interpreted as evidence for decay.

Such a mechanism for decay has interesting implications. It may explain why PI effects interact with retention interval. Prior trials with similar items would structure the noise distribution so that it is no longer random but rather is biased to share components with the representation of the to-be remembered item (target). Representations of prior, now-irrelevant items might compete with the current target’s representation for control of shared feature units, increasing the likelihood (rate) at which these units fall out of synchrony.

Prior similar items may also dampen the fidelity of the target representation to begin with, weakening their initial binding and thus causing these items to fall out of synchrony more quickly. In addition, poorly learned items might have fewer differentiating feature units, and these units may be less tightly bound and therefore more vulnerable to falling out of synchrony. This could explain why Keppel & Underwood (1962) found that poorly learned items resulted in retention interval effects even on the first trial. It may also underlie the greater decay effects that McKone (1995 , 1998) found for nonwords than for words, if one assumes that non-words have fewer meaning-based units and connections.

A SUMMARY OF PRINCIPLES AND AN ILLUSTRATION OF SHORT-TERM MEMORY AT WORK

Here we summarize the principles of STM that seem best supported by the behavioral and neural evidence. Building on these principles, we offer a hypothetical sketch of the processes and neural structures that are engaged by a canonical STM task, the probe recognition task with distracting material.

Principles of Short-Term Memory

We have motivated our review by questions of structure, process, and forgetting. Rather than organize our summary this way, we wish to return here to the title of our review and consider what psychological and neural mechanisms seem best defended by empirical work. In that we have provided details about each of these issues in our main discussion, we summarize them here as bullet points. Taken together, they provide answers to our questions about structure, process, and forgetting.

The mind of short-term memory

Representations in memory are composed of bundles of features for stored information, including features representing the context in which that information was encountered.

  • ■ Representations in memory vary in activation, with a dormant state characterizing long-term memories, and varying states of activation due to recent perceptions or retrievals of those representations.
  • ■ There is a focus of attention in which a bound collection of information may be held in a state that makes it immediately available for cognitive action. Attention may be focused on only a single chunk of information at a time, where a chunk is defined as a set of items that are bound by a common functional context.
  • ■ Items may enter the focus of attention via perceptual encoding or via cue-based retrieval from LTM.
  • ■ Items are maintained in the focus via a controlled process of maintenance, with rehearsal being a case of controlled sequential allocation of attentional focus.
  • ■ Forgetting occurs when items leave the focus of attention and must compete with other items to regain the focus (interference), or when the fidelity of the representation declines over time due to stochastic processes (decay).

The brain of short-term memory

Items in the focus of attention are represented by patterns of heightened, synchronized firing of neurons in primary and secondary association cortex.

  • ■ The sensorimotor features of items in the focus of attention or those in a heightened state of activation are the same as those activated by perception or action. Information within a representation is associated with the cortical region that houses it (e.g., verbal, spatial, motor). In short, item representations are stored where they are processed.
  • ■ Medial temporal structures are important for binding items to their context for both the short- and long-term and for retrieving items whose context is no longer in the focus of attention or not yet fully consolidated in the neocortex.
  • ■ The capacity to focus attention is constrained by parietal and frontal mechanisms that modulate processing as well as by increased noise in the neural patterns arising from similarity-based interference or from stochastic variability in firing.
  • ■ Frontal structures support controlled processes of retrieval and interference resolution.
  • ■ Placing an item into the focus of attention from LTM involves reactivating the representation that is encoded in patterns of neural connection weights.
  • ■ Decay arises from the inherent variability of the neural firing of feature bundles that build a representation: The likelihood that the firing of multiple features will fall out of synchrony increases with time due to stochastic variability.

A Sketch of Short-Term Memory at Work

The theoretical principles outlined above summarize our knowledge of the psychological and neural bases of STM, but further insight can be gained by attempting to see how these mechanisms might work together, moment-by-moment, to accomplish the demands of simple tasks. We believe that working through an illustration will not only help to clarify the nature of the proposed mechanisms, but it may also lead to a picture of STM that is more detailed in its bridging of neural process and psychological function.

Toward these ends, we present here a specific implementation of the principles that allows us to give a description of the mechanisms that might be engaged at each point in a simple visual STM task. This exercise leads us to a view of STM that is heavily grounded in concepts of neural activation and plasticity. More specifically, we complement the assumptions about cognitive and brain function above with simple hypotheses about the relative supporting roles of neuronal firing and plasticity (described below). Although somewhat speculative in nature, this description is consistent with the summary principles, and it grounds the approach more completely in a plausible neural model. In particular, it has the virtue of providing an unbroken chain of biological mechanisms that supports the encoding of short-term memories over time.

Figure 1 traces the representation of one item in memory over the course of a few seconds in our hypothetical task. The cognitive events are demarcated at the top of the figure, and the task events at the bottom. In the hypothetical task, the subject must keep track of three visual items (such as novel shapes). The first item is presented for 700 milliseconds, followed by a delay of 2 seconds. The second stimulus then appears, followed by a delay of a few seconds, then the third stimulus, and another delay. Finally, the probe appears, and contact must be made with the memory for the first item. The assumption is that subjects will engage in a strategy of actively maintaining each item during the delay periods.

An external file that holds a picture, illustration, etc.
Object name is nihms566147f1.jpg

The processing and neural representation of one item in memory over the course of a few seconds in a hypothetical short-term memory task, assuming a simple single-item focus architecture. The cognitive events are demarcated at the top; the task events, at the bottom. The colored layers depict the extent to which different brain areas contribute to the representation of the item over time, at distinct functional stages of short-term memory processing. The colored layers also distinguish two basic types of neural representation: Solid layers depict memory supported by a coherent pattern of active neural firing, and hashed layers depict memory supported by changes in synaptic patterns. The example task requires processing and remembering three visual items; the figure traces the representation of the first item only. In this task, the three items are sequentially presented, and each is followed by a delay period. After the delay following the third item, a probe appears that requires retrieval of the first item. See the text for details corresponding to the numbered steps in the figure.

Before walking through the timeline in Figure 1 , let us take a high-level view. At any given time point, a vertical slice through the figure is intended to convey two key aspects of the neural basis of the memory. The first is the extent to which multiple cortical areas contribute to the representation of the item, as indicated by the colored layers corresponding to different cortical areas. The dynamic nature of the relative sizes of the layers captures several of our theoretical assumptions concerning the evolving contribution of those different areas at different functional stages of STM. The second key aspect is the distinction between memory supported by a coherent pattern of active neural firing (captured in solid layers) and memory supported by synaptic plasticity (captured in the hashed layers) ( Fuster 2003 , Grossberg 2003 , Rolls 2000 ). The simple hypothesis represented here is that perceptual encoding and active-focus maintenance are supported by neuronal firing, and memory of items outside the focus is supported by short-term synaptic plasticity ( Zucker & Regehr 2002 ). 3

We now follow the time course of the neural representation of the first item (in the order indicated by the numbers in the figure). ( 1 ) The stimulus is presented and rapidly triggers a coherent pattern of activity in posterior perceptual regions, representing both low-level visual features of the item content and its abstract identification in higher-level regions. ( 2 ) There is also a rapid onset of the representation of item-context binding (temporal context in our example) supported by the medial-temporal lobes (see section titled “Contesting the Idea of Separate Long-Term and Short-Term Systems”) ( Ranganath & Blumenfeld 2005 ). ( 3 ) Over the first few hundred milliseconds, this pattern increases in quality, yielding speed-accuracy tradeoffs in perceptual identification. ( 4 ) Concurrent with the active firing driven by the stimulus, very short-term synaptic plasticity across cortical areas begins to encode the item’s features and its binding to context. Zucker & Regehr (2002) identify at least three distinct plasticity mechanisms that begin to operate on this time scale (tens of milliseconds) and that together are sufficient to produce memories lasting several seconds. (For the use of this mechanism in a prominent neural network model of STM, see Burgess & Hitch 1999 , 2005 , 2006 .) ( 5 ) At the offset of the stimulus, the active firing pattern decays very rapidly (consistent with identified mechanisms of rapid decay in short-term potentiation; Zucker & Regehr 2002 ), but ( 6 ) active maintenance, mediated by increased activity in frontal and parietal systems, maintains the firing pattern during the delay period (see sections titled “The Architecture of Unitary-Store Models” and “Maintenance of Items in the Focus”) ( Pasternak & Greenlee 2005 , Ranganath 2006 , Ruchkin et al. 2003 ). This active delay firing includes sustained contribution of MTL to item-context binding (see section titled “Contesting the Idea of Separate Long-Term and Short-Term Systems”). Significant reduction in coherence of the firing pattern may occur as a result of stochastic drift as outlined above (in sections titled “What Happens Neurally During the Delay?” and “Interference-Based Decay?”), possibly leading to a kind of short-term decay during maintenance (see section titled “What Happens Neurally During the Delay?”) ( Fuster 1995 , Pasternak & Greenlee 2005 ). ( 7 ) The active maintenance involves the reuse of posterior perceptual regions in the service of the task demands on STM. This reuse includes even early perceptual areas, but we show here a drop in the contribution of primary perceptual regions to maintenance in order to indicate a relatively greater effect of top-down control on the later high-level regions ( Postle 2006 , Ranganath 2006 ). ( 8 ) During this delay period of active maintenance, short-term potentiation continues to lay down a trace of the item and its binding to context via connection weights both within and across cortical regions. The overall efficacy of this memory encoding is the result of the interaction of the possibly decaying active firing pattern with the multiple plasticity mechanisms and their individual facilitation and depression profiles ( Zucker & Regehr 2002 ).

( 9 ) At the end of the delay period and the onset of the second stimulus, the focus rapidly shifts to the new stimulus, and the active firing of the neural pattern of the target stimulus ceases. ( 10 ) The memory of the item is now carried completely by the changed synaptic weights, but this change is partially disrupted by the incoming item and its engagement of a similar set of neural activity patterns. Cognitively, this disruption yields similarity-based retroactive interference (see “Other Mechanisms for Interference Effects?”) ( Nairne 2002 ). ( 11 ) Even in the absence of interference, a variety of biochemical processes give rise to the decay of short-term neural change and therefore the gradual loss of the memory trace over time. This pattern of interference and decay continues during processing of both the second and third stimulus. The probe triggers a rapid memory retrieval of the target item ( 12 ), mediated in part by strategic frontal control (see “Neural Mechanisms of Short- and Long-Term Memory Retrieval”) ( Cabeza et al. 2002 , Ranganath et al. 2004 ). This rapid retrieval corresponds to the reinstantiation of the target item’s firing pattern in both posterior perceptual areas ( 13 ) and medial-temporal regions, the latter supporting the contextual binding. A plausible neural mechanism for the recovery of this activity pattern at retrieval is the emergent pattern-completion property of attractor networks ( Hopfield 1982 ). Attractor networks depend on memories encoded in a pattern of connection weights, whose formation and dynamics we have sketched above in terms of short-term synaptic plasticity. Such networks also naturally give rise to the kind of similarity-based proactive interference clearly evident in STM retrieval (see “Selection-Based Interference Effects”) ( Jonides & Nee 2006 , Keppel & Underwood 1962 ).

We have intentionally left underspecified a precise quantitative interpretation of the y -axis in Figure 1 . Psychologically, it perhaps corresponds to a combination of availability (largely driven by the dichotomous nature of the focus state) and accessibility (driven by a combination of both firing and plasticity). Neurally, it perhaps corresponds to some measure of both firing amplitude and coherence and potential firing amplitude and coherence.

We are clearly a long way from generating something like the plot in Figure 1 from neuroimaging data on actual tasks—though plots of event-related potentials in STM tasks give us an idea of what these data may look like ( Ruchkin et al. 2003 ). There no doubt is more missing from Figure 1 than is included (e.g., the role of subcortical structures such as the basal ganglia in the frontal/parietal mediated control, or the reciprocal cortical-thalamic circuits that shape the nature of the neocortical patterns).We nevertheless believe that the time course sketched in Figure 1 is useful for making clear many of the central properties that characterize the psychological and neural theory of human STM outlined above: ( a ) STM engages essentially all cortical areas—including medial temporal lobes—and does so from the earliest moments, though it engages these areas differentially at different functional stages. ( b ) STM reuses the same posterior cortical areas and representations that subserve perception, and active maintenance of these representations depends on these posterior areas receiving input from frontal-parietal circuits. ( c ) Focused items are distinguished both functionally and neurally by active firing patterns, and nonfocused memories depend on synaptic potentiation and thereby suffer from decay and retroactive interference. ( d ) Nonfocused memories are reinstantiated into active firing states via an associative retrieval process subject to proactive interference from similarly encoded patterns.

Postscript: Revisiting Complex Cognition

A major goal of this review has been to bring together psychological theorizing (the mind) and neuroscientific evidence (the brain) of STM. However, any celebration of this union is premature until we address this question: Can our account explain how the mind and brain accomplish the everyday tasks (e.g., completing a tax form) that opened this review? The recognition probe task used in our example and the other procedures discussed throughout the main text are considerably simpler than those everyday tasks. Is it plausible to believe that the system outlined here, particularly in light of its severely limited capacity, could support human cognition in the wild?

It is sobering to note that Broadbent (1993) and Newell (1973 , 1990) asked this question nearly two decades ago, and at that time they were considering models of STM with even larger capacities than the one advocated here. Even so, both observed that none of the extant computational models of complex cognitive tasks (e.g., the Newell & Simon 1972 models of problem solving) used contemporary psychological theories of STM. Instead, the complex-cognition models assumed much larger (in some cases, effectively unlimited) working memories. The functional viability of the STM theories of that time was thus never clearly demonstrated. Since then, estimates of STM capacity have only grown smaller, so the question, it would seem, has grown correspondingly more pressing.

Fortunately, cognitive modeling and cognitive theory have also developed over that time, and in ways that would have pleased both Broadbent and Newell. Importantly, many computational cognitive architectures now make assumptions about STM capacity that are congruent with the STM models discussed in this review. The most prominent example is ACT-R, a descendent of the early Newell production-system models. ACT-R continues to serve as the basis of computational models of problem solving (e.g., Anderson & Douglass 2001 ), sentence processing ( Lewis & Vasishth 2005 , Lewis et al. 2006 ), and complex interactive tasks ( Anderson et al. 2004 ). However, the current version of ACT-R has a focus-based structure with an effective capacity limit of four or fewer items ( Anderson et al. 2004 ).

Another important theoretical development is the long-term working memory approach of Ericsson & Kintsch (1995) . This approach describes how LTM, using the kind of fast-encoding and cue-based associative retrieval processes assumed here, can support a variety of complex cognitive tasks ranging from discourse comprehension to specialized expert performance. In both the modern approaches to computational architecture and long-term working memory, the power of cognition resides not in capacious short-term buffers but rather in the effective use of an associative LTM. A sharply limited focus of attention does not, after all, seem to pose insurmountable functional problems.

In summary, this review describes the still-developing convergence of computational models of complex cognition, neural network models of simple memory tasks, modern psychological studies of STM, and neural studies of memory in both humans and primates. The points of contact among these different methods of studying STM have multiplied over the past several years. As we have pointed out, significant and exciting challenges in furthering this integration lie ahead.

1 Another line of neural evidence about the separability of short- and long-term memory comes from electrophysiological studies of animals engaged in short-term memory tasks. We review this evidence and its interpretation in The Architecture of Unitary-Store Models section.

2 This carving up of STM processes is also consistent with recent approaches to individual differences in working memory, which characterize individual variation not in terms of variation in buffer capacity, but rather in variation in maintenance and retrieval processes ( Unsworth & Engle 2007 ).

3 The alternative to this strong claim is that memory items outside the focus might also be supported by residual active firing. The empirical results reviewed above indicating load-dependent posterior activation might lend support to this alternative if one assumes that the memory load in those experiments was not entirely held in the focus, and that these activations exclusively index firing associated with the memory load itself.

LITERATURE CITED

  • Altmann EM, Gray WD. Forgetting to remember: the functional relationship of decay and interference. Psychol. Sci. 2002; 13 (1):27–33. [ PubMed ] [ Google Scholar ]
  • Anderson JR. Retrieval of information from long-term memory. Science. 1983; 220 (4592):25–30. [ PubMed ] [ Google Scholar ]
  • Anderson JR, Bothell D, Byrne MD, Douglass S, Lebiere C, Qin Y. An integrated theory of mind. Psychol. Rev. 2004; 111 :1036–1060. [ PubMed ] [ Google Scholar ]
  • Anderson JR, Douglass S. Tower of Hanoi: evidence for the cost of goal retrieval. J. Exp. Psychol.: Learn. Mem. Cogn. 2001; 27 :1331–1346. [ PubMed ] [ Google Scholar ]
  • Anderson JR, Matessa M. A production system theory of serial memory. Psychol. Rev. 1997; 104 (4):728–748. [ Google Scholar ]
  • Anderson JR, Schooler LJ. Reflections of the environment in memory. Psychol. Sci. 1991; 2 (6):396–408. [ Google Scholar ]
  • Atkinson RC, Shiffrin RM. The control of short-term memory. Sci. Am. 1971; 224 :82–90. [ PubMed ] [ Google Scholar ]
  • Awh E, Barton B, Vogel EK. Visual working memory represents a fixed number of items regardless of complexity. Psychol. Sci. 2007; 18 (7):622–628. [ PubMed ] [ Google Scholar ]
  • Awh E, Jonides J. Overlapping mechanisms of attention and spatial working memory. Trends Cogn. Sci. 2001; 5 (3):119–126. [ PubMed ] [ Google Scholar ]
  • Awh E, Jonides J, Reuter-Lorenz PA. Rehearsal in spatial working memory. J. Exp. Psychol.: Hum. Percept. Perform. 1998; 24 :780–790. [ PubMed ] [ Google Scholar ]
  • Awh E, Jonides J, Smith EE, Buxton RB, Frank LR, et al. Rehearsal in spatial working memory: evidence from neuroimaging. Psychol. Sci. 1999; 10 (5):433–437. [ Google Scholar ]
  • Awh E, Jonides J, Smith EE, Schumacher EH, Koeppe RA, Katz S. Dissociation of storage and rehearsal in verbal working memory: evidence from PET. Psychol. Sci. 1996; 7 :25–31. [ Google Scholar ]
  • Baddeley AD. Working Memory. Oxford: Clarendon; 1986. [ Google Scholar ]
  • Baddeley AD. Working memory. Science. 1992; 225 :556–559. [ PubMed ] [ Google Scholar ]
  • Baddeley AD. The episodic buffer: a new component of working memory? Trends Cogn. Sci. 2000; 4 (11):417–423. [ PubMed ] [ Google Scholar ]
  • Baddeley AD. Working memory: looking back and looking forward. Nat. Rev. Neurosci. 2003; 4 (10):829–839. [ PubMed ] [ Google Scholar ]
  • Baddeley AD, Hitch G. Working memory. In: Bower GA, editor. Recent Advances in Learning and Motivation. Vol. 8. New York: Academic; 1974. pp. 47–90. [ Google Scholar ]
  • Baddeley AD, Thomson N, Buchanan M. Word length and structure of short-term memory. J. Verbal Learn. Verbal Behav. 1975; 14 (6):575–589. [ Google Scholar ]
  • Baddeley AD, Vallar G, Wilson BA. Sentence comprehension and phonological working memory: some neuropsychological evidence. In: Coltheart M, editor. Attention and Performance XII: The Psychology of Reading. London: Erlbaum; 1987. pp. 509–529. [ Google Scholar ]
  • Baddeley AD, Warrington EK. Amnesia and the distinction between long- and short-term memory. J. Verbal Learn. Verbal Behav. 1970; 9 :176–189. [ Google Scholar ]
  • Baddeley AD, Wilson BA. Prose recall and amnesia: implications for the structure of working memory. Neuropsychologia. 2002; 40 :1737–1743. [ PubMed ] [ Google Scholar ]
  • Badre D, Wagner AD. Frontal lobe mechanisms that resolve proactive interference. Cereb. Cortex. 2005; 15 :2003–2012. [ PubMed ] [ Google Scholar ]
  • Braver TS, Barch DM, Kelley WM, Buckner RL, Cohen NJ, et al. Direct comparison of prefrontal cortex regions engaged by working and long-term memory tasks. Neuroimage. 2001; 14 :48–59. [ PubMed ] [ Google Scholar ]
  • Broadbent D. Comparison with human experiments. In: Broadbent D, editor. The Simulation of Human Intelligence. Oxford: Blackwell Sci; 1993. pp. 198–217. [ Google Scholar ]
  • Brooks LR. Spatial and verbal components of the act of recall. Can. J. Psychol. 1968; 22 :349–368. [ Google Scholar ]
  • Brown GDA, Preece T, Hulme C. Oscillator-based memory for serial order. Psychol. Rev. 2000; 107 (1):127–181. [ PubMed ] [ Google Scholar ]
  • Buckner RL, Koutstaal W, Schacter DL, Wagner AD, Rosen BR. Functional-anatomic study of episodic retrieval using fMRI: I. Retrieval effort versus retrieval success. NeuroImage. 1998; 7 (3):151–162. [ PubMed ] [ Google Scholar ]
  • Buffalo EA, Reber PJ, Squire LR. The human perirhinal cortex and recognition memory. Hippocampus. 1998; 8 :330–339. [ PubMed ] [ Google Scholar ]
  • Burgess N, Hitch GJ. Memory for serial order: a network model of the phonological loop and its timing. Psychol. Rev. 1999; 106 (3):551–581. [ Google Scholar ]
  • Burgess N, Hitch GJ. Computational models of working memory: putting long-term memory into context. Trends Cogn. Sci. 2005; 9 :535–541. [ PubMed ] [ Google Scholar ]
  • Burgess N, Hitch GJ. A revised model of short-term memory and long-term learning of verbal sequences. J. Mem. Lang. 2006; 55 :627–652. [ Google Scholar ]
  • Cabeza R, Dolcos F, Graham R, Nyberg L. Similarities and differences in the neural correlates of episodic memory retrieval and working memory. Neuroimage. 2002; 16 :317–330. [ PubMed ] [ Google Scholar ]
  • Cabeza R, Nyberg L. Imaging cognition II: an empirical review of 275 PET and fMRI studies. J. Cogn. Neurosci. 2000; 9 :254–265. [ PubMed ] [ Google Scholar ]
  • Cave CB, Squire LR. Intact verbal and nonverbal short-term memory following damage to the human hippocampus. Hippocampus. 1992; 2 :151–163. [ PubMed ] [ Google Scholar ]
  • Cowan N. Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system. Psychol. Bull. 1988; 104 :163–191. [ PubMed ] [ Google Scholar ]
  • Cowan N. Attention and Memory: An Integrated Framework. New York: Oxford Univ. Press; 1995. [ Google Scholar ]
  • Cowan N. The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav. Brain Sci. 2000; 24 :87–185. [ PubMed ] [ Google Scholar ]
  • Crowder R. Principles of Learning and Memory. Hillsdale, NJ: Erlbaum; 1976. [ Google Scholar ]
  • Damasio AR. Time-locked multiregional retroactivation: a system-level proposal for the neuronal substrates of recall and recognition. Cognition. 1989; 33 :25–62. [ PubMed ] [ Google Scholar ]
  • Darwin CJ, Turvey MT, Crowder RG. Auditory analogue of Sperling partial report procedure—evidence for brief auditory storage. Cogn. Psychol. 1972; 3 (2):255–267. [ Google Scholar ]
  • Deiber MP, Missonnier P, Bertrand O, Gold G, Fazio-Costa L, et al. Distinction between perceptual and attentional processing in working memory tasks: a study of phase-locked and induced oscillatory brain dynamics. J. Cogn. Neurosci. 2007; 19 (1):158–172. [ PubMed ] [ Google Scholar ]
  • den Heyer K, Barrett B. Selective loss of visual and verbal information in STM by means of visual and verbal interpolated tasks. Psychon. Sci. 1971; 25 :100–102. [ Google Scholar ]
  • D’Esposito M, Postle BR. The dependence of span and delayed-response performance on prefrontal cortex. Neuropsychologia. 1999; 37 (11):1303–1315. [ PubMed ] [ Google Scholar ]
  • D’Esposito M, Postle BR. Neural correlates of processes contributing to working memory function: evidence from neuropsychological and pharmacological studies. In: Monsell S, Driver J, editors. Control of Cognitive Processes. Cambridge, MA: MIT Press; 2000. pp. 580–602. [ Google Scholar ]
  • D’Esposito M, Postle BR, Jonides J, Smith EE, Lease J. The neural substrate and temporal dynamics of interference effects in working memory as revealed by event-related fMRI. Proc. Natl. Acad. Sci. USA. 1999; 96 :7514–7519. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Eng HY, Chen DY, Jiang YH. Visual working memory for simple and complex visual stimuli. Psychon. Bull. Rev. 2005; 12 :1127–1133. [ PubMed ] [ Google Scholar ]
  • Ericsson KA, Kintsch W. Long-term working memory. Psychol. Rev. 1995; 102 :211–245. [ PubMed ] [ Google Scholar ]
  • Fletcher PC, Henson RNA. Frontal lobes and human memory—insights from functional neuroimaging. Brain. 2001; 124 :849–881. [ PubMed ] [ Google Scholar ]
  • Frank MJ, Loughry B, O’Reilly RC. Interactions between the frontal cortex and basal ganglia in working memory: a computational model. Cogn. Affect. Behav. Neurosci. 2001; 1 :137–160. [ PubMed ] [ Google Scholar ]
  • Funahashi S, Bruce CJ, Goldman-Rakic PS. Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J. Neurophysiol. 1989; 61 :331–349. [ PubMed ] [ Google Scholar ]
  • Fuster JK. Thoughts from the long-term memory chair. Behav. Brain Sci. 2003; 26 :734–735. [ Google Scholar ]
  • Fuster JM. Unit activity in prefrontal cortex during delayed response performance: neuronal correlates of transient memory. J. Neurophysiol. 1973; 36 :61–78. [ PubMed ] [ Google Scholar ]
  • Fuster JM. Memory in the Cerebral Cortex. Cambridge, MA: MIT Press; 1995. [ Google Scholar ]
  • Gabrieli JDE, Brewer JB, Desmond JE, Glover GH. Separate neural bases of two fundamental memory processes in the human medial temporal lobe. Science. 1997; 276 :264–266. [ PubMed ] [ Google Scholar ]
  • Garavan H. Serial attention within working memory. Mem. Cogn. 1998; 26 :263–276. [ PubMed ] [ Google Scholar ]
  • Gardiner JM, Craik FIM, Birtwist J. Retrieval cues and release from proactive inhibition. J. Verbal Learn. Verbal Behav. 1972; 11 (6):778–783. [ Google Scholar ]
  • Gillund G, Shiffrin RM. A retrieval model for both recognition and recall. Psychol. Rev. 1984; 91 (1):1–67. [ PubMed ] [ Google Scholar ]
  • Goldman-Rakic PS. Circuitry of primate pre-frontal cortex and regulation of behavior by representational memory. In: Plum F, editor. Handbook of Physiology: The Nervous System. Vol. 5. Bethesda, MD: 1987. pp. 373–417. Am. Physiol. Soc. [ Google Scholar ]
  • Grossberg S. From working memory to long-term memory and back: linked but distinct. Behav. Brain Sci. 2003; 26 :737–738. [ Google Scholar ]
  • Hampson M, Driesen NR, Skudlarski P, Gore JC, Constable RT. Brain connectivity related to working memory performance. J. Neurosci. 2006; 26 (51):13338–13343. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hanley JR, Young AW, Pearson NA. Impairment of the visuo-spatial sketch pad. Q. J. Exp. Psychol. Hum. Exp. Psychol. 1991; 43 :101–125. [ PubMed ] [ Google Scholar ]
  • Hannula DE, Tranel D, Cohen NJ. The long and the short of it: relational memory impairments in amnesia, even at short lags. J. Neurosci. 2006; 26 (32):8352–8359. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Harris JD. The decline of pitch discrimination with time. J. Exp. Psychol. 1952; 43 (2):96–99. [ PubMed ] [ Google Scholar ]
  • Hasher L, Zacks RT. Working memory, comprehension, and aging: a review and a new view. In: Bower GH, editor. The Psychology of Learning and Motivation. Vol. 22. New York: Academic; 1988. pp. 193–225. [ Google Scholar ]
  • Hebb DO. The Organization of Behavior. New York: Wiley; 1949. [ Google Scholar ]
  • Hockley WE. Analysis of response-time distributions in the study of cognitive-processes. J. Exp. Psychol.: Learn. Mem. Cogn. 1984; 10 (4):598–615. [ Google Scholar ]
  • Holdstock JS, Shaw C, Aggleton JP. The performance of amnesic subjects on tests of delayed matching-to-sample and delayed matching-to-position. Neuropsychologia. 1995; 33 :1583–1596. [ PubMed ] [ Google Scholar ]
  • Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA. 1982; 79 (8):2554–2558. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Jacobsen CF. The functions of the frontal association areas in monkeys. Comp. Psychol. Monogr. 1936; 13 :1–60. [ Google Scholar ]
  • James W. Principles of Psychology. New York: Henry Holt; 1890. [ Google Scholar ]
  • Jensen O. Maintenance of multiple working memory items by temporal segmentation. Neuroscience. 2006; 139 :237–249. [ PubMed ] [ Google Scholar ]
  • Jha AP, McCarthy G. The influence of memory load upon delay-interval activity in a working-memory task: an event-related functional MRI study. J. Cogn. Neurosci. 2000; 12 :90–105. [ PubMed ] [ Google Scholar ]
  • Johnson MK, Raye CL, Mitchell KJ, Greene EJ, Cunningham WA, Sanislow CA. Using fMRI to investigate a component process of reflection: prefrontal correlates of refreshing a just-activated representation. Cogn. Affect. Behav. Neurosci. 2005; 5 :339–361. [ PubMed ] [ Google Scholar ]
  • Jonides J, Lacey SC, Nee DE. Processes of working memory in mind and brain. Curr. Dir. Psychol. Sci. 2005; 14 :2–5. [ Google Scholar ]
  • Jonides J, Nee DE. Brain mechanisms of proactive interference in working memory. Neuroscience. 2006; 139 :181–193. [ PubMed ] [ Google Scholar ]
  • Jonides J, Smith EE, Koeppe RA, Awh E, Minoshima S, Mintun MA. Spatial working memory in humans as revealed by PET. Nature. 1993; 363 :623–625. [ PubMed ] [ Google Scholar ]
  • Jonides J, Smith EE, Marshuetz C, Koeppe RA, Reuter-Lorenz PA. Inhibition in verbal working memory revealed by brain activation. Proc. Natl. Acad. Sci. USA. 1998; 95 :8410–8413. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Keppel G, Underwood BJ. Proactive-inhibition in short-term retention of single items. J. Verbal Learn. Verbal Behav. 1962; 1 :153–161. [ Google Scholar ]
  • Lange EB, Oberauer K. Overwriting of phonemic features in serial recall. Memory. 2005; 13 :333–339. [ PubMed ] [ Google Scholar ]
  • Lewandowsky S, Duncan M, Brown GDA. Time does not cause forgetting in short-term serial recall. Psychon. Bull. Rev. 2004; 11 :771–790. [ PubMed ] [ Google Scholar ]
  • Lewis RL, Vasishth S. An activation-based theory of sentence processing as skilled memory retrieval. Cogn. Sci. 2005; 29 :375–419. [ PubMed ] [ Google Scholar ]
  • Lewis RL, Vasishth S, Van Dyke J. Computational principles of working memory in sentence comprehension. Trends Cogn. Sci. 2006; 10 :447–454. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lisman JE, Idiart MAP. Storage of 7+/−2 short-term memories in oscillatory subcycles. Science. 1995; 267 :1512–1515. [ PubMed ] [ Google Scholar ]
  • Luck SJ, Vogel EK. The capacity of visual working memory for features and conjunctions. Nature. 1997; 390 :279–281. [ PubMed ] [ Google Scholar ]
  • Lustig C, Matell MS, Meck WH. Not “just” a coincidence: frontal-striatal interactions in working memory and interval timing. Memory. 2005; 13 :441–448. [ PubMed ] [ Google Scholar ]
  • Malmo RB. Interference factors in delayed response in monkeys after removal of frontal lobes. J. Neurophysiol. 1942; 5 :295–308. [ Google Scholar ]
  • Manoach DS, Greve DN, Lindgren KA, Dale AM. Identifying regional activity associated with temporally separated components of working memory using event-related functional MRI. NeuroImage. 2003; 20 (3):1670–1684. [ PubMed ] [ Google Scholar ]
  • Martin RC. Short-term memory and sentence processing: evidence from neuropsychology. Mem. Cogn. 1993; 21 :176–183. [ PubMed ] [ Google Scholar ]
  • McClelland JL, McNaughton BL, O’Reilly RC. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 1995; 102 :419–457. [ PubMed ] [ Google Scholar ]
  • McElree B. Attended and nonattended states in working memory: accessing categorized structures. J. Mem. Lang. 1998; 38 :225–252. [ Google Scholar ]
  • McElree B. Working memory and focal attention. J. Exp. Psychol.: Learn. Mem. Cogn. 2001; 27 :817–835. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McElree B. Accessing recent events. Psychol. Learn. Motiv. 2006; 46 :155–200. [ Google Scholar ]
  • McElree B, Dosher BA. Serial position and set size in short-term memory: time course of recognition. J. Exp. Psychol.: Gen. 1989; 118 :346–373. [ Google Scholar ]
  • McGeoch J. Forgetting and the law of disuse. Psychol. Rev. 1932; 39 :352–370. [ Google Scholar ]
  • McKone E. Short-term implicit memory for words and non-words. J. Exp. Psychol.: Learn. Mem. Cogn. 1995; 21 (5):1108–1126. [ Google Scholar ]
  • McKone E. The decay of short-term implicit memory: unpacking lag. Mem. Cogn. 1998; 26 (6):1173–1186. [ PubMed ] [ Google Scholar ]
  • Meyer DE, Kieras DE. A computational theory of executive cognitive processes and multiple-task performance: 1. Basic mechanisms. Psychol. Rev. 1997; 104 (1):3–65. [ PubMed ] [ Google Scholar ]
  • Miller GA. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 1956; 63 :81–97. [ PubMed ] [ Google Scholar ]
  • Milner PM. Magical attention. Behav. Brain Sci. 2001; 24 (1):131. [ Google Scholar ]
  • Miyashita Y, Chang HS. Neuronal correlate of pictorial short-term memory in the primate temporal cortex. Nature. 1968; 331 :68–70. [ PubMed ] [ Google Scholar ]
  • Monsell S. Recency, immediate recognition memory, and reaction-time. Cogn. Psychol. 1978; 10 (4):465–501. [ Google Scholar ]
  • Murdock BB. A theory for the storage and retrieval of item and associative information. Psychol. Rev. 1982; 89 (6):609–626. [ PubMed ] [ Google Scholar ]
  • Nairne JS. Remembering over the short-term: the case against the standard model. Annu. Rev. Psychol. 2002; 53 :53–81. [ PubMed ] [ Google Scholar ]
  • Neath I, Nairne JS. Word-length effects in immediate memory: overwriting trace decay theory. Psychon. Bull. Rev. 1995; 2 :429–441. [ PubMed ] [ Google Scholar ]
  • Nelson JK, Reuter-Lorenz PA, Sylvester CYC, Jonides J, Smith EE. Dissociable neural mechanisms underlying response-based and familiarity-based conflict in working memory. Proc. Natl. Acad. Sci. USA. 2003; 100 :11171–11175. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Newell A. You can’t play 20 questions with nature and win: projective comments on the papers of this symposium. In: Chase WG, editor. Visual Information Processing; Academic; New York. 1973. pp. 283–310. [ Google Scholar ]
  • Newell A. Unified Theories of Cognition. Cambridge, MA: Harvard Univ. Press; 1990. [ Google Scholar ]
  • Newell A, Simon H. Human Problem Solving. Englewood Cliffs, NJ: Prentice-Hall; 1972. [ Google Scholar ]
  • Nichols EA, Kao Y-C, Verfaellie M, Gabrieli JDE. Working memory and long-term memory for faces: evidence from fMRI and global amnesia for involvement of the medial temporal lobes. Hippocampus. 2006; 16 :604–616. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Oberauer K. Access to information in working memory: exploring the focus of attention. J. Exp. Psychol.: Learn. Mem. Cogn. 2002; 28 :411–421. [ PubMed ] [ Google Scholar ]
  • Oberauer K. Is the focus of attention in working memory expanded through practice? J. Exp. Psychol.: Learn. Mem. Cogn. 2006; 32 :197–214. [ PubMed ] [ Google Scholar ]
  • Oberauer K, Kliegl R. A formal model of capacity limits in working memory. J. Mem. Lang. 2006; 55 :601–626. [ Google Scholar ]
  • Olson IR, Moore KS, Stark M, Chatterjee A. Visual working memory is impaired when the medial temporal lobe is damaged. J. Cogn. Neurosci. 2006a; 18 :1087–1097. [ PubMed ] [ Google Scholar ]
  • Olson IR, Page K, Moore KS, Chatterjee A, Verfaellie M. Working memory for conjunctions relies on the medial temporal lobe. J. Neurosci. 2006b; 26 :4596–4601. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Owen AM, Sahakian BJ, Semple J, Polkey CE, Robbins TW. Visuo-spatial short-term recognition memory and learning after temporal lobe excisions, frontal lobe excisions or amygdala-hippocampectomy in man. Neuropsychologia. 1995; 33 :1–24. [ PubMed ] [ Google Scholar ]
  • Pashler H. Familiarity and visual change detection. Percept. Psychophys. 1988; 44 :369–378. [ PubMed ] [ Google Scholar ]
  • Pasternak T, Greenlee MW. Working memory in primate sensory systems. Nat. Rev. Neurosci. 2005; 6 :97–107. [ PubMed ] [ Google Scholar ]
  • Polk TA, Simen P, Lewis RL, Freedman E. A computational approach to control in complex cognition. Cogn. Brain Res. 2002; 15 (1):71–83. [ PubMed ] [ Google Scholar ]
  • Postle BR. Working memory as an emergent property of the mind and brain. Neuroscience. 2006; 139 :23–38. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Postle BR, D’Esposito M. “What”—then—“where” in visual working memory: an event-related, fMRI study. J. Cogn. Neurosci. 1999; 11 (6):585–597. [ PubMed ] [ Google Scholar ]
  • Postman L. Extra-experimental interference and retention of words. J. Exp. Psychol. 1961; 61 (2):97–110. [ PubMed ] [ Google Scholar ]
  • Prabhakaran V, Narayanan ZZ, Gabrieli JDE. Integration of diverse information in working memory within the frontal lobe. Nat. Neurosci. 2000; 3 :85–90. [ PubMed ] [ Google Scholar ]
  • Pylyshyn ZW. Some primitive mechanisms of spatial attention. Cognition. 1994; 50 :363–384. [ PubMed ] [ Google Scholar ]
  • Pylyshyn ZW, Burkell J, Fisher B, Sears C, Schmidt W, Trick L. Multiple parallel access in visual-attention. Can. J. Exp. Psychol. Rev. Can. Psychol. Exp. 1994; 48 (2):260–283. [ PubMed ] [ Google Scholar ]
  • Raichle ME, MacLeod AM, Snyder AZ, Powers WJ, Gusnard DA, Shulman GL. A default mode of brain function. Proc. Natl. Acad. Sci. USA. 2001; 98 :676–682. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ranganath C. Working memory for visual objects: complementary roles of inferior temporal, medial temporal, and prefrontal cortex. Neuroscience. 2006; 139 :277–289. [ PubMed ] [ Google Scholar ]
  • Ranganath C, Blumenfeld RS. Doubts about double dissociations between short- and long-term memory. Trends Cogn. Sci. 2005; 9 :374–380. [ PubMed ] [ Google Scholar ]
  • Ranganath C, DeGutis J, D’Esposito M. Category-specific modulation of inferior temporal activity during working memory encoding and maintenance. Cogn. Brain Res. 2004; 20 :37–45. [ PubMed ] [ Google Scholar ]
  • Ranganath C, D’Esposito M. Medial temporal lobe activity associated with active maintenance of novel information. Neuron. 2001; 31 :865–873. [ PubMed ] [ Google Scholar ]
  • Ranganath C, D’Esposito M. Directing the mind’s eye: prefrontal, inferior and medial temporal mechanisms for visual working memory. Curr. Opin. Neurobiol. 2005; 15 :175–182. [ PubMed ] [ Google Scholar ]
  • Ranganath C, Johnson MK, D’Esposito M. Prefrontal activity associated with working memory and episodic long-term memory. Neuropsychologia. 2003; 41 (3):378–389. [ PubMed ] [ Google Scholar ]
  • Renart A, Parga N, Rolls ET. Backward projections in the cerebral cortex: implications for memory storage. Neural Comput. 1999; 11 (6):1349–1388. [ PubMed ] [ Google Scholar ]
  • Repov G, Baddeley AD. The multi-component model of working memory: explorations in experimental cognitive psychology. Neuroscience. 2006; 139 :5–21. [ PubMed ] [ Google Scholar ]
  • Reuter-Lorenz PA, Jonides J. The executive is central to working memory: insights from age performance and task variations. In: Conway AR, Jarrold C, Kane MJ, Miyake A, Towse JN, editors. Variations in Working Memory. London/New York: Oxford Univ. Press: 2007. pp. 250–270. [ Google Scholar ]
  • Reuter-Lorenz PA, Jonides J, Smith EE, Hartley A, Miller A, et al. Age differences in the frontal lateralization of verbal and spatial working memory revealed by PET. J. Cogn. Neurosci. 2000; 12 :174–187. [ PubMed ] [ Google Scholar ]
  • Roediger HL, Knight JL, Kantowitz BH. Inferring decay in short-term-memory—the issue of capacity. Mem. Cogn. 1977; 5 (2):167–176. [ PubMed ] [ Google Scholar ]
  • Rolls ET. Memory systems in the brain. Annu. Rev. Psychol. 2000; 51 :599–630. [ PubMed ] [ Google Scholar ]
  • Rougier NP, Noelle DC, Braver TS, Cohen JD, O’Reilly RC. Prefrontal cortex and flexible cognitive control: rules without symbols. Proc. Natl. Acad. Sci. USA. 2005; 102 (20):7338–7343. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ruchkin DS, Grafman J, Cameron K, Berndt RS. Working memory retention systems: a state of activated long-term memory. Behav. Brain Sci. 2003; 26 :709–777. [ PubMed ] [ Google Scholar ]
  • Sakai K. Reactivation of memory: role of medial temporal lobe and prefrontal cortex. Rev. Neurosci. 2003; 14 (3):241–252. [ PubMed ] [ Google Scholar ]
  • Schubert T, Frensch PA. How unitary is the capacity-limited attentional focus? Behav. Brain Sci. 2001; 24 (1):146. [ Google Scholar ]
  • Scoville WB, Milner B. Loss of recent memory after bilateral hippocampal lesions. J. Neurol. Neurosurg. Psychiatry. 1957; 20 :11–21. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Shallice T, Warrington EK. Independent functioning of verbal memory stores: a neuropsychological study. Q. J. Exp. Psychol. 1970; 22 :261–273. [ PubMed ] [ Google Scholar ]
  • Smith EE, Jonides J. Working memory: a view from neuroimaging. Cogn. Psychol. 1997; 33 :5–42. [ PubMed ] [ Google Scholar ]
  • Smith EE, Jonides J. Neuroscience—storage and executive processes in the frontal lobes. Science. 1999; 283 :1657–1661. [ PubMed ] [ Google Scholar ]
  • Smith EE, Jonides J, Koeppe RA, Awh E, Schumacher EH, Minoshima S. Spatial vs object working-memory: PET investigations. J. Cogn. Neurosci. 1995; 7 :337–356. [ PubMed ] [ Google Scholar ]
  • Sperling G. The information available in brief visual presentations. Psychol. Monogr. 1960; 74 Whole No. 498. [ Google Scholar ]
  • Squire L. Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. Psychol. Rev. 1992; 99 :195–231. [ PubMed ] [ Google Scholar ]
  • Sternberg S. High-speed scanning in human memory. Science. 1966; 153 :652–654. [ PubMed ] [ Google Scholar ]
  • Talmi D, Grady CL, Goshen-Gottstein Y, Moscovitch M. Neuroimaging the serial position curve. Psychol. Sci. 2005; 16 :716–723. [ PubMed ] [ Google Scholar ]
  • Thompson-Schill SL, D’Esposito M, Aguirre GK, Farah MJ. Role of left inferior prefrontal cortex in retrieval of semantic knowledge: a reevaluation. Proc. Natl. Acad. Sci. USA. 1997; 94 :14792–14797. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Thompson-Schill SL, Jonides J, Marshuetz C, Smith EE, D’Esposito M, et al. Effects of frontal lobe damage on interference effects in working memory. J. Cogn. Affect. Behav. Neurosci. 2002; 2 :109–120. [ PubMed ] [ Google Scholar ]
  • Todd JJ, Marois R. Capacity limit of visual short-term memory in human posterior parietal cortex. Nature. 2004; 428 (6984):751–754. [ PubMed ] [ Google Scholar ]
  • Todd JJ, Marois R. Posterior parietal cortex activity predicts individual differences in visual short-term memory capacity. Cogn. Affect. Behav. Neurosci. 2005; 5 :144–155. [ PubMed ] [ Google Scholar ]
  • Trick LM, Pylyshyn ZW. What enumeration studies can show us about spatial attention—evidence for limited capacity preattentive processing. J. Exp. Psychol.: Hum. Percept. Perform. 1993; 19 (2):331–351. [ PubMed ] [ Google Scholar ]
  • Ungerleider LG, Haxby JV. “What” and “where” in the human brain. Curr. Opin. Neurobiol. 1994; 4 :157–165. [ PubMed ] [ Google Scholar ]
  • Unsworth N, Engle RW. The nature of individual differences in working memory capacity: active maintenance in primary memory and controlled search from secondary memory. Psychol. Rev. 2007; 114 :104–132. [ PubMed ] [ Google Scholar ]
  • Vallar G, Baddeley AD. Fractionation of working memory: neuropsychological evidence for a phonological short-term store. J. Verbal Learn. Verbal Behav. 1984; 23 :151–161. [ Google Scholar ]
  • Vallar G, Papagno C. Neuropsychological impairments of verbal short-term memory. In: Baddeley AD, Kopelman MD, Wilson BA, editors. The Handbook of Memory Disorders. 2nd ed. Chichester, UK: Wiley; 2002. pp. 249–270. [ Google Scholar ]
  • Verhaeghen P, Basak C. Aging and switching of the focus of attention in working memory: results from a modified N-Back task. Q. J. Exp. Psychol. A. 2007 In press. [ PubMed ] [ Google Scholar ]
  • Verhaeghen P, Cerella J, Basak C. A working memory workout: how to expand the focus of serial attention from one to four items in 10 hours or less. J. Exp. Psychol.: Learn. Mem. Cogn. 2004; 30 :1322–1337. [ PubMed ] [ Google Scholar ]
  • Vogel EK, Machizawa MG. Neural activity predicts individual differences in visual working memory capacity. Nature. 2004; 426 :748–751. [ PubMed ] [ Google Scholar ]
  • Vogel EK, Woodman GF, Luck SJ. The time course of consolidation in visual working memory. J. Exp. Psychol.: Hum. Percept. Perform. 2006; 32 :1436–1451. [ PubMed ] [ Google Scholar ]
  • Wager TD, Smith EE. Neuroimaging studies of working memory: a meta-analysis. Neuroimage. 2003; 3 :255–274. [ PubMed ] [ Google Scholar ]
  • Warrington EK, Shallice T. The selective impairment of auditory verbal short-term memory. Brain. 1969; 92 :885–896. [ PubMed ] [ Google Scholar ]
  • Waugh NC, Norman DA. Primary memory. Psychol. Rev. 1965; 72 :89–104. [ PubMed ] [ Google Scholar ]
  • Wheeler ME, Peterson SE, Buckner RL. Memory’s echo: vivid remembering reactivates sensory-specific cortex. Proc. Natl. Acad. Sci. USA. 2000; 97 (20):11125–11129. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wheeler ME, Shulman GL, Buckner RL, Miezin FM, Velanova K, Petersen SE. Evidence for separate perceptual reactivation and search processes during remembering. Cereb. Cortex. 2006; 16 (7):949–959. [ PubMed ] [ Google Scholar ]
  • Wickens DD. Encoding categories of words—empirical approach to meaning. Psychol. Rev. 1970; 77 :1–15. [ Google Scholar ]
  • Wilken P, Ma WJ. A detection theory account of change detection. J. Vis. 2004; 4 :1120–1135. [ PubMed ] [ Google Scholar ]
  • Wilson FAW, O’Scalaidhe SP, Goldman-Rakic PS. Dissociation of object and spatial processing domains in primate prefrontal cortex. Science. 1993; 260 :1955–1958. [ PubMed ] [ Google Scholar ]
  • Wixted JT. The psychology and neuroscience of forgetting. Annu. Rev. Psychol. 2004; 55 :235–269. [ PubMed ] [ Google Scholar ]
  • Woodman GF, Vogel EK, Luck SJ. Attention is not unitary. Behav. Brain Sci. 2001; 24 (1):153. [ Google Scholar ]
  • Xu YD, Chun MM. Dissociable neural mechanisms supporting visual short-term memory for objects. Nature. 2006; 440 :91–95. [ PubMed ] [ Google Scholar ]
  • Yantis S, Serences JT. Cortical mechanisms of space-based and object-based attentional control. Curr. Opin. Neurobiol. 2003; 13 :187–193. [ PubMed ] [ Google Scholar ]
  • Zhang D, Zhang X, Sun X, Li Z, Wang Z, et al. Cross-modal temporal order memory for auditory digits and visual locations: an fMRI study. Hum. Brain Mapp. 2004; 22 :280–289. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Zucker RS, Regehr WG. Short-term synaptic plasticity. Annu. Rev. Physiol. 2002; 64 :355–405. [ PubMed ] [ Google Scholar ]

This week: the arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Machine Learning

Title: short-term electricity-load forecasting by deep learning: a comprehensive survey.

Abstract: Short-Term Electricity-Load Forecasting (STELF) refers to the prediction of the immediate demand (in the next few hours to several days) for the power system. Various external factors, such as weather changes and the emergence of new electricity consumption scenarios, can impact electricity demand, causing load data to fluctuate and become non-linear, which increases the complexity and difficulty of STELF. In the past decade, deep learning has been applied to STELF, modeling and predicting electricity demand with high accuracy, and contributing significantly to the development of STELF. This paper provides a comprehensive survey on deep-learning-based STELF over the past ten years. It examines the entire forecasting process, including data pre-processing, feature extraction, deep-learning modeling and optimization, and results evaluation. This paper also identifies some research challenges and potential research directions to be further investigated in future work.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: [cs.LG]
  (or [cs.LG] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 03 September 2024

Short-term air quality prediction based on EMD-transformer-BiLSTM

  • Jie Dong 1 , 2 ,
  • Yaoli Zhang 2 &
  • Jiang Hu 3  

Scientific Reports volume  14 , Article number:  20513 ( 2024 ) Cite this article

Metrics details

  • Engineering
  • Environmental sciences
  • Mathematics and computing

Actual acquired air quality time series data are highly volatile and nonstationary, and accurately predicting nonlinear time series data containing complex noise is an ongoing challenge. This paper proposes an air quality prediction method based on empirical mode decomposition (EMD), a transformer and a bidirectional long short-term memory neural network (BiLSTM), which is good at addressing the ultrashort-term prediction of nonlinear time-series data and shows good performance for application to the air quality dataset of Patna, India (6:00 am on October 3, 2015–0:00 pm on July 1, 2020). The AQI sequence is first decomposed into intrinsic mode functions (IMFs) via EMD and subsequently predicted separately via the improved transformer algorithm based on BiLSTM, where linear prediction is performed for IMFs with simple trends. Finally, the predicted values of each IMF are integrated using BiLSTM to obtain the predicted AQI values. This paper predicts the AQI in Patna with a time window of 5 h, and the RMSE, MAE and MAPE are as low as 5.6853, 2.8230 and 2.23%, respectively. Moreover, the scalability of the proposed model is validated on air quality datasets from several other cities, and the results prove that the proposed hybrid model has high performance and broad application prospects in real-time air quality prediction.

Similar content being viewed by others

short term papers

Prediction of air quality index based on the SSA-BiLSTM-LightGBM model

short term papers

Daily flow prediction of the Huayuankou hydrometeorological station based on the coupled CEEMDAN–SE–BiLSTM model

short term papers

A novel coupled rainfall prediction model based on stepwise decomposition technique

Introduction.

In recent years, global air quality problems have become increasingly serious. According to the World Meteorological Organization, if greenhouse gas emissions remain high, ground-level ozone levels are expected to rise by the second half of the twenty-first century in heavily polluted regions, especially in Asia, including Pakistan, northern India and Bangladesh 1 , where they have already increased by 20%. On Nov. 3, the air quality in New Delhi, India, reached its worst since the beginning of winter, seriously affecting economic activity in India. However, real-time monitoring of existing monitoring systems in most countries has a certain lag, and air pollution conditions cannot be predicted in advance, allowing reliable early warning information to be provided to the government. Therefore, how to predict air quality well to provide suggestions for social development has become an urgent problem.

The air quality index (AQI) is a composite index that converts the levels of various pollutants into a quantitative description of air quality levels. However, real-life measurements of AQIs usually involve complex noise with typical nonlinear characteristics. To retain the original physical characteristics of the data, this paper uses the EMD method to decompose the input AQI series, and the resulting IMF components can represent different local characteristics of the original data 2 . If the decomposed IMFs are separately predicted in parallel, the time needed to identify the features of the signal is reduced, which greatly improves the efficiency and accuracy of algorithm training. For the predicted component data, most of the previous studies used the direct summation method for integration 3 . In this paper, to improve the accuracy of the model as much as possible and considering that there is still some correlation between the decomposed IMFs, the BiLSTM model is integrated to obtain the final AQI prediction results. Overall, the advantage of this approach lies in utilizing empirical mode decomposition (EMD) to decompose time series, enabling each component to be separately predicted by predictive models, thereby better capturing variations in different frequencies within the sequence. Subsequently, a neural network algorithm is trained based on the initial decomposition components to integrate the prediction results of each component into an overall prediction result, effectively enhancing the accuracy and robustness of the predictions.

For the prediction model, one of the disadvantages of the transformer architecture is the limited size of the context and prediction window due to the traditional squared operations and memory requirements, and some experiments have shown this architecture to be ineffective for long-term prediction 4 . Since its calculation method may cause spurious correlations in the data at different times, obviously more input data increase the probability of this phenomenon. Therefore, this paper aims to investigate whether the transformer architecture can be useful for short-term prediction. On the other hand, the pointwise dot product self-attention in the transformer is not sensitive to local information, which makes the model prone to abnormalities in the prediction, while BiLSTM is good at capturing bidirectional time series information dependencies; thus, the transformer-BiLSTM model that combines the two methods can balance the global and local sequence information well. In addition, when the transformer is used for time series data prediction, the self-attention mechanism of the decoder accumulates errors layer by layer. Chaos can easily be generated, and the prediction effect in many cases is even worse than that of replacing the decoder part with a fully connected layer 5 . Therefore, it is appropriate to apply the BiLSTM network to improve the performance of the decoder. Furthermore, considering that the decomposed components after EMD exhibit different frequencies and trends, not all component sequences are suitable for prediction via complex algorithms. Therefore, the prediction result for each component is determined as the one with the minimum root mean square error (RMSE) among Transformer-BiLSTM and simple linear regression.

The contributions of this paper are as follows. (1) Expand the application methods when signal decomposition methods such as EMD are used for time series prediction: For the prediction of IMFs, two prediction methods are used in a selective manner considering the different characteristics of the data; in the signal reconstruction process, BiLSTM is used to predict the final results considering the possible correlation among the original data and all IMFs. (2) Investigate whether the improved transformer architecture using the BiLSTM model can perform well for short-term prediction: the transformer architecture does not perform as well as expected in long-term time series prediction tasks, and this paper further investigates whether this architecture is good for short-term prediction tasks by improving the decoder part to enhance the local information perception capability of the model. (3) Enrich the library of methods for air quality prediction in India: This paper presents the first example of an EMD-transformer-BiLSTM model and demonstrates through experiments that the accuracy of this model is improved compared with that of other models with similar structures used for air quality prediction.

The paper is structured as follows. Section “ Literature review ” reviews the literature on research methods for air quality prediction. Section “ Models ” presents the framework of the model proposed in this paper and some of its computational processes. Section “ Experimental results and analysis ” presents the preparation before model training, the results of model training and a detailed explanation and analysis of the results. In Section “ Extension analysis ”, the prediction results of the proposed model for several different cities are given to verify the scalability of the model. Section “ Conclusions ” summarizes the results of this paper and the limitations of the model and proposes future research directions.

Literature review

Statistical analysis.

The commonly used methods for predicting air quality can be divided into two main categories: statistical measures and machine learning algorithms. Statistical methods make predictions by applying regression models based on mathematical statistics, such as autoregressive integrated moving average (ARIMA) models and generalized autoregressive conditional heteroskedasticity (GARCH) models. Polydorasa et al. 6 compared the effectiveness of air quality prediction using partial differential equations and univariate Box‒Jenkins models. Alsoltany et al. 7 suggested that residual errors are sometimes caused by the uncertainty or inaccuracy of the model structure; therefore, the fuzzy linear regression parameter estimation method was used to predict the concentrations of urban pollutants. However, these methods can only describe the trend state of time series in the field of linear regression and rely heavily on the assumption of a normal distribution of data, making it difficult to more accurately fit real data in real life. In the field of measurement, in recent years, several scholars have proposed using the generalized Pareto distribution (DCP) model to fit the time dependence of air pollutant concentrations and provide a better fit for the tail correlation of the data 8 .

Machine learning

With the development of big data technology, since machine learning and deep learning methods based on nonparametric statistics can fit complex multiple interactive relationships and nonlinear relationships, an increasing number of scholars have applied these algorithms to prediction tasks.

In the realm of nonparametric machine learning algorithms, Donnelly et al. 9 employed a nonparametric kernel regression approach, integrating temporal variations in pollutant concentrations, historical correlations with meteorological factors, and seasonal and diurnal periodicity factors to achieve real-time forecasting of air quality for the next 48 h with low resource requirements and high accuracy. Castelli et al. 10 utilized support vector regression (SVR) to predict hourly pollutant concentrations and the air quality index (AQI), experimentally indicating that the radial basis function (RBF) is the kernel type allowing SVR to achieve the most accurate predictions. Mengash et al. 11 also developed an automated atmospheric particulate matter concentration prediction tool based on various machine learning algorithms, employing the chi-square feature selection method for feature screening after computing multimodal features of particulate matter concentration in dynamic environments. Zhan et al. 12 , taking representative machine learning algorithms as an example, introduced a novel framework based on feature selection, error correction models, and a novel kernel acceleration method for residual estimation that is capable of handling large-scale data and significantly reducing the prediction interval.

In the domain of deep learning algorithms, Neagu et al. 13 combined fuzzy inference and neural networks for the prediction of air quality. Corani 14 used feedforward neural network (FFNN), pruning neural network (PNN), and lazy learning (LL) methods to predict ozone concentrations in Milan and found that LL, as a local linear prediction algorithm, can eliminate the overfitting problem, update faster and is more interpretable. Kim et al. 15 devised a data-driven method for predicting the indoor air quality in subway stations by leveraging recurrent neural networks (RNNs). They utilized the partial least squares (PLS) method during preprocessing to establish a linear relationship model between the input and output variables, thereby facilitating the selection of key input variables to optimize the prediction model. The experimental results showed that the prediction results of RNN models have good performance and high interpretability. Mellit 16 used least squares support vector machines (LS-SVMs) for short-term forecasting of meteorological time series and verified that LS-SVMs produced significantly better results than artificial neural networks (ANNs). Singh et al. 17 used partial least squares regression (PLSR), multiple polynomial regression (MPR) and ANN models to predict the levels of atmospheric air pollutants such as SO and ultimately found that the performance of the nonlinear model was relatively better than that of the linear model. Li et al. 18 worked on building deep networks and considered spatiotemporal correlation to propose an air quality prediction method based on spatiotemporal deep learning (STDL) and stacked autoencoder (SAE) models, and the results demonstrated that the model could predict air quality at multiple stations simultaneously. Yi et al. 19 proposed Deepair, a deep neural network (DNN)-based method consisting of a spatial transformation component and a deep distributed fusion network, and experimental results based on data from Chinese cities showed that Deepair outperformed 10 classical prediction methods. In summary, the development of machine learning methods in the field of air quality prediction has evolved from traditional statistical approaches to deep learning. These methods offer more powerful and accurate tools for addressing air quality prediction issues. On the one hand, they have improved the selection and extraction of features related to air quality. On the other hand, they employ model fusion and ensemble methods to combine results from multiple models.

Improvement of the LSTM algorithm

Although nonlinear machine learning methods achieve strong generalizability in predicting air quality, these methods have difficulty capturing the effects of long lags between series, thus limiting the prediction accuracy. As a classic time series forecasting model, LSTM models offer advantages in short-term air quality prediction by effectively capturing both long-term and short-term dependencies within time series data and possessing stronger memory and sequence modeling capabilities. Compared to LSTM models, BiLSTM models can further enhance predictive performance by better leveraging contextual information. To make full use of more historical data in time series, many scholars have started to analyze and improve the structure of LSTMs in recent years. Li et al. 20 argued that the existing methods for predicting air pollutant concentrations at that time could not effectively model long-term dependence, and most of them ignored spatial correlation; thus, a method that merges meteorological data and time-stamped data was proposed. Therefore, they proposed a new long- and short-term memory neural network extension (LSTME) model that simultaneously considers spatiotemporal correlations, and multiscale predictions were performed for different periods. The results prove that the model can achieve satisfactory performance. Wen et al. 21 proposed a spatiotemporal convolutional long short-term memory neural network extension (C-LSTME) model for predicting air quality concentrations, incorporating historical air pollutant concentrations at each site as well as adaptive k-nearest neighboring sites to improve the model prediction performance. Ma 22 worked on solving the problem of data shortages in air quality prediction tasks and proposed a stacked bidirectional long short-term memory (TLS-BLSTM) network based on migration learning for predicting air quality at new stations lacking data. Li et al. 23 applied a model based on a one-dimensional convolutional neural network (CNN), LSTM, and attention mechanism for urban PM2.5 concentration prediction; additionally, they added meteorological data and data from neighboring air quality monitoring stations to their input data while using an attention mechanism to capture the importance level of the influence of different temporal feature states in the past on future PM2.5 concentrations to improve the prediction accuracy. Zhang et al. 24 proposed a semisupervised model based on empirical modal decomposition (EMD) and a bidirectional long short-term memory (BiLSTM) network to improve short-term trend prediction, especially for the identification of unexpected situations.

Decomposition ensemble methods

Air quality data typically exhibit complex nonlinearities, comprising multiple components with different frequency characteristics, making accurate trend prediction, especially long-term trend prediction, challenging. To address this issue, numerous scholars have sought to improve prediction accuracy and reliability from the perspective of information decomposition ensemble methods. After decomposing PM2.5 data into multiple components via empirical mode decomposition (EMD), Jin et al. 25 employed convolutional neural networks (CNNs) to group all the components based on their frequency characteristics. Finally, they applied gated recurrent units (GRUs) to predict each group and fused the results to obtain the outcome. Song and Fu 26 integrated three single prediction models, the radial basis function neural network (RBFNN), the RBFNN algorithm based on ensemble empirical mode decomposition (EEMD), and the ARIMA algorithm based on EEMD-RBFNN, into a composite forecasting model (CFM). Through weight allocation, they achieved decomposition integration, providing a novel ensemble method for AQI prediction. Wang et al. 27 proposed a novel multiscale hybrid learning framework based on robust local mean decomposition and a moving window integration strategy for particle concentration prediction. This framework can capture linear and nonlinear patterns and improve prediction accuracy and generalizability through ensemble methods. Subsequently, Wang et al. 28 introduced a new forecasting method capable of effectively capturing trends and fluctuations in AQI. Specifically, by constructing ternary interval value sequences of AQI data and performing multiscale decomposition using multivariate variational mode decomposition (MVMD), they conducted separate predictions followed by simple addition for integration. Cai et al. 29 proposed a novel decomposition-ensemble-reconstruction prediction framework that utilizes entropy to compute and study decomposed subcomponents. They employed different prediction tools (ARIMA, CNN, and TCN) to capture different time-scale patterns of reconstructed subtime series data, demonstrating significant superiority over classical deep learning algorithms. It is evident that utilizing decomposition ensemble methods based on EMD for air quality forecasting can better handle complex air quality data and capture multiple components with different frequency characteristics, thereby improving the prediction accuracy and reliability. Moreover, this approach can adapt to different scale prediction requirements, better meeting the practical application scenarios of air quality forecasting.

Among the above four categories of methods, whether based on traditional neural network models or statistical econometric models, many have more elaborate parameter settings, longer training cycles, and complex model hierarchies, which make it difficult to verify the generalization performance of the models. In the application of decomposition ensemble methods, while previous research has considered using nonsingle prediction methods for forecasting components, there has been a slight oversight in the fusion of prediction results by employing only simple ensemble methods. Moreover, although deep learning methods can obtain good results in air quality prediction, the implementation of these methods requires sufficient historical datasets, and the amount of data significantly limits the model performance. Some research methods fail to effectively extract the spatiotemporal characteristics of air pollutant concentration data or gauge the impact of different temporal characteristics on future air quality, and most of them fail to effectively model the spatiotemporal dependence of air quality indices at the same time, showing low accuracy in long-term predictions and unexpected situations. Therefore, considering the shortcomings of existing research, this paper integrates BiLSTM and Transformer architectures to better address the potential issues of information forgetting and gradient vanishing in BiLSTM when dealing with long sequences, as well as the inadequate modeling of long-term dependency relationships in Transformers. Additionally, by utilizing EMD for sequence decomposition followed by ensemble neural network algorithms, the predictive capability of the model is further enhanced.

Model framework

In the model of this paper, the input data are first decomposed using EMD to obtain intrinsic mode functions 30 . To accurately predict these component sequences, the data are first processed by positional encoding, and then a vector containing information about the data features is output to the decoder after a multihead self-attention mechanism and a feedforward neural network based on a residual network structure. Then, BiLSTM outputs the predicted sequence after learning the relationship 31 . Finally, a fully connected layer is used to interpret each time step in the output sequence and output the prediction probability.

However, considering that the final decomposed IMF usually has a very simple linear trend, using a neural network with complex parameters for prediction would not only have no beneficial effect on the enhancement of prediction results but also increase the overall complexity of the algorithm. It was found experimentally that the final results after predicting all IMFs using only neural networks were worse than the final results after using the model proposed in this paper. Therefore, the model in this study compares the transformer-BiLSTM model and simple linear regression, and the model with the smallest RMSE is selected as the prediction result of the IMF sequence from the two prediction results. In this module, after the components obtained through EMD are initially predicted using the Transformer-BiLSTM algorithm, simple linear regression is applied for prediction. Subsequently, the prediction value with the smaller RMSE is selected as the predicted result for that component. The introduction of this selective structure helps ensure the optimal prediction performance of the model by choosing the most suitable prediction method in different scenarios.

Moreover, based on the training set data decomposed into IMFs, a training set is constructed with N IMFs as N input features and the corresponding real AQI values as labels, training a BiLSTM model. At this point, the total sample size of this training set is 23,174. Finally, using this model to reconstruct the prediction results of the IMF sequence, i.e., using the predicted results sequence of N components as input, the final AQI prediction sequence is obtained using the trained BiLSTM model, where both the input and output sequence sample sizes are 7724 32 . The model flow used in this paper is shown in Fig.  1 .

figure 1

Flow of the EMD-transformer-BiLSTM hybrid model.

In the preprocessing method of time series data, the EMD algorithm can reflect the original physical characteristics of the system more accurately than the wavelet algorithm, which seems more effective in dealing with nonlinear and nonstationary signals 33 . The steps of the method are as follows: for signal \(x(t)\) , all the extreme value points and all the minimal value points are fitted with two threefold spline curves, and the average of the upper and lower 2 extreme value envelopes is denoted by \(m(t)\) . Let \(h\left(t\right)=x\left(t\right)-m\left(t\right)\) . If \(h(t)\) satisfies the condition (Eq.  1 ), then \(h(t)\) is the first IMF; otherwise, \(h(t)\) is considered \(x(t)\) . \({h}_{1,k}(t)\) is the difference between the obtained signal and the mean value of the envelope after k repetitions.

Suppose \({h}_{1,k}(t)\) is the first IMF; let \(x(t)=x(t)- {h}_{1,k}(t)\) , and repeat the above steps. When the residual is a monotonic function or the amplitude is less than a predetermined value, several components \({C}_{i}(t)\) can be obtained, and the residual is \(r(t)\) .

Moreover, compared with other advanced signal decomposition algorithms, EMD excels in balancing performance stability with relatively low computational complexity. Methods such as EEMD and CEEMD are essential improvements to the EMD method. The computational complexity of these improved methods is usually greater due to the introduction of additional steps and iterative processes aimed at enhancing decomposition performance. Moreover, given its widespread recognition and ample research support, we ultimately opt to use EMD as the decomposition method to ensure reliable and stable results.

Position encoding

The transformer model discards the traditional RNN and CNN models. To avoid losing the sequential information of the time series data, this paper uses positional encoding (hereafter referred to as PE) to sum the position information of the data one by one into the input vector of the model so that the self-attention mechanism can determine the absolute and relative position information of each data point in the overall sequence. The data at the even-numbered position are transformed by the sine function, and the data at the odd-numbered position are transformed by the cosine function so that the variable values can be scaled between [0,1], thus avoiding the interference of data orders. \({10000}^{2i/{d}_{model}}\) -fold reduction of the sequential values first can effectively avoid situations in which the data positions are different but the PE values are the same 34 . The calculation method of the position encoding is shown in Eqs. ( 2 )–( 3 ):

Multihead self-attention

Compared with the recurrent neural network structure, the self-attention mechanism can compute the input data in parallel in less time and space complexity, and it also enhances the interpretability of the whole model because the correlation between the data can be visualized. The self-attention mechanism plays a role in allowing the model to observe the correlations between different data throughout the input, thus discovering and solving the problem of lagged intercorrelations of time series data.

Assuming that \({Q}_{0}\) represents the information of certain data and \({K}_{0}\) represents the information of the rest of the input data, \({Q}_{0}\) and \({K}_{0}\) are pointwise dot products used to obtain a weight matrix containing the correlation information between the two vectors, which can be multiplied by the original data to obtain a weighted summed output. To enhance the fitting ability of the model, three trainable parameter matrices, Q , K , and V , are used; these matrices are obtained by linear transformation of the input matrix X with different parameter matrices. The self-attention mechanism is defined in Eq. ( 4 ):

Compared to single self-attention, multihead self-attention promotes the advantages of integrated learning. The input matrices Q , K , and V are linearly transformed so that each attention mechanism function is responsible for only one subspace of the final output sequence and that the results are independent of each other, which fully utilizes the original information of the data and effectively reduces the risk of overfitting.

Residual network

The main problems encountered by deep learning for network depth are gradient disappearance and gradient explosion. During the training process, each layer extracts features from the previous layer; thus, the network degenerates as the number of layers increases. On the other hand, residual networks take a jump connection approach to avoid these problems. Usually, a residual block consists of a direct mapping part and a residual part, which enables a connection between the input and output so that the newly added layer needs to learn only new features based on the original input layer, i.e., learning the residuals, thus avoiding the phenomenon that the error in the training set increases as the network deepens. The general manifestation of the residual block is shown in Eq. ( 5 ). \(\mathcal{F}\) is a function of the residuals, \({x}_{l}\) is the input to the \(l\) th layer, and \({W}_{l}\) is the parameter corresponding to the \(l\) th layer.

To solve the problem of gradient disappearance caused by ordinary recurrent neural networks when the input sequence is too long, the BiLSTM algorithm is jointly determined by 2 LSTM models in opposite directions; these models use bidirectional results to enhance access to information, forget nonessential information, and retain critical information. The LSTM network is composed of input data at time t \({X}_{t}\) , cell state \({C}_{t}\) , temporary cell state \(\widetilde{{C}_{t}}\) , hidden layer state \({h}_{t}\) , forgetting gate \({f}_{t}\) , memory gate \({i}_{t}\) , and output gate \({o}_{t}\) . By dynamically memorizing and forgetting information, a network transmits effective information and discards invalid information to solve the problem that RNNs cannot establish long-term associations 35 . Among them, forgetting, memory and output are controlled by the forget gate, memory gate and output gate, respectively, which are calculated from the hidden layer state at the last moment \({h}_{t-1}\) and the current input information \({X}_{t}\) . The specific computational flowchart of BiLSTM is shown in Fig.  2 .

figure 2

Flow of the BiLSTM algorithm.

During the training process, the model also adds a masking mechanism to the decoder. When using the transformer construct for training, the model compiles and converts the entire output results into feature vectors and inputs them, but the output of the decoder is expected to be obtained using the previous results to avoid future information from being used in advance. Thus, this paper adds the mask tensor to the BiLSTM input.

Experimental results and analysis

Datasets and configurations.

To verify the effectiveness of the proposed method, the AQI dataset of Patna city in India (6:00 am on October 3, 2015–0:00 pm on July 1, 2020) is used for prediction with a time interval of 1 h. The data were obtained from the Central Pollution Control Board, the official website of the Government of India ( https://airquality.cpcb.gov.in/AQI_India/ ). In this paper, the air quality index at T + 1 was predicted based on the air quality index of the previous T hour. This method involves single-step and single-variable prediction. The structure of the data is shown in Fig.  3 .

figure 3

The structure of the AQI data.

Since the missing values of the original data are not large compared to the total amount of data and the missing values are continuous for a long time, they are removed from this paper directly. The final dataset used in the experiments consisted of 30,898 samples. After that, this paper uses the sliding window method to construct time series samples, which serve to divide the input series into a training set and its labels by delaying \(\Delta t\) time units. Since it is a single-step prediction, if the input is 1–10, its labels are 2–11 to fit the transformer's seq2seq output form. Seventy-five percent of the data are included in the training sets; i.e., the training set contains 23,174 samples, and the test set contains 7724 samples. Let the sample size of the training set be \(n\) . Then, the total number of time series samples to be constructed is \(n - \Delta t + 1\) . Usually, the value of \(\Delta t\) affects the number of time series samples and the number of features in each sample, thus affecting the performance of the model; therefore, the choice of the value of \(\Delta t\) is important 22 . In this paper, to verify the application of the model to ultrashort-term air quality index prediction, experiments are conducted for \(\Delta t\) in the range of values from 1 to 5.

The experiments in this paper were conducted in Python (version 3.8) using CUDA 11.3 and the deep learning development framework PyTorch (version 1.11.0) to construct the network model. All the experiments were conducted on a remote PC equipped with a 15-core Intel processor and 80 GB of RAM; the specific environment configuration is shown in Table 1 .

Parameter setting

When preprocessing the data using EMD, the AQI sequence was decomposed into 12 IMFs. This indicates that the air quality data used in this study are complex and exhibit nonlinear and oscillatory behaviors, requiring 12 IMFs to effectively capture the structure of the data. The components obtained after EMD of the AQI data in this article are shown in Fig.  4 .

figure 4

The IMFs obtained after EMD.

To train the transformer-BiLSTM model for IMF component prediction, this paper sets adaptive moment estimation with decoupled weight decay (AdamW) as the optimizer. The AdamW optimizer decouples the weight decay and the learning rate based on the Adam optimizer, which has better generalization performance and a wider range of optimal hyperparameters than the latter. The specific settings are shown in Table 2 . For the BiLSTM model used for component sequence reconstruction, 20 hidden units are used in the experiments.

Evaluation metrics

After constructing the time series samples and initializing the model parameters, this paper applies the EMD-transformer-BiLSTM network to model the training set data for predicting the AQI in the next hour. Since the performances of most algorithms on training data are biased and often overfit, the results of this study are based on the test set. The root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are used to evaluate the predictive performance of the hybrid model. The specific definitions are shown in Eqs. ( 6 )–( 8 ):

In this paper, the training process for the prediction of the components is recorded, and a chart of the training loss and test loss is obtained. Some training processes for the data are shown in Fig.  5 , where the images are logarithmically transformed on the y-axis scale so that the difference between the two types of errors can be more clearly observed. For the third IMF, the training loss is almost consistently smaller than the test loss, which indicates that the model has converged and that no overfitting has occurred. For the later IMFs, the test loss is lower than the training loss in the later stages of training. The reason is likely that the original data were leaked when EMD was used, and the model was trained with a known test set; thus, the prediction effect on the test set is naturally better than that on the training set. By analyzing the ninth function, it is found that the model converges gradually more slowly, which is mainly because the earlier the IMF component is obtained, the simpler the trend is, and it is difficult to obtain better accuracy for prediction using a model with complex parameters instead.

figure 5

Model training process—loss for IMFs.

In addition, in the hybrid model proposed in this paper, as it involves decision-making steps for result assessment and model selection, the overall parameter complexity lies between \(n(IMFs)*O(linear) + O(bilstm)\) and \(n(IMFs)*O(transformer)+O(bilstm)\) . This paper estimates the computational complexity based on the number of parameters and finds that the overall parameter scale of the model ranges from 0.02 to 127.01 MB. The time complexity of neural networks is typically challenging to precisely compute using traditional analytical methods. For convolutional neural networks, one can consider the computational load of convolution and pooling operations and estimate their time complexity based on the network's hierarchical structure. However, for more complex neural network architectures such as recurrent neural networks or deep residual networks, their time complexity often requires assessment through experimentation or simulation. In our study, the training runtime on a computer configured with the specified parameters is approximately between 15 and 45 min, which is shorter than the prediction interval (1 h). This demonstrates that the model's training runtime does not exceed the data sampling interval, maintaining the practicality of the model.

To analyze the characteristics of the hybrid model proposed in this paper from three perspectives—the data preprocessing, decoder and encoder models—the comparison models chosen in this paper are ARIMA, RNN, LSTM, BiLSTM, and BiLSTM based on the wavelet transform 36 ; the transformer with the encoder as a linear layer 37 ; and the transformer with the encoder as an LSTM 38 and BiLSTM 39 . The latter two are essentially improved models based on RNNs with self-attention, which is the current research direction of many scholars. In the comparison model, the most classical time series model ARIMA \((p, d, q)\) is first added, which includes an autoregressive part of order \(p\) and a moving average part of order \(q\) , and a \(d\) -order difference is performed on the original sequence. The baseline transformer model is less effective when used for time series prediction tasks because its decoder accumulates errors; therefore, the comparison model in this paper removes the original decoder and leaves only one linear layer to output the final prediction results of the model, which is also more convenient for us to compare the performance of the target model from the perspective of control variables. The transformer-LSTM model turns the decoder part of the transformer model into an LSTM network with two hidden layers 40 .

The five models are used to predict the AQI series with time windows of 1, 2, 3, 4, and 5 h. The use of a time window ranging from 1 to 5 h is intended to meet the demand for ultrashort-term forecasting while considering data availability and forecasting objectives. For air quality data, we have sufficient historical data for modeling, and the monitoring frequency is relatively high, making shorter time windows more suitable. The final prediction results and best case for each model are obtained, as shown in Tables 3 and 4 .

A comparison of the results reveals that the EMD-transformer-BiLSTM model yields the best results in all the ultrashort-term series prediction experiments. In the experimental phase of this study, we conducted a minimum of three repeated experiments for each experimental set, and experimental validation was carried out across five different short-term time windows. Overall, our proposed model outperformed the other comparative models. The consistency and stability of the experimental results suggest that the experiments were not influenced by randomness. According to Table 4 , the EMD-transformer-BiLSTM has the lowest values for all three evaluation metrics, which indicates that this hybrid model has significant optimal properties. Among the best results of each model, the RMSE, MAE and MAPE of the EMD-transformer-BiLSTM model were 11.46%, 11.77% and 15.53% lower, respectively, than those of the second-ranked Wavelet-BiLSTM model. The traditional ARIMA model is well suited for time series data with seasonal trends. When applied to static forecasting of AQI data, it performed remarkably well, second only to the hybrid model proposed in this paper. In contrast, although RNNs can capture long-term dependencies in sequences, they did not realize their full potential in our experiments, resulting in performance inferior to that of ARIMA. The Naive (Persistence) model was also employed for comparative experiments. This model is particularly effective when the time series exhibits strong autocorrelation, making it highly suitable for air quality data prediction. The experimental results indicate that the Naive (Persistence) model achieved a prediction RMSE of approximately 7.5943, comparable to the performance of the ARIMA model. However, it still falls short of the hybrid model proposed in this paper. Figure  6 shows the best prediction results for all models across different time windows, except for the Naive (Persistence) model, which does not use a sliding time window and serves solely as a benchmark comparison model.

figure 6

Comparison of the final prediction results of each model.

Furthermore, by comparing the Transformer-BiLSTM and EMD-Transformer-BiLSTM models, it can be seen that the proposed improved EMD decomposition ensemble method based on a selective branching structure and neural network integration does indeed enhance the prediction accuracy. To validate the reliability of the results, these two models were tested in ten independent repeated experiments that were conducted randomly, and the Mann‒Whitney U test was used to compare the differences in the experimental outcomes. Nonparametric testing methods were utilized here due to the difficulty in determining the overall distribution of experimental result data, coupled with a relatively small sample size. The computed Mann‒Whitney U statistic is approximately 98, with a two-tailed test p value of approximately 0.00033, which is significantly less than 0.05. This indicates the rejection of the null hypothesis that the two sets of data are drawn from the same distribution. Hence, from a statistical testing perspective, the novel decomposition ensemble method proposed in this paper contributes to improving the prediction accuracy.

To demonstrate the reliability of the experimental results, this paper conducted a robustness test by varying the training set and test set data partition ratios. The results obtained are shown in Fig.  7 . The numerical fluctuations of the three metrics are relatively minor, indicating that the experimental results are quite robust.

figure 7

The results of robustness testing.

Data preprocessing

The wavelet-BiLSTM model has a poorer prediction ability for time series with large and frequent fluctuations, possibly because using the noise reduction processed data for training does not make the model too easy to overfit or fail to learn the law; rather, the model's prediction ability for mutated series decreases. The wavelet transform does not decompose the signal according to its characteristics but requires the prior selection of a suitable wavelet basis with the appropriate number of decomposition layers, and these subjective selection factors can affect the characteristics of the reconstructed sequence to some extent; however, EMD avoids this aspect, so this method has some advantages compared with the wavelet method 41 . Furthermore, for wavelet transformation, the choice of wavelet basis and other parameters can significantly impact the results for the input data 42 . However, given the substantial variations in air quality data across different regions, opting for EMD as a more objective preprocessing method can enhance the generalizability of the model.

Transformer architecture

The LSTM and BiLSTM models have better fitting effects than the Transformer (-Linear) model. BiLSTM has fewer fitting errors because it can take into account the information in both directions of the sequences. However, both models fit the tail sequences poorly, with some lag effects on the predicted values, and the overall values are larger than the true values, which is obviously the opposite of the results of the Transformer (-Linear) model. It is worth noting that the Transformer-LSTM model fits the tail sequences and mutated partial sequences better than the LSTM and BiLSTM models. The transformer is good at capturing long-term dependencies and has an outstanding ability to interact with information due to its ability to use global information; however, the self-attention mechanism using dot product multiplication has a disadvantage in extracting local information 43 , and it does not perform as well as LSTM in this regard. This leads to the fact that although the Transformer (-Linear) can predict the global sequence trend more accurately, the predicted values are not accurate, and there is a short-term lag effect.

Decoder section

The transformer-LSTM model does not fit the clustered effect sequences well enough and addresses the problems of the transformer (-Linear) model; however, there is a great improvement in this model, which confirms, to some extent, the effectiveness of introducing the LSTM model into the decoder part. For RNN-based LSTM and BiLSTM networks, the outputs of their neurons can act directly on themselves in the next period. This sequential traversal makes the LSTM network very good at extracting local information. However, this sequential model makes it difficult to control the mutated sequences well and leads to poor fitting of the sequences at the tail of the test set. The best-case transformer-BiLSTM model shows encouraging performance, combining the advantages of transformers and long short-term memory (LSTM) networks to better identify clustered effect sequences and mutated sequences with no significant lag in overall prediction.

The final hybrid model

In contrast to the prediction results of the previous models, the EMD-transformer-BiLSTM model does not generally have low or high prediction results compared with the true values; rather, it has a slightly stronger prediction for periods of frequent fluctuations, i.e., the prediction results may be higher when the true values increase suddenly, and vice versa. In this paper, the proposed EMD-transformer-BiLSTM hybrid model uses EMD to decompose the input sequences into multiple sequences, which are predicted separately; finally, the prediction results are integrated using the model with BiLSTM. The transformer-BiLSTM model used for prediction embeds a BiLSTM that can capture the bidirectional time series information dependence in the decoder to overcome the local agnosticism of the transformer, and the model also has a better learning effect on the global contours of the sequences; thus, the model can balance the global and local contextual information well.

Extension analysis

The generalization ability of a model is important. If a model has good generalization performance, it tends to perform well on data that have not been previously reported. Moreover, a neural network with good generalization ability helps to improve network interpretation and leads to a more regular and reliable model architecture design 44 . In contrast, the proposed model performs better on the dataset for all cities with different air quality levels. Therefore, this paper obtained the air quality of each city in India, and the exponential distribution diagram is shown in Fig.  8 . Finally, four cities with very poor or very good air quality and four cities with moderate air quality are selected for prediction to determine whether the prediction model works well for cities with different air quality levels.

figure 8

Air quality distribution in India (Data from IQAir, Switzerland).

This paper chooses datasets with different data volumes for experiments to verify whether the data volume has a large impact on the performance of the model and to examine the robustness of the model. The hyperparameter settings of the model are consistent with those in Table 2 . The final selected city datasets and experimental results are summarized in Table 5 , where the cutoff time for all of these datasets is 2020-7-1 at 0:00:00.

The EMD-transformer-BiLSTM hybrid model proposed in this paper shows encouraging performance for the AQI prediction tasks in eight cities, including Thiruvananthapuram and Bengaluru, with the best application on the Delhi dataset and an RMSE, MAE and MAPE as low as 1.8013, 0.9523 and 0.58%, respectively. The model has the best prediction results for the cities of Delhi and Bengaluru, which also have the largest datasets, indicating that the amount of data still improves the performance of the model. The prediction results of the model on different datasets are shown in Fig.  9 .

figure 9

Results of the generalization performance verification experiment.

Conclusions

Hybrid model.

Accurate predictions of air quality can have an important impact on economic development, but the actual obtained time series data are often highly volatile, nonstationary and nonlinear. To predict the AQI more accurately, this paper proposes a hybrid model based on EMD, transformers and BiLSTM. The original data are first decomposed using EMD, after which the component sequences are positionally encoded separately. The vectors containing the data feature information are subsequently fed into the BiLSTM-based decoder via multihead self-attention and a residual network. A comparison is made with the prediction results obtained from a simple linear regression model, and the result with the smallest RMSE is chosen as the predicted value of each component. Finally, the BiLSTM model is used to reconstruct the predicted values of each component to obtain the final predicted AQI values.

Results analysis

According to the experimental results, the EMD-transformer-BiLSTM model achieves the best performance compared with other structurally similar models for air quality prediction. The RMSE, MAE and MAPE of the EMD-transformer-BiLSTM model were 11.46, 11.77 and 15.53% lower than those of the second-ranked models, respectively. The EMD method for data preprocessing can better reflect the original physical characteristics of the data, which in turn can handle nonlinear and nonstationary time series well. The pointwise dot product self-attention in the transformer is insensitive to local information, which makes the model prone to anomalies in the time series, while BiLSTM is good at capturing two-way time series information. LSTM has a disadvantage in parallel processing, while the multihead self-attention in transformers makes it superior in parallel computing; thus, the transformer-BiLSTM model combining the two methods can balance the global and local sequence information well.

Innovations

We use either complex or simple prediction methods to predict the decomposed components, taking into account the different characteristics of IMFs. In the signal reconstruction process, BiLSTM is used to predict the final result considering the possible correlation between the original data and all IMFs. Based on previous experience that the Transformer architecture does not perform as well as expected in long-term time series prediction tasks, this paper further investigates whether the architecture can be applied to ultrashort-term prediction tasks by improving the decoder.

Limitations

However, the study in this paper still does not address the limitation of data leakage when using the EMD method for prediction; one approach that can be used is to construct the sample using the sliding window method and perform an EMD process for each step forward in the prediction 45 . Moreover, although the study in this paper focused on univariate time series data, the proposed method is expected to be further extended to model spatiotemporal data that can utilize auxiliary data such as meteorological data and time-stamped data while considering spatiotemporal correlation, which is a direction for future exploration.

Data availability

The data used in this study were obtained from the Central Pollution Control Board of India ( https://airquality.cpcb.gov.in/AQI_India/ ), and the hourly data of the air quality indices were obtained from Patna, Thiruvananthapuram, Bengaluru, Visakhapatnam, Amritsar, Gurugram and Delhi from 16:00 pm on January 1, 2015, to 0:00 pm on July 1, 2020.

Bikkina, S. et al. Air quality in megacity Delhi affected by countryside biomass burning. Nat. Sustain. 2 (3), 200–205 (2019).

Article   Google Scholar  

Meng, E. et al. A robust method for non-stationary streamflow prediction based on improved EMD-SVM model. J. Hydrol. 568 , 462–478 (2019).

Article   ADS   Google Scholar  

Abedinia, O. et al. Improved EMD-based complex prediction model for wind power forecasting. IEEE Trans. Sustain. Energy 11 (4), 2790–2802 (2020).

Zeng, A., Chen, M., Zhang, L. et al. Are transformers effective for time series forecasting?. arXiv preprint arXiv:2205.13504 (2022).

Zhou, H., Zhang, S., Peng, J. et al. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35 11106–11115 (2021).

Polydoras, G. N., Anagnostopoulos, J. S. & Bergeles, G. C. Air quality predictions: dispersion model vs Box-Jenkins stochastic models. An implementation and comparison for Athens, Greece. Appl. Therm. Eng. 18 (11), 1037–1048 (1998).

Article   CAS   Google Scholar  

Alsoltany, S. N. & Alnaqash, I. A. Estimating fuzzy linear regression model for air pollution predictions in Baghdad City. Al Nahrain J. Sci. 18 (2), 157–166 (2015).

Google Scholar  

Huang, C. et al. Statistical inference of dynamic conditional generalized Pareto distribution with weather and air quality factors. Mathematics 10 (9), 1433 (2022).

Donnelly, A., Misstear, B. & Broderick, B. Real time air quality forecasting using integrated parametric and non-parametric regression techniques. Atmos. Environ. 103 , 53–65 (2015).

Article   ADS   CAS   Google Scholar  

Castelli, M. et al. A machine learning approach to predict air quality in California. Complexity https://doi.org/10.1155/2020/8049504 (2020).

Mengash, H. A. et al. Smart cities-based improving atmospheric particulate matters prediction using chi-square feature selection methods by employing machine learning techniques. Appl. Artif. Intell. 36 (1), 2067647 (2022).

Zhan, H., Zhu, X. & Hu, J. A probabilistic forecasting approach for air quality spatio-temporal data based on kernel learning method. Appl. Soft Comput. 132 , 109858 (2023).

Neagu, C. D. et al. Air quality prediction using neuro-fuzzy tools. IFAC Proc. Vol. 34 (8), 229–235 (2001).

Corani, G. Air quality prediction in Milan: Feed-forward neural networks, pruned neural networks and lazy learning. Ecol. Model. 185 (2–4), 513–529 (2005).

Kim, M. H., Kim, Y. S., Sung. S. W. et al. Data-driven prediction model of indoor air quality by the preprocessed recurrent neural networks. In 2009 ICCAS-SICE 1688–1692 (IEEE, 2009).

Mellit, A., Pavan, A. M. & Benghanem, M. Least squares support vector machine for short-term prediction of meteorological time series. Theor. Appl. Climatol. 111 (1), 297–307 (2013).

Singh, K. P. et al. Linear and nonlinear modeling approaches for urban air quality prediction. Sci. Total Environ. 426 , 244–255 (2012).

Article   ADS   CAS   PubMed   Google Scholar  

Li, X. et al. Deep learning architecture for air quality predictions. Environ. Sci. Pollut. Res. 23 (22), 22408–22417 (2016).

Yi X, Zhang J, Wang Z, et al. Deep distributed fusion network for air quality prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 965–973 (2018).

Li, X. et al. Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environ. Pollut. 231 , 997–1004 (2017).

Article   CAS   PubMed   Google Scholar  

Wen, C. et al. A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Sci. Total Environ. 654 , 1091–1099 (2019).

Ma, J. et al. Air quality prediction at new stations using spatially transferred bi-directional long short-term memory network. Sci. Total Environ. 705 , 135771 (2020).

Li, S. et al. Urban PM 2.5 concentration prediction via attention-based CNN–LSTM. Appl. Sci. 10 (6), 1953 (2020).

Zhang, L. et al. Air quality predictions with a semi-supervised bidirectional LSTM neural network. Atmos. Pollut. Res. 12 (1), 328–339 (2021).

Jin, X. B. et al. Deep hybrid model based on EMD with classification by frequency characteristics for long-term air quality prediction. Mathematics 8 (2), 214 (2020).

Song, C. & Fu, X. Research on different weight combination in air quality forecasting models. J. Clean. Prod. 261 , 121169 (2020).

Wang, Z. et al. Daily PM 2.5 and PM 10 forecasting using linear and nonlinear modeling framework based on robust local mean decomposition and moving window ensemble strategy. Appl. Soft Comput. 114 , 108110 (2022).

Wang, Z. et al. A new perspective on air quality index time series forecasting: A ternary interval decomposition ensemble learning paradigm. Technol. Forecast. Soc. Change 191 , 122504 (2023).

Cai, P., Zhang, C. & Chai, J. Forecasting hourly PM 2.5 concentrations based on decomposition-ensemble-reconstruction framework incorporating deep learning algorithms. Data Sci. Manag. 6 (1), 46–54 (2023).

Meng, Z., Xie, Y. & Sun, J. Short-term load forecasting using neural attention model based on EMD. Electr. Eng. 104 (3), 1857–1866 (2022).

Zhang, Y. et al. Accurate prediction of water quality in urban drainage network with integrated EMD-LSTM model. J. Clean. Prod. 354 , 131724 (2022).

Jiang, B., Liu, Y., Xie, H. Super short-term wind speed prediction based on CEEMD decomposition and BILSTM-Transformer model. In 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA) 876–882 (IEEE, 2023).

Qiu, X. et al. Empirical mode decomposition based ensemble deep learning for load demand time series forecasting. Appl. Soft Comput. 54 , 246–255 (2017).

Vaswani, A., Shazeer, N., Parmar, N. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 (8), 1735–1780 (1997).

Li, S., Jin, X., Xuan, Y. et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst. 32 (2019).

Liang, X. et al. LSTM with wavelet transform based data preprocessing for stock price prediction. Math. Probl. Eng. https://doi.org/10.1155/2019/1340174 (2019).

Mohammadi Farsani, R. & Pazouki, E. A transformer self-attention model for time series forecasting. J. Electr. Comput. Eng. Innov. JECEI 9 (1), 1–10 (2020).

Zeyer, A., Bahar, P., Irie, K. et al. A comparison of transformer and lstm encoder decoder models for asr. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 8–15 (IEEE, 2019).

Yan, Q. et al. An improved feature-time Transformer encoder-Bi-LSTM for short-term forecasting of user-level integrated energy loads. Energy Build. 297 , 113396 (2023).

Rhif, M. et al. Wavelet transform application for/in non-stationary time-series analysis: A review. Appl. Sci. 9 (7), 1345 (2019).

Yu, C. et al. Matrix-based wavelet transformation embedded in recurrent neural networks for wind speed prediction. Appl. Energy 324 , 119692 (2022).

Huang, Z., Xu, P., Liang, D. et al. TRANS-BLSTM: Transformer with bidirectional LSTM for language understanding. arXiv preprint arXiv:2003.07000 (2020).

Zhang, C. et al. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64 (3), 107–115 (2021).

Qian, Z. et al. A review and discussion of decomposition-based hybrid models for wind energy forecasting applications. Appl. Energy 235 , 939–953 (2019).

Download references

Hubei Provincial Social Science Foundation Early Funding Project “Research on the Value Co-creation Mechanism and Realization Path of Hubei New Energy Vehicle Green Supply Chain from the Perspective of Configuration” (23ZD148); Hubei University of Automotive technology Doctoral Research Start-up Fund “Research on the Long-term Mechanism and Path of Green and High-quality Development of the Yangtze River Economic Belt from the Perspective of Digital Empowerment” (BK202010).

Author information

Authors and affiliations.

Fudan University, Shanghai, 200433, China

Zhongnan University of Economics and Law, Wuhan, 430073, China

Jie Dong & Yaoli Zhang

School of Economics and Management, Hubei University of Automotive Technology, Shiyan, 442002, China

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, J.D.; methodology, J.D.; formal analysis, J.H.; data curation, Y.Z.; supervision, Y.Z.; writing—original draft preparation, J.D.; writing—review and editing, J.D. and J.H. All the authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jiang Hu .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Dong, J., Zhang, Y. & Hu, J. Short-term air quality prediction based on EMD-transformer-BiLSTM. Sci Rep 14 , 20513 (2024). https://doi.org/10.1038/s41598-024-67626-1

Download citation

Received : 22 February 2024

Accepted : 15 July 2024

Published : 03 September 2024

DOI : https://doi.org/10.1038/s41598-024-67626-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Hourly forecast
  • Air quality index
  • Transformer

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

short term papers

Employer-Based Short-Term Savings Accounts

We study the introduction of a choice architecture design intended to increase short-term savings among employees at five U.K. firms. Employees were offered the opportunity to opt into a payroll deduction program that auto-deposits funds from each paycheck into a short-term savings account from which withdrawals are possible at any time. We find that employees who opted into the program kept using it. Among employees whose accounts were created early enough to be observed over the first 12 months after their account activation and who did not separate from employment during this period, 96% still had a balance greater than £1 and 87% received an automatic payroll contribution in month 12. However, product take-up was very low: no more than 0.7% of eligible employees ever activated an account. Opt-in access to short-term savings programs does not elicit widespread participation.

Contributions: conceptualized paper (SB, JB, JC, DL), wrote paper (SB, JB), edited paper (JC, DL), implemented empirical analysis (SB, JG), project leadership (SB, JB). The research reported herein was convened by Nest Insight, a public-benefit research and innovation center which is part of Nest Corporation, a UK public corporation. This research was reviewed by the Harvard Institutional Review Board and determined to be exempt human subjects research. The findings and conclusions expressed are solely those of the authors and do not represent the views of Harvard, Yale, NBER, Nest, the participating employers, or the implementing partners. The authors gratefully acknowledge Annick Kuipers, Emma Stockdale, and Bradley Bain for tireless project management and for sharing data; Jo Phillips for project leadership; the entire Nest Insight team for convening the trial; Shreenidhi Subramanian for sharing data; Santiago Medina Pizarro, Adrian Baxter, and Peter Smith for helpful comments and support; Jessica Brooks for excellent research assistance; the BlackRock Foundation and the Money and Pensions Service (UK) for their support of the Nest Insight program, which provided the funding for our research; and the Pershing Square Fund for Research on the Foundations of Human Behavior at Harvard University for financial support. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

I have received additional grant support from the National Institutes of Health, the Social Security Administration, and the Smith Richardson Foundation.

I have received research data from Alight Solutions, the Commonwealth Bank of Australia, and Voya Financial.

I was an advisor to and equity holder in Nutmeg Saving and Investment, a robo-advice asset management company. I am a TIAA Institute Fellow.

This link contains a chronological list of all organizations from which I have received honoraria or payments since 2007.

https://scholar.harvard.edu/laibson/outside_activities

MARC RIS BibTeΧ

Download Citation Data

  • data appendix
  • January 17, 2024

Working Groups

More from nber.

In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship  — as well as online conference reports , video lectures , and interviews .

2024, 16th Annual Feldstein Lecture, Cecilia E. Rouse," Lessons for Economists from the Pandemic" cover slide

A Novel Long Short-Term Memory Learning Strategy for Object Tracking

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, index terms.

Computing methodologies

Artificial intelligence

Computer vision

Computer vision problems

Object detection

Computer vision tasks

Activity recognition and understanding

Scene understanding

Machine learning

Machine learning approaches

Neural networks

Recommendations

Adaptive correlation filters with long-term and short-term memory for object tracking.

Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive ...

Long-term object tracking based on joint tracking and detection strategy with Siamese network

In computer vision, Siamese network-based visual object tracking algorithms employ a strategy of detecting the target in the vicinity of the previous tracking result, effectively avoiding redundant computations. Nonetheless, the efficacy of such ...

A stable long-term object tracking method with re-detection strategy

  • We introduce a long-term tracking strategy that is composed of a CA-CF tracker and a SVM-based re-detector.

In this work, we proposed a long-term tracking strategy to deal with the occlusion, out-of-plane rotation, and the confusing non-target object. Our tracking system is composed of two parts, the CA-CF tracker, an efficient correlation ...

Information

Published in.

John Wiley and Sons Ltd.

United Kingdom

Publication History

  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

IMAGES

  1. How to Write a Term Paper: a Beginner's Guide

    short term papers

  2. How to Write a Term Paper: a Beginner's Guide

    short term papers

  3. Research Paper Format

    short term papers

  4. How to Write a Term Paper: a Beginner's Guide

    short term papers

  5. Term Paper Outline For High School

    short term papers

  6. Short term commercial paper definition

    short term papers

VIDEO

  1. CS201 Final term Preparation Solved Past Papers

  2. MGT501 Final term Preparation Solved Past Papers

  3. MGT610 Final term Preparation Solved Past Papers

  4. ENG501 Final term Preparation Solved Past Papers

  5. STA630 Final term Preparation Solved Past Papers

  6. IT430 Final term Preparation Solved Past Papers

COMMENTS

  1. Short-Term Paper: What It is, How It Works

    Short-Term Paper: Financial instruments typically with original maturities of less than nine months. Short-term paper is typically issued at a discount and provides a low-risk investment ...

  2. Commercial Paper: Definition, Advantages, and Example

    Commercial paper is a form of unsecured, short-term debt. It's commonly issued by companies to finance their payrolls, payables, inventories, and other short-term liabilities.

  3. Short-Term Papers: Types, Examples, and Financial Insights

    Short-term papers, commonly known as Treasury bills, are debt securities with maturities typically ranging from a few days to one year. Issued by governments and corporations, these financial instruments serve as a low-risk investment option due to their short duration and guaranteed returns. Investors often turn to short-term papers for ...

  4. Commercial Paper: A Comprehensive Guide to Short-Term Financing

    An asset-backed commercial paper (ABCP) stands as a short-term investment mechanism characterized by a maturity date typically ranging from 90 to 270 days. Issued predominantly by financial institutions or banks, these securities exhibit a distinct feature: their backing is derived from tangible assets owned by the issuing company, often ...

  5. Commercial Paper

    Commercial paper refers to a short-term, unsecured debt obligation that is issued by financial institutions and large corporations as an alternative to costlier methods of funding. It is a money market instrument that generally comes with a maturity of up to 270 days. Commercial paper is sold at a discount to its face value to compensate the ...

  6. An Introduction to Commercial Paper

    Commercial paper—a type of interest collecting promissory note—is a short-term instrument that can be an alternative for retail fixed-income investors looking for a better rate of return.

  7. Commercial Paper

    Commercial paper is an unsecured promissory note issued by banks and companies for short-term funding needs. Typically issued at a discount, it matures between a few days to a few months, attracting low-risk, short-term investors. It offers a cost-effective source of short-term capital, flexibility in borrowing amount and maturity period, quick ...

  8. How to Write a Term Paper in 5 Steps

    1 Developing ideas. The first step of writing a term paper is brainstorming to come up with potential topics and then selecting the best one. Sometimes your topics are assigned, but often you'll have to choose one yourself. In addition to picking a topic that you're personally interested in, try to settle on one that has sufficient depth.

  9. Term Paper

    However, a typical term paper usually consists of the following sections: Title page: This should include the title of your paper, your name, the course name and number, your instructor's name, and the date. Abstract: This is a brief summary of your paper, usually no more than 250 words. It should provide an overview of your topic, the ...

  10. How to Write a Good Term Paper (Updated for 2021)

    2. Gather Research on Your Topics. The foundation of a good term paper is research. Before you start writing your term paper, you need to do some preliminary research. Take your topics with you to the library or the Internet, and start gathering research on all of the topics you're interested in.

  11. How to Write a Term Paper: Step-by-Step Guide With Examples

    4. Write your abstract. Because the abstract is a summary of your entire paper, it's usually best to write it after you complete your first draft. Typically, an abstract is only 150-250 words, so focus on highlighting the key elements of your term paper like your thesis, main supporting evidence, and findings.

  12. How to Write a Term Paper From Start to Finish

    Here's how to refine your work efficiently: Take a Break: Step back and return to your paper with a fresh perspective. Structure Check: Ensure your paper flows logically and transitions smoothly from the introduction to the conclusion. Clarity and Conciseness: Trim excess words for clarity and precision.

  13. Write the Perfect Term Paper

    Creating an outline for your term paper is an important step in the writing process. An outline will help you organize your thoughts and ensure that you stay on track. Your outline should include an introduction, body paragraphs, and a conclusion. You should also include a list of the sources you plan to use.

  14. How to Write a Term Paper Step-by-step Guide with Examples

    Body Paragraphs. As a rule, in writing college term papers, one must write down several subheadings and headings to divide ideas and arguments into several (at least four) paragraphs. As done below, each body paragraph should contain one idea and a strong topic sentence. Heading 1: History of the argument and background.

  15. Short-term paper Definition & Meaning

    The meaning of SHORT-TERM PAPER is a negotiable paper (such as a note or bill) that matures within a three to six months period.

  16. Term Paper vs Research Paper: What's the Difference?

    A term paper is longer than a typical essay, but it won't be as voluminous as a research paper. In fact, term papers hardly ever go beyond 20 pages, and the shortest ones that Help for Assessment writers have worked on are as short as 1,000 words.

  17. Short-term stock market price trend prediction using a comprehensive

    The strength of this paper is that the authors exploited a novel model with a hybrid model constructed by different kinds of neural networks, it provides an inspiration for constructing hybrid neural network structures. ... The Long Short-term Memory is different from other NNs, and it is a variant of standard RNN, which also has time steps ...

  18. Short-term Commercial Paper

    Short-term Commercial Papers (STCPs) are securities issued by SEC-Registered Philippine corporations typically to fund short-term financial obligations such as payroll, purchase of inventory, among others. Tenors of STCPs vary can range from 30 days to 1 year. Additional Features: See other Fixed Income Securities:

  19. Tax-Exempt Commerical Paper: What It is, How it Works

    Tax-exempt commercial paper is short-term unsecured debt where the bondholder does not pay federal, state, or local taxes on the interest payments. Tax-exempt commercial paper is issued with a ...

  20. Papers with Code

    7. Paper. Code. **Time Series Forecasting** is the task of fitting a model to historical, time-stamped data in order to predict future values. Traditional approaches include moving average, exponential smoothing, and ARIMA, though models as various as RNNs, Transformers, or XGBoost can also be applied. The most popular benchmark is the ETTh1 ...

  21. Preparing and Filing Your Short-Term Disability Claim

    Step 1 - Get the Short-Term Disability Claim Form. If you get your short-term disability insurance from your employer, ask your HR department for a copy of the form you need to file a claim for short-term disability benefits. Claim forms might also be available online from the state department that handles the SDI or TDI program (in the ...

  22. The Mind and Brain of Short-Term Memory

    Abstract. The past 10 years have brought near-revolutionary changes in psychological theories about short-term memory, with similarly great advances in the neurosciences. Here, we critically examine the major psychological theories (the "mind") of short-term memory and how they relate to evidence about underlying brain mechanisms.

  23. Short-Term Electricity-Load Forecasting by Deep Learning: A

    Short-Term Electricity-Load Forecasting (STELF) refers to the prediction of the immediate demand (in the next few hours to several days) for the power system. Various external factors, such as weather changes and the emergence of new electricity consumption scenarios, can impact electricity demand, causing load data to fluctuate and become non-linear, which increases the complexity and ...

  24. Short-term air quality prediction based on EMD-transformer-BiLSTM

    This paper proposes an air quality prediction method based on empirical mode decomposition (EMD), a transformer and a bidirectional long short-term memory neural network (BiLSTM), which is good at ...

  25. Short-term PV-Wind forecasting of large-scale regional site clusters

    Therefore, this paper proposes an innovative short-term power forecasting model for regional site clusters based on variable selection, FCM clustering algorithm, improved GWO algorithm and hybrid Inception-ResNet deep neural network embedded with Informer which achieves a tradeoff between accuracy and efficiency. The main contributions of this ...

  26. Employer-Based Short-Term Savings Accounts

    DOI 10.3386/w32074. Issue Date January 2024. Revision Date August 2024. We study the introduction of a choice architecture design intended to increase short-term savings among employees at five U.K. firms. Employees were offered the opportunity to opt into a payroll deduction program that auto-deposits funds from each paycheck into a short-term ...

  27. A Novel Long Short-Term Memory Learning Strategy for Object Tracking

    In this paper, a novel integrated long short-term memory (LSTM) network and dynamic update model are proposed for long-term object tracking in video images. The LSTM network tracking method is introduced to improve the effect of tracking failure caused by target occlusion. Stable tracking of the target is achieved using the LSTM method to ...