Meta’s LLM Fumbles at Summarizing Research Papers

IndustryTrends

Meta’s large language model, Open Pre-trained Transformer (OPT-175B ) was trained on around 175 billion parameters, to match up to the accuracy of GPT-3, despite using only 15% of the data the latter was trained with.

Its self-attention mechanism encodes a sequence to one fixed-length vector is responsible for OPT misreading the text, as it finds it difficult to decode long sequences.

The training data includes only 880 GB of English textual data collected from academic and professional sources.

Given that retraining a large model like OPT is impractical and difficult, smaller models finetuned for specific tasks can help researchers.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

                                                                                                       _____________                                             

Disclaimer: Analytics Insight does not provide financial advice or guidance on cryptocurrencies and stocks. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. This article is provided for informational purposes and does not constitute investment advice. You are responsible for conducting your own research (DYOR) before making any investments. Read more about the financial risks involved here.

Read More Stories