From time to time, there are some posts on twitter that ChatGPT is not able to performance reasoning or inconsistent. A recent example involved the question, "If 23 shirts take 1 hour to dry outside, how long will it take for 44 shirts?" The provided answer was deemed incorrect, sparking a conversation about the AI's reasoning capabilities. references url dated 20 Dec 2023 :
https://twitter.com/abacaj/status/1737206667387850936
For more insight, refer to an article dated December 20, 2023, at this link. This instance underscores a genuine issue: ChatGPT's default responses sometimes miss the mark on first attempts. However, various techniques in prompt engineering can significantly enhance the likelihood of eliciting accurate responses. One such method is the Chain of Thought (COT) approach.
The seminal research paper titled "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," published in January 2022 and cited over 2000 times in scholarly articles, illustrates the efficacy of this technique, particularly in arithmetic reasoning. COT is defined as a sequence of intermediate, natural language reasoning steps culminating in a final output. This method has become a benchmark for evaluating complex reasoning in large language models (LLMs).
For users of the OpenAI platform, COT prompting is compatible with versions of ChatGPT, including 3.5 and 4. But for most users who are not AI experts, there's also a simpler strategy for crafting more robust answers by merely tweaking system prompts.
Taking the aforementioned shirt-drying example, an optimized system prompt to enhance reasoning might be: "Approach this question systematically and thoughtfully. Reflect on your reasoning using principles of common sense and logic." This guidance encourages a more structured and logical approach to problem-solving. The result below shows the differences.