As a thought experiment, consider the thoughts of a teen vs a working professional adult. More specifically, focus on all the thoughts that are directly converted into actions. Essentially, the reasoning one performs in response to ‘What should I do next?’ Though highly variable, one pattern that I have observed is the major difference in the timeframe and existing context of the thought. Assuming teens presently have their basic survival needs satisfied, their thoughts and actions are easily and continuously influenced and directed by the variety of sensory inputs they receive. And the response provided is highly reactive in nature as well. Though this sounds highly adaptive in nature, adults undergo the opposite process and proceed with stable survival in the real world. Why is that? Why can’t the former remain the method of survival throughout their lives?
Obviously, adults have higher responsibilities and focus more on basic means of survival, irrespective of the path they choose, among other reasons. But one aspect doesn’t seem to add up! A certain directed action requires a complex interaction of a variety of thoughts. Teens lack history and experiences, hence the lack of expectation of the little nuances of real-world execution. But the facts are available to both parties. With the rise of the internet and AI, the requirement of the ‘history’ aspect got erased away, and the real world experience difference exists, directly proportional to time, which neither can control.
Current AI models and chatbots, majorly using transformers and MLPs, seem much like the teen, very eager to predict the next highly probable token, only using the limited context window of facts available. Even the training is based on minimising the Mean Squared Error (MSE), basically rewarding ‘right’ and punishing ‘wrong’. A binary system! And doing this repeatedly on a predefined dataset for a long time until the global loss curve stabilises and reaches a minimum. Hence, the model itself only turns out to be as good as the dataset. Ready to automate repetitive tasks and calculate the highest quality outputs. Seems very similar to millions of students undergoing the education system.
But consider the adults! They may have undergone the same training during their childhood, yet the modern world’s requirements are fundamentally different and subtle. Rather than produce a mass-scaled output similar to the average of the dataset, they must continuously consciously choose the complex mix of context to hold on to, based on the combination of facts and experiences for the desired output. Facts may be absolute, yet experiences are relative. And since the communication barrier to reach the other end of the earth, even physically, exist no longer, operation within this world requires way more fragements of condensed context. Hence, they “reinforce” themselves with “dynamic weights” in response to the “wide environment” and “continuously adapt” and condense the ever-wider and complex context. Imagine this as the second derivative of storing facts.
Hence, reinforcement learning is gaining ever more popularity within the ML world. The human brain operates at an even higher derivative. It can create, dissolve and populate the weights themselves according to the immediate context requirement. Basically, it does not utilise context; it generates it! In this sense, the context used for the output becomes infinite. And adults ‘generalise’ across the various requirements they meet. Hence, the context in this case is not, actually not even related, to facts. One can master oneself to produce and dissolve knowledge at will. It is what makes humans … humans.
What does this mean for teens? Are they doomed in their course of survival due to a lack of experience? Nope, rather, they find ways out of the rapid fire of facts upon them to master their own context. And this becomes their lifetime experience and forms a strong positive loop.
Hence, education, in my opinion, as I realise, was never meant to learn the ‘right’ thing, but rather learning to learning to learn the right thing … and the wrong thing. If that makes sense lol. And this third derivative process will always be initiated by the system, undergoing a way higher derivative, YOU!