A Brief Discussion on Epistemology and the Philosophy of Science

Published by

on

This article serves as the foreword to my doctoral thesis, in which I address several philosophical questions that have influenced my research journey. Although this segment doesn’t directly engage with my main research questions, it includes reflections on how I envision my work contributing to broader academic discussions.

Epistemology and the Philosophy of Science

During my three years of MPhil and DPhil studies, I have grappled with a central question: Should I focus on prediction or explanation? In the era of computational social science, the influx of vast amounts of data and the continual enhancement of computational capabilities have enabled the integration of sophisticated machine learning models into social science research (Edelmann et al. 2020; Lazer et al. 2020). While results from large-scale prediction competitions like the Fragile Families Challenge suggest that advanced machine learning models may have predictive power comparable to simpler theory-based regression models (Salganik et al. 2020; Salganik et al. 2019), studies in social epidemiology have demonstrated that more complex models, such as neural networks, significantly outperform traditional theory-based approaches in predicting health outcomes based on social factors (Kino et al. 2021; Kreatsoulas and Subramanian 2018; Seligman, Tuljapurkar and Rehkopf 2018; Zhao et al. 2021). Some computational social scientists believe that data-driven social science, which focuses on R², may prove more useful than theory-driven approaches that aim to uncover mechanisms and emphasize p-values (Breiman 2001). However, data-driven models are often criticized as “black boxes” that lack sociological insight. The foundational philosophy of prediction models, rooted in engineering, prioritizes optimization over explanation, which, arguably, should be the scientist’s focus. Science should aid in understanding the natural world, not merely its predictive aspects. Historically, an excessive focus on predictive accuracy has sometimes hindered understanding of true mechanisms. For example, Ptolemy’s geocentric model, which placed Earth at the universe’s centre and assumed that other planets moved in perfect circles around it, was a data-driven model optimized with extensive observational data into a complex system of forty to sixty circles (Jones 2006; Murschel 1995). Despite its impressive predictive accuracy—erring by only around ten days for 1,500 years—Ptolemy’s model was eventually superseded by the heliocentric model proposed by Copernicus, which, despite initially poorer predictive power due to lack of training data, provided a more accurate understanding of our solar system’s structure (Gingerich 1993; Kuhn 1997). 

Geocentric model based on Aristotle and Ptolemy. Source: https://astronomy.stackexchange.com/questions/38927/was-the-geocentric-model-correct-at-all

Thus, while data-driven prediction models may offer superior forecasts, they do not necessarily enhance our understanding of the world’s mechanisms. A sole focus on prediction can be problematic.

On the other hand, scepticism towards explanations is warranted. Firstly, criticisms from social data scientists regarding explanations are justifiable: often, explanatory models may lack predictive power (Seligman et al. 2018). Furthermore, the characteristics of non-linear complex systems suggest that even comprehensive understanding of a social system’s mechanisms can be undermined by minor initial measurement errors, compromising long-term accuracy in predictions. It is worthwhile to ponder the meaningfulness of explanatory models. Moreover, I am sceptical about our ability to distinguish between objective explanations and subjective narratives. Since the dawn of civilization, humans have been captivated by stories and connecting unnecessary dots[1]. The degree to which humans can perceive the world without framing it into narratives remains questionable. Psychological studies suggest that our brains may be biologically inclined to structure information into narratives. For instance, research on split-brain patients—individuals whose left and right cerebral hemispheres are disconnected—reveals that when a stimulus is presented exclusively to one hemisphere (typically the non-verbal right), the verbal left hemisphere, responsible for rationalization, will fabricate plausible explanations for observed actions, despite having no access to the actual stimuli (Gazzaniga 1967, 1998). Meanwhile, more recent findings indicate that elevated dopamine levels might enhance the perception of meaningful patterns but can also lead to unusual cognitive phenomena such as paranormal beliefs by interpreting random noise as significant patterns (Krummenacher et al. 2010). These suggest that explanations based on causality could be subjective, rooted in biological predispositions. As David Hume imply, to a large extent, human beings’ beliefs in causation are not derived from reason or any logical deduction from observing the world (Goertz et al. 2012). It is possible that when we are explaining things, we are simply creating narratives with noisy data solely for the sake of self-satisfaction.

If neither prediction nor explanation are perfectly solid, the only remaining option for human beings may be description, which is common in the field of demography. Accurate and up-to-date descriptions of social phenomena can provide enormous new information. One example would be life expectancy studies during the COVID-19 pandemic. High-quality descriptions of changes in life expectancy (Aburto et al. 2021; Schöley et al. 2022) and daily updates on mortality data (Centers for Disease Control and Prevention 2024) could provide valuable information for policymakers and the public, allowing for decisions that extend beyond their local knowledge. However, it is my contention that the principal limitation of descriptive research is its relatively modest intellectual stimulation, from the standpoint of a social scientist. An alternative approach that could potentially yield both rigorous knowledge and greater intellectual engagement is the method of falsification. If humans are truly obsessed with creating narratives from randomness, there could be many false positive explanations existing. Therefore, there is an urgent need to test existing explanations with different data and research designs. However, one should always note that falsification is not straightforward. As the Quine-Duhem thesis suggests, one can never test a hypothesis in isolation; testing one hypothesis inevitably involves multiple interconnected hypotheses (Harding 2012). Moreover, there is always a tendency to explain away falsification evidence by abandoning some instrumental hypotheses instead of the core one[2]. Nevertheless, falsification always requires less information than verification and may be able to bring more insight than verification evidence. An insightful experiment by P.C. Wason, mentioned in Daniel Kahneman’s famous book Thinking, Fast and Slow, serves as a great example of why falsification evidence is more useful than verification evidence. The experiment involved participants guessing a rule behind the sequence “2, 4, 6.” They proposed sequences, and the experimenter confirmed or denied them based on the rule. Most failed to find the rule of “ascending order” because they only tested examples supporting their assumptions and would find the “imaginary” rule they had concocted (Daniel 2017; Wason 1960). In this scenario, increasing verification evidence of an “imaginary” rule would only confuse our understanding of the true mechanism, whereas a single piece of falsification evidence could dramatically change the situation.


[1] The narrative of the Seven Sisters serves as a compelling illustration. The Pleiades, also known as the Seven Sisters, constitutes an open stellar cluster of stars. Its presence is ubiquitous in myths spanning various cultures, such as Chinese, Greek, and Aboriginal Australian mythology, each featuring a similar storyline. Notably, the emphasis on seven sisters presents a curiosity, given that contemporary observers typically discern only six stars within the Pleiades. Recent astronomical investigations posit that the consistent depiction of seven stars across diverse mythologies, alongside the shared narrative motifs, may stem from the antiquity of the Seven Sisters story, potentially originating around 100,000 BC. At that epoch, historically, seven stars were observable to a majority of humans on Earth. Over the course of the subsequent millennia, the celestial motion of Pleione has rendered only six stars discernible to contemporary observers. This inference suggests that the genesis of the mythological narrative may be traced to the earliest human communities gathered around campfires, gazing skyward some 100,000 years ago.

[2] For instance, one could always blame the omitted variable bias in a regression-based study.

Leave a comment