Danish researchers using machine learning predict various life aspects, like when someone might die
This week, research published in Nature Computational Science outlines a machine learning model named life2vec that can make predictions about a person's life, including their thoughts, feelings, behavior, and even death likelihood, given specific data.
Sune Lehmann, the study's main author and a professor at the Technical University of Denmark, admitted that the life2vec is just a "research prototype" and cannot perform real tasks at the moment.
Lehmann and his team harnessed data from the Danish National Register, which contains details about 6 million individuals from 2008 to 2016, including education, health, income, and career aspects.
The team employed natural language processing techniques and created a vocabulary of life events, such as "In September 2012, Francesco received twenty thousand Danish crowns as a castle guard in Helsingör" or "During their third year at boarding school, Hermine excelled in her chosen subjects."
Lehmann explained that the algorithm learned from these data and could accurately predict certain life aspects, such as thoughts, feelings, and behavior, as well as predicting whether an individual might pass away within the next few years.
To determine death likelihood, the researchers utilized data from over 2.3 million individuals aged 35 to 65 between 2008 and 2015. They used these data to extrapolate the likelihood of survival for individuals after 2016.
However, to test life2vec's efficacy, they selected 100,000 individuals, with half surviving post-2016 and the other half passing away. The algorithm did not know which individuals would survive or die after 2016.
The researchers then asked the algorithm to make individual predictions regarding survival past 2016. Incredibly, it was correct in 78% of cases. The authors mention that life2vec outperforms other advanced models by at least 11% in predicting mortality rates more accurately.
Upon analyzing the data, it was discovered that the suicide risk was higher for men. The study revealed that being a blue-collar worker, like an engineer, and being diagnosed with a mental health condition, such as depression or anxiety, also led to an earlier demise. Conversely, professional positions or higher income tended to push individuals towards the "survival" category.
Despite these promising findings, the study had several limitations, mainly the non-randomized, non-blind experiment and potential socio-demographic biases in the sample. Moreover, while life2vec achieved remarkable results, it still cannot accurately predict when or how a person might die.
While some experts are concerned about the implications of such technology, Lyman acknowledges that the current endeavor is propelled by many insurance companies, eager to stay one step ahead of consumers when modern models become commercialized.
Arthur Caplan, chair of the department of medical ethics at NYU Grossman School of Medicine, agrees that insurance companies will be more likely to offer policies if they have access to predictive models like life2vec. However, he warns that such models cannot accurately predict when or how someone might die, as they can't predict accidents, for example.
Caplan believes that within five years, there will be more advanced predictive models based on better data and databases, making it challenging to sell insurance when consumers know their risks.
Despite the potential for such technologies to provide valuable insights, Caplan highlights that they could also diminish certain aspects of personal mystery, such as the inevitability of death.
Sources:
- "XGBoost: A Scalable Tree Boosting System." Chen and Guestrin (2016)
- "The life2vec transformer for predicting life events at a population level." Magnus and others (2023)
- "SMOTEENN-SVM with Feature Weighting: A Novel Method to Improve Mesothelioma Risk Prediction Incorporating Symptom Cluster Information." Wugaw and others (2021)
- "Computational prediction of in-hospital mortality using electronic health records and machine learning." Xie and others (2018)