I am a data scientist (6 years of experience, 5 years in RWE) and a researcher in ageing and age-related diseases (13 years). I hold a PhD and have designed 13 and executed 26 data science/epidemiological projects, and authored 16+ research papers. I have led teams including a small group of 2 data scientists, supervised MSc dissertations for 4 students who earned excellent grades, and collaborated with researchers across multiple teams. My technical expertise spans classification and regression, survival and time-series analyses, and deep learning (CNN, GAN, RNN).

Evgeniy R. Galimov

PRO

I am a data scientist (6 years of experience, 5 years in RWE) and a researcher in ageing and age-related diseases (13 years). I hold a PhD and have designed 13 and executed 26 data science/epidemiological projects, and authored 16+ research papers. I have led teams including a small group of 2 data scientists, supervised MSc dissertations for 4 students who earned excellent grades, and collaborated with researchers across multiple teams. My technical expertise spans classification and regression, survival and time-series analyses, and deep learning (CNN, GAN, RNN).

Available to hire

I am a data scientist (6 years of experience, 5 years in RWE) and a researcher in ageing and age-related diseases (13 years). I hold a PhD and have designed 13 and executed 26 data science/epidemiological projects, and authored 16+ research papers.

I have led teams including a small group of 2 data scientists, supervised MSc dissertations for 4 students who earned excellent grades, and collaborated with researchers across multiple teams. My technical expertise spans classification and regression, survival and time-series analyses, and deep learning (CNN, GAN, RNN).

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
See more

Language

English
Fluent
Russian
Fluent

Work Experience

Principal Data Scientist at Phastar
October 1, 2024 - Present
Led 7 Real-World Evidence (RWE) projects for AstraZeneca in cardiovascular, renal, and metabolic diseases, covering dashboard development, population estimation, and multi-source data preprocessing (CPRD, Optum Claims, PharMetrics Plus). Conducted classification modeling, survival analysis, and propensity score matching to support RWE studies. Developed an analytical-mapping tool to assist clinical-trial recruitment and built agentic AI systems with natural language interfaces to automate analytical workflows from clinical queries.
Data Scientist at Imperial College Health Partners
November 1, 2021 - September 30, 2024
Designed and delivered five data-analysis RWE projects involving regression, classification, survival and time-series analyses to improve management of CVD, bowel disease, and COVID-19 (projects for Janssen-Cilag, GSK, and Daiichi Sankyo). Led a two-person data-science team as analytical lead in The Networked Data Lab. Wrote a proposal for a data-driven CVD management project for Novartis and contributed to five other proposals and three grants. Managed projects addressing frailty in social care and forecasting adverse CVD events in hypercholesterolaemic patients.
Co-founder at Aptadeep
April 1, 2020 - October 1, 2021
Developed business plan, market analysis and product development for a machine-learning based aptamer discovery platform; published research on predicting C. elegans lifespan from morphology data using ML and convolutional networks; studied age-related changes in alternative splicing in various species through RNA-seq.
Cohort Member at Entrepreneur First
April 1, 2020 - October 1, 2021
Participant in the 14th Cohort focusing on entrepreneurship and early-stage venture development.
Researcher and Data Scientist (Posdoctoral Fellow) at Institute of Healthy Ageing, University College London
September 1, 2019 - February 29, 2020
Developed expertise in regular ML and DL; designed and implemented a novel model of C. elegans ageing evolution (published in Aging Cell); worked with real-world human data—blood biomarkers—to predict type 2 diabetes.
UK Researcher and Data Scientist (Posdoctoral Fellow) at UK Research and Innovation, Imperial College London
January 1, 2019 - August 31, 2019
Designed and performed two screens to characterize microbiota effects on proteostasis and development; wrote code for a robotised screening platform and flow cytometry analyses to perform the screens mentioned above.
Researcher (Posdoctoral Fellow) at Institute of Healthy Ageing, University College London
October 1, 2014 - December 31, 2018
Lead to seven high-impact publications largely as first author; demonstrated independently developed hypotheses, executed research plans, problem solving and prioritization.

Education

PhD in molecular biology at Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics
January 1, 2007 - January 1, 2012
Integrated masters (1st class degree, honors) in Bioengineering at Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics
January 1, 2002 - January 1, 2007

Qualifications

Udacity Nanodegree: Deep Learning
January 1, 2023 - December 31, 2023
Coursera: Practical Machine Learning
January 1, 2020 - December 31, 2020
Healthcare NLP and Spark NLP workshops
January 1, 2023 - December 31, 2023

Industry Experience

Healthcare, Life Sciences, Professional Services, Software & Internet
    paper Impact of being an unpaid carer on health conditions and healthcare access in North West London

    https://www.twine.net/signin

    Networked Data Lab: NDL North West London
    Impact of being an unpaid carer on health conditions and healthcare access in North West London
    Project Description
    The Network Data Lab (NDL, https://www.twine.net/signin is a pioneering collaborative network of analysts who use linked data, open analytics, and public and patient involvement to tackle the most pressing challenges in health and social care. The initiative is led by The Health Foundation working closely with five partner labs across the UK. The North West London Networked Data Lab (NWL NDL) is a partnership between Imperial College Health Partners (ICHP), North West London Health and Care Partnership, Imperial College’s School of Public Health, and the Institute of Global Health Innovation (IGHI).

    The overarching aim of the NDL is to improve health and care services, and reduce health inequalities in the UK, with the current project specifically aiming to understand the needs, health issues and pathways to services of unpaid carers.

    The aims of the study are to:

    Explore the demographic profiles of unpaid carers as well as their geographical distribution in North West London
    Estimate the effect of being a carer on health-related metrics and the risk of developing various long-term conditions
    Analyse how the COVID-19 pandemic affects access of unpaid carers to healthcare services
    To achieve these aims we extracted the healthcare data related to unpaid carers identified through a list of SNOMED codes in the Discover dataset. Discover data is the deidentified dataset which contains linked, coded primary care, secondary, acute, mental health, community health and social care records for over 2.5 million patients who live and are registered with a GP in North West London [4]. We also created a matched cohort based on gender, age, Index of Multiple Deprivation (IMD) and ethnicity to use as a control population for comparisons. The matched cohort contains professional carers, however, for brevity, in this study we refer to unpaid carers as carers and the matched population as non-carers.

    paper Modelling the demand of gastrointestinal endoscopic procedures in North West London

    https://www.twine.net/signin

    Project Description
    This repository contains the code for forecasting the the number of weekly endoscopies (and points reflecting the staff capacit required for endoscopy) for the next 5 yearshttps://www.twine.net/signin) based on the up to 10 years of historical data obtained from 6 providers in North West London. 3 different approaches were used for the modelling: Prophet, SARIMA and Exponential smoothing. The forecasts of best models from chosen sites/procedures were combined to predict the demand for each provider or the whole North West London.

    Data sources
    The data were received from the following providers: * Chelsea and Westminster NHS Foundation Trust * Imperial College Healthcare NHS Trust * London North West Healthcare NHS Trust * The Hillingdon Hospitals NHS Foundation Trust * Healthshare * NHS bowel cancer screening programme (BCSP)

    Data
    The data contain the date, procedure codes, procedure categories, patients numbers, points for each procedure for each of 5 datasets: referrals, rebookings, emergency, surveilance, removals.

    paper IPTW-adjusted Cox regression to assess effectivenes of Sotrovimab

    https://www.twine.net/signin

    The main project aim was to assess the risk of COVID-19-related hospitalisation and/or COVID-19-related death within 28 days of the observed/imputed treatment date between highest-risk patients treated and not treated with sotrovimab.

    Inverse probability of treatment weighting (IPTW) was used to balance baseline patient characteristics in the treated and untreated cohorts. IPTW based on propensity scores was used to adjust for measured confounders between the treated and untreated cohorts. Propensity scores (probability of treatment based on baseline covariates) were obtained using logistic regression or gradient boosting machine models. Propensity score models were used to predict the probability of treatment based on the following covariates: age, gender, time period of COVID-19 diagnosis (i.e., Omicron BA.1, BA.2 or BA.5, as defined above), presence of renal disease (binary), presence of multiple highest-risk conditions (≥2, binary), presence of high-risk conditions (binary), solid organ transplant (binary), COVID vaccination status (binary), time since vaccination, and ethnicity (see full list of variables and models in the publication). To obtain an appropriate estimation of the variance of the treatment effect and better control the type I error rate, inverse probability of treatment weights were stabilised. The balance in baseline characteristics between weighted treated and untreated groups was assessed using standardised differences.

    Cox proportional hazards models with stabilised weights were performed to assess the hazard ratio (HR) of COVID-19-related hospitalisation and/or COVID-19-related death. Covariates not balanced after weighting (standardised differences >0.1) were included in the Cox proportional hazards model. IPTWs and accordingly doubly robust estimation was performed separately for each Cox model.