Adam Lavertu
Profile Url: adam-lavertu
Researcher at Stanford University
Journal of Biomedical Informatics, 2019-10-16
Social media has been identified as a promising potential source of information for pharmacovigilance. The adoption of social media data has been hindered by the massive and noisy nature of the data. Initial attempts to use social media data have relied on exact text matches to drugs of interest, and therefore suffer from the gap between formal drug lexicons and the informal nature of social media. The Reddit comment archive represents an ideal corpus for bridging this gap. We trained a word embedding model, RedMed, to facilitate the identification and retrieval of health entities from Reddit data. We compare the performance of our model trained on a consumer-generated corpus against publicly available models trained on expert-generated corpora. Our automated classification pipeline achieves an accuracy of 0.88 and a specificity of >0.9 across four different term classes. Of all drug mentions, an average of 79% (±0.5%) were exact matches to a generic or trademark drug name, 14% (±0.5%) were misspellings, 6.4% (±0.3%) were synonyms, and 0.13% (±0.05%) were pill marks. We find that our system captures an additional 20% of mentions; these would have been missed by approaches that rely solely on exact string matches. We provide a lexicon of misspellings and synonyms for 2,978 drugs and a word embedding model trained on a health-oriented subset of Reddit.
The opioid epidemic persists in the United States; in 2019, annual drug overdose deaths increased by 4.6% to 70,980, including 50,042 opioid-related deaths. The widespread abuse of opioids across geographies and demographics and the rapidly changing dynamics of abuse require reliable and timely information to monitor and address the crisis. Social media platforms include petabytes of participant-generated data, some of which, offers a window into the relationship between individuals and their use of drugs. We assessed the utility of Reddit data for public health surveillance, with a focus on the opioid epidemic. We built a natural language processing pipeline to identify opioid-related comments and created a cohort of 1,689,039 geo-located Reddit users, each assigned to a city and state. We followed these users over a period of 10+ years and measured their opioid-related activity over time. We benchmarked the activity of this cohort against CDC overdose death rates for different drug classes and NFLIS drug report rates. Our Reddit-derived rates of opioid discussion strongly correlated with external benchmarks on the national, regional, and city level. During the period of our study, kratom emerged as an active discussion topic; we analyzed mentions of kratom to understand the dynamics of its use. We also examined changes in opioid discussions during the COVID-19 pandemic; in 2020, many opioid classes showed marked increases in discussion patterns. Our work suggests the complementary utility of social media as a part of public health surveillance activities.
The scale and speed of the COVID-19 pandemic has strained many parts of the national healthcare infrastructure, including communicable disease monitoring and prevention. Many local health departments now receive hundreds or thousands of COVID-19 case reports a day. Many arrive via faxed handwritten forms, often intermingled with other faxes sent to a general fax line, making it difficult to rapidly identify the highest priority cases for outreach and monitoring. We present an AI-based system capable of real-time identification and triage of handwritten faxed COVID-19 forms. The system relies on two models: one model to identify which received pages correspond to case report forms, and a second model to extract information from the set of identified case reports. We evaluated the system on a set of 1,224 faxes received by a local health department over a two-week period. For the 88% of faxes of sufficient quality, the system detects COVID-19 reports with high precision, 0.98, and high recall, 0.91. Among all received COVID-19 faxes, the system identifies high priority cases with a specificity of 0.87, a precision of 0.46 and recall of 0.83. Our system can be adapted to new forms, after a brief training period. Covid Fast Fax can support local health departments in their efforts to control the spread of COVID-19 and limit its impact on the community. The tool is freely available.
Genetics plays a key role in drug response, affecting efficacy and toxicity. Pharmacogenomics aims to understand how genetic variation influences drug response and develop clinical guidelines to aid clinicians in personalized treatment decisions informed by genetics. Although pharmacogenomics has not been broadly adopted into clinical practice, genetics influences treatment decisions regardless. Physicians adjust patient care based on observed response to medication, which may occur as a result of genetic variants harbored by the patient. Here we seek to understand the genetics of drug selection in statin therapy, a class of drugs widely used for high cholesterol treatment. Genetics are known to play an important role in statin efficacy and toxicity, leading to significant changes in patient outcome. We performed genome-wide association studies (GWAS) on statin selection among 59,198 participants in the UK Biobank and found that variants known to influence statin efficacy are significantly associated with statin selection. Specifically, we find that carriers of variants in APOE and LPA that are known to decrease efficacy of treatment are more likely to be on atorvastatin, a stronger statin. Additionally, carriers of the APOE and LPA variants are more likely to be on a higher intensity dose (a dose that reduces low‑density lipoprotein cholesterol by greater than 40%) of atorvastatin than non-carriers (APOE: p(high intensity) = 0.16, OR = 1.7, P = 1.64 x 10-4, LPA: p(high intensity) = 0.17, OR = 1.4, P = 1.14 x 10-2). These findings represent the largest genetic association study of statin selection and statin dose association to date and provide evidence for the role of LPA and APOE in statin response, furthering the possibility of personalized statin therapy. ### Competing Interest Statement R.B.A. is a stockholder in Personalis.com, 23andme.com. M.A.R. is on the SAB of 54Gene and Computational Advisory Board for Goldfinch Bio and has advised BioMarin, Third Rock Ventures, MazeTx and Related Sciences.
Pharmacogenetics (PGx) studies the influence of genetic variation on drug response. Clinically actionable associations inform guidelines created by the Clinical Pharmacogenetics Implementation Consortium (CPIC), but the broad impact of genetic variation on entire populations is not well-understood. We analyzed PGx allele and phenotype frequencies for 487,409 participants in the U.K. Biobank, the largest PGx study to date. For fourteen CPIC pharmacogenes known to influence human drug response, we find that 99.5% of individuals may have an atypical response to at least one drug; on average they may have an atypical response to 12 drugs. Non-European populations carry a greater frequency of variants that are predicted to be functionally deleterious; many of these are not captured by current PGx allele definitions. Strategies for detecting and interpreting rare variation will be critical for enabling broad application of pharmacogenetics. ### Competing Interest Statement R.B.A. is a stockholder in Personalis.com, 23andme.com.
Adverse drug reactions (ADRs) impact the health of 100,000s of individuals annually in the United States with associated costs in the hundreds of billions. The monitoring and analysis of the severity of adverse drug reactions is limited by the current qualitative and categorical system of severity classifications. Previous efforts have generated quantitative estimates for a subset of ADRs, but were limited in scope due to the time and costs associated with the efforts. We present a semi-supervised approach that estimates ADR severity by using a lexical network of ADR word embeddings and label propagation. We use this method to estimate the severity of 28,113 ADRs, representing 12,198 unique ADR concepts from MedDRA. Our Severity of Adverse Events Derived from Reddit (SAEDR) scores have good correlations with real-world outcomes. SAEDR scores had Spearman correlations with ADR case outcomes in FAERS of 0.595, 0.633, and -0.748 for death, serious outcome, and no outcome, respectively. We investigate different methods for defining initial seed term sets and evaluate their impact on severity estimates. We analyzed severity distributions for ADRs based on their appearance in Boxed Warning drug label sections, as well as ADRs with sex-specific associations. We find that ADRs discovered postmarket have significantly greater severity compared to those discovered in the clinical trial. We create quantitative Drug RIsk Profile (DRIP) scores for 968 drugs that have a Spearman correlation of 0.377 with drugs ranked by FAERS cases resulting in death, where the given drug was the primary suspect. We make the SAEDR and DRIP scores publicly available in order to enable more quantitative analysis of pharmacovigilance data.