Using Precision Health resources to empower COVID-19 research
– Okay I think in the interest of time I'm gonna get started.
Good morning everyone, myname is Vicki Ellingrod and I am the faculty leadfor the education workgroup for Precision Health.
Before we begin with our speaker today, I wanted to provide you some updates about some of the newerPrecision Health activities that we have going on.
In terms of education, we've recently introduced a Precision Health certificate program.
It is one of the first in the nation that is focusing primarilyon Precision Health.
It is open not only tocurrent graduate students across campus but also professional students and masters studentswho are graduate students.
If you are interested in being part of this certificate program, we are now enrolling students for fall and I encourage you to visit our website if you would like to learn more.
Additionally on Tuesday, June 30th, at noon, there is going to be anonline information session that you can find the link for on the Precision Health webpage.
So if this is somethingthat's of interest to you, please go ahead and registerfor that information session to learn more.
In terms of Precision Health activities for facilitatingresearch, one of our goals is to really build the infrastructure to enable interdisciplinary research with various tools andresources across campus.
While one of these resources is going to be discussedshortly by our speaker today, I also want to make sure that you know that we have a new analytics platform to help you even researchersfrom all disciplines access and use Precision Health data.
More information similarto our certificate program is available on thePrecision Health webpage.
Precision Health alsorealizes the importance of not only scientific discovery related to the individualization of healthcare resourcesand scientific discoveries but also translating itto findings into practice.
Therefore our implementation workgroup has been doing just that.
And some of the most recentfindings of this work is also being highlighted onthe Precision Health webpage.
And lastly, one of the key roles that Precision Health hasbeen playing on campus is the funding of research.
Precision Health realizes the importance of encouraging research not only through educationalevents such as this one, but with also real research dollars.
Therefore I'm happy toannounce that currently there's nearly $6.
5 million in funds that have already beendistributed for research, as well as for promisingtrainees here on campus.
Therefore, with thisinformation and these updates I hope I've enticed you to learn more about Precision Health.
I would also like toencourage you to become an official member of Precision Health and more information about how to do that is available on the webpage.
So if you go to precisionhealth.
edu there's a wealth of informationthere waiting for you to learn more about howyou can become involved, potentially apply forsome of the research funds or take advantage of oureducational programs.
I also invite you to savethe date for our symposium taking place on Wednesday, September 23rd.
This year we will behaving a virtual event, however we'll be having severalprominent National Speakers, and we have a chance to interact with other community memberswithin Michigan and beyond through a virtual or an E-poster session.
So please watch your email for more information about this event, I really think this is going to be a very different event thanyou've ever been a part of not only focusing on thescience of Precision Health, but also the impact on the communities that we all work with and serve.
So with that, I want tointroduce our speaker for today.
We are excited to hostthis workshop on a topic that has become very importantto everyone very quickly and that is COVID-19.
As more research isbeing done across campus on this topic and related topics, you're here today to learnabout a valuable research available to support COVID research using Precision's Health, Michigan Genomics Initiative or MGI cohort of nearly 80, 000 participants.
Our speaker today is Erin O'Brien Kaleba Director of the ResearchData Warehouse in DataDirect, as well as the Data Office for Clinical and Translational Research.
She and her team partnerwith investigators and academic departments totranslate research data needs into technical solutions with an experienced teamof data-based programmers, and she particularly has expertise in accessing the MGI data.
So how this is gonna work, Erin is going to speak, I'm gonna be moderating the chat function.
So if you have questions, please feel free to type them in the chat function.
And then around 12:45 or so, we'll be asking some of those questions.
And with that, I want to thank Erin for agreeing to present today, and we'll hand it over to you.
– Thanks Vicki so much.
You can see Vicki has my name under her photos(Vicki laughs) so she's gonna take the hard questions and I'll take the softballs.
So yeah, thank you somuch to Precision Health, to Tina for all the coordination and Vicki and to Rachel Dawson for your leadership.
Also thank you to my team whodoes all the work that I'll shamelessly take creditfor in this presentation.
But I'd like you to thinkabout this talk today as just the start of a conversation about what we have builtto date for your use and what we will be building, what you tell us isthe most critical next, either data type, or solutionor integration that you need to be successful.
So that you know, thealternate title to this really could be why is thePrecision Health platform ideal for studying this disease? What is it about COVID-19? Its rapid uptick, its pervasiveness that makes the Precision Health platform with all its resources and its experts the ideal place to centralize a lot of what we're making available.
So an alternate titlefor this could have been if you remember nothingelse from this talk remember that there arepeople and there are tools that can serve you and help you remember what was said today.
So let's start with alittle bit of background.
So the world seemsstumped by this pandemic, the ferocity of it, theunpredictability of it.
I mean the epidemicitself was predictable.
We saw this coming, we thescientific community of the world knew there was one ofthese big pandemics coming, yet the course of thedisease, who's most impacted? That is a largely unpredictable and that needs to be studied.
That is ripe for predictive models for looking at all the dataand information we have and making tools to predictwho's next, what's next? So although we're stumped bythis one, pandemics are not new Black Death, cholera, yellow fever, even up until the 1980s, HIV and AIDS.
And we got lovely images such as these plague physicians andpainters back in the day would spend their time doing these just disaster ugly paintings of plagues and the impact on society.
This particular one actually shows not just peasants sickbut kings and people in the upper crust showing that plagues are non discriminatory whenit comes to who gets sick.
So the reason this is soimportant to be studied and why the Precision HealthInitiative is ideal is that we know the immunesystem attacks the virus, that's what gets you theinflammation response and gets you a fever.
But there're in some cases theimmune system goes berserk.
I don't know if that's a clinical term, but it resonates with us.
It does more damage thanthe actual virus is doing, it causes somethingcalled cytokines storms.
And what we know is that this can cause the blood vessels to leak, the blood pressure willabsolutely plummet.
The immune cells will attack the lungs that they're supposed to be protecting.
But what needs to be studiedas who is most at risk for this extreme reaction? Whose immune system willbehave in a predictable way and get rid of the virus? And who is going to havean immune response that is absolutely more damaging? Are there social determinantsthat we could say, this particular groupneeds heavier prevention because they're more at risk? Are there genetic factors that suggest one group versus another is more prone to this type of extreme response? Are there other things wehaven't even thought about yet, beyond clinical and social and genetic that are really the keysfor us to be studying? And you all may know what those are and we'd love to hear whatyou're finding in your labs, what you're finding inyour social research that we should make availablebecause those are going to be, it's not going to be one factor, it's going to be some mix of factors.
But this is why I say thatthe Precision Health platform is ideal to be studying this from all these different angles.
So today I'm gonna talk about what is the Precision Health initiative and what is the Data Office? When should I seek theirservices and their tools? How is COVID-19 beingdefined globally versus for clinical versus for research? And life a few months fromnow what should we expect, other than me getting a haircutand long overdue highlights? So the thing that thePrecision Health Initiative is really doing is takinginformation about the patient, reams and reams and 100sof millions of documents and vital signs and surveys, and trying to put those into a place where you can easily access them, we cannot expose the patientsensitive information to risk or loss and where the patient can be part of that conversation about what we can and should study about them.
So this is really amazing if you look at most academic institution there's something that's like the Precision Health Initiative.
I think Michigan's is alittle bit unique though.
You'll see institutions call it the Personalized Medicine initiative, or the Precision Medicine Initiative.
U of M was intentional aboutcalling it Precision Health.
And I think that definition on the left-hand side of the screen, that it's really studying populations with the intent to drilldown to actionable decisions for the individual person.
And really the pursuit of wellness.
So we're not just studyingthe genetic and social factors for sickness and disease, we're studying why some people stay well.
What is it about them, their lifestyle, their genes that make them stay well? So that's one thing that sets University of Michigan apart is this focus on health and wellness as well as disease.
A second thing is that wehave 19 schools and colleges at U of M who are all part of The Precision Health Initiative, many of whom have contributed dollars and faculty and expertise.
This is a cross campusinitiative, the provost, the Deans of Engineering, andPublic Health, and Pharmacy and Medicine and so many others have said, this is important, we're gonna actually put money on the table.
And the Precision Health Initiative is not something that's so brand new, we're starting from scratch.
It's an accelerator and incubator.
It's to take all the pockets of innovation that were already out thereand glue them together and say to a social work researcher, and a Ross School of Business researcher, and someone in Medicine, youguys are all making advances.
Your contributions todiscovery and to translation will be stronger if we work together.
So that's the Precision Health Initiative.
Lots of more information onthat if you're interested.
Let's talk specificallyabout the Data Office.
The Data Office has gonethrough a lot of changes or an identity crisis.
I joke that it used to be just a Jeremy, it felt like back when I was hired, every department or everyschool had a Jeremy.
And I'd say how do you get data? How do your researchers getdata to answer their questions? They'll be like, “Wegot a guy named Jeremy “who pulls it for us.
” So Jeremy became an Honest Broker Office.
And that title was reallyto reflect the fact that this office would be the go between between the patient and the researcher, and would provide thatsensitive information.
The honest broker kind of paired together with the Research DataWarehouse and DataDirect to become the Data Office for Clinical and Translational Research, DOCTR is another acronym for that.
So it's the people whoget you the data you need, but not more than the data you need.
I've listed out here alist of the types of data.
There are many many more.
But I find sometimes it'shelpful to hear some examples so that you can know is thissomething we can help you with? Or can we help you get intouch with a different group? So it is very basics cohort discovery.
So how many patientswith COVID-19 transferred from outside hospitals? How many patients with lupushave had a knee replacement? So these how many's a lot of times, funders will have a funding opportunity and we want to be responsive but we don't even know howmany we see here at Michigan, you go with a gut feel of I think there's about five patients a week that I see with this condition.
We also serve the needs for retrospective observational study.
So there's a lot of studythe difference of outcomes for those based on the length of time someone was on a ventilator.
And then those retrospectiveobservational results, then can really inform aprospective cohort design and help with recruitment.
So once I've identifiedmy eligible population, I want to recruit them in themost efficient way possible.
For those who are eligible, when's their next appointment? When are they gonna come into see their rheumatologist and I can maybe have someone at the Domino farmsrheumatology clinic ready.
So those are reports ourtools can print out for you on a daily or a weekly basis, when eligible patients are due for their next appointment with us.
Precision Health, somany possibilities here that I'm gonna talk aboutin the next set of slides.
But this is the clinicaldata I've been describing plus a whole extra suite of data options.
So how many patients withunderlying atrial fibrillation have genetic and environmentaldata that I can analyze together with theirmedication list, let's say? And finally the growth ofnetwork funded research or multicenter study of research kind of how the NIH wantsto fund research nowadays, not just, here Michigan here Duke, we're gonna fund you to do this but we need you to workacross health systems so that we get a bigger bang for the buck.
So how can we make U of M data look like everyone else's data so we can more nimbly share it? How can we map it? Even if everyone hasthe epic medical record, the implementations are so unique that you really do have tomap to common data standards in order to efficiently be able to share and link with other health systems.
So those are the types of data needs obviously there are many more, but then each of those dataneeds comes in a lot of flavors.
And so as Vicki mentioned, we have the technical set of skills in the PrecisionHealth and Data Office and we also have aregulatory set of skills that will kind of tease outeach of these unique nuances.
So here's the variationsof those requests.
Could you pull a limited data set from me I can't have direct identifiers.
Could you pull data from my fellow? Could I share it with myvisiting med students? Could you give me all the data so I can build machinelearning algorithms? Can I store this dataset on the cloud? Can I link this data with genetic data, with geolocation data, withpatient survey data that I have? So each individual data need, just like research itself is a unique question, it has unique nuances.
We tease out where's thebest place to store it? Who can have access? How anonymized do the data need to be? And so these are some of thethings that our office does.
So it's really about getting data from the patient as the source, putting it through the lensof what did the IRB say was most ethical in theconduct of the research with those data? And the compliance partabout privacy and security? Where can you store it? Can you invite others intoyour environment to analyze it? can you reach back out to that patient? So this is the lens, so we'renot IRB, we're not compliance but we use the guidance fromthose two pillar organizations to look at each of the researchers and get the data that they need.
So let's talk specificallyabout COVID-19 data.
I feel like these slides will be stale in about 10 minutes becausethis virus and this pandemic is so rapidly growing thatwe are trying to keep up and iterate in a waythat you the researcher or you the research coordinatorknow what you're looking at as opposed to what you pulled a month ago.
So we're trying to establish consistencies but I recognize that it's very hard as we learn more and more.
So in March when the U.
S's experience and Southeast Michiganreally began to experience the full impact of the pandemic.
There was a flurry ofactivity and I'm blown away, I'm so impressed by how quicklycommittees were stood up the data, what did they call 'em? I'm not gonna think about it, like command centers werestood up to learn about this to help those primarily at the bedside.
Those are the front lines, dowe have enough ventilators? Do we need to expand the number of floors, that were stepped down floorsand now need to be full ICUs? They needed the data first.
So research as importantas we think it is, as you think it is as a researcher, it's not the first inline for data, right? So the data for researchrarely needs to be real time.
Whereas on care delivery and operations, those data for COVID-19 starting in March it was high urgency, they had to be real time.
We had to know when someonewas going to deteriorate if we could even predict that.
They had to be absolutelystandardly defined, it had to be in a really consistent way so we could see our trends.
And the integration withother types of data was there but it was minimal.
So how is research different when we got in line in March and said, “Oh, we need to start studying this, “we need to get informationout to the research community “so that this can be studied.
” We definitely recognizedwe needed to come behind that front line.
The urgency for research.
Usually, I'll say usually, but COVID-19 has changed everything but usually is medium to low urgency.
The freshness can bea day old, a week old, it doesn't have to be real time, you're not treating a patientright in front of you, you're studying.
And so if it's a coupledays stale that's okay.
What you're asking is usually novel, it's not like a qualityimprovement measure where you're measuring the same thing.
Research is you're asking a question that we don't know the answer to and so there's a lot of newness to it.
And the importance of integratingwith other types of data, looking at long term outcomes, looking at patient experience and patient perception, lookingat wearables and biomarkers and all that it's incrediblyimportant for research.
So my favorite, Martin Luther King quote talks about the fierce urgency of now.
Certainly, it applies toso much going on right now but certainly COVID-19, you could say, you know, a time for vigorous and positive action.
And so I think about research being behind the front line needsbut not far behind it.
This pandemic and this urgency is greater than we've seenwith other types of research.
So what have we tried tomake available for you? I'm gonna go through acouple of data types, how you get it, what youneed to have in place to obtain these data.
So I'm gonna talk about self-serve tools, things you can do on your owntoday, from your own desk.
Hopefully with lawn mowers less loud than the ones outside my window now my neighbor's mow it all hours of the day.
What a custom extract meansif the self-serve tools are not meeting your needs because your data needs are so complex, we can help you out witha custom data extract.
What biospecimens or humantissue, or viral tissue is available for your use in research, and then what genetic information derived from those biospecimensis available to you.
So let's start withstructured clinical data.
By discrete data elementsor structured data elements think about data collected inthe course of someone's care, that's either picked from adrop down menu or radio buttons.
So these are things like labsand medications and diagnoses.
And so the DataDirecttool and the EMERSE tool are the two self-serve tools.
DataDirect is really thetool for the structured data.
So this is access to clinicaldata that's a day old or a week old, that you caneither do cohort discovery with, no IRB needed or you canobtain raw-level patient data if you have IRB approval.
So what we did for COVID-19, alot of people were asking us, how's it being defined? Do I have to put in every single lab that is available to test for COVID-19 and then look for apositive or a negative? What if someone got tested elsewhere and transferred into Uof M how do I find them? And so we thought it would be helpful to build what we're callinga starting population.
So I think of this as that baking show where they put something into the oven and immediately pull it out fully baked.
So this is a pre-baked definitionor a computed phenotype for COVID-19 that'savailable to you today.
And for the research definition, is patients at Michigan Medicine with presumptive positivewere positive COVID-19 as a result of one of those six lab tests that you see below.
Or they have a diagnosis, the ICD-10 UO7.
2 is associated with positive COVID-19 even if you don't haveevidence of the test.
And so this is not really readable but it's to show you ourlogic of how we built this, what sits behind that starting population so that when you go in andstart your research on COVID-19, and you grab the studyingpopulation that updates daily, that's fresh as of yesterday, you can in your paperor in your methods say here's what's included, here's what's not included.
So how is this researchdefinition different than what our colleagues onthe frontline are studying? So in my chart, in the EMR, the electronic medical record, there's a COVID-19 definition.
And that, again, it's confirmed only, it's inpatient and discharged.
This is our definition, sorry.
It's confirmed and it'sinpatient and discharged.
The MiChart one, one ofthe MiChart definitions is only current occupancy andcurrently admitted patients with COVID-19.
Once they're dischargedthey're not part of that number because they're no longer needed for calculating resources or supplies.
Ours is inpatient, discharged, came here for a test were sent home because their oxygensaturation level was okay even though they still had a fever.
And then, in the medical record, there's something calledthe active infection record.
Which appears on a patient's chart that tells you what typeof droplet precautions you the care provider needs to consider.
That in my chart goes intothe definition of COVID.
That alone does not gointo our definition, we need either a test or a diagnosis code.
So just to know if you see a number in the electronic record, andit's different than ours, that might explain it.
So in DataDirect once you login and you create a new query, you can either start with adenominator of all patients who have ever had a medicalrecord number at Michigan, which is over 4 million, or you could start with the population, with anyone tested positive for COVID-19.
And so for level one users, it's datadirect.
edu for level two users, it'sdatadirectmed@umich.
The beauty of Precision Health is that you don't have to beemployed by Michigan medicine to be able to access health research for the important studies you do.
And so we hope eventually, there won't be two versions of DataDirect, it will be just one versionand your credentials and your levels of approvalwill all be authenticated on the back end and the user experience will just be one tool.
Right now it's two differenttools but with the same data.
You can still get that starting population of COVID-19 patients in the either tool.
So let me know if you haveany trouble with that.
So here's some of thestructured data available.
Patient demographics, dates of admission, dates of primary care visits, diagnosis, whether it was on your problem list or whether it was the build diagnosis, the reason you're being seen here today.
All surgeries and procedures are in there, medications that were ordered, medications that wereadministered inpatient, labs, and then I'll talka little bit more about the Central Biorepository, that's a core part of thePrecision Health Initiative in which patients consentto have their data, their biospecimens, their genetic information accessed for future research.
And that's in Vicky BlancCentral Biorepository.
Those are structured data, now let's talk about unstructured data.
I think at the same time that I'm giving this town hall David Hanauer is doing a training for EMERSE.
EMERSE is a tool that has more than 100 million clinical documents.
So think about pathologyreports, discharge summary, physician notes.
David says that 80% ofimportant information about the patient isactually in free text, not in a structured fieldthat's easy to query.
So he's developed this toolthat will actually take you right to language in a patient's chart that has a string of textthat you're Interested in so you don't have to combthrough the whole chart on the front end.
Remdesivir is the antiviral drug that's getting a lot of attention, doing some clinical trials about it.
I was quickly able to sayokay 176 unique patients since January 1st, 2020 haveheard the term remdesivir mentioned in their notes.
Some of them it may saylike patient declined or patient started on itand had adverse reaction, something like that.
It will tell you how many patients that it'll bring you right to their chart.
You do need IRB for thisbecause it is not de-identified, its actual views into the patient's notes.
So now we've doneclinical data structured, unstructured, free text.
Now let's talk about whattissue and biospecimens are available to you.
So specimens related to COVID-19, we have quickly growing repositories of plasma, serum and nasal swab tissue.
This is a little bithard to read but again, we can help walk you through this.
There are severalrequirements for obtaining clinical residuals or blood leftover from clinical diagnosing andclinical laboratory testing.
So, the UMOR stood up acommittee very quickly to help prioritize COVID-19 research.
And it wasn't an approvalbody the way the IRB is but it was a necessary step to prioritize, to say to the IRB tosay to Precision Health, to say to the DataOffice, this is absolutely a high priority research study.
Fast track it, get themresources they need or this is really important, but it's a medium or low in terms of COVID-19.
And this group would turnaround prioritization within 24 to 48 hours.
I don't think they slept, they still need to receive any research ideas related to COVID-19.
Their volume is a lot lower now than it was in March and April.
But they just did afantastic job at saying, What are you studying? Will you be depletingour resource of plasma? If so, it has to beable to, you know, like, of utmost scientificand translation value.
So we can help walk you through that.
But the DataDirect tool can tell you at least how many are available.
So if I use my startingpopulation of COVID-19 and then I want to look to also include a diagnosis oflet's say, atrial fibrillation, and then I wanna say, “Okay of these patientswho met my criteria, “how many of them havea specimen available?” Because I'm going to want torun some separate analyses on those let's say.
And then as I mentioned those specimens, some of those specimens have been further genotyped or sequenced.
And so the genetic data about patients with COVID-19 is available, not on all patients.
But I'll tell you a couple of things that we're making available, and if you can give me feedback on how you would like tosee this information, how you would like to consume it, what else we should think about.
So as part of the MichiganGenomics Initiative, Vicki mentioned there's more than 80, 000 patients at Michigan whohave consented to have their DNA analyzed through a genome-wide association studychip that tells us variants about what's in their genotypes, and then patients witha nasal swab with COVID have had the virusitself, the RNA sequenced.
That's an increasing number that's going to be publicly available.
Actually the RNA detail itself, but to take that RNA about the virus, and couple it with other types of data, we have about those same patients, their geolocation data, their social determinants, their clinical course of care.
That's again an extreme advantage that the University of Michigan has over other groups who are doing this.
So if you're looking forCOVID-19 positive patients who are also in MGI, you could then say, All right, I'm going to workwith the research facilitators or the Data Office toobtain their genotype their GWAS data.
And then the viral RNAagain is just COVID positive from a nasal swab.
And that's going to be available soon.
So if the self-serve tools are inadequate, to answer your research question because it's highlycomplex and the variables all have a relation to eachother that's highly complex.
We have a set of SQL analysts, they're database analysts who can pull custom data for you, according to your specifications, from multiple systems, they can link it to other types of data and they can deliver itto your secure server wherever you're going to do your analysis.
And so we've been working together, we have had probably 10 to 20 requests for custom data extractsrelated to COVID-19.
And part of our, youknow, influx of requests is when the basic science, you know, enterprise at Michigan had to close and when human subjectsclinical trials had to close, researchers and their teams turned to things they could do remotely and data is one of those powerful things.
Informatics, modeling, associations, those are all things you can do while wearing sweatpants at home.
And so our business hasbeen unbelievably busy and we are so grateful for that.
Because you can still request data polls, you can still play with DataDirect, you can still do have consults with the research scientific facilitators, all of that from home.
And so it's helped the researchersbridge this time between, you know, closing of the enterprise and reopening of traditional research.
And it's also gotten a lot of excitement in these researchers aboutwhat data are available that they had no idea.
And so we do a lot ofconsultations where people say, is this available? How many do you have with this? Can I link it with data I already have? So when you get to an actual data poll from one of our data analysts, it's a $60 an hour recharge rate and a lot of the data requestsare a one time delivery, others are a scheduledor an automated delivery.
So every Monday morning, a refresh of the data will automatically come to your secure analytic location.
So let's look ahead forthe last couple minutes.
And I want to tell you about new data since maybe you last usedthe tools or the services.
And I want to hear from you what else we should be going after what else we should be studying.
So in the last couple months, we've been working alot on patient surveys or patient reported outcomesand we'll talk more about that.
We have information onclinical trial enrollment, so we're working to say I foundthese patients are eligible for my trial, how many ofthem are already involved in an a CT and probably couldnot be part of my trial? Cancer staging treatment and outcomes, we're working with thecancer tumor registry that's a really reallyvaluable source of data for anyone who was diagnosedor treated at Michigan with cancer.
And it goes deep into theirstaging, their treatment, their outcomes, their biomarkers, things that in the regular medical record might be in a path report or might be in free text somewhere, this isactually unstructured fields.
These are registrars thatour Cancer Center hires to really go in-depth and dataenter these cancer findings.
The Michigan index is up todate as of a couple months ago and it has date ofdeath and cause of death for anyone from Michigan medicine who died in the state of Michigan.
Additionally we have access to but not in the self-serve tools.
we have access to theNational Death Index.
And so think of that as a much broader, that's a national scope.
However it's only data death, we don't have cause of death from there.
So we're searchers aretelling me that they need some sort of hybrid information to get a more completepicture of outcomes research where you're studying death.
Natural language processing, we're working a lot with thelearning health system on this, and taking free textor strings of data like presenting symptoms, chief complaint and we're convertingthose into structured data that would be searchable likeany other structured field.
And then we're taking U of M data and spending a greatamount of time mapping it to some of these national data standards.
So there's a PCORI Network, Pediatric Trial Network, OMOP is a standard that's used in Precision Health activities.
There's a Common Data model where it will break down each of let's say, 14 data tables and say, all of your lab files need to have white codes result, you know.
And all of your demographics have to be African American is one, Asian is two.
And so when you map yourdata, it's the same data, it's in a new format that makes you ready to collaborate with other institutions.
So we often get asked by ourcolleagues in engineering, “Is this big data?” And I never know how to answer.
I say it's a lot of data, Idon't know if it's big data.
But I do know it'sgrowing bigger by the day, it's millions of encounters, from patients who receive their care at Michigan.
It's millions of genetic variants, it's hundreds of surveys.
Right now what we know is it's a not a very diverse population.
And so Precision Healthis putting in great effort to expand our reach and expandthe opportunity for patients from different socioeconomic backgrounds, from different racialor ethnic backgrounds to participate in MGI and tohelp us shape what we study and what we collect, andwhat's important to them.
Because just Michigan Medicine alone is not especially diverse, it's 95% English speaking, it's 85% white or Caucasian.
It's also a very high educationlevel, so we really do.
So it's big data, it's a lot of data but it is getting bigger.
So it is taking the medical phenotype, the genetic geno-type.
Increasingly family historyis being asked about and coded in a way you can study.
Behavioral and lifestyle.
So we have the AppleWatch study called MIPACT, where we're learning a lotabout patient's blood pressure during their regular day.
And surveys they respondto via their Apple Watch.
What's their pain score when they're about to take their opioid? Are there any patternswe can learn from them? Environmental factors that may contribute to someone's outcomes.
Social factors that may becontributing to who does well, who doesn't.
So to glue all thesetogether is our goal for the Precision Health Initiativeand we're getting close.
So although we're getting a lot of data, we don't need to getdata, just for data sake.
We need to be data richand information rich.
There's this expression dripdata rich information poor, like you sit on a pile of information but you can't make any sense of it, you can't obtain it ina way that could lead to a study result or a conclusion.
And so we want to be datarich and information rich, that's our goal.
So the two surveys that Iwant to kind of close with are really exciting.
So Bhramar Mukherjee isleading an effort to do a epidemiology basedquestionnaire called EPIQ.
This is a one long and detailed, and nominal questionnaire, similar to what's being done in the U.
K with the U.
And it goes overenvironmental factors about your amount of sleep, your smoking, anxiety, your family, your family's health.
And it's really the type ofinformation about a patient that is more predictive of outcomes than the data we collect in a 15-minute doctor's appointment.
This is what leads to someone having healthy outcomes or not, it's these and we've never collected it in this much of a systematic way.
So we're in a pilot phase we have 60 surveys collected today.
We also have a COVID-19 questionnaire that we use throughQualtrics and we sent to more than 50, 000 patientswith and without COVID-19.
These are patients at ourMichigan Genomics Initiative, who have agreed to be recontacted.
It asked about perceptions to COVID-19, how healthy is your family? What age group is your family? For those who were infected, what were the symptoms? Were you hospitalized? Were you sent home? How long did that fever last? This will be discoverable in DataDirect.
Right now we have as of this morning 8, 032 surveys completed today.
Bhramar Mukherjee, oneof our faculty said that survey response rateshave never been higher because people are at home and they have the opportunity to fill out surveys.
So it's one of the smallbenefits of this pandemic and the stay home ordersis that people have time to answer all of thesequestions which is great.
Geolocation data, this is information, we've taken the streetaddress of patients, mapped it to a latitude and longitude, that latitude longitudecoincides with a census track ID.
And so from that census track ID there are national data available about characteristics of apatient who lives on that block.
It's mapped that individualpatients, you know, income and education level, it's data about that specific area.
So it's much more granularthan zip code data, it's much more actionable, it'smuch more related to health.
And this is the first timewe've had this available.
So I'm just gonna go throughthese characteristics.
A lot of them have beencompiled by national groups into indices, so neighborhood affluence, neighborhood immigrant index.
And so some of them youcan already take advantage of what's been pulled together, others it just come to you like, for the census ID whatare the percent of those in a household with aneducation less than eighth grade and income less than this? And it gives you somecontext when you're studying outcomes of a given condition.
So I just want to finish withall these data, you know, we do our best to de-identify, and we do our best to protect that patient who has so generously allowed us to use your information to study.
But we can never saysomething's 100% de-identified.
And if I have a singledata set with lab values, I could do a prettygood job de-identifying.
But I just showed youall that information, information from their Apple Watch, information from their surveys, information about their genetics.
And when you combine all of those de-identified data sets together, patient identity becomesincreasingly discoverable.
And so one of the main messages we really really really stressin Precision Health as well, all these resources are available and we're so excited about them.
There's additional ownerson you, the user on us, the provider of the information to make sure we're protecting these.
Even if we can give youa de-identified data set, we make sure that you keepa strong password on it, keep it in a HIPAA configured enclave, even if it doesn't have HIPAA identifiers.
So that's just one thingthat's underpinning all of the data I just gotso excited telling you about is that we really really need to partner and keeping those data protected otherwise, this will all go away.
So now I would love to hear from you all about what else we should be working on.
What did I miss? What else is key to COVIDor to Precision Health moving our PrecisionHealth Initiative forward that we haven't thought about or that I maybe didn't mention? – Thank you Erin that was wonderful.
I don't know how to virtually clap.
(both laugh) – We'll pretend thelawnmower in the background is applause Vicki.
– Well you are going to be able how you're hearing my dog applaud in the living room- Oh good, good, okay.
– But anyway so we did havea couple of questions come in that would great.
So one of the question has to do around the COVID genetic data availability.
And they're interested in getting it nanopore sequenced and would appreciate it if you could get the Fast five files.
– Excellent, I would loveto put you in touch with our colleagues in theCentral Biorepository but it might also be our colleagues in public health biostatswho work with the raw data derived from the DNA.
So Vicki maybe I'll work with you just to get that individualsor the individual can send me an email tothe email on the screen and I'll get you in touch withthe right people for that, that's a great question.
– So there was also two other questions that I think kind of goalong with each other.
So one is, is it possibleto add patient age at a counter to the PrecisionHealth version of DataDirect? Ages- Oh good.
– Ages associated withCOVID-19 susceptibility so it's important toinclude as a covariate it is available in themedical school version of DataDirect but not thePrecision Health version.
And then the other question has to do regarding the patients that are in the COVID database of patients included in the starting populationare inpatient or discharged.
Does that mean the population includes a higher proportion ofsevere COVID patients than the general populationof people infected since many people don'tneed to be in the hospital? – Great question, great question.
So first patient agent encounter, that's absolutely doable in the Precision Healthversion of DataDirect.
So all of the datesbecause that DataDirect sits on top of a de-identified data set within a given patient's record we took the dates and shiftedthem all the exact same amount so that we can Stilldo calculations between started on this medication andgot this reaction this time, but they're not real dates, they've been shifted.
And so agent encountercould fall in that same.
So I'll make sure we can add that thank you so much for that suggestion.
And then for patients who tested positive, I would say they're moresevere if they made it, maybe to Michigan.
If there was a communityhospital that diagnosed them, but maybe didn't have thefull ICU and ventilators and other support staff.
So in that way they are but really we would just we don't evenlook at hospitalization as a criterion we look atpositive tests and diagnosis.
You can study who was hospitalizedand who was sent home.
What I was trying to dowith that slide is say that certain populationincludes all of those sent home after they were diagnosed which was a large percentage of the COVID positive.
The goal, you know, March and April when our numbers were highest, to get people out of the hospital was one of the biggest goals.
If they had a high feverthey could be sent home, if they had low oxygen saturation, they had to be admitted.
And so the ED and others hadto walk this fine line of who to hospitalize who not.
Whether you got senthome, or you got admitted those are all in the population.
So we don't actuallylook at current location we just look at, do we haveevidence of a positive test? Or did your record have a COVID diagnosis? So, I probably made it sound like we skewed towardinpatient, but really, it's anyone with COVID-19 that we saw here in Michigan Medicine.
– Great, another question is what is the best path forward to obtain more granularinformation about biospecimens.
When I search in DataDirect the specimen type I'm interested in does not appear for COVID-19cohort i.
e in nasal swabs and what derivative formsof the sample are available? – Great question, thank you.
I hope next week we'llbe exposing nasal swabs.
Right now that screenshotI showed of COVID-19 that's the blood and serum.
And so the nasal swabswill hopefully be available next week I think they're gonnabe coming through the feed, so we get a data feed from the Central Biorepository lab system, and so it's a nightly feed.
In the meantime, if you wantto email me and just say, how many patients of thiscohort have a nasal swab? I could tell you thatjust through a manual pull but it will be available for discovery.
And we're trying to think of, and if anyone wants tohelp us pilot test this, we're trying to thinkof the best way to say these patients have specimens, here's the various type.
These patients have geneticdata, this is the virus RNA, this is the patient's DNA.
and so we're trying to figureout the best way to display all of that.
– So we also have a questionabout some of the NLP data.
And could you repeat a little bit about how to access the NLP datathrough the self service system? And specifically what kindof information is included in those NLP data? – Yeah thank you for that.
Right now we have about three use cases that we've tried out withNLP and we're not sure how to display those in DataDirect, but we have them incoordination with the Node.
And then someone on theresearch data warehouse team Heung Ju also has done work with NLP.
And so again it's one of thosedata types we have available and we can deliver toyou through a custom.
We don't know how todisplay it yet, self-serve.
But if you have suggestions, you know whoever asked the question Iwould love to work with you.
– And that right nowis the end of the Q&As or chats that came in.
So if anyone else has another question, I can unmute you provided Ican figure out how to do that.
I've not had to do that yet I think you have to raise a hand somehow.
And I'm not exactly sure how to do that.
(laughs) I might need some help.
I'm not seeing anything though.
But if you tell me your nameand that you have a question, I certainly can unmute you.
I'm having a lot of appreciation for my daughter's college that had a huge chat with thousands of parents and what the president of the university had to undergo for that.
So I'm sure I can figureit out with 40 people.
– Oh my gosh I love it.
– I'm not seeing any more questions.
– I do have an email toshare with you later, Erin.
And once again I want to thank you, I think this was afantastic and very timely and I certainly share yourappreciation of how fast things and how Michigan was ableto mobilize so quickly to do this in a structured fashion.
And shows that we trulyare the leader in this and your team in particularbeing able to share this data in such a wide way is truly remarkable.
I also want to thankTina Crozier who helped put on this event and dealingwith a lot of the logistics.
I think this is one of the first webinars that Precision Health has done.
So thank you Tina for yourorganizational skills, and I want to thank everyonewho came and attended this.
The recording is going to be available on the Precision Health webpage, which again is precisionhealth.
We will also email all the attendees to let them know when that is up.
But again I think thiswas really remarkable.
So thank you so much.
– Thanks Vicki.
– And we are concluded.
– All right.
– Thank you.
– Thank you.