Assistant professor of computer science Christopher Ré received a “genius grant” from the John D. and Catherine T. MacArthur Foundation on Monday. One of 24 fellowship prizes, the $625,000 grant will fund his work in a breadth of computer science areas.
Recognized for his contributions to making big data analysis more accessible, Ré developed an inference engine called DeepDive, which has been used for medical and anti-human trafficking efforts. The Daily spoke with him about his plans for further application.
The Stanford Daily (TSD): Much of your work recently has been focused on big data, a field that has been a buzzword recently in computer science. When did you first become interested in big data and its application to solving real-world problems?
Christopher Ré (CR): When I was an undergrad, I first had a database class that I was taking. A guy named Jim Gray came by and spoke. Jim was a Turing award winner. He worked on this project called The Worldwide Telescope, which now I think is called the Sloan Digital Sky Survey.
One of the things that happened during that class is I saw this guy — I was kind of a math student at the time — I saw this guy who was working on these really applied problems for physicists and astrophysicists. He had this beautiful abstraction that he was providing to them, and he had a very simple observation, which was all the telescopes in the world eventually dump their data onto hard disks, and if you just took all those hard disks and strapped them all together, you could have a much bigger telescope. And it was like one of those “aha” moments when you realize this is a key place where you can have a huge amount of impact.
I started thinking about a career in data management more seriously as a result of that, and then over the last 10 or 15 years have been thinking about different ways that I could apply that. So now we’re not doing as much with physics stuff, but I’m really interested in continuing this observation that all the world’s scientific knowledge hits publications — that’s where it is, that’s where it lives — and it’s somehow inaccessible even though it’s right out there to have, yet they can’t make use of it. So that got me really excited and that’s why I started working on the DeepDive project.
TSD: Can you tell me a little about the DeepDive project and how you’re trying to revolutionize databases and their capabilities?
CR: So one of the big problems is in a traditional database, you have to have very clean, precise information. Databases were really made for financial transactions. Every record that’s inside that database is perfectly correct, and so a lot of the technology that goes into that is about dealing with these perfectly correct, clean databases.
Now, if you want to go out into the literature, these are documents that are written by people, so they’re necessarily vague and ambiguous. They have a lot of imprecision in them. And so trying to map that text and those emails and those webpages that contain all that valuable information into that precisely structured form is a real challenge, and so people have been working on that in natural language processing, have been working on it in databases, they’ve been working on it in a bunch of different areas.
With DeepDive, what we’re really excited about is that we’ve been able to take advantage of the fact that the last 10 or 15 years, some great work has gone on here at Stanford like Chris Manning’s Stanford NLP [Natural Language Processing] group, who have been able to build all these tools that have allowed us to actually build these end-to-end systems, all the way from the text to the structured data, with really high quality. And that is based on not just the work in our group, that is a huge number of people who have contributed to that, Chris being among them.
What we’ve been focused on is now that we’ve started to build these systems, can we build them dramatically faster, dramatically more easily, with higher quality? DeepDive is really pushing in that direction, pushing for more information, higher resolution and easier to use we hope. We’re not very close yet but we’re moving in that direction.
TSD: You mentioned making it more accessible and easier to use. What further applications do you see DeepDive being used for?
CR: Even just today, we’ve been talking to people who have a whole bunch of image data, so we have some students who have these projects where they’re looking at things like lung cancer. So they have medical images that are actually looking at the cells of different tissue, and they have reports that are written by pathologists, and the pathologist has a really hard challenge. They have to look through these massive images and try to identify how severe is a cancer, what kind of cancer is it, is it even cancer to begin with?
Machines don’t get tired — they can look at all this information — and what we’re looking for in DeepDive is that they can expand the types of information, the types of data that are involved, and that will allow us to have applications in medical schools and various branches of science.
We’ve been active in anti-human trafficking. The possibilities are really staggering for us right now. We’re getting a lot of emails, as one might imagine, about different places to apply it. So we’re really excited to try to find out which key new features are going to enable new applications. But the easy one is medical imaging.
TSD: The MacArthur Foundation commits itself to supporting people who are committed to building a “more just, verdant and peaceful world.” Why do you believe that you were selected to uphold this mission and how do you think you can serve that cause?
CR: It’s a great charter. I’m not sure that we deserve that mantle, but we’re really excited by it. I think one of the things that the MacArthur Foundation was interested in, or at least they told us they were interested in, was the fact that we were taking some of this technology and applying it in new places for society, like the anti-human trafficking work that we’ve been doing with DARPA and a bunch of other teams.
This is a problem — anti-human trafficking — that has gone on for a long period of time, but it hasn’t really received a lot of attention because it’s very difficult to get ahold of. But this is the kind of data that we can unearth now, and then go back and have a societal impact. And I’m hopeful that with this award and with some of the attention for this kind of work that more people will engage with it. And at Stanford we already have a tradition of it. My colleague Jure Leskovec ran something on data mining for societal good this past year.
So at Stanford we’re sort of leading the way of trying to say, “Hey, all this great technology not only can make better services for people, better consumer applications, but can fundamentally change society.” So this is really the exciting place to be to do that work, and I think in part the MacArthur was not just recognizing our individual work but was recognizing all the great people who have gone on here, and I got lucky — I got picked out of the hat, so that was pretty exciting.
Contact Tristan Vanech at tvanech ‘at’ stanford.edu.