Skip to main content

Data Curation at Scale

Professor Renée J. Miller

Data curation seeks to support all the processes necessary to manage and maintain value in data over its lifecycle. Curation includes the process of understanding the origins of data, how it was created, cleaned, or integrated, as well as data discovery. In Toronto's Big Data Curation Lab, we are studying how to scale important data curation tasks.  Among these, data integration and data discovery are perhaps the most important ways to add value to data.  In our iBench project, we are striving to raise the bar in empirical evaluation of data integration, data exchange, and schema mapping systems by providing a metadata generator that can create large, realistic scenarios to test both the scalability and functionality of systems.

In complementary research, we are investigating the challenges of curating open data.  In this talk, I will briefly describe our LinkedCT system, a curated linked open data system of clinical drug trials.  I will conclude the talk with some interesting challenges we have observed in creating internet scale data management solutions for open data.

Speaker bio

Renée J. Miller is a Professor of Computer Science and the Bell Canada Chair of Information Systems at the University of Toronto. She is a fellow of the Royal Society of Canada, Canada's National Academy, and a fellow of the ACM. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Premier's Research Excellence Award, and an IBM Faculty Award.

Her research is in the area of data integration and data curation. She and her IBM co-authors received the ICDT Test-of-Time Award for their influential 2003 paper establishing the foundations of data exchange. She has served on the Board of Trustees of the VLDB Endowment and as President of the Endowment. Her research is funded by NSERC, NSF, IBM, SAP, and Bell Canada among others. She received her PhD in Computer Science from the University of Wisconsin, Madison and Bachelor's degrees in Mathematics and in Cognitive Science from MIT.



Share this: