Skip to main content

Data-Intensive Scalable Computing: Taking Google-Style Computing Beyond Web Search

Randy Bryant ( Carnegie Mellon, School of Computer Science )
Web search engines have become fixtures in our society, but few people
realize that they are actually publicly accessible supercomputing
systems, where a single query can unleash the power of several hundred
processors operating on a data set of over 200 terabytes. With Internet
search, computing has risen to entirely new levels of scale, especially
in terms of the sizes of the data sets involved. Google and its
competitors have created a new class of large-scale computer systems,
which we label "Data-Intensive Scalable Computer" (DISC) systems. DISC
systems differ from conventional supercomputers in their focus is on
data: they acquire and maintain continually changing data sets, in
addition to performing large-scale computations over the data.

With the massive amounts of data arising from such diverse sources as
telescope imagery, medical records, online transaction records, and web
pages, DISC systems have the potential to achieve major advances in
science, health care, business, and information access. DISC opens up
many important research topics in system design, resource management,
programming models, parallel algorithms, and applications. By engaging
the academic research community in these issues, we can more
systematically and in a more open forum explore fundamental aspects of a
societally important style of computing.

Share this: