Relation Extraction with Matrix Factorization and Universal Schemas
At the heart of machine reading is relation extraction: predicting relations between entities, such as employeeOf(Person,Company). Machine learning approaches to this task require either manual annotation or, for distant supervision, pre-existing databases of the same schema. Yet, for many interesting questions (who criticised whom?) no suitable databases are available. For example, there is no critized(Person,Person) relation in Freebase.
In this talk I will first present some earlier work we have done in distantly supervised extraction. Then I will show that the need for pre-existing datasets can be avoided by using, what we call, a "universal schema": the union of all involved schemas (surface form predicates such as "X-was-criticized-by-Y", and relations in the schemas of pre-existing databases). This extended schema allows us to answer new questions not yet supported by any structured schema, and to answer old questions more accurately. For example, if we learn to accurately predict the surface form relation "X-is-scientist-at-Y", this can help us to better predict the Freebase employee(X,Y) relation.
To populate a database of such schema we present a family of matrix factorization models that predict affinity between database tuples and relations. We show that this achieves substantially higher accuracy than the traditional classification approach. More importantly, by operating simultaneously on relations observed in text and in pre-existing structured DBs, we are able to reason about unstructured and structured data in mutually-supporting ways. By doing so our approach outperforms state-of-the-art distant supervision even on the pre-existing schema.