Query rewriting for caching and security
Supervisors
Suitable for
Abstract
Prerequisites: Foundational AI/ML background
Natural language interfaces for data-driven systems face a fundamental conflict between personalization and efficiency.
User queries are frequently "data-dependent," meaning the user's private data and the logical structure of their request are
intertwined within the query string itself. This fusion of logic and private data renders traditional caching ineffective,
as identical user intentions result in textually unique queries, forcing redundant computation. Furthermore, this design exposes
sensitive user data to the core query planning and optimization layers, creating significant privacy vulnerabilities and data
leakage risks throughout the system.
This research proposes a new AI model to address this challenge by performing
intelligent query-rewriting. The model will function as an abstraction layer, intercepting a data-dependent natural language
query and transforming it into two distinct components: a canonical, **data-independent template** that represents the abstract
operational intent, and a separate, structured **parameter object** that isolates all the user-specific data. This decoupling
is the central hypothesis, designed to systematically separate the *what* (the logic) from the *who* (the data).
The benefits of this separation are twofold. First, the data-independent templates become highly cacheable, allowing the
system to reuse computationally expensive execution plans for all users expressing the same intent, which promises a significant
increase in performance and scalability. Second, it enables a more secure processing model where the generalized template
is handled by a public-facing planner, while the isolated, sensitive data is managed by a secure, trusted module. This ensures
private information is shielded from the main planning environment and only introduced at the final point of execution, greatly
enhancing data privacy.