A benchmark to test linguistic robustness.

View the Project on GitHub DataManagementLab/ParaphraseBench

Introducing ParaphraseBench – a benchmark to evaluate the robustness of NLIDBs.

Current benchmarks like the GeoQuery benchmark to not explicitly test different linguistic variations which is important to understand the robustness of an NLIDB. For testing different linguistic variants in a principled manner, we therefore curated a new benchmark as part of our paper on DBPal that covers different linguistic variations for the user NL input and maps it to an expected SQL output.

The schema of our new benchmark models a medical database which contains only one table comprises of hospital’s patients attributes such as name, age, and disease. In total, the benchmark consists of 290 pairs of NL-SQL queries. The queries are grouped into one of the following categories depending on the linguistic variation that is used in the NL query: naıve, syntactic paraphrases, morphological paraphrases, and lexical paraphrases as well as a set of queries with missing information.

While the NL queries in the naıve category represent a direct translation of their SQL counterpart, the other categories are more challenging: syntactic paraphrases emphasize structural variances, lexical paraphrases pose challenges such as alternative phrases, semantic paraphrases use semantic similarities such as synonyms, morphological paraphrases add affixes, apply stemming, etc., and the NL queries with missing information stress implicit and incomplete NL queries.

In the following, we show an example query for each of these categories in our benchmark:

Please cite the following paper when using this benchmark (download bib file):

Title: An End-to-end Neural Natural Language Interface for Databases
Authors: Utama, Prasetya; Weir, Nathaniel; Basik, Fuat; Binnig, Carsten; Cetintemel, Ugur; Hättasch, Benjamin; Ilkhechi, Amir; Ramaswamy, Shekar; Usta, Arif
Publication: eprint arXiv:1804.00401
Publication Date: 04/2018
Origin: ARXIV
Keywords: Computer Science - Databases, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction
Bibliographic Code: 2018arXiv180400401U

Read about how to use the benchmark.