FlashRelate: Extracting Relational Data from Semi-Structured Spreadsheets Using Examples
With hundreds of millions of users, spreadsheets are one of the most important end-user applications. Spreadsheets are easy to use and allow users great flexibility in storing data. This flexibility comes at a price: users often treat spreadsheets as a poor man’s database, leading to creative solutions for storing high-dimensional data in a two dimensional grid. The trouble arises when users need to answer queries with their data. Data manipulation tools make strong assumptions about data layouts and cannot read these ad-hoc databases. Converting data into the appropriate layout requires programming skills or a major investment in manual reformatting. The effect is that vast amounts of real-world data is “locked-in” to a proliferation of one-off formats.
We introduce FlashRelate, a synthesis engine that lets ordinary users extract structured data from spreadsheets without programming. Instead, users drive the extraction process by specifying output examples, which FlashRelate uses to synthesize a program in Flare. Flare is a novel extraction language that extends regular expressions with a geometric constructs. We built an interactive user interface on top of FlashRelate that lets end-users generate Flare programs by point-and-click. We demonstrate that correct extraction programs can be synthesized in seconds from a small number of examples for 43 real-world scenarios. Finally, our case study shows that FlashRelate addresses the widespread problem of data trapped in corporate and government formats.
A video demonstration is available at: http://tinyurl.com/mh3bo3a
Tue 16 Jun
|09:15 - 09:40|
Jedidiah McClurgUniversity of Colorado Boulder, Hossein HojjatCornell University, Pavol CernyUniversity of Colorado Boulder, Nate FosterCornell UniversityPre-print Media Attached
|09:40 - 10:05|
Aditya NoriMicrosoft Research, UK, Sherjil OzairIIT Delhi, Sriram RajamaniMicrosoft Research, Deepak VijaykeerthyMicrosoft ResearchMedia Attached
|10:05 - 10:30|
Dan BarowyUniversity of Massachusetts Amherst, Sumit GulwaniMicrosoft Research, Ted HartMicrosoft Research, Benjamin ZornMicrosoft ResearchMedia Attached
|10:30 - 10:55|