Lab of Advanced Algorithms and Applications

The Lab of Advanced Algorithms and Applications (A³, read "a-cube") is mainly devoted to the design, analysis and experimentation of algorithms and data structures for storing, compressing, mining and retrieving information from large amounts of textual data like Web repositories, XML file collections, textual databases, genomic/DNA sequences.


We are developing a new semantic-annotation technology for short textual fragments, called TAGME [ACM CIKM 2010, IEEE Software 2012]. This tool has been applied succesfully to many contexts concerning with the clustering, the classification and the similarity-comparison of short texts. We are also studying the design of of “Multi-objective” data compressor, a novel paradigm in which optimization techniques are plugged into the design of data compressors. For example, in the Bicriteria Data Compression ([ACM-SIAM SODA14], [ESA2014] to appear), the compressor solves the following problem: "return a compressed file which minimizes the compressed space provided that the decompression time is less than T". Similarly, the compressor can "return a compressed file which minimizes the decompression time provided that the compressed space is less than S bits". A third line of research is a classic one for our group lasting for 20 years now and regarding the design of compressed indexes for big data (such as String B-tree [JACM 1999] and FM-index [JACM 2005]). Recently we have designed cache-oblivious compressed version of those indexes for dictionaries of strings [ESA 2013], and distribution-ware compressed indexes for data collections [Algorithmica 2013].

Current Partnerships

  • Bassilichi
  • Cerved Group
  • Google (Zurich)
  • Yahoo! Research (Barcelona)
  • Net7
  • SpazioDati
  • Tiscali Italia

Current Grants

  • [2014] Yahoo! Faculty Research and Engagement Program 2014
  • [2013-2015] Regione Toscana, Net7, StudioFlu, SpazioDati
  • [2013-2014] Bassilichi
  • [2013-2014] Google Research Award
  • [2012-2014] Italian MIUR-PRIN Project "ARS-Technomedia"
  • [2011-2012] Telecom Italia Working Capital
  • [2010] Google Research Award
  • [2010-2012] Italian MIUR-PRIN Project "The Mad Web"
  • [2006-2011] Yahoo! Research
  • [2009-2012] Italian MIUR-FIRB Project "Linguistica"

Latest News

WWW 2015 accepted papers » January 26, 2015

We announce that our papers “Compressed indexes for string-searching in labeled graphs” and “GERBIL – General Entity Annotator Benchmark” have been accepted at the 24th International World Wide Web Conference (WWW 2015) conference.

See you in Florence ;)

“Partitioned Elias-Fano Indexes” wins the SIGIR Best paper award » July 18, 2014

We are glad to announce that Giuseppe Ottaviano and Rossano Venturini’s “Partitioned Elias-Fano Indexes” won the SIGIR 2014 Best Paper Award. Here’s a link to the article.

Yahoo Faculty Research and Engagement Program award » July 18, 2014

We are happy to announce that Rossano Venturini has been awarded with a Yahoo Faculty Research and Engagement Program (FREP)The project involves the design and the experimentation of compressed data structures for indexing users’ personal collections of documents which grow and change frequently in the time.

SMAPH achieves the best result at ERD 2014 » July 17, 2014

We are happy to announce that our system, SMAPH, co-developed by Marco Cornolti and Paolo Ferragina (University of Pisa), Massimiliano Ciaramita (Google), Hinrich Schütze and Stefan Rüd (University of Munich) achieved the best result in the ERD Challenge hosted by SIGIR 2014. Teams participating in the challenge (around 20) had to build a working system to do Entity Recognition and Disambiguation on search-engine queries, i.e. given a query, find the entities associated to it.

The problem of NER in queries is somehow harder than in long texts. Queries are often malformed, ambiguous and, most of all, lack context. A searcher that issue a query like glasses may be interested either in the drinkware or in eyeglasses, while a searcher that issue a query like google glasses has yet another need. armstrong moon landing should point to Neil Armstrong, while armstrong trumpet should point to Louis Armstrong.

SMAPH disambiguates queries. It piggybacks on a search engine to normalize the keywords of the query, then disambiguates them, and prune away bad entities. On the ERD challenge (short track) it scored the best result (68.5% F1). The system will shortly be available to be queried through a web service. Details on the system implementation are given in a paper.

We also participated as Acube Lab with WAT, a new version of TagMe, to the long-track competition (i.e. disambiguation of long texts), achieving a nice result (though unbalanced towards precision ;) ).

Further readings:  Research at Google blogLMUERD Challenge papers.