Sebastiano Vigna

Professor at the Università degli Studi di Milano

Università degli Studi di Milano

Sebastiano Vigna’s research focuses on the interaction between theory and practice. He has worked on theoretical topics such as computability on the reals, distributed computability, self-stabilization, minimal perfect hashing, succinct data structures, query recommendation, algorithms for large graphs, pseudorandom number generation, theoretical/experimental analysis of spectral rankings such as PageRank, and axiomatization of centrality measures. However, he is also (co)author of several widely used software tools ranging from high-performance Java libraries to Scipy, a search engine, a crawler, a text editor, and a graph compression framework that is used by Common Crawl and Software Heritage for managing huge graphs. He recently ported the latter to Rust, supported by a new Rust framework for zero-copy deserialization and memory mapping. In 2011 he collaborated to the first computation of the distance distribution of the whole Facebook graph, from which it was possible to evince that on Facebook there are just 3.74 degrees of separation. His work on the Elias-Fano encoding and quasi-succinct indices has been implemented in Facebook’s “folly” library. He also proposed the first open ranking of Wikipedia pages (http://wikirank.di.unimi.it/), which is based on his body of work on centrality in networks. His pseudorandom number generator xorshift128+ is the current stock generator of Google’s V8 JavaScript engine, and it is used by Chrome, Safari, Firefox, Edge, and Node.js; it is also the stock generator of the Erlang language, whereas his generator xoshiro256++ is the SmallRng of Rust, and xoshiro256** is the stock generator of the .NET framework and Lua; he also participated with Guy Steele to the redesign of the Java 18 random API, which now includes several of his generators.

Interests

Compression of web and social graphs
Analysis of web and social graphs
Pseudorandom number generators
Efficient data structures for large datasets