Data Science on Information-rich Graphs: New Frontiers, Problems and Methods
12 giugno 2019 alle ore: 11:00
Sala Consiglio 8 Piano, via Celoria 18
Speaker: Francesco Gullo, UniCredit R&D Department
Persona di riferimento: Nicolò Cesa-Bianchi
Graphs have nowadays become a ubiquitous model to represent real-world structured data. They are routinely used to describe a large variety of data such as the Web, social networks, knowledge bases, (heterogeneous) information networks, biological networks, financial networks, and many more. The proliferation of heterogeneous data acquired from a variety of sources has given rise to more and more complex linked-data representations. As a result, today's real-world graphs exhibit a wide set of additional information assigned to their vertices and/or edges: weights, labels, feature vectors, probabilities of existence, probability distributions over weights or labels, time series capturing the dynamic evolution of the network, and so on. These enriched graphs constitute a unique opportunity, but also a serious challenge, for improving the quality of data-science methods on graphs.
The focus of this talk is on formulating, theoretically characterizing, and designing effective yet efficient algorithms for novel data-science problems on information-rich graphs. Specifically, as exemplary cases, we discuss two problems: conditional reliability in uncertain graphs and network-based receivable financing. The former is the problem of estimating reliability between vertices of an uncertain graph -- a graph whose edges exist with a certain probability -- when edge-existence probabilities depend on a set of conditions. In particular, the main goal here is to determine the top-k conditions that maximize the reliability between two (sets of) vertices. The second problem deals with receivable financing, a well-established service in finance whereby cash is advanced to firms against receivables their customers have yet to pay. In particular, here we show how the limitations of traditional centralized receivable-financing services can be overcome by adopting a network-based perspective where customers are able to autonomously pay each other. This is achieved by formulating and solving a novel combinatorial-optimization problem on a (multi-)graph of receivables.
Francesco Gullo is a researcher at UniCredit, R&D department. He received his PhD from the University of Calabria, Italy, in 2010. During his PhD, he was an intern at the George Mason University, Fairfax VA, USA. Before joining UniCredit, he spent 1.5 years in the University of Calabria, Italy (as a postdoc), and 4 years in the Yahoo Labs, Spain (as a postdoc first, and as a research scientist then).
His research is focused on algorithmic data science, i.e., on formulating novel problems to gain insights/information/knowledge from data, theoretically characterizing them, and designing/analyzing effective yet efficient algorithms for their solution. As far as data types, special emphasis has been given to graphs, text, and time series, but he has also dealt with Euclidean data, probabilistic data, and semistructured data. Large-scale data processing and combinatorial optimization are frequently-occurring keywords in his work. His research has been published in premier venues in the areas of data mining, machine learning, databases, Web data management, and NLP, including SIGMOD, VLDB, KDD, ICDM, CIKM, EDBT, WSDM, ECML-PKDD, SDM, TODS, TKDE, TKDD, Machine Learning, DAMI, JCSS, Pattern Recognition.
He has also been active in serving the data-science scientific community, by, e.g., being Workshop Chair of ICDM’16, organizing workshops/symposia (MIDAS workshop @ECML-PKDD['16-'19], MultiClust mini-symposium @SDM’14, MultiClust workshop @KDD’13, 3Clust workshop @PAKDD’12), or being part of the program committee of major conferences (KDD, WWW, ICDM, WSDM, ECML-PKDD, CIKM, ICWSM, SDM).