Memory- and Computation-Efficient Statistical Tools for Big Matrices

Meetup
Anglais
Extension
Auteur·rice
Date de publication

8 mars 2022

Bonjour à toutes et tous,

R Lille organise un nouveau MeetUp :
Memory- and Computation-Efficient Statistical Tools for Big Matrices

Celui-ci sera présenté par Florian Privé et aura lieu le Thursday, the 7th of April, 2022 - 18:00 CET.

[Résumé]
R package {bigstatsr} (https://github.com/privefl/bigstatsr) provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory. The Package {bigstatsr} is based on a similar format (called FBM) as the format big.matrix provided by the R package {bigmemory}. The Package {bigstatsr} enables users with a laptop to perform statistical analysis of several dozens of gigabytes of data, and HPC users to handle matrices of hundreds of GB. The package is fast and efficient because of four different reasons. First, {bigstatsr} is memory-efficient because it uses only small chunks of data at a time. Second, special care has been taken to implement effective algorithms. Third, FBM objects use memory-mapping, which provides efficient accesses to matrices. Finally, as matrices are stored on-disk, many processes can easily access them in parallel.

The main features currently available in {bigstatsr} are:

  • truncated SVD/PCA,
  • penalized linear/logistic regressions,
  • column-wise linear/logistic regressions tests,
  • matrix operations,
  • parallelization / split-apply-combine stategy.

In this webinar, I will present this package and its features, as well as quickly present two other packages, {bigsparser}, which provides an on-disk sparse matrix format, and {bigsnpr}, which extends {bigstatsr} to provide specific tools for analyzing genotype data.

[Bio]
Florian Privé is a senior researcher in predictive human genetics, fond of Data Science and an R(cpp) enthusiast. He is also the founder and former organizer of the Grenoble R user group. You can find him on Twitter and GitHub as @privefl.

[A Propos]
This event (in English) is organised by the R Lille user group (https://rlille.fr/) (Lille, France). The meetup will be recorded and made available on YouTube (http://youtube.rlille.fr/).

Les inscriptions sont sur Meetup : http://meetup.rlille.fr/events/284358899

L’ensemble des diapositives sera mis à disposition sur le GitHub du groupe : https://github.com/RLille/meetups/tree/main/meetups/2022-04-07/materials

Le Meetup sera enregistré et diffusé sur Youtube (http://youtube.rlille.fr/).

À bientôt !
Mickaël CANOUIL