Stefano Cairo, PhD, and Gilad Silberberg, PhD, answer key questions from their recent webinar.
- What’s the minimal sample size required for a PPMO study? As for any statistical test, a minimum of samples is required to reach a reasonable statistical power. In the case of machine learning tasks, we also need a test group, thus the requirement is even higher in terms of numbers, usually we have good results with at least 30 samples for the training set and several samples to be defined as a test set.
- Can PPMO integrate other data types? Yes, this is possible, we can integrate SNP data, epidemiologic, and demographic data, and all data that are considered relevant.
- How many PDX in vivo models would be needed per cancer indication to get reliable results of drug response? There is no predefined number, the experimental design is tailored to the scientific question that is addressed. All our PDXs are very well characterized at the molecular level, including RNASeq, WES, and whole proteome analysis, which integrate detailed clinical information and PDX growth parameters, and we have developed a web-based tool to facilitate the selection of the models of interest. It may happen that, given the variety and extension of our PDX panel, the number of PDXs selected is too wide and should be reduced for reasons of time and or budget, in these cases it is possible to run an ex vivo prescreen of the selected models to identify the PDXs that will be tested in vivo.
- What can PPMO/sPLS be used for and what are the disadvantages of sparse PLS? The sparse approach is designed to reduce the number of features common to machine learning models in biological systems. The abundance of features in omics poses an issue for machine learning, with a sparse approach the model focuses on the minimal predictive number of features, the elements related to the mode of action are eliminated from the mode of action, so the model is not concerned with the mode of action but with the minimal number of features that allows discriminating the two groups.
- It is great that you can classify responders and non-responders based on multi-omics approach. How do you see this translate to clinical practice for patient enrollment based on the identified biomarkers/signature? The bottleneck of many molecular classifiers is indeed their application in the clinical routine. The most straightforward approach for a multiomics-based classifier would be to focus on the molecular parameters identified by the classifier that can be measured in the pathology laboratories, evaluate the classifier's robustness in the training and test set, and retrospectively validate it in clinical cohorts.
- In the data-driven approach for biomarker discovery, how does the number of models and their tumor types influence the discovered biomarkers? The most challenging issue in biomarker discovery is to overcome patients’ tumor heterogeneity, as such, the more representative of the patient population the models used for biomarker identification are, the higher the chances to identify molecular features that correlate with tumor behavior regardless of the tumor entity or subtype.
- For biomarkers discovery purposes and potentially develop a CDX, would you suggest working with a relatively low 'n' in PDX models, or a larger population in organoids? For the development of a CDX, the use of PDX is recommended as they are physiologically much more like the human tumor. Our in vivo platform is an ideal support for biomarker development in reason of the very large number of models that allow cover patients’ heterogeneity and their extremely well-characterized molecular profile.
- I see you have around 26 PDX models with RNASeq, WES, and proteomics available in a specific indication. Would this population be sufficient to robustly identify biomarkers of sensitivity/resistance? This number limits the machine learning workflow. One exploratory alternative is to train the model without validation. This can be done either as a sparse, or a non-restrictive approach. The interpretation of the latter would be like that of factor analysis.