LASSO Regression with Multiple Imputations for the Selection of Key Variables Affecting the Fatty Acid Profile of Nannochloropsis oculata .
Vasilis AndriopoulosMichael KornarosPublished in: Marine drugs (2023)
The marine microalga Nannochloropsis oculata has garnered significant interest as a potential source of lipids, both for biofuel and nutrition, containing significant amounts of C16:0, C16:1, and C20:5, n-3 (EPA) fatty acids (FA). Growth parameters such as temperature, pH, light intensity, and nutrient availability play a crucial role in the fatty acid profile of microalgae, with N. oculata being no exception. This study aims to identify key variables for the FA profile of N. oculata grown autotrophically. To that end, the most relevant literature data were gathered and combined with our previous work as well as with novel experimental data, with 121 observations in total. The examined variables were the percentages of C14:0, C16:0, C16:1, C18:1, C18:2, and C20:5, n-3 in total FAs, their respective ratios to C16:0, and the respective content of biomass in those fatty acids in terms of ash free dry weight. Many potential predictor variables were collected, while dummy variables were introduced to account for bias in the measured variables originating from different authors as well as for other parameters. The method of multiple imputations was chosen to handle missing data, with limits based on the literature and model-based estimation, such as using the software PHREEQC and residual modelling for the estimation of pH. To eliminate unimportant predictor variables, LASSO (Least Absolute Shrinkage and Selection Operator) regression analysis with a novel definition of optimal lambda was employed. LASSO regression identified the most relevant predictors while minimizing the risk of overfitting the model. Subsequently, stepwise linear regression with interaction terms was used to further study the effects of the selected predictors. After two rounds of regression, sparse refined models were acquired, and their coefficients were evaluated based on significance. Our analysis confirms well-known effects, such as that of temperature, and it uncovers novel unreported effects of aeration, calcium, magnesium, and manganese. Of special interest is the negative effect of aeration on polyunsaturated fatty acids (PUFAs), which is possibly related to the enzymatic kinetics of fatty acid desaturation under increased oxygen concentration. These findings contribute to the optimization of the fatty acid profile of N. oculata for different purposes, such as production of, high in PUFAs, food or feed, or production of, high in saturated and monounsaturated FA methyl esters (FAME), biofuels.