The code at this page can be downloaded here
Before starting, please ensure you have installed the TriplotGUI package following the Setup instructions.
Data description
The data used in this tutorial section can be downloaded here in rda format.
This example utilizes HealthyNordicDiet_2, which is a modified version of the HealthyNordicDiet data provided by the Triplot package (Schillemans et al. 2019). The HealthyNordicDiet is a synthetic dataset that simulates data from a case-control study within the Swedish Västerbotten Intervention Programme cohort, exploring plasma metabolites reflecting healthy Nordic dietary patterns and their association with type 2 diabetes development (Shi et al. 2018). The simulated HealthyNordicDiet_2 data consists of three data frames:
- Clinical measurements (ClinData): Contains 11 variables:
"ID","T2D"(Cases and controls for type 2 diabetes),"PairedInfo"(Pairing/matching information for case and controls, a unique number is assigned to each case-control pairs),"Energy","Gender","Age","BMI","Smoking","PhysicalActivityIndex","Education","FastingGlucose".
- Plasma metabolites predictive of BMI (MetaboliteData): Comprises 20 variables
- Dietary intake as measured by food frequency questionnaires (FoodData): Includes 11 variables:
"HFI"(Healthy Food Index),"BSDS"(the Baltic Sea Diet Score),"Wholegrains","Sausage","Pizza","Refined.bread","Fruits","Liquor","Wine","Hamburger","Poultry","Fish","Margarine","Cabbage","Carrot","low_fat_Dairy","Vegetables"
The data frames are row-wise matched by observations and consist of 1000 synthetic observations.
Reference
Shi, L., Brunius, C., Johansson, I., Bergdahl, I. A., Lindahl, B., Hanhineva, K., & Landberg, R. (2018). Plasma metabolites associated with healthy Nordic dietary indexes and risk of type 2 diabetes—a nested case-control study in a Swedish population. The American Journal of Clinical Nutrition, 108(3), 564-575. 10.1093/ajcn/nqy145
Schillemans T, Shi L, Liu X, Åkesson A, Landberg R, Brunius C. Visualization and Interpretation of Multivariate Associations with Disease Risk Markers and Disease Risk-The Triplot. Metabolites. 2019 Jul 6;9(7):133. doi: 10.3390/metabo9070133
Research question
We want to assess the relationship between diet, metabolic profiles and Type 2 diabetes and BMI. We assume that metabolites serves as mediators, mediating the effect of dietary exposures on Type 2 diabetes and BMI.
Data exploration
HealthyNordicDiet_2 is loaded in the R environment upon running library(TriplotGUI). Let’s begin with some data exploration to understand the structure and contents of the dataset:
Check HealthyNordicDiet_2
Check the HealthyNordicDiet_2 list:
Check datasets
Check the names of variables in each data:
Show the code
colnames(HealthyNordicDiet_2$ClinData) [1] "ID" "T2D" "PairedInfo"
[4] "Energy" "Gender" "Age"
[7] "BMI" "Smoking" "PhysicalActivityIndex"
[10] "Education" "FastingGlucose"
Show the code
colnames(HealthyNordicDiet_2$MetaboliteData) [1] "Pipecolic.acid.betaine" "lysoPC.18.0."
[3] "HP143.1179.1.55" "lysoPE.22.6."
[5] "HN151.0065.4.97" "RP225.1482.9.15"
[7] "X3.4.5.Trimethoxycinnamic.acid" "DHA"
[9] "RP383.1671.10.53" "gama.tocopherol"
[11] "RP431.3516.11.96" "RP490.3516.8.99"
[13] "lysoPC.22.6." "Steroid.glucuronide"
[15] "RP684.5545.13.41" "RP197.0926.3.17"
[17] "RP303.6468.9.98" "RP816.5684.12.52"
[19] "RP827.5356.11.88" "RP826.5442.12.17"
[21] "RN201.1498.8.23" "RN203.0022.3.00"
[23] "EPA" "PC.18.2.15.0."
[25] "RN814.5610.12.14" "RN153.0190.3.5"
[27] "RN427.1641.10.59" "RN457.1742.10.50"
[29] "RN830.5845.12.46" "RN446.1541.10.59"
[31] "RN472.1596.10.53"
Show the code
colnames(HealthyNordicDiet_2$FoodData) [1] "HFI" "BSDS" "Wholegrains" "Sausage"
[5] "Pizza" "Refined.bread" "Fruits" "Liquor"
[9] "Wine" "Hamburger" "Poultry" "Fish"
[13] "Margarine" "Cabbage" "Carrot" "low_fat_Dairy"
[17] "Vegetables"
Check variables’ class
We then transform the data to dataframe format and use TriplotGUI’s checkdata() function to examine the classes of variables.
Show the code
ClinData <- as.data.frame(HealthyNordicDiet_2$ClinData)
MetaboliteData <- as.data.frame(HealthyNordicDiet_2$MetaboliteData)
FoodData <- as.data.frame(HealthyNordicDiet_2$FoodData)
ClinData_check <- checkdata(ClinData)
MetaboliteData_check <- checkdata(MetaboliteData)
FoodData_check <- checkdata(FoodData)Show the code
ClinData_check$class_summary_statistics$check_class_vector
ID T2D PairedInfo
"numeric" "factor" "factor"
Energy Gender Age
"numeric" "factor" "numeric"
BMI Smoking PhysicalActivityIndex
"numeric" "factor" "factor"
Education FastingGlucose
"factor" "numeric"
$check_class_table
check_class_vector
factor numeric
6 5
Show the code
MetaboliteData_check$class_summary_statistics$check_class_vector
Pipecolic.acid.betaine lysoPC.18.0.
"numeric" "numeric"
HP143.1179.1.55 lysoPE.22.6.
"numeric" "numeric"
HN151.0065.4.97 RP225.1482.9.15
"numeric" "numeric"
X3.4.5.Trimethoxycinnamic.acid DHA
"numeric" "numeric"
RP383.1671.10.53 gama.tocopherol
"numeric" "numeric"
RP431.3516.11.96 RP490.3516.8.99
"numeric" "numeric"
lysoPC.22.6. Steroid.glucuronide
"numeric" "numeric"
RP684.5545.13.41 RP197.0926.3.17
"numeric" "numeric"
RP303.6468.9.98 RP816.5684.12.52
"numeric" "numeric"
RP827.5356.11.88 RP826.5442.12.17
"numeric" "numeric"
RN201.1498.8.23 RN203.0022.3.00
"numeric" "numeric"
EPA PC.18.2.15.0.
"numeric" "numeric"
RN814.5610.12.14 RN153.0190.3.5
"numeric" "numeric"
RN427.1641.10.59 RN457.1742.10.50
"numeric" "numeric"
RN830.5845.12.46 RN446.1541.10.59
"numeric" "numeric"
RN472.1596.10.53
"numeric"
$check_class_table
check_class_vector
numeric
31
Show the code
FoodData_check$class_summary_statistics$check_class_vector
HFI BSDS Wholegrains Sausage Pizza
"factor" "numeric" "numeric" "numeric" "numeric"
Refined.bread Fruits Liquor Wine Hamburger
"numeric" "numeric" "numeric" "numeric" "numeric"
Poultry Fish Margarine Cabbage Carrot
"numeric" "numeric" "numeric" "numeric" "numeric"
low_fat_Dairy Vegetables
"numeric" "numeric"
$check_class_table
check_class_vector
factor numeric
1 16
Note that in the given data, "HFI"(Healthy Food Index) is a factor variable and "BSDS"(the Baltic Sea Diet Score) is a numeric variable. You could use table(FoodData$HFI) and table(FoodData$BSDS) to check the difference.
Check sanities for variables
To check for missing (NA) or abnormal values (e.g., NaN, negative values, infinite values, blank values) in each variable, you can use the checkdata() function. This function generates a summary table for each data frame, showing the number of observations containing NA, NaN, zero, negative, infinite (Inf), and blank values, along with their percentages.
Show the code
if (!require(kableExtra)) {
install.packages("kableExtra") # Install the package if it's not available.
}
library(kableExtra) # Load it after installation.Show the code
kable(ClinData_check$everycolumn)| ID | T2D | PairedInfo | Energy | Gender | Age | BMI | Smoking | PhysicalActivityIndex | Education | FastingGlucose | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| NA_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NA_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NaN_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NaN_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| negative_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| negative_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| zero_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| zero_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| inf_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| inf_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| blank_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| blank_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Show the code
kable(MetaboliteData_check$everycolumn)| Pipecolic.acid.betaine | lysoPC.18.0. | HP143.1179.1.55 | lysoPE.22.6. | HN151.0065.4.97 | RP225.1482.9.15 | X3.4.5.Trimethoxycinnamic.acid | DHA | RP383.1671.10.53 | gama.tocopherol | RP431.3516.11.96 | RP490.3516.8.99 | lysoPC.22.6. | Steroid.glucuronide | RP684.5545.13.41 | RP197.0926.3.17 | RP303.6468.9.98 | RP816.5684.12.52 | RP827.5356.11.88 | RP826.5442.12.17 | RN201.1498.8.23 | RN203.0022.3.00 | EPA | PC.18.2.15.0. | RN814.5610.12.14 | RN153.0190.3.5 | RN427.1641.10.59 | RN457.1742.10.50 | RN830.5845.12.46 | RN446.1541.10.59 | RN472.1596.10.53 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NA_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NA_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NaN_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NaN_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| negative_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| negative_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| zero_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| zero_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| inf_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| inf_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| blank_number | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| blank_percentage | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Show the code
kable(FoodData_check$everycolumn)| HFI | BSDS | Wholegrains | Sausage | Pizza | Refined.bread | Fruits | Liquor | Wine | Hamburger | Poultry | Fish | Margarine | Cabbage | Carrot | low_fat_Dairy | Vegetables | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NA_number | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| NA_percentage | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| NaN_number | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| NaN_percentage | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| negative_number | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| negative_percentage | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| zero_number | 0 | 0 | 63.000 | 139.000 | 195.000 | 155.000 | 180.00 | 367.000 | 288.000 | 231.000 | 117.000 | 79.000 | 41.000 | 253.000 | 225.000 | 237.000 | 192.000 |
| zero_percentage | 0 | 0 | 0.063 | 0.139 | 0.195 | 0.155 | 0.18 | 0.367 | 0.288 | 0.231 | 0.117 | 0.079 | 0.041 | 0.253 | 0.225 | 0.237 | 0.192 |
| inf_number | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| inf_percentage | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| blank_number | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| blank_percentage | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
You shall only continue to use the data when the class of variables are correct and the missing or abnormal values in the variable are properly handled.
Build data for analysis
We treat food variables as our exposures and BMI and type 2 diabetes as outcomes. Note that BMI is a continuous variable. We can additionally construct BMI as a categorical outcome using the following criteria:
- BMI <= 18.5: Underweight
- 18.5 <BMI <= 25: Normal
- 25 < BMI <= 30: Overweight
- 30 < BMI: Obese
We want to explore the exposure-outcome relationships through the metabolomics data, using metabolites as assumed mediators. Food data serves as exposures, while BMI (both continuous and categorical) and Type 2 Diabetes are treated as outcomes. Gender, age, education, physical activity, and smoking are used as potential confounders for the exposure-mediator and mediator-outcome associations. Additionally, auxiliary information regarding case-control matches, fasting glucose levels, education, and physical activity is provided to enhance the analysis.
Show the code
exposure2 <- FoodData
Omics2 <- MetaboliteData
outcome2 <- cbind.data.frame(ClinData[,c("T2D","BMI")], BMI_cat)
covariates2 <- ClinData[, c("Gender" ,"Age","Education","PhysicalActivityIndex","Smoking" )]
auxiliary2 <- ClinData[, c("PairedInfo","FastingGlucose","Education","PhysicalActivityIndex", "Energy")]Code example of using TriplotGUI
We will use more complex settings in this example. Please go to Tutorial(simple).if you want to try simpler settings.
Step 1: Data reduction of Omics variables
Using TriplotGUI package, first we perform dimension reduction, using weighted correlation network analysis (WGCNA) on metabolomics data.
Show the code
# Using the complex settings
reduced_Omics2 <- OmicsReduction(dataframe = Omics2,
plotType = c("scree","score","loading","scoreloading"),
pcType = "principal",
# Users can choose freely to use principal() instead of prcomp() to perform PCA.The result should be very similar, but principal() allows more functionalities
firstPC = 1,
secondPC= 2,
option = "WGCNA",
# In this example we use WGCNA to perform data reduction, therefore specifying `option="WGCNA"`.
scale = TRUE,
center = TRUE,
eigenLoading = "loading",
# If you want to show loading or scale it to eigevalues.
rotate = "none",
# This argument is inherited from the principal() function
sizeVariable = auxiliary2$FastingGlucose,
sizeVariableName = "FastigGlucose",
colorVariable = auxiliary2$Education,
colorVariableName = "Education",
shapeVariable = auxiliary2$PhysicalActivityIndex,
shapeVariableName = "PhysicalActivityIndex",
# The size,shape,color variables are used to differentiate points on the score plot and biplot
loadingsName = TRUE,
# Show the names of loadings
loadingsCutpercent = 0.2,
# The loadings bigger than 20% of all the loadings will show up
minModuleSize = 3
# The mimimum number of variable in each cluster of WGCNA
)In WGCNA, Omics variables are grouped into clusters. PCA is performed on each cluster, and the first principal component of each cluster is used as the module for WGCNA.
You could try out the arguments of the
OmicsReduction()function to visualize the results and view the whole function here-
The
minModuleSizeargument inOmicsReduction()is crucial as it defines the minimum number of variables in each cluster. Setting this too high may result in all variables being grouped into a single cluster. For instance, with 20 variables andminModuleSize=15, you might end up with one cluster containing all 20 variables. In this case, only the first principal component of this single cluster would be used and displayed on both axes of the resulting triplots.To avoid this, adjust
minModuleSizebased on your total variable count and desired cluster granularity. Experiment with different values to achieve a balance between meaningful clusters and interpretable results.
You can see scree plot, score plot, loadings plot and biplot at this stage.
The scree plot in WGCNA visualizes the variance explained by each cluster. Clusters are color-coded, with column heights representing the proportion of total variance explained by each cluster. The darker color in each column shows the variance explained by the first principal component (module) of that cluster.
The grey cluster, showing only a darker color, indicates it contains just one variable, which serves as its own first principal component/module. This can occur even with minModuleSize=2, as some variables may be left unclustered.
Clusters are arranged from left to right based on the number of variables they contain, with the leftmost cluster having the most variables and the rightmost having the least. This arrangement helps in quickly identifying the relative sizes and importance of different clusters in the dataset.
In the score plot, observations are colored by education level, with different shapes representing varying physical activity indices and different sizes indicating fasting glucose levels. This visualization allows for an integrated view of how these characteristics relate to the modules derived from the WGCNA analysis, highlighting patterns among individuals based on their traits.
The loading plot displays loadings aligned along the axes, reflecting the use of first principal components/modules from each cluster in WGCNA. This alignment occurs because each axis represents a distinct principal component analysis, making it impossible for loadings to point in the direction of another axis.
Sometimes the following text will show up in your console and the figure cannot be directly run:
libpng warning: Image width is zero in IHDR
libpng error: Invalid IHDR data
Warning message:
In dev.off(which) : agg could not write to the given file\
Run gc() in your console to clear the memory. Then rerun the code again to view the figure.
Although informative, the biplot combining both score and loading can appear cluttered due to the high density of information it contains.
A TPObject is constructed based on the data reduction results from the WGCNA analysis. This object serves as a central repository for storing and trandfering information generated through the various steps during the analysis.
Show the code
scores <- reduced_Omics2$object$scores
loadings <- reduced_Omics2$object$loadings
variance <- reduced_Omics2$object$variance
TPObject1 <- makeTPO(scores = scores,
loadings = loadings,
variance = variance)
Making TriPlotObject (TPO)
--------------------------
Score matrix has 1000 observations and 5 components.
TPO has 0 attached correlations.
TPO has 0 attached risks.
Step 2-3: Exposure-Omics and Omics-Outcome associations
The associations between modules and food items and the associations between modules and risk markers can be then calculated, with confounders adjusted. Note that it is possible for users to skip code regarding exposure association or risk association, if their interests focus on only outcome associations or exposure associations with Omics variables.
The modules information saved in the TPObject can be directly utilized in the exposureAssociation() function. This function calculates exposure associations coefficients and p-values between the modules and exposures (food variables and diet scores), while adjusting for specified confounders.
Show the code
Correlations_object <- exposureAssociation(TPObject = TPObject1,
exposure = exposure2,
use = 'pairwise',
# use inherit from cor() function
method = 'pearson',
# method inherit from cor() function
allowCategorical = FALSE,
# one-hot-encoding on categorical exposures
partial = TRUE,
# perform confonder adjustment
confounder = covariates2)-
The
exposureAssociation()function offers options for handling categorical variables and adjusting for confounders. When settingallowcategorical=FALSE, categorical exposures with more than two classes undergo one-hot-encoding, transforming them into binary variables. This can be verified by examiningCorrelations_object$cor_estimate, which shows separate correlation coefficients for each binary representation of HFI levels. However,allowcategorical=FALSEis limited for meaningful use. The binary variables may be only meaningful when we want to compare one level to all the other levels together. We use this option for categorical variable"HFI"for demonstration purpose. Ideally, one should useallowcategorical=TRUEfor HFI.Conversely,
allowcategorical=TRUEleaves categorical variables like"HFI"untransformed. In this case, linear models regressing modules on"HFI"are used for correlation analysis. The correlation coefficients are derived from the square root of the proportion of HFI’s sum of squares to the total sum of squares. Setting
partial=TRUEensures that specified confounders are adjusted for in the correlation analysis.
The correlation results obtained from the exposureAssociation() function are then incorporated into the TPObject.
Show the code
TPObject2 <- addExposure(TPObject = TPObject1,
corrEstimate = Correlations_object$cor_estimate,
corrPvalue = Correlations_object$cor_pvalue)
Adding correlation to TPO
-------------------------
TPO has 23 attached correlations.
TPO has 0 attached risks.
The modules information saved in the TPObject can be directly utilized in the outcomeAssociation() function. This function calculates risk estimates and p-values between the modules and outcomes (BMI, type 2 diabetes and categorical BMI), while adjusting for specified confounders.
Show the code
Risks_object <- outcomeAssociation(TPObject = TPObject2,
outcome = outcome2,
confounder = covariates2,
partial = TRUE,
# performs confounder adjustment
multinomial = TRUE,
# multinomial regression is used
pair = auxiliary2$PairedInfo,
# use the pairing information of case and control
CI = 0.95
# confidence interval
)The outcomeAssociation() function offers various options for handling outcomes and adjusting for confounders, with behavior differing based on whether pair information is provided.
- Without
pairinformation:-
multinomial=FALSEtransforms categorical outcomes with >2 classes into binary variables via one-hot-encoding. Logistic regression is used for binary outcomes, and linear regression for continuous outcomes. -
multinomial=TRUEapplies multinomial regression to categorical outcomes with >2 classes, yielding n-1 estimates. Binary and continuous outcomes are handled as before.
-
- With pair information:
-
multinomial=FALSEtransforms categorical outcomes as above, then uses conditional logistic regression for binary outcomes and linear mixed models for continuous outcomes, incorporating pairing information. -
multinomial=TRUEignores pairing information, applying multinomial regression to categorical outcomes and standard logistic/linear regression to binary/continuous outcomes.
-
- Setting
partial=TRUEensures adjustment for specified confounders in all analyses.
The results for risk estimates obtained from the outcomeAssociation() function are then incorporated into the TPObject.
Show the code
TPObject3 <- addOutcome(TPObject = TPObject2,
Risk = Risks_object)
Adding risk to TPO
------------------
TPO has 23 attached correlations.
TPO has 5 attached risks.
Step 4: Meet-in-the-middle Triplot Visualization
The TPObject created from all steps can be used to generate a Triplot using the TriplotGUI() function. The function serves as a wrapper for PCA_TriplotGUI() and WGCNA_TriplotGUI(). You can check the functions respectively here: TriplotGUI(), PCA_TriplotGUI(), WGCNA_TriplotGUI().
Show the code
TPO_plots3 <- TriplotGUI(TPObject3,
firstPC = 1, ## The first PC to map
secondPC = 2, ## The first PC to map
plotLoads = TRUE, ##Whether to plot loadings (TRUE; default) or suppress them (FALSE)
plotScores = FALSE, ##Whether to plot scores (TRUE) or suppress them (FALSE; default)
plotCorr = TRUE, ##Whether to plot correlations (TRUE; default) or suppress them (FALSE)
plotRisk = TRUE, ##Whether to plot risk estimates (TRUE; default) or suppress them (FALSE)
###############################################
##For loadings
loadLabels = TRUE, ###Whether to plot variable loading labels (TRUE; default) or not (FALSE)
loadArrowLength = 0.02, ###Length of arrow tip , set it as 0 if you want to remove it
loadCut = 0, ###lower limit Loadings below the cut are plotted in light grey and without label
loadLim = NULL, ##higher limit,Plot range for loadings
###############################################
##For correlations
colCorr = "blue", ##Color vector for correlations
pchCorr = 16, ##Plotting character for correlations
whichCorr = NULL, ##Which correlations to plot (vector of numbers)
corLim = NULL, ##Plot range for correlations
corrLabels = TRUE,
###############################################
##For risks
colRisk = "red", ##Color vector for risk estimates
pchRisk = 15, ##Plotting character for risk estimates
whichRisk = NULL, ##Which risk estimates to plot (vector of numbers)
riskLim = NULL, ##Plot range for risks
riskWhiskerPercentage = 0.1, ## whisker length is how many percentage of confidence interval
riskLabels = TRUE, size = 3, # the size of points ont he plot
riskOR = TRUE
# plot the risk estimates as odds ratio
)We then visualize the triplot.
In the triplot visualization, black arrows represent loadings of Omics variables, forming patterns in different modules (axes). Blue circle points show exposure correlations with the modules, while red square points indicate outcome associations with the modules.
Note that we use first and second module for our example here. In reality, you may use the checkTPO() function in TriplotGUI to view which modules may be more of your interest (that are associated with specific exposures and outcomes).
Based on the triplot visualization of the TriplotGUI interface, we can interpret the following key points:
The first (top right) and third (bottom left) dimension generally represents a spectrum of dietary healthiness. Healthier foods (e.g.,
"Fruits","Vegetables") and higher healthy food indices ("HFI","BSDS") are positioned on one side in the first dimension, while less healthy foods (e.g.,"Hamburger","Sausage","Margarine") are on the opposite side in the third dimension.The second (top left) and fourth (bottom right) dimensions are associated with health outcomes, including the numeric outcome
"BMI", binary outcome"T2D", and one-hot-encoded categorical"BMI_cat"outcome.-
Explanation for some variables in the plot:
-
"HFI"stands for healthy food index. A higher"HFI"suggests healthier diet."HFI"ranges from 0 to 6. You can see that there are HFI_0, HFI_1…HFI_6 in the figure and that is because"HFI"is used as a factor exposure variable in exposure’s correlation and is one-hot-encoded to the same number of binary variables as its number of levels. For example, HFI_6 is a binary variable where individuals with"HFI"as 6 (most healthy diet people) will be labeled as 1 and the rest of individuals will be labelled as 0. HFI_0 is a binary variable where individuals with"HFI"as 0 (most unhealthy diet people) will be labeled as 1 and the rest of individuals will be labelled as 0. HFI_1…HFI_5 has limited use since it only separates the group of people with certain"HFI"value from to the rest, which is a mixture of people with healthier and less healthier diet. -
"BSDS"stands for Baltic Sea Diet Score ranging. A higher"BSDS"suggests healthier diet."BSDS"is used as a numeric variable ranging from 2 to 25 in the correlation. -
BMI_cat_obese, BMI_cat_overweight, BMI_cat_underweight are the odds ratio generated from the multinomial regression, since
"BMI_cat"is a categorical outcome and multinomial regression is performed in outcome’s association in this example. Normal weight is uses as a reference and odds ratio of obese, overweight and underweight is produced. (Note that the first level of the factor variable is set as the default reference, in this case it is normal).
-
Adjusting for
"Age","Gender","Smoking","Education"and"FastingGlucose", the first module (on x-axis) correlates positively with generally healthy food (e.g."fruits","vegatables") and food index (e.g."BSDS") and negatively with unhealthy food (e.g."Hamburger","pizza"). It also reversely associated with"T2D"and"BMI". From the result from"BMI_cat", this components also reversely associated with being obese.Adjusting for
"Age","Gender","Smoking","Education"and"FastingGlucose", the second module (on y-axis) associated positively with generally healthy food (e.g."fruits","vegatables") and food index (e.g."BSDS") and negatively with unhealthy food (e.g."Hamburger","pizza"). However, the module associated also positively with"T2D","BMI", obese, overweight and negatively associated with underweight.
Based on what we summarized above, the two modules may be associated with a overall healthy eating pattern. And in this healthy eating pattern, the first module is associated more with the beneficial effect on health and the second module represents the adverse health effect.
Step 5: Mediation analysis and visualization
In mediation analysis in TriplotGUI, we focus on a specific set of variables for mediation analysis: we select "BSDS" (Baltic Sea Diet Score) and "Hamburger" as our exposures, "BMI" and "T2D" as outcome to enter our next step. We choose first and second modules as our mediators.
This time, we perform the mediation analysis with the getMediationCounterfactual() function, using the counterfactual/potential outcome method implemented through the mediation package in R on our exposures (i.e. "BSDS", "Hamburger"), mediators (i.e. the first and second modules) and outcome (i.e. "T2D", "BMI") of interest, adjusting for "Age", "Gender", "Smoking", "Education" and "FastingGlucose" for both exposure-mediator and mediator-outcome relationship.
Show the code
mediation_object3 <-
getMediationCounterfactual (mediator = TPObject3$scores[,c(1:2),drop = FALSE],
# Specfiying at least 2 components so that there can be a 2-dimensional plot
exposure=exposure2[,c("Hamburger","BSDS"),drop = FALSE],
outcome = outcome2[,c("BMI","T2D"),drop = FALSE],
exposureTreatment = c(25,0),
exposureControl = c(2,19),
confounderME = covariates2,
confounderOE = covariates2)In the counterfactual mediation analysis, contrast values of treatment and controls for each exposure needs to be specified. For each individual, the analysis considers what would happen if they received the treatment versus if they received the control. In brief, the algorithm uses these contrast values to calculate the potential outcomes under each scenario to estimate the direct and indirect effects. For a detailed explanation please refer to this paper of the mediation R package.
If an exposure variable is continuous (numeric variable), it is recommended that the treatment and control contrast values are chosen between the range of the exposure variable. If an exposure variable is categorical (factor variable), the treatment and control contrast values should be chosen from the levels of the exposure variable.
Then we can also Look at the barplot for mediations, which provides a clear visual representation of the direct, indirect, and total effects for each combination of exposure, mediator (principal component), and outcome. This visualization allows for a more intuitive understanding of the mediation analysis results.
Show the code
mediation_plots3 <- mediationBarplot(mediationObject = mediation_object3,
cex = 2#,
# size of the text
#by_row = "one_column"
)The barplot displays direct, indirect, and total effects for each exposure-mediator-outcome combination, offering a convenient tool to assess the direction and magnitude of mediation estimates.
- Color coding enhances interpretation: red indicates significant positive effects (p<0.05), cyan shows significant negative effects (p<0.05), and grey represents insignificant effects.
- Significance levels are denoted by stars: one star for p<0.05, two stars for p<0.01, and three stars for p<0.001.
We will make an example interpretation of the mediation barplot using BSDS as exposure, module1 as mediator (the module on the x axis in the triplot in step 4) and BMI as outcome.
In the shown mediation barplot, we can observe the direction and magnitude of the indirect effect and proportion mediated. We observed that with all effects significant, indirect effect is negative, direct and total effect is positive. The negative indirect effect is consistent with the its visualization in the triplot, i.e, the module1 on the x axis associative with BSDS positively and with BMI negatively.
However, the figure also shows that BSDS is positively associated with BMI through a larger direct effect. This seems counterintuitive but this could be due to a true direct effect or indirect effect mediated by other modules.
Based on the mediation barplot, we’ve also observed significant proportion mediated with for BSDS-module1-BMI mediations, suggesting a robust mediation effect.
\[PM=\frac{IE}{IE+DE}\]
\[APM=\frac{IE}{|IE|+|DE|}\cdot\frac{|IE+DE|}{IE+DE}=\frac{IE}{IE+DE}\cdot\frac{|IE+DE|}{|IE|+|DE|}=\frac{IE}{|IE|+|DE|}\cdot\frac{IE+DE}{|IE+DE|}\]
- Notably, the APM for BSDS (Baltic Sea Diet Score) as the exposure and T2D (Type 2 Diabetes) as the outcome is smaller than their corresponding PM (considering the abosolute value). This difference occurs because the indirect and direct effects for this mediation pathway are in opposite directions. This opposition leads to a smaller total effect and consequently a larger PM (considering the abosolute value). However, the APM is not affected by the directionality of the direct and indirect effects, providing a more stable measure of mediation.
- The APM is always smaller than 1, even when direct and indirect effects are in opposite directions. This is due to the calculation method of APM, which uses the sum of the absolute values of indirect and direct effects as the denominator.
To further explore the indirect, direct, and total effects of different combinations of mediations, you can run more mediation analysis at Step 5 in the TriplotGUI interface .
Add the mediation result from getMediationCounterfactual() into the TPObject.
Show the code
TPObject4 <- addMediation(TPObject = TPObject3,
mediationObject = mediation_object3)
Adding mediation to TPO
-------------------------
TPO mediation has used 2 mediators, 2 exposures, 2 outcomes
Mediators are module1, module2
Exposures are Hamburger, BSDS
Outcomes are BMI, T2D
Comparative Visualization
The checkTPO() function in TriplotGUI allows users to generate heatmaps from the TPObject to visualize correlations, risk estimations, and mediation results.
Show the code
checkTPObject4 <- checkTPO(TPObject4,
mediatorEstimates = c("IE","TE","PM","APM"),
## What type of mediation estimate or proportion mediated to show on the heatmap
mediators = c("module1", "module2"),
## What mediators (principal component) you want to show
exposures = c("Hamburger","BSDS"),
## What exposure variables you want to show
outcomes = c("T2D","BMI")
## What outcome variables you want to show
)Correlation coefficients and p values between modules and exposures:
Risk estimates(beta coefficients) and p values between modules and outcomes:
Mediation estimates and p values for indirect effect(IE) and total effect(TE), proportion mediated(PM) and adjusted proportion mediated(APM): Read Manual for more details:
- Significance is indicated as follows. One star for p<0.05; Two stars for p<0.01; Three stars for p<0.001.
- The first two plots, correlation coefficients with exposures and risk estimates of outcomes will show all the exposure and outcome variables that are not removed from visualization.
- Note that in the plot of
checkTPbject4$med_coefficients, nly the selected exposures, mediators and outcomes that we use to do mediation on will show up in the mediation results. The rows in the heatmaps are mediators and the column represents exposure-outcome pairs. For each exposure-outcome pair, 4 results are shown: IE (indirect effect), TE (total effect), PM(Proportion Mediated) and APM(Adjusted Proportion Mediated). Please read Manual for more information. The “CF” before the names of exposures-outcome pairs means that this mediation is using the Counterfactual method.
Data download
To save all your output, including data, results, and visualization output as an .rda file, you can use the save() function as an rda file.
Show the code
save(exposure2,Omics2,outcome2,covariates2,auxilariy2,
reduced_Omics2,Correlations_object,Risks_object,
mediation_object3,mediation_plots3,
TPObject1,TPObject2,TPObject3,TPObject4,
checkTPObject4,
"Tutorial_complex_output.rda")







