moDiNA
Submodules
modina.context_net_inference module
- modina.context_net_inference.calculate_association_scores(ord_data, nom_data, cont_data, bi_data, test_type='nonparametric', num_workers=1, nan_value=-89)[source]
- Return type:
DataFrame
- modina.context_net_inference.compute_context_scores(context_data, meta_file, test_type='nonparametric', correction='bh', num_workers=1, path=None, nan_value=None)[source]
Compute association scores for a given context.
- Parameters:
context_data (
DataFrame) – The raw context data (rows: samples, columns: variables).meta_file (
DataFrame) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.test_type (
str) – Type of tests to use for network inference. Defaults to ‘nonparametric’.correction (
str) – Correction method for multiple testing. Defaults to ‘bh’.num_workers (
int) – Number of workers for parallel processing. Defaults to 1.path (
Optional[str]) – Optional path to save the computed scores as a CSV file. Defaults to None.nan_value (
Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.
- Return type:
DataFrame- Returns:
A pd.DataFrame containing the computed association scores.
- modina.context_net_inference.napy_bi_cont(cont_phenotypes, bi_phenotypes, test='nonparametric', num_workers=8, nan_value=-89)[source]
- modina.context_net_inference.napy_bi_nom(nom_phenotypes, bi_phenotypes, num_workers=8, nan_value=-89)[source]
- modina.context_net_inference.napy_bi_ord(ord_phenotypes, bi_phenotypes, num_workers=8, nan_value=-89)[source]
- modina.context_net_inference.napy_cont_cont(cont_phenotypes, test='nonparametric', num_workers=8, nan_value=-89)[source]
- modina.context_net_inference.napy_nom_cont(cont_phenotypes, nom_phenotypes, test='nonparametric', num_workers=8, nan_value=-89)[source]
modina.context_simulation module
- modina.context_simulation.simulate_copula(path=None, name1='context1', name2='context2', n_bi=50, n_cont=50, n_cat=50, n_samples=500, n_shift_cont=0, n_shift_bi=0, n_shift_cat=0, n_corr_cont_cont=0, n_corr_bi_bi=0, n_corr_cat_cat=0, n_corr_bi_cont=0, n_corr_bi_cat=0, n_corr_cont_cat=0, n_both_cont_cont=0, n_both_bi_bi=0, n_both_cat_cat=0, n_both_bi_cont=0, n_both_bi_cat=0, n_both_cont_cat=0, shift=0.5, corr=0.7)[source]
Simulate two contexts with binary and continuous nodes using a Gaussian copula.
- Parameters:
path – Path to save the simulated contexts, the meta file and the ground truth information. If None, files are not saved.
name1 – Name of the first context.
name2 – Name of the second context.
n_bi – Number of binary nodes to simulate.
n_cont – Number of continuous nodes to simulate.
n_cat – Number of categorical nodes to simulate.
n_samples – Number of samples per context.
n_shift_cont – Number of continuous nodes with an artificially introduced mean shift.
n_shift_bi – Number of binary nodes with an artificially introduced mean shift.
n_shift_cat – Number of categorical nodes with an artificially introduced mean shift.
n_corr_cont_cont – Number of continuous node pairs with an artifically introduced correlation difference.
n_corr_bi_bi – Number of binary node pairs with an artificially introduced correlation difference.
n_corr_cat_cat – Number of categorical node pairs with an artificially introduced correlation difference.
n_corr_bi_cat – Number of binary-categorical node pairs with an artificially introduced correlation difference.
n_corr_cont_cat – Number of continuous-categorical node pairs with an artificially introduced correlation difference.
n_corr_bi_cont – Number of mixed node pairs with an artificially introduced correlation difference.
n_both_cont_cont – Number of continuous node pairs with both an aritificially introduced mean shift and correlation difference.
n_both_bi_bi – Number of binary node pairs with both an artificially introduced mean shift and correlation difference.
n_both_cat_cat – Number of categorical node pairs with both an artificially introduced mean shift and correlation difference.
n_both_bi_cat – Number of binary-categorical node pairs with both an artificially introduced mean shift and correlation difference.
n_both_cont_cat – Number of continuous-categorical node pairs with both an artificially introduced mean shift and correlation difference.
n_both_bi_cont – Number of mixed node pairs with both an artificially introduced mean shift and correlation difference.
shift – Magnitude of the mean shift.
corr – Magnitude of the correlation difference (measured as correlation coefficient between 0 and 1).
- Returns:
A tuple containing the two simulated contexts, a meta file and a list of ground truth nodes. - context1: pd.DataFrame of the first simulated context. - context2: pd.DataFrame of the second simulated context. - meta: pd.DataFrame containing the data type for each simulated variable. - ground_truth: A tuple containing three lists of ground truth nodes: (shift_nodes, corr_nodes, shift_corr_nodes).
modina.diff_net_construction module
- modina.diff_net_construction.compute_diff_edges(scores1, scores2, edge_metric, max_path_length=2, path=None)[source]
Compute differential edge scores based on the specified edge metric.
- Parameters:
scores1 (
DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.scores2 (
DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.edge_metric (
str) – Edge metric to compute the differential edge scores.max_path_length (
int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.path (
Optional[str]) – Optional path to save the differential edge scores as a CSV file. Defaults to None.
- Return type:
DataFrame- Returns:
A DataFrame containing the computed differential edge scores.
- modina.diff_net_construction.compute_diff_network(scores1, scores2, context1, context2, edge_metric=None, node_metric=None, max_path_length=2, correction='bh', path=None, format='csv', meta_file=None, test_type='nonparametric', nan_value=None)[source]
Computation of a differential network defined by a node metric and an edge metric.
- Parameters:
scores1 (
DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.scores2 (
DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.context1 (
DataFrame) – Observed data of Context 1, potentially filtered.context2 (
DataFrame) – Observed data of Context 2, potentially filtered.edge_metric (
Optional[str]) – Edge metric used to construct the differential network.node_metric (
Optional[str]) – Node metric used to construct the differential network.max_path_length (
int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.correction (
str) – Correction method for multiple testing. Defaults to ‘bh’.path (
Optional[str]) – Optional path to save the differential scores as CSV files. Defaults to None.format (
str) – File format to save the differential network. Options are ‘csv’ and ‘graphml’. Defaults to ‘csv’.meta_file (
Optional[DataFrame]) – Meta file containing the node types. Only needed if node_metric is ‘STC’. Defaults to None.test_type (
str) – Test type to use for continuous nodes in STC metric. Defaults to ‘nonparametric’.nan_value (
Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.
- Return type:
Tuple[Optional[DataFrame],Optional[DataFrame]]- Returns:
A tuple (edges_diff, nodes_diff) containing the computed differential edges and nodes.
- modina.diff_net_construction.compute_diff_nodes(scores1, scores2, context1, context2, node_metric, correction='bh', meta_file=None, test_type='nonparametric', nan_value=None, path=None)[source]
Compute differential node scores based on the specified node metric.
- Parameters:
scores1 (
DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.scores2 (
DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.context1 (
DataFrame) – Observed data of Context 1, potentially filtered.context2 (
DataFrame) – Observed data of Context 2, potentially filtered.node_metric (
str) – Node metric to compute the differential node scores.correction (
str) – Correction method for multiple testing. Only needed if node_metric is ‘STC’. Defaults to ‘bh’.meta_file (
Optional[DataFrame]) – Meta file containing the node types. Only needed if node_metric is ‘STC’. Defaults to None.test_type (
str) – Test type to compare continuous variables across contexts for the ‘STC’ node metric. Defaults to ‘nonparametric’.nan_value (
Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.path (
Optional[str]) – Optional path to save the differential node scores as a CSV file. Defaults to None.
- Return type:
DataFrame- Returns:
A DataFrame containing the computed differential node scores.
- modina.diff_net_construction.degree_centrality(nodes_diff, scores1, scores2, metric='pre-P', weighted=False)[source]
modina.edge_filtering module
- modina.edge_filtering.filter(scores1, scores2, context1, context2, filter_method=None, filter_param=0.0, filter_metric=None, filter_rule=None, path=None)[source]
Filter association scores and context data based on the specified filtering configurations.
- Parameters:
scores1 (
DataFrame) – Statistical association scores of Context 1.scores2 (
DataFrame) – Statistical association scores of Context 2.context1 (
DataFrame) – The first context for the differential network analysis.context2 (
DataFrame) – The second context for the differential network analysis.filter_method (
Optional[str]) – Method used for filtering. Defaults to None.filter_param (
float) – Parameter for the specified filtering method. Defaults to 0.0.filter_metric (
Optional[str]) – Edge metric used for filtering. Defaults to None.filter_rule (
Optional[str]) – Rule to integrate the networks during filtering. Defaults to None.path (
Optional[str]) – Optional path to save the filtered scores and context data as CSV files. Defaults to None.
- Return type:
Tuple[DataFrame,DataFrame,DataFrame,DataFrame]- Returns:
A tuple containing the filtered scores and context data.
modina.pipeline module
- modina.pipeline.diffnet_analysis(context1, context2, meta_file, edge_metric=None, node_metric=None, ranking_alg='PageRank+', filter_method=None, filter_param=0.0, filter_metric=None, filter_rule=None, max_path_length=2, test_type='nonparametric', nan_value=None, correction='bh', num_workers=1, project_path=None, name1='context1', name2='context2')[source]
Wrapper function to perform an end-to-end differential network analysis following the moDiNA pipeline.
- Parameters:
context1 (
DataFrame) – Observed data of Context 1 (rows: samples, columns: variables).context2 (
DataFrame) – Observed data of Context 2 (rows: samples, columns: variables).meta_file (
DataFrame) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.test_type (
str) – Type of statistical tests to use for association score calculation. Defaults to ‘nonparametric’.nan_value (
Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.correction (
str) – Correction method for multiple testing. Defaults to ‘bh’.num_workers (
int) – Number of workers for parallel processing. Defaults to 1.filter_method (
Optional[str]) – Method used for filtering. Defaults to None.filter_param (
float) – Parameter for the specified filtering method. Defaults to 0.0.filter_metric (
Optional[str]) – Edge metric used for filtering. Defaults to None.filter_rule (
Optional[str]) – Rule to integrate the networks during filtering. Defaults to None.edge_metric (
Optional[str]) – Edge metric used to construct the differential network.node_metric (
Optional[str]) – Node metric used to construct the differential network.max_path_length (
int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.ranking_alg (
str) – Ranking algorithm to compute. Options are ‘PageRank+’, ‘PageRank’, ‘absDimontRank’, ‘DimontRank’, ‘direct_node’ and ‘direct_edge’. Defaults to ‘PageRank+’.name1 (
str) – Name of Context 1. Used for saving files. Defaults to ‘context1’.name2 (
str) – Name of Context 2. Used for saving files. Defaults to ‘context2’.project_path (
Optional[str]) – Optional path to save results. Defaults to None.
- Return type:
Tuple[list,dict,Optional[DataFrame],Optional[DataFrame],dict]- Returns:
A tuple (ranking, edges_diff, nodes_diff, config) containing the computed ranking, differential edges, differential nodes, and configuration parameters.
modina.ranking module
- modina.ranking.compute_ranking(nodes_diff, edges_diff, ranking_alg, path=None, meta_file=None)[source]
Compute a ranking based on the specified ranking algorithm.
- Parameters:
nodes_diff (
Optional[DataFrame]) – Differential node scores.edges_diff (
Optional[DataFrame]) – Differential edge scores.ranking_alg (
str) – Ranking algorithm to compute. Options are ‘PageRank+’, ‘PageRank’, ‘absDimontRank’, ‘DimontRank’, ‘direct_node’ and ‘direct_edge’.meta_file (
Optional[DataFrame]) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.path (
Optional[str]) – Optional path to save the ranking as a CSV file.
- Return type:
Tuple[list,dict]- Returns:
A tuple containing the list of ranked nodes and a dictionary with ranked nodes per data type.
modina.statistics_utils module
Module contents
- modina.compute_context_scores(context_data, meta_file, test_type='nonparametric', correction='bh', num_workers=1, path=None, nan_value=None)[source]
Compute association scores for a given context.
- Parameters:
context_data (
DataFrame) – The raw context data (rows: samples, columns: variables).meta_file (
DataFrame) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.test_type (
str) – Type of tests to use for network inference. Defaults to ‘nonparametric’.correction (
str) – Correction method for multiple testing. Defaults to ‘bh’.num_workers (
int) – Number of workers for parallel processing. Defaults to 1.path (
Optional[str]) – Optional path to save the computed scores as a CSV file. Defaults to None.nan_value (
Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.
- Return type:
DataFrame- Returns:
A pd.DataFrame containing the computed association scores.
- modina.compute_diff_edges(scores1, scores2, edge_metric, max_path_length=2, path=None)[source]
Compute differential edge scores based on the specified edge metric.
- Parameters:
scores1 (
DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.scores2 (
DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.edge_metric (
str) – Edge metric to compute the differential edge scores.max_path_length (
int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.path (
Optional[str]) – Optional path to save the differential edge scores as a CSV file. Defaults to None.
- Return type:
DataFrame- Returns:
A DataFrame containing the computed differential edge scores.
- modina.compute_diff_network(scores1, scores2, context1, context2, edge_metric=None, node_metric=None, max_path_length=2, correction='bh', path=None, format='csv', meta_file=None, test_type='nonparametric', nan_value=None)[source]
Computation of a differential network defined by a node metric and an edge metric.
- Parameters:
scores1 (
DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.scores2 (
DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.context1 (
DataFrame) – Observed data of Context 1, potentially filtered.context2 (
DataFrame) – Observed data of Context 2, potentially filtered.edge_metric (
Optional[str]) – Edge metric used to construct the differential network.node_metric (
Optional[str]) – Node metric used to construct the differential network.max_path_length (
int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.correction (
str) – Correction method for multiple testing. Defaults to ‘bh’.path (
Optional[str]) – Optional path to save the differential scores as CSV files. Defaults to None.format (
str) – File format to save the differential network. Options are ‘csv’ and ‘graphml’. Defaults to ‘csv’.meta_file (
Optional[DataFrame]) – Meta file containing the node types. Only needed if node_metric is ‘STC’. Defaults to None.test_type (
str) – Test type to use for continuous nodes in STC metric. Defaults to ‘nonparametric’.nan_value (
Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.
- Return type:
Tuple[Optional[DataFrame],Optional[DataFrame]]- Returns:
A tuple (edges_diff, nodes_diff) containing the computed differential edges and nodes.
- modina.compute_diff_nodes(scores1, scores2, context1, context2, node_metric, correction='bh', meta_file=None, test_type='nonparametric', nan_value=None, path=None)[source]
Compute differential node scores based on the specified node metric.
- Parameters:
scores1 (
DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.scores2 (
DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.context1 (
DataFrame) – Observed data of Context 1, potentially filtered.context2 (
DataFrame) – Observed data of Context 2, potentially filtered.node_metric (
str) – Node metric to compute the differential node scores.correction (
str) – Correction method for multiple testing. Only needed if node_metric is ‘STC’. Defaults to ‘bh’.meta_file (
Optional[DataFrame]) – Meta file containing the node types. Only needed if node_metric is ‘STC’. Defaults to None.test_type (
str) – Test type to compare continuous variables across contexts for the ‘STC’ node metric. Defaults to ‘nonparametric’.nan_value (
Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.path (
Optional[str]) – Optional path to save the differential node scores as a CSV file. Defaults to None.
- Return type:
DataFrame- Returns:
A DataFrame containing the computed differential node scores.
- modina.compute_ranking(nodes_diff, edges_diff, ranking_alg, path=None, meta_file=None)[source]
Compute a ranking based on the specified ranking algorithm.
- Parameters:
nodes_diff (
Optional[DataFrame]) – Differential node scores.edges_diff (
Optional[DataFrame]) – Differential edge scores.ranking_alg (
str) – Ranking algorithm to compute. Options are ‘PageRank+’, ‘PageRank’, ‘absDimontRank’, ‘DimontRank’, ‘direct_node’ and ‘direct_edge’.meta_file (
Optional[DataFrame]) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.path (
Optional[str]) – Optional path to save the ranking as a CSV file.
- Return type:
Tuple[list,dict]- Returns:
A tuple containing the list of ranked nodes and a dictionary with ranked nodes per data type.
- modina.diffnet_analysis(context1, context2, meta_file, edge_metric=None, node_metric=None, ranking_alg='PageRank+', filter_method=None, filter_param=0.0, filter_metric=None, filter_rule=None, max_path_length=2, test_type='nonparametric', nan_value=None, correction='bh', num_workers=1, project_path=None, name1='context1', name2='context2')[source]
Wrapper function to perform an end-to-end differential network analysis following the moDiNA pipeline.
- Parameters:
context1 (
DataFrame) – Observed data of Context 1 (rows: samples, columns: variables).context2 (
DataFrame) – Observed data of Context 2 (rows: samples, columns: variables).meta_file (
DataFrame) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.test_type (
str) – Type of statistical tests to use for association score calculation. Defaults to ‘nonparametric’.nan_value (
Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.correction (
str) – Correction method for multiple testing. Defaults to ‘bh’.num_workers (
int) – Number of workers for parallel processing. Defaults to 1.filter_method (
Optional[str]) – Method used for filtering. Defaults to None.filter_param (
float) – Parameter for the specified filtering method. Defaults to 0.0.filter_metric (
Optional[str]) – Edge metric used for filtering. Defaults to None.filter_rule (
Optional[str]) – Rule to integrate the networks during filtering. Defaults to None.edge_metric (
Optional[str]) – Edge metric used to construct the differential network.node_metric (
Optional[str]) – Node metric used to construct the differential network.max_path_length (
int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.ranking_alg (
str) – Ranking algorithm to compute. Options are ‘PageRank+’, ‘PageRank’, ‘absDimontRank’, ‘DimontRank’, ‘direct_node’ and ‘direct_edge’. Defaults to ‘PageRank+’.name1 (
str) – Name of Context 1. Used for saving files. Defaults to ‘context1’.name2 (
str) – Name of Context 2. Used for saving files. Defaults to ‘context2’.project_path (
Optional[str]) – Optional path to save results. Defaults to None.
- Return type:
Tuple[list,dict,Optional[DataFrame],Optional[DataFrame],dict]- Returns:
A tuple (ranking, edges_diff, nodes_diff, config) containing the computed ranking, differential edges, differential nodes, and configuration parameters.
- modina.filter(scores1, scores2, context1, context2, filter_method=None, filter_param=0.0, filter_metric=None, filter_rule=None, path=None)[source]
Filter association scores and context data based on the specified filtering configurations.
- Parameters:
scores1 (
DataFrame) – Statistical association scores of Context 1.scores2 (
DataFrame) – Statistical association scores of Context 2.context1 (
DataFrame) – The first context for the differential network analysis.context2 (
DataFrame) – The second context for the differential network analysis.filter_method (
Optional[str]) – Method used for filtering. Defaults to None.filter_param (
float) – Parameter for the specified filtering method. Defaults to 0.0.filter_metric (
Optional[str]) – Edge metric used for filtering. Defaults to None.filter_rule (
Optional[str]) – Rule to integrate the networks during filtering. Defaults to None.path (
Optional[str]) – Optional path to save the filtered scores and context data as CSV files. Defaults to None.
- Return type:
Tuple[DataFrame,DataFrame,DataFrame,DataFrame]- Returns:
A tuple containing the filtered scores and context data.
- modina.simulate_copula(path=None, name1='context1', name2='context2', n_bi=50, n_cont=50, n_cat=50, n_samples=500, n_shift_cont=0, n_shift_bi=0, n_shift_cat=0, n_corr_cont_cont=0, n_corr_bi_bi=0, n_corr_cat_cat=0, n_corr_bi_cont=0, n_corr_bi_cat=0, n_corr_cont_cat=0, n_both_cont_cont=0, n_both_bi_bi=0, n_both_cat_cat=0, n_both_bi_cont=0, n_both_bi_cat=0, n_both_cont_cat=0, shift=0.5, corr=0.7)[source]
Simulate two contexts with binary and continuous nodes using a Gaussian copula.
- Parameters:
path – Path to save the simulated contexts, the meta file and the ground truth information. If None, files are not saved.
name1 – Name of the first context.
name2 – Name of the second context.
n_bi – Number of binary nodes to simulate.
n_cont – Number of continuous nodes to simulate.
n_cat – Number of categorical nodes to simulate.
n_samples – Number of samples per context.
n_shift_cont – Number of continuous nodes with an artificially introduced mean shift.
n_shift_bi – Number of binary nodes with an artificially introduced mean shift.
n_shift_cat – Number of categorical nodes with an artificially introduced mean shift.
n_corr_cont_cont – Number of continuous node pairs with an artifically introduced correlation difference.
n_corr_bi_bi – Number of binary node pairs with an artificially introduced correlation difference.
n_corr_cat_cat – Number of categorical node pairs with an artificially introduced correlation difference.
n_corr_bi_cat – Number of binary-categorical node pairs with an artificially introduced correlation difference.
n_corr_cont_cat – Number of continuous-categorical node pairs with an artificially introduced correlation difference.
n_corr_bi_cont – Number of mixed node pairs with an artificially introduced correlation difference.
n_both_cont_cont – Number of continuous node pairs with both an aritificially introduced mean shift and correlation difference.
n_both_bi_bi – Number of binary node pairs with both an artificially introduced mean shift and correlation difference.
n_both_cat_cat – Number of categorical node pairs with both an artificially introduced mean shift and correlation difference.
n_both_bi_cat – Number of binary-categorical node pairs with both an artificially introduced mean shift and correlation difference.
n_both_cont_cat – Number of continuous-categorical node pairs with both an artificially introduced mean shift and correlation difference.
n_both_bi_cont – Number of mixed node pairs with both an artificially introduced mean shift and correlation difference.
shift – Magnitude of the mean shift.
corr – Magnitude of the correlation difference (measured as correlation coefficient between 0 and 1).
- Returns:
A tuple containing the two simulated contexts, a meta file and a list of ground truth nodes. - context1: pd.DataFrame of the first simulated context. - context2: pd.DataFrame of the second simulated context. - meta: pd.DataFrame containing the data type for each simulated variable. - ground_truth: A tuple containing three lists of ground truth nodes: (shift_nodes, corr_nodes, shift_corr_nodes).