MergedSumStats¶
-
class
pysumstats.
MergedSumStats
(data, phenotypes, merge_info, variables, xy, low_ram=False, tmpdir='sumstats_temporary', allign=True)¶ Class containing merged summary statistics. In general you will not create a MergedSumStats object manually.
Parameters: - data (dict) – dataset containing merged summary statistics
- phenotypes (list) – list of phenotype names.
- merge_info (dict) – Dict with information on the merge
- variables (list) – list of variables contained in the data.
- xy (list) – x and y suffixes (to be used in _allign)
- low_ram (bool) – Whether to use the low_ram option for this MergedSumStats object (passed down from SumStats). Use this only when running into MemoryErrors. Enabling this option will read/write data from local storage rather then RAM. It will save lots of RAM, but it will gratly decrease processing speed.
- tmpdir (str) – Which directory to store the temporary files if low_ram is enabled (passed down from SumStats).
- allign (bool) – Enable to auto-allign SNPs
-
afplot
(ref_phenotypes=None, other_phenotypes=None, filename=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶ Generates AF comparison plots for merged GWAS data.
Parameters: - ref_phenotypes (list) – List of phenotypes to use as reference (defaults to all phenotypes)
- other_phenotypes (list) – List of phenotypes to compare reference to (defaults to all phenotypes, overlapping plots will be dropped)
- filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
- nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(n_plots/ncols)) )
- ncols (int) – Specify number of columns in the figure ( defaults to int(sqrt(n_plots)))
- figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
- dpi (int) – DPI setting to use when saving the figure.
- kwargs – Other keyword arguments to be passed to
pysumstats.plot.manhattan()
Returns: (fig, axes) or None.
-
close
()¶ Close connection to and HDF5 file if low_ram is specified
Returns: None
-
copy
()¶ Returns: a deepcopy of the existing opject
-
describe
(columns=None, per_chromosome=False)¶ Get a summary of the data.
Parameters: Returns: pd.Dataframe, or list of pd.Dataframes
-
groupby
(*args, **kwargs)¶ Compatibility function to create pandas grouped object
Parameters: - args – arguments to be passed to pandas groupby function
- kwargs – keyword arguments to be passed to pandas groupby function
Returns: a full grouped pandas dataframe object
-
gwama
(cov_matrix=None, h2_snp=None, name='gwama')¶ Multivariate meta analysis as described in Baselmans, et al. 2019.
Parameters: - cov_matrix (pd.Dataframe) – Covariance matrix, defaults to generating a correlation matrix of Z-scores
- h2_snp (dict) – Dict of SNP heritabilities per GWAS, to use as additional weights. Defaults to all 1’s.
- name – New phenotype name to use in the new SumStats object (default: ‘gwama’)
Returns: pysumstats.SumStats
object
-
head
(n=10, n_chromosomes=1, **kwargs)¶ Prints (n_chromosomes) dataframes with the first n rows.
Parameters: Returns: None
-
manhattan
(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶ Generates manhattan plots for each phenotype (or specified phenotypes) in merged GWAS data.
Parameters: - filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
- phenotypes (list) – List of phenotype names to plot manhattans for (defaults to plotting all phenotypes)
- nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(len(phenotypes)/ncols)) )
- ncols (int) – Specify number of columns in the figure ( defaults to int(log2(len(phenotypes)/2)) )
- figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*8, nrows*4) )
- dpi (int) – DPI setting to use when saving the figure.
- kwargs – Other keyword arguments to be passed to
pysumstats.plot.manhattan()
Returns: (fig, axes) or None.
-
merge
(other, inplace=False, how='inner', low_memory=False)¶ Merge with other SumStats or MergedSumstats object(s).
Parameters: - other (
pysumstats.SumStats
,pysumstats.MergedSumStats
, or list.) –pysumstats.SumStats
, orpysumstats.MergedSumStats
object, or a list of SumStats, MergedSumstats objects - inplace (bool) – Enable to store the new data in the current MergedSumStats object. (currently not supported when low_ram is enabled)
- how (str) – Type of merge, for now only implemented for merges with
pysumstats.SumStats
objects - low_memory (bool) – Enable to use a more RAM-efficient merging method (WARNING: still untested)
Returns: None, or
pysumstats.MergedSumStats
object.- other (
-
meta_analyze
(name='meta', method='ivw', debug=False)¶ Meta analyze all GWAS summary statistics contained in this object. WARNING: There appears to be an error somewhere in this function that causes incorrect result. For now running .meta_analyze() will instead run .gwama() with an identity matrix (functionally identical to an ivw meta_analysis) :param name: New phenotype name to use for the new SumStats object (default: ‘meta’) :type name: str :param method: Meta-analysis method to use, should be one of [‘ivw’, ‘samplesize’], default: ‘ivw’ :type method: str :param debug: Run the meta_analyze function instead of .gwama() for debugging purposes :type debug: bool :return:
pysumstats.SumStats
object.
-
plot_all
(dest='.', prefix='SumStatsPlots', kwargs={})¶ Runs all attached plot functions
Parameters: - dest (str) – Folder to save resulting files to. File names will be: {prefix}_{plottype}_{YEAR-MONTH-DAY}.png
- prefix (str) – prefix to use when saving files.
- kwargs (dict) – Nested dictionary of other keyword arguments to be passed to each function (keys of top-level dictionary should be function names). Use the ‘all’ key the top level dictionary to pass keyword argument to every function.
Returns: None
-
prep_for_mr
(exposure, outcome, filename=None, p_cutoff=None, bidirectional=False, **kwargs)¶ Save a pre-formatted file to use with the MendelianRandomization package in R.
Parameters: - exposure (str) – phenotype name to use as exposure.
- outcome (str) – phenotype name to use as outcome.
- filename (str, list or None) – Path to where the resulting file(s) should be stored, or list of paths if bidirectional=True
- p_cutoff (float) – Optional p-value cut-off to apply. Will include SNPs where P > p_cutoff
- bidirectional (bool) – Enable to store two files (exposure=exposure, outcome=outcome), and (exposure=outcome, outcome=exposure)
- kwargs – Additional keyword arguments to be passed to pandas to_csv function.
Returns: None
-
pzplot
(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶ Generates PZ-plots for each phenotype (or specified phenotypes) in merged GWAS data.
Parameters: - filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
- phenotypes (list) – List of phenotype names to plot PZ-plots for (defaults to plotting all phenotypes)
- nrows (int) – Specify number of rows in the figure ( defaults to int(sqrt(len(phenotypes))) )
- ncols (int) – Specify number of columns in the figure ( defaults to int(ceil(len(phenotypes)/nrows)) )
- figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
- dpi (int) – DPI setting to use when saving the figure.
- kwargs – Other keyword arguments to be passed to
pysumstats.plot.pzplot()
Returns: (fig, axes) or None.
-
qqplot
(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶ Generates QQ-plots for each phenotype (or specified phenotypes) in merged GWAS data.
Parameters: - filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
- phenotypes (list) – List of phenotype names to plot QQ-plots for (defaults to plotting all phenotypes)
- nrows (int) – Specify number of rows in the figure ( defaults to int(sqrt(len(phenotypes))) )
- ncols (int) – Specify number of columns in the figure ( defaults to int(ceil(len(phenotypes)/nrows)) )
- figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
- dpi (int) – DPI setting to use when saving the figure.
- kwargs – Other keyword arguments to be passed to
pysumstats.plot.qqplot()
Returns: (fig, axes) or None.
-
reset_index
()¶ Reset the index of the data.
Returns: None
-
save
(path, per_chromosome=False, per_phenotype=False, phenotype=None, **kwargs)¶ Save the data held in this object to local storage.
Parameters: - path (str) – Relative or full path to the target file to store the data or object in. Paths ending in .pickle will save a pickled version of the full object. Note that with low_ram enabled this will not store the data. When per_phenotype is specified, add {} to the path where the phenotype name should be, if {} is not in the string, the filename will be prefixed with phenotype name.
- per_chromosome (bool) – Whether to save seperate files for each chromosome.
- per_phenotype – Set to True to create a separate file for each phenotype in MergedSumStats objects
:type per_phenotype :param phenotype: Only save a file for a specifici phenotype in MergedSumstats objects :type phenotype: str :param kwargs: keyword arguments to be passed to pandas to_csv() function. :return: None
-
sort_values
(by, inplace=True, **kwargs)¶ Sorts values in the dataframe. Note: Sorting by chromosme (chr) will have no effect as data is already structured by chromosome.
Parameters: Returns: Non
-
tail
(n=10, n_chromosomes=1, **kwargs)¶ Prints (n_chromosomes) dataframes with the last n rows.
Parameters: Returns: None
-
zzplot
(ref_phenotypes=None, other_phenotypes=None, filename=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶ Generates ZZ comparison plots for merged GWAS data.
Parameters: - ref_phenotypes (list) – List of phenotypes to use as reference (defaults to all phenotypes)
- other_phenotypes (list) – List of phenotypes to compare reference to (defaults to all phenotypes, overlapping plots will be dropped)
- filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
- nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(n_plots/ncols)) )
- ncols (int) – Specify number of columns in the figure ( defaults to int(sqrt(n_plots)))
- figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
- dpi (int) – DPI setting to use when saving the figure.
- kwargs – Other keyword arguments to be passed to
pysumstats.plot.zzplot()
Returns: (fig, axes) or None.