MergedSumStats¶

class pysumstats.MergedSumStats(data, phenotypes, merge_info, variables, xy, low_ram=False, tmpdir='sumstats_temporary', allign=True)¶

Class containing merged summary statistics. In general you will not create a MergedSumStats object manually.

Parameters:

data (dict) – dataset containing merged summary statistics
phenotypes (list) – list of phenotype names.
merge_info (dict) – Dict with information on the merge
variables (list) – list of variables contained in the data.
xy (list) – x and y suffixes (to be used in _allign)
low_ram (bool) – Whether to use the low_ram option for this MergedSumStats object (passed down from SumStats). Use this only when running into MemoryErrors. Enabling this option will read/write data from local storage rather then RAM. It will save lots of RAM, but it will gratly decrease processing speed.
tmpdir (str) – Which directory to store the temporary files if low_ram is enabled (passed down from SumStats).
allign (bool) – Enable to auto-allign SNPs

afplot(ref_phenotypes=None, other_phenotypes=None, filename=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶

Generates AF comparison plots for merged GWAS data.

Parameters:

ref_phenotypes (list) – List of phenotypes to use as reference (defaults to all phenotypes)
other_phenotypes (list) – List of phenotypes to compare reference to (defaults to all phenotypes, overlapping plots will be dropped)
filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(n_plots/ncols)) )
ncols (int) – Specify number of columns in the figure ( defaults to int(sqrt(n_plots)))
figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
dpi (int) – DPI setting to use when saving the figure.
kwargs – Other keyword arguments to be passed to pysumstats.plot.manhattan()

Returns:

(fig, axes) or None.

close()¶

Close connection to and HDF5 file if low_ram is specified

Returns:	None

copy()¶

Returns:	a deepcopy of the existing opject

describe(columns=None, per_chromosome=False)¶

Get a summary of the data.

Parameters:	columns (list) – List of column names to print summary for (default: [‘b’, ‘se’, ‘p’]) per_chromosome (bool) – Enable to return a list of summary dataframes per chromosome
Returns:	pd.Dataframe, or list of pd.Dataframes

groupby(*args, **kwargs)¶

Compatibility function to create pandas grouped object

Parameters:	args – arguments to be passed to pandas groupby function kwargs – keyword arguments to be passed to pandas groupby function
Returns:	a full grouped pandas dataframe object

gwama(cov_matrix=None, h2_snp=None, name='gwama')¶

Multivariate meta analysis as described in Baselmans, et al. 2019.

Parameters:	cov_matrix (pd.Dataframe) – Covariance matrix, defaults to generating a correlation matrix of Z-scores h2_snp (dict) – Dict of SNP heritabilities per GWAS, to use as additional weights. Defaults to all 1’s. name – New phenotype name to use in the new SumStats object (default: ‘gwama’)
Returns:	`pysumstats.SumStats` object

head(n=10, n_chromosomes=1, **kwargs)¶

Prints (n_chromosomes) dataframes with the first n rows.

Parameters:	n (int) – number of rows to show n_chromosomes (int) – number of chromosomes to show. kwargs – keyword arguments to be passed to pandas head function
Returns:	None

manhattan(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶

Generates manhattan plots for each phenotype (or specified phenotypes) in merged GWAS data.

Parameters:

filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
phenotypes (list) – List of phenotype names to plot manhattans for (defaults to plotting all phenotypes)
nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(len(phenotypes)/ncols)) )
ncols (int) – Specify number of columns in the figure ( defaults to int(log2(len(phenotypes)/2)) )
figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*8, nrows*4) )
dpi (int) – DPI setting to use when saving the figure.
kwargs – Other keyword arguments to be passed to pysumstats.plot.manhattan()

Returns:

(fig, axes) or None.

merge(other, inplace=False, how='inner', low_memory=False)¶

Merge with other SumStats or MergedSumstats object(s).

Parameters:

other (pysumstats.SumStats, pysumstats.MergedSumStats, or list.) – pysumstats.SumStats, or pysumstats.MergedSumStats object, or a list of SumStats, MergedSumstats objects
inplace (bool) – Enable to store the new data in the current MergedSumStats object. (currently not supported when low_ram is enabled)
how (str) – Type of merge, for now only implemented for merges with pysumstats.SumStats objects
low_memory (bool) – Enable to use a more RAM-efficient merging method (WARNING: still untested)

Returns:

None, or pysumstats.MergedSumStats object.

meta_analyze(name='meta', method='ivw', debug=False)¶: Meta analyze all GWAS summary statistics contained in this object. WARNING: There appears to be an error somewhere in this function that causes incorrect result. For now running .meta_analyze() will instead run .gwama() with an identity matrix (functionally identical to an ivw meta_analysis) :param name: New phenotype name to use for the new SumStats object (default: ‘meta’) :type name: str :param method: Meta-analysis method to use, should be one of [‘ivw’, ‘samplesize’], default: ‘ivw’ :type method: str :param debug: Run the meta_analyze function instead of .gwama() for debugging purposes :type debug: bool :return: pysumstats.SumStats object.

plot_all(dest='.', prefix='SumStatsPlots', kwargs={})¶

Runs all attached plot functions

Parameters:

dest (str) – Folder to save resulting files to. File names will be: {prefix}_{plottype}_{YEAR-MONTH-DAY}.png
prefix (str) – prefix to use when saving files.
kwargs (dict) – Nested dictionary of other keyword arguments to be passed to each function (keys of top-level dictionary should be function names). Use the ‘all’ key the top level dictionary to pass keyword argument to every function.

Returns:

None

prep_for_mr(exposure, outcome, filename=None, p_cutoff=None, bidirectional=False, **kwargs)¶

Save a pre-formatted file to use with the MendelianRandomization package in R.

Parameters:

exposure (str) – phenotype name to use as exposure.
outcome (str) – phenotype name to use as outcome.
filename (str, list or None) – Path to where the resulting file(s) should be stored, or list of paths if bidirectional=True
p_cutoff (float) – Optional p-value cut-off to apply. Will include SNPs where P > p_cutoff
bidirectional (bool) – Enable to store two files (exposure=exposure, outcome=outcome), and (exposure=outcome, outcome=exposure)
kwargs – Additional keyword arguments to be passed to pandas to_csv function.

Returns:

None

pzplot(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶

Generates PZ-plots for each phenotype (or specified phenotypes) in merged GWAS data.

Parameters:

filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
phenotypes (list) – List of phenotype names to plot PZ-plots for (defaults to plotting all phenotypes)
nrows (int) – Specify number of rows in the figure ( defaults to int(sqrt(len(phenotypes))) )
ncols (int) – Specify number of columns in the figure ( defaults to int(ceil(len(phenotypes)/nrows)) )
figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
dpi (int) – DPI setting to use when saving the figure.
kwargs – Other keyword arguments to be passed to pysumstats.plot.pzplot()

Returns:

(fig, axes) or None.

qqplot(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶

Generates QQ-plots for each phenotype (or specified phenotypes) in merged GWAS data.

Parameters:

filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
phenotypes (list) – List of phenotype names to plot QQ-plots for (defaults to plotting all phenotypes)
nrows (int) – Specify number of rows in the figure ( defaults to int(sqrt(len(phenotypes))) )
ncols (int) – Specify number of columns in the figure ( defaults to int(ceil(len(phenotypes)/nrows)) )
figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
dpi (int) – DPI setting to use when saving the figure.
kwargs – Other keyword arguments to be passed to pysumstats.plot.qqplot()

Returns:

(fig, axes) or None.

reset_index()¶

Reset the index of the data.

Returns:	None

save(path, per_chromosome=False, per_phenotype=False, phenotype=None, **kwargs)¶

Save the data held in this object to local storage.

Parameters:

path (str) – Relative or full path to the target file to store the data or object in. Paths ending in .pickle will save a pickled version of the full object. Note that with low_ram enabled this will not store the data. When per_phenotype is specified, add {} to the path where the phenotype name should be, if {} is not in the string, the filename will be prefixed with phenotype name.
per_chromosome (bool) – Whether to save seperate files for each chromosome.
per_phenotype – Set to True to create a separate file for each phenotype in MergedSumStats objects

:type per_phenotype :param phenotype: Only save a file for a specifici phenotype in MergedSumstats objects :type phenotype: str :param kwargs: keyword arguments to be passed to pandas to_csv() function. :return: None

sort_values(by, inplace=True, **kwargs)¶

Sorts values in the dataframe. Note: Sorting by chromosme (chr) will have no effect as data is already structured by chromosome.

Parameters:	by (str) – label of the column to sort values by inplace (bool) – Whether to return the sorted object or sort values within existing object. (Currently only inplace sorting is supported) kwargs – Other keyword arguments to be passed to pandas sort_values function
Returns:	Non

tail(n=10, n_chromosomes=1, **kwargs)¶

Prints (n_chromosomes) dataframes with the last n rows.

Parameters:	n (int) – number of rows to show n_chromosomes (int) – number of chromosomes to show. kwargs – keyword arguments to be passed to pandas tail function
Returns:	None

zzplot(ref_phenotypes=None, other_phenotypes=None, filename=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)¶

Generates ZZ comparison plots for merged GWAS data.

Parameters:

ref_phenotypes (list) – List of phenotypes to use as reference (defaults to all phenotypes)
other_phenotypes (list) – List of phenotypes to compare reference to (defaults to all phenotypes, overlapping plots will be dropped)
filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(n_plots/ncols)) )
ncols (int) – Specify number of columns in the figure ( defaults to int(sqrt(n_plots)))
figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
dpi (int) – DPI setting to use when saving the figure.
kwargs – Other keyword arguments to be passed to pysumstats.plot.zzplot()

Returns:

(fig, axes) or None.