MergedSumStats

class pysumstats.MergedSumStats(data, phenotypes, merge_info, variables, xy, low_ram=False, tmpdir='sumstats_temporary', allign=True)

Class containing merged summary statistics. In general you will not create a MergedSumStats object manually.

Parameters:
  • data (dict) – dataset containing merged summary statistics
  • phenotypes (list) – list of phenotype names.
  • merge_info (dict) – Dict with information on the merge
  • variables (list) – list of variables contained in the data.
  • xy (list) – x and y suffixes (to be used in _allign)
  • low_ram (bool) – Whether to use the low_ram option for this MergedSumStats object (passed down from SumStats). Use this only when running into MemoryErrors. Enabling this option will read/write data from local storage rather then RAM. It will save lots of RAM, but it will gratly decrease processing speed.
  • tmpdir (str) – Which directory to store the temporary files if low_ram is enabled (passed down from SumStats).
  • allign (bool) – Enable to auto-allign SNPs
afplot(ref_phenotypes=None, other_phenotypes=None, filename=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)

Generates AF comparison plots for merged GWAS data.

Parameters:
  • ref_phenotypes (list) – List of phenotypes to use as reference (defaults to all phenotypes)
  • other_phenotypes (list) – List of phenotypes to compare reference to (defaults to all phenotypes, overlapping plots will be dropped)
  • filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
  • nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(n_plots/ncols)) )
  • ncols (int) – Specify number of columns in the figure ( defaults to int(sqrt(n_plots)))
  • figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
  • dpi (int) – DPI setting to use when saving the figure.
  • kwargs – Other keyword arguments to be passed to pysumstats.plot.manhattan()
Returns:

(fig, axes) or None.

close()

Close connection to and HDF5 file if low_ram is specified

Returns:None
copy()
Returns:a deepcopy of the existing opject
describe(columns=None, per_chromosome=False)

Get a summary of the data.

Parameters:
  • columns (list) – List of column names to print summary for (default: [‘b’, ‘se’, ‘p’])
  • per_chromosome (bool) – Enable to return a list of summary dataframes per chromosome
Returns:

pd.Dataframe, or list of pd.Dataframes

groupby(*args, **kwargs)

Compatibility function to create pandas grouped object

Parameters:
  • args – arguments to be passed to pandas groupby function
  • kwargs – keyword arguments to be passed to pandas groupby function
Returns:

a full grouped pandas dataframe object

gwama(cov_matrix=None, h2_snp=None, name='gwama')

Multivariate meta analysis as described in Baselmans, et al. 2019.

Parameters:
  • cov_matrix (pd.Dataframe) – Covariance matrix, defaults to generating a correlation matrix of Z-scores
  • h2_snp (dict) – Dict of SNP heritabilities per GWAS, to use as additional weights. Defaults to all 1’s.
  • name – New phenotype name to use in the new SumStats object (default: ‘gwama’)
Returns:

pysumstats.SumStats object

head(n=10, n_chromosomes=1, **kwargs)

Prints (n_chromosomes) dataframes with the first n rows.

Parameters:
  • n (int) – number of rows to show
  • n_chromosomes (int) – number of chromosomes to show.
  • kwargs – keyword arguments to be passed to pandas head function
Returns:

None

manhattan(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)

Generates manhattan plots for each phenotype (or specified phenotypes) in merged GWAS data.

Parameters:
  • filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
  • phenotypes (list) – List of phenotype names to plot manhattans for (defaults to plotting all phenotypes)
  • nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(len(phenotypes)/ncols)) )
  • ncols (int) – Specify number of columns in the figure ( defaults to int(log2(len(phenotypes)/2)) )
  • figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*8, nrows*4) )
  • dpi (int) – DPI setting to use when saving the figure.
  • kwargs – Other keyword arguments to be passed to pysumstats.plot.manhattan()
Returns:

(fig, axes) or None.

merge(other, inplace=False, how='inner', low_memory=False)

Merge with other SumStats or MergedSumstats object(s).

Parameters:
Returns:

None, or pysumstats.MergedSumStats object.

meta_analyze(name='meta', method='ivw', debug=False)

Meta analyze all GWAS summary statistics contained in this object. WARNING: There appears to be an error somewhere in this function that causes incorrect result. For now running .meta_analyze() will instead run .gwama() with an identity matrix (functionally identical to an ivw meta_analysis) :param name: New phenotype name to use for the new SumStats object (default: ‘meta’) :type name: str :param method: Meta-analysis method to use, should be one of [‘ivw’, ‘samplesize’], default: ‘ivw’ :type method: str :param debug: Run the meta_analyze function instead of .gwama() for debugging purposes :type debug: bool :return: pysumstats.SumStats object.

plot_all(dest='.', prefix='SumStatsPlots', kwargs={})

Runs all attached plot functions

Parameters:
  • dest (str) – Folder to save resulting files to. File names will be: {prefix}_{plottype}_{YEAR-MONTH-DAY}.png
  • prefix (str) – prefix to use when saving files.
  • kwargs (dict) – Nested dictionary of other keyword arguments to be passed to each function (keys of top-level dictionary should be function names). Use the ‘all’ key the top level dictionary to pass keyword argument to every function.
Returns:

None

prep_for_mr(exposure, outcome, filename=None, p_cutoff=None, bidirectional=False, **kwargs)

Save a pre-formatted file to use with the MendelianRandomization package in R.

Parameters:
  • exposure (str) – phenotype name to use as exposure.
  • outcome (str) – phenotype name to use as outcome.
  • filename (str, list or None) – Path to where the resulting file(s) should be stored, or list of paths if bidirectional=True
  • p_cutoff (float) – Optional p-value cut-off to apply. Will include SNPs where P > p_cutoff
  • bidirectional (bool) – Enable to store two files (exposure=exposure, outcome=outcome), and (exposure=outcome, outcome=exposure)
  • kwargs – Additional keyword arguments to be passed to pandas to_csv function.
Returns:

None

pzplot(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)

Generates PZ-plots for each phenotype (or specified phenotypes) in merged GWAS data.

Parameters:
  • filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
  • phenotypes (list) – List of phenotype names to plot PZ-plots for (defaults to plotting all phenotypes)
  • nrows (int) – Specify number of rows in the figure ( defaults to int(sqrt(len(phenotypes))) )
  • ncols (int) – Specify number of columns in the figure ( defaults to int(ceil(len(phenotypes)/nrows)) )
  • figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
  • dpi (int) – DPI setting to use when saving the figure.
  • kwargs – Other keyword arguments to be passed to pysumstats.plot.pzplot()
Returns:

(fig, axes) or None.

qqplot(filename=None, phenotypes=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)

Generates QQ-plots for each phenotype (or specified phenotypes) in merged GWAS data.

Parameters:
  • filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
  • phenotypes (list) – List of phenotype names to plot QQ-plots for (defaults to plotting all phenotypes)
  • nrows (int) – Specify number of rows in the figure ( defaults to int(sqrt(len(phenotypes))) )
  • ncols (int) – Specify number of columns in the figure ( defaults to int(ceil(len(phenotypes)/nrows)) )
  • figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
  • dpi (int) – DPI setting to use when saving the figure.
  • kwargs – Other keyword arguments to be passed to pysumstats.plot.qqplot()
Returns:

(fig, axes) or None.

reset_index()

Reset the index of the data.

Returns:None
save(path, per_chromosome=False, per_phenotype=False, phenotype=None, **kwargs)

Save the data held in this object to local storage.

Parameters:
  • path (str) – Relative or full path to the target file to store the data or object in. Paths ending in .pickle will save a pickled version of the full object. Note that with low_ram enabled this will not store the data. When per_phenotype is specified, add {} to the path where the phenotype name should be, if {} is not in the string, the filename will be prefixed with phenotype name.
  • per_chromosome (bool) – Whether to save seperate files for each chromosome.
  • per_phenotype – Set to True to create a separate file for each phenotype in MergedSumStats objects

:type per_phenotype :param phenotype: Only save a file for a specifici phenotype in MergedSumstats objects :type phenotype: str :param kwargs: keyword arguments to be passed to pandas to_csv() function. :return: None

sort_values(by, inplace=True, **kwargs)

Sorts values in the dataframe. Note: Sorting by chromosme (chr) will have no effect as data is already structured by chromosome.

Parameters:
  • by (str) – label of the column to sort values by
  • inplace (bool) – Whether to return the sorted object or sort values within existing object. (Currently only inplace sorting is supported)
  • kwargs – Other keyword arguments to be passed to pandas sort_values function
Returns:

Non

tail(n=10, n_chromosomes=1, **kwargs)

Prints (n_chromosomes) dataframes with the last n rows.

Parameters:
  • n (int) – number of rows to show
  • n_chromosomes (int) – number of chromosomes to show.
  • kwargs – keyword arguments to be passed to pandas tail function
Returns:

None

zzplot(ref_phenotypes=None, other_phenotypes=None, filename=None, nrows=None, ncols=None, figsize=None, dpi=300, **kwargs)

Generates ZZ comparison plots for merged GWAS data.

Parameters:
  • ref_phenotypes (list) – List of phenotypes to use as reference (defaults to all phenotypes)
  • other_phenotypes (list) – List of phenotypes to compare reference to (defaults to all phenotypes, overlapping plots will be dropped)
  • filename (str) – Target file to save the resulting figure to (if no name is specified, fig and axes are returned)
  • nrows (int) – Specify number of rows in the figure ( defaults to int(ceil(n_plots/ncols)) )
  • ncols (int) – Specify number of columns in the figure ( defaults to int(sqrt(n_plots)))
  • figsize ((int, int)) – Specify width and height of figure in inches ( defaults to (ncols*5, nrows*5) )
  • dpi (int) – DPI setting to use when saving the figure.
  • kwargs – Other keyword arguments to be passed to pysumstats.plot.zzplot()
Returns:

(fig, axes) or None.