SumStats¶
-
class
pysumstats.
SumStats
(path, phenotype=None, gwas_n=None, column_names=None, data=None, low_ram=False, tmpdir='sumstats_temporary', **kwargs)¶ Class for summary statistics of a single GWAS.
Parameters: - path (str) – Path to the file containing summary statistics. Should be a csv, or tab-delimited txt-file (.gz supported).
- phenotype (str) – Phenotype name
- gwas_n (int) – Optional N subjects in the GWAS, for N-based meta analysis (if there is no N column in the summary statistics)
- column_names (dict) – Optional dictionary of column names, if these are not automatically recognised. Keys should be: [‘rsid’, ‘chr’, ‘bp’, ‘ea’, ‘oa’, ‘maf’, ‘b’, ‘se’, ‘p’, ‘hwe’, ‘info’, ‘n’, ‘eaf’, ‘oaf’]
- data (dict) – Dataset for the new SumStats object, in general, don’t specify this.
- low_ram (bool) – Whether to use the low_ram option for this SumStats object. Use this only when running into MemoryErrors. Enabling this option will read/write data from local storage rather then RAM. It will save lots of RAM, but it will significantly decrease processing speed.
- tmpdir (str) – Which directory to store the temporary files if low_ram is enabled.
- kwargs – other keyword arguments to be passed to pandas.read_csv() method
-
close
()¶ Close connection to and HDF5 file if low_ram is specified
Returns: None
-
copy
()¶ Returns: a deepcopy of the existing opject
-
describe
(columns=None, per_chromosome=False)¶ Get a summary of the data.
Parameters: - columns (list.) – List of column names to print summary for (default: [‘b’, ‘se’, ‘p’])
- per_chromosome (bool.) – Enable to return a list of summary dataframes per chromosome
Returns: pd.Dataframe, or list
-
groupby
(*args, **kwargs)¶ Compatibility function to create pandas grouped object
Parameters: - args – arguments to be passed to pandas groupby function
- kwargs – keyword arguments to be passed to pandas groupby function
Returns: a full grouped pandas dataframe object
-
head
(n=10, n_chromosomes=1, **kwargs)¶ Prints (n_chromosomes) dataframes with the first n rows.
Parameters: Returns: None
-
manhattan
(**kwargs)¶ Generate a manhattan plot using this sumstats data
Parameters: kwargs – keyworded arguments to be passed to pysumstats.plot.manhattan()
Returns: None, or (fig, ax)
-
merge
(other, how='inner', low_memory=False)¶ Merge with other SumStats object(s).
Parameters: Returns: pysumstats.plot.MergedSumStats
object
-
plot_all
(dest='.', prefix='SumStatsPlots', kwargs={})¶ Runs all attached plot functions
Parameters: - dest (str) – Folder to save resulting files to. File names will be: {prefix}_{plottype}_{YEAR-MONTH-DAY}.png
- prefix (str) – prefix to use when saving files.
- kwargs (dict) – Nested dictionary of other keyword arguments to be passed to each function (keys of top-level dictionary should be function names). Use the ‘all’ key the top level dictionary to pass keyword argument to every function.
Returns: None
-
pzplot
(**kwargs)¶ Generate a PZ-plot using this sumstats data
Parameters: kwargs – keyworded arguments to be passed to pysumstats.plot.pzplot()
Returns: None, or (fig, ax)
-
qc
(maf=None, hwe=None, info=None, **kwargs)¶ Basic GWAS quality control function.
Parameters: - maf (float or None) – Minor allele frequency cutoff, will drop SNPs where MAF < cutoff. Default: 0.1
- hwe (float or None) – Hardy-Weinberg Equilibrium cutoff, will drop SNPs where HWE < cutoff, if specified and HWE column is present in the data.
- info (float or None) – Imputation quality cutoff, will drop SNPs where Info < cutoff, if specified and Info column is present in the data.
- kwargs – Other columns to filter on, keyword should be column name, SNPs whill be dropped where the value < argument.
Returns: None
-
qqplot
(**kwargs)¶ Generate a QQ-plot using this sumstats data
Parameters: kwargs – keyworded arguments to be passed to pysumstats.plot.qqplot()
Returns: None, or (fig, ax)
-
reset_index
()¶ Reset the index of the data.
Returns: None
-
save
(path, per_chromosome=False, per_phenotype=False, phenotype=None, **kwargs)¶ Save the data held in this object to local storage.
Parameters: - path (str) – Relative or full path to the target file to store the data or object in. Paths ending in .pickle will save a pickled version of the full object. Note that with low_ram enabled this will not store the data. When per_phenotype is specified, add {} to the path where the phenotype name should be, if {} is not in the string, the filename will be prefixed with phenotype name.
- per_chromosome (bool) – Whether to save seperate files for each chromosome.
- per_phenotype – Set to True to create a separate file for each phenotype in MergedSumStats objects
:type per_phenotype :param phenotype: Only save a file for a specifici phenotype in MergedSumstats objects :type phenotype: str :param kwargs: keyword arguments to be passed to pandas to_csv() function. :return: None
-
sort_values
(by, inplace=True, **kwargs)¶ Sorts values in the dataframe. Note: Sorting by chromosme (chr) will have no effect as data is already structured by chromosome.
Parameters: Returns: Non