models¶
peaklist¶
-
class
dimspy.models.peaklist.
PeakList
(ID: str, mz: Sequence[float], intensity: Sequence[float], **metadata)[source]¶ Bases:
object
The PeakList class.
Stores mass spectrometry peaks list data. It requires an ID, mz values, and intensities. It can store extra peak attributes e.g. SNRs, and peaklist tags and metadata. It utilises the automatically managed flags to “remove” or “retain” peaks without actually delete them. Therefore the filterings on the peaks are traceable.
- Parameters
ID – The ID of the peaklist data, unique string or integer value is recommended
mz – Mz values of all the peaks. Must in the ascending order
intensity – Intensities of all the peaks. Must have the same size as mz
kwargs – Key-value pairs of the peaklist metadata
>>> mz_values = np.random.uniform(100, 1200, size = 100) >>> int_values = np.random.normal(60, 10, size = 100) >>> peaks = PeakList('dummy', mz_values, int_values, description = 'a dummy peaklist')
Internally the peaklist data is stored by using numpy structured array namely the attribute talbe (this may change in the future):
mz
intensity
snr
snr_flag
…
flags*
102.5
21.7
10.5
True
…
True
111.7
12.3
5.1
False
False
126.3
98.1
31.7
True
True
133.1
68.9
12.6
True
True
…
Each column is called an attribute. The first two attributes are fixed as “mz” and “intensity”. They cannot be added or removed as the others. The last “attribute” is the “flags”, which is fact stored separately. The “flags” column is calculated automatically according to all the manually set flag attributes, e.g., the “snr_flag”. It can only be changed by the class itself. The unflagged peaks are considered as “removed”. They are kept internally mainly for visualization and tracing purposes.
Warning
Removing a flag attribute may change the “flags” column, and cause the unflagged peaks to be flagged again. As most the processes are applied only on the flagged peaks, these peaks, if the others have gone through such process, may have incorrect values.
In principle, setting a flag attribute should be considered as an irreversible process.
-
property
ID
¶ Property of the peaklist ID.
- Getter
Returns the peaklist ID
- Setter
Set the peaklist ID
- Type
Same as input ID
-
add_attribute
(attr_name: str, attr_value: Sequence, attr_dtype: Optional[Union[Type, str]] = None, is_flag: bool = False, on_index: Optional[int] = None, flagged_only: bool = True, invalid_value=nan)[source]¶ Adds an new attribute to the PeakList attribute table.
- Parameters
attr_name – The name of the new attribute, must be a string
attr_value – The values of the new attribute. It’s size must equals to PeakList.size (if flagged_only == True), or PeakList.full_size (if flagged_only == False)
attr_dtype – The data type of the new attribute. If it is set to None, the PeakList will try to detect the data type based on attr_value. If the detection failed it will take the “object” type. Default = None
is_flag – Whether the new attribute is a flag attribute, i.e., will be used in flags calculation. Default = False
on_index – Insert the new attribute on a specific column. It can’t be 0 or 1, as the first two attributes are fixed as mz and intensity. Setting to None means to put it to the last column. Default = None
flagged_only – Whether the attr_value is set to the flagged peaks or all peaks. Default = True
invalid_value – If flagged_only is set to True, this value will be assigned to the unflagged peaks. The actual value depends on the attribute data type. For instance, on a boolean attribute invalid_value = 0 will be converted to False. Default = numpy.nan
- Return type
PeakList object (self)
-
property
attributes
¶ Property of the attribute names.
- Getter
Returns a tuple of the attribute names
- Type
tuple
-
calculate_flags
()[source]¶ Re-calculates the flags according to the flag attributes.
- Return type
numpy array
Note
This method will be called automatically every time a flag attribute is added, removed, or changed.
-
cleanup_unflagged_peaks
(flag_name: Optional[str] = None)[source]¶ Remove unflagged peaks.
- Parameters
flag_name – Remove peaks unflagged by this flag attribute. Setting None means to remove peaks unflagged by the overall flags. Default = None
- Return type
PeakList object (self)
>>> print(peaks) mz, intensity, intensity_flag, snr, snr_flag, flags 10, 70, True, 10, False, False 20, 60, True, 20, True, True 30, 50, False, 30, True, False 40, 40, False, 40, True, False >>> print(peaks.cleanup_unflagged_peaks('snr_flag')) mz, intensity, intensity_flag, snr, snr_flag, flags 20, 60, True, 20, True, True 30, 50, False, 30, True, False 40, 40, False, 40, True, False >>> print(peaks.cleanup_unflagged_peaks()) mz, intensity, intensity_flag, snr, snr_flag, flags 20, 60, True, 20, True, True
-
drop_attribute
(attr_name: str)[source]¶ Drops an existing attribute.
- Parameters
attr_name – The attribute name to drop. It cannot be mz, intensity, or flags
- Return type
PeakList object (self)
-
property
dtable
¶ Property of the overall attribute table.
- Getter
Returns the original attribute table
- Type
numpy structured array
Warning
This property directly accesses the internal attribute table. Be careful when manipulating the data, particularly pay attention to the potential side-effects.
-
property
flag_attributes
¶ Property of the flag attribute names.
- Getter
Returns a tuple of the flag attribute names
- Type
tuple
-
property
flags
¶ Property of the flags.
- Getter
Returns a deep copy of the flags array
- Type
numpy array
-
property
full_shape
¶ Property of the peaklist full attributes table shape.
- Getter
Returns the full attibutes table shape, including the unflagged peaks
- Type
tuple
-
property
full_size
¶ Property of the peaklist full size.
- Getter
Returns the full peaklist size, i.e., including the unflagged peaks
- Type
int
-
get_attribute
(attr_name: str, flagged_only: bool = True)[source]¶ Gets values of an existing attribute.
- Parameters
attr_name – The attribute to get values
flagged_only – Whether to return the values of flagged peaks or all peaks. Default = True
- Return type
numpy array
-
get_peak
(peak_index: Union[int, Sequence[int]], flagged_only: bool = True)[source]¶ Gets values of a peak.
- Parameters
peak_index – The index of the peak to get values
flagged_only – Whether the values are taken from the index of flagged peaks or all peaks. Default = True
- Return type
numpy array
-
has_attribute
(attr_name: str)[source]¶ Checks whether there exists an attribute in the table.
- Parameters
attr_name – The attribute name for checking
- Return type
bool
-
insert_peak
(peak_value: Sequence)[source]¶ Insert a new peak.
- Parameters
peak_value – The values of the new peak. Must contain values for all the attributes. It’s position depends on the mz value, i.e., the 1st value of the input
- Return type
PeakList object (self)
-
property
metadata
¶ Property of the peaklist metadata.
- Getter
Returns an access interface to the peaklist metadata object
- Type
PeakList_Metadata object
-
property
peaks
¶ Property of the attribute table.
- Getter
Returns a deep copy of the flagged attribute table
- Type
numpy structured array
-
remove_peak
(peak_index: Union[int, Sequence[int]], flagged_only: bool = True)[source]¶ Remove an existing peak.
- Parameters
peak_index – The index of the peak to remove
flagged_only – Whether the index is for flagged peaks or all peaks. Default = True
- Return type
PeakList object (self)
-
set_attribute
(attr_name: str, attr_value: Sequence, flagged_only: bool = True, unsorted_mz: bool = False)[source]¶ Sets values to an existing attribute.
- Parameters
attr_name – The attribute to set values
attr_value – The new attribute values, It’s size must equals to PeakList.size (if flagged_only == True), or PeakList.full_size (if flagged_only == False)
flagged_only – Whether the attr_value is set to the flagged peaks or all peaks. Default = True
unsorted_mz – Whether the attr_value contains unsorted mz values. This parameter is valid only when attr_name == “mz”. Default = False
- Return type
PeakList object (self)
-
set_peak
(peak_index: int, peak_value: Sequence, flagged_only: bool = True)[source]¶ Sets values to a peak.
- Parameters
peak_index – The index of the peak to set values
peak_value – The new peak values. Must contain values for all the attributes (not including flags)
flagged_only – Whether the peak_value is set to the index of flagged peaks or all peaks. Default = True
- Return type
PeakList object (self)
>>> print(peaks) mz, intensity, snr, flags 10, 10, 10, True 20, 20, 20, True 30, 30, 30, False 40, 40, 40, True >>> print(peaks.set_peak(2, [50, 50, 50], flagged_only = True)) mz, intensity, snr, flags 10, 10, 10, True 20, 20, 20, True 30, 30, 30, False 50, 50, 50, True >>> print(peaks.set_peak(2, [40, 40, 40], flagged_only = False)) mz, intensity, snr, flags 10, 10, 10, True 20, 20, 20, True 40, 40, 40, False 50, 50, 50, True
-
property
shape
¶ Property of the peaklist attributes table shape.
- Getter
Returns the attibutes table shape, i.e., peaks number x attributes number. The “flags” column does not count
- Type
tuple
-
property
size
¶ Property of the peaklist size.
- Getter
Returns the flagged peaklist size
- Type
int
-
sort_peaks_order
()[source]¶ Sorts peaklist mz values into ascending order.
Note
This method will be called automatically every time the mz values are changed.
Property of the peaklist tags.
- Getter
Returns an access interface to the peaklist tags object
- Type
PeakList_Tags object
-
to_df
()[source]¶ Exports peaklist attribute table to Pandas DataFrame, including the flags.
- Return type
pd.DataFrame
-
to_dict
(dict_type: Callable[[Sequence], Mapping] = <class 'collections.OrderedDict'>) → Mapping[source]¶ Exports peaklist attribute table to a dictionary (mappable object), including the flags.
- Parameters
dict_type – Result dictionary type, Default = OrderedDict
- Return type
list
peaklist_metadata¶
-
class
dimspy.models.peaklist_metadata.
PeakList_Metadata
[source]¶ Bases:
dict
The PeakList_Metadata class.
Dictionary-like container for PeakList metadata storage.
- Parameters
args – Iterable object of key-value pairs
kwargs – Metadata key-value pairs
>>> PeakList_Metadata([('name', 'sample_1'), ('qc', False)]) >>> PeakList_Metadata(name = 'sample_1', qc = False)
metadata attributes can be accessed in both dictionary-like and property-like manners.
>>> meta = PeakList_Metadata(name = 'sample_1', qc = False) >>> meta['name'] sample_1 >>> meta.qc False >>> del meta.qc >>> meta.has_key('qc') False
Warning
The __getattr__, __setattr__, and __delattr__ methods are overrided. DO NOT assign a metadata object to another metadata object, e.g., metadata.metadata.attr = value.
peaklist_tags¶
Bases:
object
The PeakList_Tags class.
Container for both typed and untyped tags. This class is mainly used in PeakList and PeakMatrix classes for sample filtering. For a PeakList the tag types must be unique, but not the tag values (unless they are untyped). For instance, PeakList can have tags batch = 1 and plate = 1, but not batch = 1 and batch = 2, or (untyped) 1 and (untyped) 1. Single value will be treated as untyped tag.
- Parameters
args – List of untyped tags
kwargs – List of typed tags. Only one tag value can be assigned to a specific tag type
>>> PeakList_Tags('untyped_tag1', Tag('untyped_tag2'), Tag('typed_tag', 'tag_type')) >>> PeakList_Tags(tag_type1 = 'tag_value1', tag_type2 = 'tag_value2')
Adds typed or untyped tag.
- Parameters
tag – Tag or tag value to add
tag_type – Type of the tag value
>>> tags = PeakList_Tags() >>> tags.add_tag('untyped_tag1') >>> tags.add_tag(Tag('typed_tag1', 'tag_type1')) >>> tags.add_tag(tag_type2 = 'typed_tag2')
Drops all tags, both typed and untyped.
Drops typed and untyped tag.
- Parameters
tag – Tag or tag value to drop
tag_type – Type of the tag value
>>> tags = PeakList_Tags('untyped_tag1', tag_type1 = 'tag_value1') >>> tags.drop_tag(Tag('tag_value1', 'tag_type1')) >>> print(tags) untyped_tag1
Drops the tag with the given type.
- Parameters
tag_type – Tag type to drop, None (untyped) may drop multiple tags
Checks whether there exists a specific tag.
- Parameters
tag – The tag for checking
tag_type – The type of the tag
- Return type
bool
>>> tags = PeakList_Tags('untyped_tag1', Tag('tag_value1', 'tag_type1')) >>> tags.has_tag('untyped_tag1') True >>> tags.has_tag('typed_tag1') False >>> tags.has_tag(Tag('tag_value1', 'tag_type1')) True >>> tags.has_tag('tag_value1', 'tag_type1') True
Checks whether there exists a specific tag type.
- Parameters
tag_type – The tag type for checking, None indicates untyped tags
- Return type
bool
Returns tag value of the given tag type, or tuple of untyped tags if tag_type is None.
- Parameters
tag_type – Valid tag type, None for untyped tags
- Return type
Tag, or None if tag_type not exists
Property of included tag types. None indicates untyped tags included.
- Getter
Returns a set containing all the tag types of the typed tags
- Type
set
Property of included tag values. Same tag values will be merged
- Getter
Returns a set containing all the tag values, both typed and untyped tags
- Type
set
Property of all included tags.
- Getter
Returns a tuple containing all the tags, both typed and untyped
- Type
tuple
Exports tags to a list. Each element is a tuple of (tag value, tag type).
>>> tags = PeakList_Tags('untyped_tag1', tag_type1 = 'tag_value1') >>> tags.to_list() [('untyped_tag1', None), ('tag_value1', 'tag_type1')]
- Return type
list
Exports tags to a string. It can also be used inexplicitly as
>>> tags = PeakList_Tags('untyped_tag1', tag_type1 = 'tag_value1') >>> print(tags) untyped_tag1, tag_type1:tag_value1
- Return type
str
Property of included typed tags.
- Getter
Returns a tuple containing all the typed tags
- Type
tuple
Property of included untyped tags.
- Getter
Returns a tuple containing all the untyped tags
- Type
tuple
Bases:
object
The Tag class.
This class is mainly used in PeakList and PeakMatrix classes for sample filtering.
- Parameters
value – Tag value, must be number (int, float), string (ascii, unicode), or Tag object (ignore ttype setting)
ttype – Tag type, must be string or None (untyped), default = None
Single value will be treated as untyped tag:
>>> tag = Tag(1) >>> tag == 1 True >>> tag = Tag(1, 'batch') >>> tag == 1 False
Property of tag type. None indicates untyped tag.
- Getter
Returns the type of the tag
- Setter
Set the tag type, must be None or string
- Type
None, str, unicode
Property to decide if the tag is typed or untyped.
- Getter
Returns typed status of the tag
- Type
bool
Property of tag value.
- Getter
Returns the value of the tag
- Setter
Set the tag value, must be number or string
- Type
int, float, str, unicode
peak_matrix¶
-
class
dimspy.models.peak_matrix.
PeakMatrix
(peaklist_ids: Sequence[str], peaklist_tags: Sequence[dimspy.models.peaklist_tags.PeakList_Tags], peaklist_attributes: Sequence[Tuple[str, Any]])[source]¶ Bases:
object
The PeakMatrix class.
Stores aligned mass spectrometry peaks matrix data. It requires IDs, tags, and attributes from the source peak lists. It uses tags based mask to “hide” the unrelated samples for convenient processing. It utilises the automatically managed flags to “remove” peaks without actually delete them. Therefore the filterings on the peaks are traceable. Normally, PeakMatrix object is created by functions e.g. align_peaks() rather than manual.
- Parameters
peaklist_ids – The IDs of the source peak lists
peaklist_tags – The tags (PeakList_Tags) of the source peak lists
peaklist_attributes – The attributes of the source peak lists. Must be a list or tuple in the format of [(attr_name, attr_matrix), …], where attr_name is name of the attribute, and attr_matrix is the vertically stacked arrtibute values in the shape of samples x peaks. The order of the attributes will be kept in the PeakMatrix. The first two attributes must be “mz” and “intensity”.
>>> pids = [pl.ID for pl in peaklists] >>> tags = [pl.tags for pl in peaklists] >>> attrs = [(attr_name, np.vstack([pl[attr_name] for pl in peaklists])) for attr_name in peaklists[0].attributes] >>> pm = PeakMatrix(pids, tags, attrs)
Internally the attribute data is stored in OrderedDict as a list of matrix. An attribute matrix can be illustrated as follows, in which the mask and flags are the same for all attributes. The final row “flags” is automatically calculated based on the manually added flags. It decides which peaks are “removed” i.e. unflagged. Particularly, the “–” indicates no peak in that sample can be aligned into the mz value.
attribute: “mz”
mask
peak_1
peak_2
peak_3
…
False
12.7
14.9
21.0
…
True
–
15.1
21.1
False
12.1
14.7
–
False
12.9
14.8
20.9
…
flag_1
True
False
True
…
flag_2
True
True
False
flags*
True
False
False
Warning
Removing a flag may change the overall “flags”, and cause the unflagged peaks to be flagged again. As most the processes are applied only on the flagged peaks, these peaks, if the others have gone through such process, may have incorrect values.
In principle, setting a flag attribute should be considered as an irreversible process.
Different from the flags, mask should be considered as a more temporary way to hide the unrelated samples. A masked sample (row) will not be used for processing, but its data is still in the attribute matrix. For this reason, the mask_peakmatrix, unmask_peakmatrix, and unmask_all_peakmatrix statements are provided as a more flexible way to set / unset the mask.
-
add_flag
(flag_name: str, flag_values: Sequence[bool], flagged_only: bool = True)[source]¶ Adds a flag to the peak matrix peaks.
- Parameters
flag_name – name of the flag, it must be unique and not equal to “flags”
flag_values – values of the flag. It must have a length of pm.shape[1] if flagged_only = True, or pm.full_shape[1] if flagged_only = False
flagged_only – whether to set the flagged peaks only. Default = True, and the values of the unflagged peaks are set to False
The overall flags property will be automatically recalculated.
-
attr_matrix
(attr_name: str, flagged_only: bool = True)[source]¶ Obtains an existing attribute matrix.
- Parameters
attr_name – name of the target attribute
flagged_only – whether to return the flagged values only. Default = True
- Return type
numpy array
-
attr_mean_vector
(attr_name: str, flagged_only: bool = True)[source]¶ Obtains the mean array of an existing attribute matrix.
- Parameters
attr_name – name of the target attribute
flagged_only – whether to return the mean array of the flagged values only. Default = True
- Return type
numpy array
Noting that only the “present” peaks will be used for mean values calculation. If the attribute matrix has a string / unicode data type, the values in each column will be concatenated.
-
property
attributes
¶ Property of the attribute names.
- Getter
returns a tuple including the names of the attribute matrix
- Type
tuple
-
drop_flag
(flag_name: str)[source]¶ Drops a existing flag from the peak matrix.
- Parameters
flag_name – name of the flag to drop. It must exist and not equal to “flags”
The overall flags property will be automatically recalculated.
-
extract_peaklist
(peaklist_id: str)[source]¶ Extracts one peaklist from the peak matrix.
- Parameters
peaklist_id – ID of the peaklist to extract
- Return type
PeakList object
Only the “present” peaks will be included in the result peaklist.
-
property
flag_names
¶ Property of the flag names.
- Getter
returns a tuple including the names of the manually set flags
- Type
tuple
-
flag_values
(flag_name: str)[source]¶ Obtains values of an existing flag.
- Parameters
flag_name – name of the target flag. It must exist and not equal to “flags”
- Return type
numpy array
-
property
flags
¶ Property of the flags.
- Getter
returns a deep copy of the flags array
- Type
numpy array
-
property
fraction
¶ Property of the fraction array.
- Getter
returns the fraction array, indicating the ratio of present peaks on each mz value
- Type
numpy array
>>> print pm.present array([3, 4, 2, 3, 3]) >>> print pm.shape[0] 4 >>> print pm.fraction array([0.75, 1.0, 0.5, 0.75, 0.75])
-
property
full_shape
¶ Property of the peak matrix full shape.
- Getter
returns the full shape of the attribute matrix, i.e., ignore mask and flags
- Type
tuple
-
property
intensity_matrix
¶ Property of the intensity matrix.
- Getter
returns the intensity attribute matrix, unmasked and flagged values only
- Type
numpy array
-
property
intensity_mean_vector
¶ Property of the intensity mean values array.
- Getter
returns the mean values array of the intensity attribute matrix, unmasked and flagged values only
- Type
numpy array
-
is_empty
()[source]¶ Checks whether the peak matrix is empty under the current mask and flags.
- Return type
bool
-
property
mask
¶ Property of the mask.
- Getter
returns a deep copy of the mask array
- Setter
sets the mask array. Provide None to unmask all samples
- Type
numpy array
Masks samples with particular tags.
- Parameters
args – tags or untyped tag values for masking
kwargs – typed tags for masking
override – whether to override the current mask, default = False
- Return type
PeakMatrix object (self)
This function will mask samples with ALL the tags. To match ANY of the tags, use cascade form instead.
>>> pm.mask_tags('qc', plate = 1) (will mask all QC samples on plate 1) >>> pm.mask_tags('qc').mask_tags(plate = 1) (will mask QC samples and all samples on plate 1)
-
property
missing_values
¶ Property of the missing values array.
- Getter
returns the missing values array, indicating the number of unaligned peaks on each sample
- Type
numpy array
>>> print pm.present_matrix array([[ True, True, True, True, False], [ True, True, False, False, True], [ True, True, True, True, True], [False, True, False, True, True],]) >>> print pm.missing_values array([1, 2, 0, 2])
-
property
mz_matrix
¶ Property of the mz matrix.
- Getter
returns the mz attribute matrix, unmasked and flagged values only
- Type
numpy array
-
property
mz_mean_vector
¶ Property of the mz mean values array.
- Getter
returns the mean values array of the mz attribute matrix, unmasked and flagged values only
- Type
numpy array
-
property
occurrence
¶ Property of the occurrence array.
- Getter
returns the occurrence array, indicating the total number of peaks (including peaks in the same sample) aliged in each mz value. This property is valid only when the intra_count attribute matrix is available
- Type
numpy array
>>> print pm.attr_matrix('intra_count') array([[ 2, 1, 1, 1, 0], [ 1, 1, 0, 0, 1], [ 1, 3, 1, 2, 1], [ 0, 1, 0, 1, 1],]) >>> print pm.occurrence array([ 4, 6, 2, 4, 3])
-
property
peaklist_ids
¶ Property of the source peaklist IDs.
- Getter
returns a tuple including the IDs of the source peaklists
- Type
tuple
-
property
peaklist_tag_types
¶ Property of the source peaklist tag types.
- Getter
returns a tuple including the types of the typed tags of the source peaklists
- Type
set
-
property
peaklist_tag_values
¶ Property of the source peaklist tag values.
- Getter
returns a tuple including the values of the source peaklists tags, both typed and untyped
- Type
set
Property of the source peaklist tags.
- Getter
returns a tuple including the Peaklist_Tags objects of the source peaklists
- Type
tuple
-
property
present
¶ Property of the present array.
- Getter
returns the present array, indicating how many peaks are aligned in each mz value
- Type
numpy array
-
property
present_matrix
¶ Property of the present matrix.
- Getter
returns the present matrix, indicating whether a sample has peak(s) aligned in each mz value
- Type
numpy array
>>> print pm.present_matrix array([[ True, True, True, True, False], [ True, True, False, False, True], [ True, True, True, True, True], [False, True, False, True, True],]) >>> print pm.present array([3, 4, 2, 3, 3])
-
property
(prop_name: str, flagged_only: bool = True)[source]¶ Obtains an existing attribute matrix.
- Parameters
prop_name – name of the target property. Valid properties include ‘present’, ‘present_matrix’, ‘fraction’, ‘missing_values’, ‘occurrence’, and ‘purity’
flagged_only – whether to return the flagged values only. Default = True
- Return type
numpy array
-
property
purity
¶ Property of the purity level array.
- Getter
returns the purity array, indicating the ratio of only one peak in each sample being aligned in each mz value. This property is valid only when the intra_count attribute matrix is available
- Type
numpy array
>>> print pm.attr_matrix('intra_count') array([[ 2, 1, 1, 1, 0], [ 1, 1, 0, 0, 1], [ 1, 3, 1, 2, 1], [ 0, 1, 0, 1, 1],]) >>> print pm.purity array([ 0.667, 0.75, 1.0, 0.667, 1.0])
-
remove_empty_peaks
()[source]¶ Removes empty peaks from the peak matrix.
Empty peaks are peaks with not valid m/z or intensity value over the samples. They may occur after removing an entire sample from the peak matrix, e.g., remove the blank samples in the blank filter.
- Return type
PeakMatrix object (self)
-
remove_peaks
(peak_ids, flagged_only: bool = True)[source]¶ Removes peaks from the peak matrix.
- Parameters
peak_ids – the indices of the peaks to remove
flagged_only – whether the indices are for flagged peaks or all peaks. Default = True
- Return type
PeakMatrix object (self)
-
remove_samples
(sample_ids, masked_only: bool = True)[source]¶ Removes samples from the peak matrix.
- Parameters
sample_ids – the indices of the samples to remove
masked_only – whether the indices are for unmasked samples or all samples. Default = True
- Return type
PeakMatrix object (self)
-
rsd
(*args, **kwargs)[source]¶ Calculates relative standard deviation (RSD) array.
- Parameters
args – tags or untyped tag values for RSD calculation, no value = calculate over all samples
kwargs – typed tags for RSD calculation, no value = calculate over all samples
on_attr – calculate RSD on given attribute. Default = “intensity”
flagged_only – whether to calculate on flagged peaks only. Default = True
- Type
numpy array
The RSD is calculated as:
>>> rsd = std(pm.intensity_matrix, axis = 0, ddof = 1) / mean(pm.intensity_matrix, axis = 0) * 100
Noting that the means delta degrees of freedom (ddof) is set to 1 for standard deviation calculation. Moreover, only the “present” peaks will be used for calculation. If a column has less than 2 peaks, the corresponding rsd value will be set to np.nan.
-
property
shape
¶ Property of the peak matrix shape.
- Getter
returns the shape of the attribute matrix
- Type
tuple
Obtains tags of the peaklist_tags with particular tag type.
- Parameters
tag_type – the type of the returning tags. Provide None to obtain untyped tags
- Return type
tuple
-
to_peaklist
(ID: str)[source]¶ Averages the peak matrix into a single peaklist.
- Parameters
ID – ID of the merged peaklist
- Return type
PeakList object
Only the “present” peaks will be included in the result peaklist. The new peaklist will only contain the following attributes: mz, intensity, present, fraction, rsd, occurence, and purity.
Use unmask statement to calculate the peaklist for a particular group of samples:
>>> with unmask_peakmatrix(pm, 'Sample') as m: pkl = m.to_peaklist('averaged_peaklist')
Or use mask statement to exclude a particular group of samples:
>>> with mask_peakmatrix(pm, 'QC') as m: pkl = m.to_peaklist('averaged_peaklist')
-
to_str
(attr_name: str = 'intensity', delimiter: str = '\t', samples_in_rows: bool = True, comprehensive: bool = True, rsd_tags: Sequence = ())[source]¶ Exports the peak matrix to a string.
- Parameters
attr_name – name of the attribute matrix for exporting. Default = ‘intensity’
delimiter – delimiter to separate the matrix. Default = ‘ ‘, i.e., TSV format
samples_in_rows – whether or not the samples are stored in rows. Default = True
comprehensive – whether to include comprehensive info, e.g., mask, flags, present, rsd etc. Default = True
rsd_tags – peaklist tags for RSD calculation. Default = (), indicating only the overall RSD is included
- Return type
str
Unmasks samples with particular tags.
- Parameters
args – tags or untyped tag values for unmasking
kwargs – typed tags for unmasking
override – whether to override the current mask, default = False
- Return type
PeakMatrix object (self)
This function will unmask samples with ALL the tags. To unmask ANY of the tags, use cascade form instead.
>>> pm.mask = [True] * pm.full_shape[0] >>> pm.unmask_tags('qc', plate = 1) (will unmask all QC samples on plate 1) >>> pm.unmask_tags('qc').unmask_tags(plate = 1) (will unmask QC samples and all samples on plate 1)
-
class
dimspy.models.peak_matrix.
mask_all_peakmatrix
(pm: dimspy.models.peak_matrix.PeakMatrix)[source]¶ Bases:
object
The mask_all_peakmatrix statement.
Temporary mask all the peak matrix samples. Within the statement the samples can be motified or removed. After leaving the statement the original mask will be recoverd.
- Parameters
pm – the target peak matrix
- Return type
PeakMatrix object
>>> print pm.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2') >>> with mask_all_peakmatrix(pm) as m: print m.peaklist_ids () >>> print pm.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
-
class
dimspy.models.peak_matrix.
mask_peakmatrix
(pm: dimspy.models.peak_matrix.PeakMatrix, *args, **kwargs)[source]¶ Bases:
object
The mask_peakmatrix statement.
Temporary mask the peak matrix with particular tags. Within the statement the samples can be motified or removed. After leaving the statement the original mask will be recoverd.
- Parameters
pm – the target peak matrix
override – whether to override the current mask, default = True
args – target tag values, both typed and untyped
kwargs – target typed tag types and values
- Return type
PeakMatrix object
>>> print pm.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2') >>> with mask_peakmatrix(pm., 'qc') as m: print m.peaklist_ids ('sample_1', 'sample_2', 'sample_3', 'sample_4') >>> print pm.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
-
class
dimspy.models.peak_matrix.
unmask_all_peakmatrix
(pm: dimspy.models.peak_matrix.PeakMatrix)[source]¶ Bases:
object
The unmask_all_peakmatrix statement.
Temporary unmask all the peak matrix samples. Within the statement the samples can be motified or removed. After leaving the statement the original mask will be recoverd.
- Parameters
pm – the target peak matrix
- Return type
PeakMatrix object
>>> print pm.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2') >>> with unmask_all_peakmatrix(pm) as m: print m.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2') >>> print pm.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
-
class
dimspy.models.peak_matrix.
unmask_peakmatrix
(pm: dimspy.models.peak_matrix.PeakMatrix, *args, **kwargs)[source]¶ Bases:
object
The unmask_peakmatrix statement.
Temporary unmask the peak matrix with particular tags. Within the statement the samples can be motified or removed. After leaving the statement the original mask will be recoverd.
- Parameters
pm – the target peak matrix
override – whether to override the current mask, default = True
args – target tag values, both typed and untyped
kwargs – target typed tag types and values
- Return type
PeakMatrix object
>>> print pm.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2') >>> with unmask_peakmatrix(pm, 'qc') as m: print m.peaklist_ids ('qc_1', 'qc_2') # no need to set pm.mask to True >>> print pm.peaklist_ids ('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')