2.7. khipu utils

Utility functions and m/z patterns. Adduct rules can be still better learned in the future.

khipu.utils.add_data_to_tag(trees, len_limit=20)[source]

Append relation note in data to tree.tag, for more informtive showing

khipu.utils.assign_masstrack_ids_in_khipu(feature_dict, mz_tolerance_ppm=5)[source]

Assign mass track ids if they are not included in peak dict. They should be if features were processed by asari.

Parameters:
  • feature_dict – feature dictionary indexed by feature ids, based on input network to khipu, thus very limited size.

  • mz_tolerance_ppm – ppm tolerance in examining m/z groups.

Return type:

Updated feature_dict with ‘parent_masstrack_id’.

Note

Sort by m/z; separate by mz_tolerance_ppm. m/z has to be check by ppm because 1) minor variation may exist btw peaks; and 2) float numbers are bad for dictionary keys. This method can be occasionally problematic in data that were not processed well, by accidently merging m/z regions, therefore causing problems in downstream determination of ion relations. (Why mass track is good in asari.)

khipu.utils.compute_multichaged_patterns(pattern=[(21.982, 'Na/H'), (41.026549, 'ACN')], charge=2)[source]

Compute and return mass difference patterns for multiple charged ions.

khipu.utils.find_trees_by_datatag_list(trees, datatag_list=['13C/12C', '13C/12C*2', '13C/12C*3', '13C/12C*4', '13C/12C*5', '13C/12C*6'])[source]

Return a list of [tree roots] corresponding to the datatag_list. Note 13C/12C is not limited to *1 but all inclusive.

khipu.utils.get_adduct_edge_pairs(list_peaks, mztree, search_patterns=[(1.0078, 'H'), (21.982, 'Na/H'), (41.026549, 'ACN')], mz_tolerance_ppm=5, rt_tolerance=2)[source]

To find all pairs of adducts (fragments and neutral loss can be accommodated using negative mz difference).

Input:
  • list_peaks – [{‘parent_masstrace_id’: 1670, ‘mz’: 133.09702315984987, ‘rtime’: 654, ‘height’: 14388.0, ‘id’: 555}, …]

  • mztree – indexed list_peaks

  • mz_tolerance_ppm – ppm tolerance in examining m/z patterns.

  • search_patterns – a list in the format of [(mz difference, notion), …]

  • rt_tolerance – tolerance threshold for deviation in retetion time, arbitrary unit depending on input data. Default intended as 2 seconds.

Returns:

  • list of lists of peak pairs that match search_patterns patterns, e.g.

  • [ (195, 206, ‘H/Na’), …].

khipu.utils.get_isotope_pattern_name(mz, isotope_search_patterns, mz_tolerance=0.01)[source]

Get the isotope with closest m/z match. If error > mz_tolerance, return Unknown, which can happen if the isotope_search_patterns does not cover all possible labled atoms. The mz_tolerance needs not to be too precise, as input value was from isotopic_edges. Used by realign_isotopes.

Return type:

A name in isotope_search_patterns or ‘Unknown’.

khipu.utils.get_isotopic_edge_pairs(list_peaks, mztree, search_patterns=[(1.003355, '13C/12C', (0, 0.8))], mz_tolerance_ppm=5, isotope_rt_tolerance=2, check_isotope_ratio=False)[source]

To find all isotope pairs. Similar to search.get_seed_empCpd_signatures, but return unidirectional pairs only. If input peaks have overlaps/duplicates, result will contain the redundant overlap peaks.

Input:
  • list_peaks – [{‘parent_masstrace_id’: 1670, ‘mz’: 133.09702315984987, ‘rtime’: 654, ‘height’: 14388.0, ‘id’: 555}, …]

  • mztree – indexed list_peaks

  • mz_tolerance_ppm – ppm tolerance in examining m/z patterns.

  • search_patterns – a list in the format of [(mz difference, notion, (ratio low limit, ratio high limit)), ..] This can be obtained through search.isotopic_patterns. The ratios are optional, because 1) naturally occuring constrains are based on chemical formula; 2) rules are different when isotope tracers are introduced to the experiments. But it’s important to have a comprehensive list here for isotope tracing experiments.

  • rt_tolerance – tolerance threshold for deviation in retetion time, arbitrary unit depending on input data. Default intended as 2 seconds.

Returns:

  • list of lists of peak pairs that match search_patterns patterns,

  • e.g.[ (195, 206, ‘13C/12C’), …].

khipu.utils.make_edge_tag(edge)[source]

Concatenate sorted str edges by underscore

khipu.utils.make_expected_adduct_index(mode='pos', pattern=[(21.982, 'Na/H'), (41.026549, 'ACN')], charge=1)[source]

Construct the adduct list for a core list of adduct m/z diff patterns. Use neutral mass as 0 offset, so that later regression will compute neutral mass. Not modify adduct tag, so that the edges can be later mapped correctly.

khipu.utils.make_peak_dict(peak_list)[source]

Same as search.build_peak_id_dict but uses either ‘id’ or ‘id_number’.

khipu.utils.make_peak_tag(peak)[source]

peak format: {‘id’: ‘F1’, ‘mz’: 60.0808, ‘rtime’: 117.7, …, ‘intensities’: [250346.0], ‘representative_intensity’: 250346.0}

khipu.utils.peaks_to_networks(peak_list, isotope_search_patterns=[(1.003355, '13C/12C', (0, 0.8)), (2.00671, '13C/12C*2', (0, 0.8)), (3.010065, '13C/12C*3', (0, 0.8)), (4.01342, '13C/12C*4', (0, 0.8)), (5.016775, '13C/12C*5', (0, 0.8)), (6.02013, '13C/12C*6', (0, 0.8))], adduct_search_patterns=[(21.982, 'Na/H'), (41.026549, 'ACN')], mz_tolerance_ppm=5, rt_tolerance=2)[source]

Search peak_list for patterns of isotopes and adducts, form a network and get connected subnetworks.

Parameters:
  • list_peaks – [{‘mz’: 133.09702315984987, ‘rtime’: 654, ‘id’: 555}, …]

  • isotope_search_patterns – exact list used to retrieve the subnetworks. E.g. [ (1.003355, ‘13C/12C’, (0, 0.8)), (2.00671, ‘13C/12C*2’, (0, 0.8)), (3.010065, ‘13C/12C*3’, (0, 0.8)), (4.01342, ‘13C/12C*4’, (0, 0.8)), (5.016775, ‘13C/12C*5’, (0, 0.8)), (6.02013, ‘13C/12C*6’, (0, 0.8)),]

  • adduct_search_patterns – exact list used to retrieve the subnetworks. It’s not recommended to have a long list here, as it’s better to search additional in-source modifications after empCpds are seeded. Example adduct_search_patterns list: [ (1.0078, ‘H’), (21.9820, ‘Na/H’), (41.026549, ‘Acetonitrile’)]

  • mz_tolerance_ppm – ppm tolerance in examining m/z patterns.

  • rt_tolerance – tolerance threshold for deviation in retetion time, arbitrary unit depending on input data. Default intended as 2 seconds.

Returns:

  • subnetwork – undirected graph. Example edges: [(‘F1606’, ‘F20’, {‘type’: ‘modification’, ‘tag’: ‘H’}), (‘F3533’, ‘F20’, {‘type’: ‘modification’, ‘tag’: ‘Na/H’}), (‘F195’, ‘F20’, {‘type’: ‘modification’, ‘tag’: ‘Acetonitrile’}), (‘F20’, ‘F807’, {‘type’: ‘modification’, ‘tag’: ‘Acetonitrile’}), (‘F20’, ‘F53’, {‘type’: ‘isotope’, ‘tag’: ‘13C/12C’}), (‘F874’, ‘F808’, {‘type’: ‘isotope’, ‘tag’: ‘13C/12C’})]

  • peak_dict – JSON peaks indexed by ID

  • edge_dict – edge_tag is str sorted, but the dict values preserve the direction, which is missed in nx.subnetwork.

Note

Features of low abundance may not have detectable isotopes, but can have multiple adducts. Do not include too many adducts in the intial search. Do not include neutral loss and fragments in initial search. They are better done after a list of khipus are constructed.

khipu.utils.read_features_from_text(text_table, id_col=0, mz_col=1, rtime_col=2, intensity_cols=(3, 6), delimiter='\t')[source]

Read a text feature table into a list of features.

Input:
  • text_table – Tab delimited feature table read as text. First line as header. Recommended col 0 for ID, col 1 for m/z, col 2 for rtime.

  • id_col – column for id. If feature ID is not given, row_number is used as ID.

  • mz_col – column for m/z.

  • rtime_col – column for retention time.

  • intensity_cols – range of columns for intensity values. E.g. (3,5) includes only col 3 and 4.

Returns:

[{‘id’: ‘’, ‘mz’: 0, ‘rtime’: 0, intensities: [], ‘representative_intensity’: 0, …}, …], where representative_intensity is mean value.

Return type:

List of features

khipu.utils.realign_isotopes(sorted_mz_peak_ids, isotope_search_patterns, mz_tolerance=0.01)[source]
To snap isotopic branch. Assume lowest m/z as M0, and re-align other features against M0.

Because edges in input_network can be relationship between any pairs. Re-alignment will get them consistent on grid. No redundant features are allowed here, whihc are handled in khipu.clean().

Parameters:
  • sorted_mz_peak_ids – [(mz, peak_id), …]; must be unique per m/z. khipu.khipu.clean() takes care of that.

  • isotope_search_patterns – [ (1.003355, ‘13C/12C’, (0, 0.8)), (3.010065, ‘13C/12C*3’, (0, 0.8)),..]

Returns:

A dictionary of {‘M0’

Return type:

F1, ‘13C/12C*2’: F11, …}

khipu.utils.realign_isotopes_reverse(sorted_mz_peak_ids, isotope_search_patterns, mz_tolerance=0.01)[source]
To snap isotopic branch. Assume lowest m/z as M0, and re-align other features against M0.

Because edges in g can be relationship between any pairs. Re-alignment will get them consistent on grid. No redundant features are allowed here, whihc are handled in khipu.clean().

Parameters:
  • sorted_mz_peak_ids – [(mz, peak_id), …]; unique per m/z not required, different from realign_isotopes.

  • isotope_search_patterns – [ (1.003355, ‘13C/12C’, (0, 0.8)), (3.010065, ‘13C/12C*3’, (0, 0.8)),..]

Returns:

A dictionary of {F0

Return type:

‘M0’, F1, ‘13C/12C*2’, …}