A paper entitled 'An Improved Archaeology Algorithm Based on Integrated Multi-Source Biological Information for Yeast Protein Interaction Network' has been published recently in IEEE Access. This paper is a joint work with colleague of Embry-Riddle Aeronautical University, USA, and Dr Yuan Zhang is the corresponding author. This paper can be accessedhere.

Abstract: With the development of high-throughput interaction detection techniques such as tandem affinity purification (TAP) and yeast two-hybrid (Y2H), the available genome-wide protein–protein inter-actions (PPIs) data have been increasing in recent years. Using mathematical, physical, and artificial-intelligence methods, some researchers in computational biology focused on uncovering the evolutionary ages of proteins according to present PPI networks (PINs), but improving their accuracy was challenging. A plausible explanation is that they solved biological problems with non-biological techniques and did not provide much attention to biological backgrounds and meanings of proteins or their relationships. In this paper, we propose two ways to improve the accuracy of age predicting and skillfully ‘‘embedding’’ multi-source biological information in each iteration of an archaeology algorithm for yeast PIN. On the one hand, we reduce the probability of reversing errors by decreasing the non-duplication protein pairs, which are obtained from 460 gene trees constructed by means of a multiple sequence alignment and the neighbor joining algorithm. On the other hand, the reliable crossover standard from different biological information sources can decrease local random errors of alternative treatment. The application of the novel algorithm to simulation data and real yeast PINs shows a marked improvement in accuracy. Our research strongly suggests that putting non-biological methods into the 'biological context' will bear more favorable results.

最近,USLab与美国Embry-Riddle Aeronautical University的同行合作,在期刊IEEE Access发表了论文“An Improved Archaeology Algorithm Based on Integrated Multi-Source Biological Information for Yeast Protein Interaction Network”。张远博士是该论文通讯作者,论文通过“嵌入”多源生物信息,提出了提高蛋白质年龄预测准确度的方法,全文可以在此处获取。

摘要:以串联亲和纯化与酵母双杂交技术为代表,高通量互作用探测技术在近些年不断得到发展,利用这些技术得到的全基因组蛋白质相互作用数据也越来越多。利用数学、物理和人工智能等方法,很多计算生物学方面的学者从现有蛋白质相互作用网络出发去揭示蛋白质的进化年龄,但是提高预测蛋白质年龄的准确性无疑是个不小的挑战。其一个可能的原因是这些学者完全利用非生物的技术或工具去解决纯生物类的问题,忽视了蛋白质本身或蛋白质之间相互关系的生物背景。在本文中,我们通过两个途径,巧妙地“嵌入”多源生物信息,改善一个经典的蛋白质相互作用网络考古算法的执行效率,提高蛋白质年龄预测的准确度。一方面,我们通过减少非复制关系的蛋白质对的数量,减少算法中回溯操作的误差。其中非复制关系的蛋白质对的数据来源于基于多序列比对和NJ算法的460棵基因树。另外一方面,我们利用不同角度的生物信息生成可靠的交叉标准,利用这个交叉标准减少算法中每次迭代所出现的随机选择误差。数值算例显示,不管是模拟网络还是真实蛋白质相互作用网络,新算法在判断节点年龄的准确度上比旧算法有了明显的提升。我们的工作充分表明: 在生物背景下或结合必要的生物信息再去应用非生物技术解决相关问题,会取得更丰硕的成果。