Cover Image

An automated workflow by using KNIME Analytical Platform: a case study for modelling and predicting HIV-1 protease inhibitors

Ramtin Ranji, Chanat Thanavanich, Sri Devi Sukumaran, Sila Kittiwachana, Sharifuddin Md Zain, Chee Sun Liew, Vannajan Sanghiran Lee Abstract - 46 PDF - 23


In this study, we have demonstrated an automated workflow by using KNIME Analytical Platform for modelling and predicting potential HIV-1 protease (HIVP) inhibitors. The workflow has been simplified in three easy steps i.e., 1) retrieve
the database of inhibitors for the target disease from ChEMBL website and well-known drug from DrugBank database, 2) generate the descriptors and, 3) select the optimal number of features after machine learning models training. Our results have indicated that the random forest with auto prediction validation method is the most reliable with the best R2 value of 0.9394. Apparently, this workflow can be transformed easily for any other diseases and the quantitative structure-activity relationship (QSAR) model that has been developed can accurately predict in silico how chemical modifications might influence biological behaviour. Overall, the automated workflow which has been presented in this study may significantly reduce the time, cost and efforts needed to design or develop potential HIVP inhibitors.

Full Text:



Buonaguro L, Tornesello ML, and Buonaguro FM, Human immunodeficiency virus type 1 subtype distribution in the worldwide epidemic: Pathogenetic and therapeutic implications. Journal of Virology, 2007. 81(19): 10209-10219.

“Fact Sheet” (PDF). 2018.

“UNAIDS Strategy,”


Zhang S, Kaplan AH, and Tropsha A, HIV-1 protease function and structure studies with the simplicial neighborhood analysis of protein packing method. Proteins, 2008. 73(3): 742-753.

Lv Z, Chu Y,Wang Y. HIV protease inhibitors: a review of molecular selectivity and toxicity. HIV/AIDS. Auckland, N.Z; 2015. p. 95–104.

Temesgen Z, and Wright AJ, Recent advances in the management of human immunodeficiency virus infection. Mayo Clinic Proceedings, 1997. 72(9): 854-858.

Mudgal M, Birudukota N, and Doke M, Applications of Click Chemistry in the Development of HIV Protease Inhibitors. International Journal of Medicinal Chemistry, 2018. 2018: 9 pages.

Win NN, Ngwe H, Abe I, et al., Naturally occurring Vpr inhibitors from medicinal plants of myanmar. Journal of Natural Medicines, 2017. 71(4): 579-589.

Humpolíčková J, Weber J, Starková J, et al., Inhibition of the precursor and mature forms of HIV-1 protease as a tool for drug evaluation. Scientific Reports, 2018. 8(1): 10438.

Richter SN, Frasson I, and Palù G, Strategies for inhibiting function

of HIV-1 accessory proteins: a necessary route to AIDS therapy?. Current Medicinal Chemistry, 2009. 16: 267-286.

Gaulton A, Bellis LJ, Bento AP, et al., ChEMBl: A large-scale bioactivity database for drug discovery. Nucleic Acids Research, 2012. 40(Database issue): D1100-D1107.

Nicola G, Berthold M, Hedrick M, et al., Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME. Database, 2015. 2015: bav087.

Warr WA, Scientific workflow systems: Pipeline pilot and KNIME. Journal of Computer-Aided Molecular Design, 2012. 26(7): 801- 804.

Wolstencroft K, Haines R, Fellows D, et al., The taverna workflow suite: Designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Research, 2013. 41(W1): W557-W561.

Ludäscher B, Altintas I, Berkley C, et al., Scientific workflow management and the kepler system: Research articles. Concurrency and Computation: Practice & Experience, 2006. 18(10): 1039-1065.

Afgan E, Baker D, Batut B, et al., The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research, 2018. 46(W1): W537-W544.

Rex DE, Ma JQ, and Toga AW, The loni pipeline processing environment. Neuroimage, 2003. 19(3): 1033-1048.

Fillbrunn A, Dietz C, Pfeuffer J, et al., Knime for reproducible cross-domain analysis of life science data. Journal of Biotechnology, 2017. 61: 149-156.

ChEMBL Database. (2019). Retrieved 10 May 2019, from

Jin X, Han J. K-means clustering. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of machine learning. Boston, MA: Springer US; 2010. p. 563-564.

McKeage K, Perry C, and Keam S, Darunavir. Drugs, 2009. 69(4): 477-503.

Arlot S, and Celisse A, A survey of cross-validation procedures for model selection. Statistics Surveys, 2010. 4(0): 40-79.

Cheng H, Garrick DJ, and Fernando RL, Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction. Journal of Animal Science and Biotechnology, 2017. 8: 38.


  • There are currently no refbacks.

Copyright (c) 2019 Vannajan Sanghiran Lee, Ramtin Ranji, Chanat Thanavanich, Sri Devi Sukumaran, Sila Kittiwachana, Sharifuddin Md Zain, Chee Sun Liew

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.