YEARS

2013-2016

AUTHORS

Ajay N Jain

TITLE

Binding-Site Modeling with Multiple-Instance Machine-Learning

ABSTRACT

DESCRIPTION (provided by applicant): This proposal is entitled Binding-Site Modeling with Multiple-Instance Machine-Learning. One of the most challenging and longest studied problems in computer-aided drug design has been affinity prediction of small molecule ligands for their cognate protein targets. Despite decades of work, quantitative structure-activity re- lationship prediction (QSAR) approaches still suffer from poor accuracy, especially when predicting outside of closely related series of molecules. Even with high-quality structures of target proteins, approaches grounded in physics are also far from robust and accurate enough for reliable use in drug lead optimization. This proposal will build upon a foundation in multiple-instance machine learning applied to computer-aided drug design problems and develop a robust, accurate, and practically applicable affinity prediction methodology. The methodology requires only ligand structures and associated activity data for training, and it induces a virtual protein binding site composed of molecular fragments. The virtual binding pocket (or pocketmol) is used in conjunction with a scoring function developed originally for molecular docking. The pocketmol configuration is chosen such that the optimal conformation and alignment of a ligand (based on the docking scoring function), yields scores for training ligands that are close to the known experimental values. Feasibility has been demon- strated in papers involving both membrane-bound receptors and enzymes. However, multiple challenges remain and are the subject of the proposed research. There are three key issues. First, there exist many pocketmols that satisfy the requirements of fitting the training data, so general solutions must be developed to address the inductive bias of the learning procedure as well as model selection after the procedure. Second, since any particular model is the product of a learning process, it will have some domain of applicability, with some new molecules likely to be predicted well and others poorly. Further, the model will be better informed by learning with certain new molecules but not others. We must develop solutions for estimating confidence of predictions for new molecules as well as for identifying particular molecules that will be highly informative. Third, the operational application of these methods involves model building, guided chemical synthesis, and iterative refinement of models. Convincing validation will require application on temporal series of molecules synthesized for multiple targets of pharmaceutical interest. The proposed work will develop novel methods to address these challenges and will establish extensive validation on multiple pharmaceutically relevant temporal series of small molecules that were the subject of real-world lead-optimization exercises.

FUNDED PUBLICATIONS

  • A structure-guided approach for protein pocket modeling and affinity prediction.
  • Chemical and protein structural basis for biological crosstalk between PPARα and COX enzymes
  • Does your model weigh the same as a duck?
  • Iterative refinement of a binding pocket model: active computational steering of lead optimization.
  • Does your model weigh the same as a Duck?
  • Physical binding pocket induction for affinity prediction.
  • A structure-guided approach for protein pocket modeling and affinity prediction
  • Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock
  • Chemical and protein structural basis for biological crosstalk between PPARα and COX enzymes.
  • Protein function annotation by local binding site surface similarity.
  • Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock.
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    28 TRIPLES      17 PREDICATES      29 URIs      9 LITERALS

    Subject Predicate Object
    1 grants:e873c361f52eeaa41513dfc641976e7f sg:abstract DESCRIPTION (provided by applicant): This proposal is entitled Binding-Site Modeling with Multiple-Instance Machine-Learning. One of the most challenging and longest studied problems in computer-aided drug design has been affinity prediction of small molecule ligands for their cognate protein targets. Despite decades of work, quantitative structure-activity re- lationship prediction (QSAR) approaches still suffer from poor accuracy, especially when predicting outside of closely related series of molecules. Even with high-quality structures of target proteins, approaches grounded in physics are also far from robust and accurate enough for reliable use in drug lead optimization. This proposal will build upon a foundation in multiple-instance machine learning applied to computer-aided drug design problems and develop a robust, accurate, and practically applicable affinity prediction methodology. The methodology requires only ligand structures and associated activity data for training, and it induces a virtual protein binding site composed of molecular fragments. The virtual binding pocket (or pocketmol) is used in conjunction with a scoring function developed originally for molecular docking. The pocketmol configuration is chosen such that the optimal conformation and alignment of a ligand (based on the docking scoring function), yields scores for training ligands that are close to the known experimental values. Feasibility has been demon- strated in papers involving both membrane-bound receptors and enzymes. However, multiple challenges remain and are the subject of the proposed research. There are three key issues. First, there exist many pocketmols that satisfy the requirements of fitting the training data, so general solutions must be developed to address the inductive bias of the learning procedure as well as model selection after the procedure. Second, since any particular model is the product of a learning process, it will have some domain of applicability, with some new molecules likely to be predicted well and others poorly. Further, the model will be better informed by learning with certain new molecules but not others. We must develop solutions for estimating confidence of predictions for new molecules as well as for identifying particular molecules that will be highly informative. Third, the operational application of these methods involves model building, guided chemical synthesis, and iterative refinement of models. Convincing validation will require application on temporal series of molecules synthesized for multiple targets of pharmaceutical interest. The proposed work will develop novel methods to address these challenges and will establish extensive validation on multiple pharmaceutically relevant temporal series of small molecules that were the subject of real-world lead-optimization exercises.
    2 sg:endYear 2016
    3 sg:fundingAmount 1162362.0
    4 sg:fundingCurrency USD
    5 sg:hasContribution contributions:26eefe1a80d11a51156872136cccc032
    6 sg:hasFieldOfResearchCode anzsrc-for:08
    7 anzsrc-for:0801
    8 sg:hasFundedPublication articles:2e7fb497e17fc08f1736713785f35a98
    9 articles:87fba4af72fd9af971e422284850042e
    10 articles:8e3a771c86c232c021bf72392386698f
    11 articles:8f414cbc2571ae433ba733932a0b7642
    12 articles:93104b6a76aa77165be26833fbeea0b3
    13 articles:a3379eb2cadac63a39b7beac318ed2d0
    14 articles:a645d2c66297251f04220d6ed1307d17
    15 articles:bd4dd9cdbd71977ed1833495bcde89bb
    16 articles:d06409f4b2d84f202a06483117d2a115
    17 articles:da31ea99ac0b02fc0e0099f275509516
    18 articles:f6c8354f948743fa26d303b727fb4ccf
    19 sg:hasFundingOrganization grid-institutes:grid.280785.0
    20 sg:hasRecipientOrganization grid-institutes:grid.266102.1
    21 sg:language English
    22 sg:license http://scigraph.springernature.com/explorer/license/
    23 sg:scigraphId e873c361f52eeaa41513dfc641976e7f
    24 sg:startYear 2013
    25 sg:title Binding-Site Modeling with Multiple-Instance Machine-Learning
    26 sg:webpage http://projectreporter.nih.gov/project_info_description.cfm?aid=8987578
    27 rdf:type sg:Grant
    28 rdfs:label Grant: Binding-Site Modeling with Multiple-Instance Machine-Learning
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/e873c361f52eeaa41513dfc641976e7f'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/e873c361f52eeaa41513dfc641976e7f'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/e873c361f52eeaa41513dfc641976e7f'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/e873c361f52eeaa41513dfc641976e7f'






    Preview window. Press ESC to close (or click here)


    ...