Open In Colab

RDKit (Cheminformatics tool) 설치¶

In [ ]:
!pip install rdkit
Requirement already satisfied: rdkit in /usr/local/lib/python3.12/dist-packages (2025.9.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from rdkit) (2.0.2)
Requirement already satisfied: Pillow in /usr/local/lib/python3.12/dist-packages (from rdkit) (11.3.0)

RDKit 불러오기¶

In [ ]:
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole

MDL Mol 파일을 읽고 쓰기를 해보자¶

1-Propoxyhexane 분자를 만들어 보자 (SMILES 코드를 이용하자)

In [ ]:
theMol = Chem.MolFromSmiles('CCCCCCOCCC')
theMol
Out[ ]:
No description has been provided for this image

MDL Molfile 문자열 만들기

In [ ]:
theMolBlock = Chem.MolToMolBlock(theMol)
print(theMolBlock)
     RDKit          2D

 10  9  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.8971    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.1962   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.4952    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.7942   -0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    9.0933    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.3923   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.6913    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  4  1  0
  4  5  1  0
  5  6  1  0
  6  7  1  0
  7  8  1  0
  8  9  1  0
  9 10  1  0
M  END

분자 이름 변경하기

In [ ]:
theMol.SetProp('_Name','1-Propoxyhexane')
print(Chem.MolToMolBlock(theMol))
1-Propoxyhexane
     RDKit          2D

 10  9  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.8971    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.1962   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.4952    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.7942   -0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    9.0933    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.3923   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.6913    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  4  1  0
  4  5  1  0
  5  6  1  0
  6  7  1  0
  7  8  1  0
  8  9  1  0
  9 10  1  0
M  END

분자를 MDL Mol파일로 저장하기

In [ ]:
theMolName = '1-Propoxyhexane.mol'
print(Chem.MolToMolBlock(theMol),file=open(theMolName, 'w+'))

MDL Mol 파일 불러오기

In [ ]:
theAnotherMol = Chem.MolFromMolFile(theMolName)
theAnotherMol
Out[ ]:
No description has been provided for this image

잘못된 분자구조를 읽어오려고 하면, 오류 메세지와 함께 Mol 객체는 None을 나타냄

In [ ]:
theInvalidMolecule1 = Chem.MolFromSmiles('CO(C)C')
theInvalidMolecule1 is None
[14:08:05] Explicit valence for atom # 1 O, 3, is greater than permitted
Out[ ]:
True

잘못된 분자구조를 읽어오려고 하면, 오류 메세지와 함께 Mol 객체는 None을 나타냄 (Kekulize 오류)

In [ ]:
theInvalidMolecule1 = Chem.MolFromSmiles('c1cc1')
theInvalidMolecule1 is None
[14:08:05] Can't kekulize mol.  Unkekulized atoms: 0 1 2
Out[ ]:
True

RDKit Mol 객체 다루기!!!¶

분자의 원자 개수 확인

In [ ]:
theNumOfAtoms = theMol.GetNumAtoms()
theNumOfAtoms
Out[ ]:
10

분자의 Bond 개수 확인

In [ ]:
theNumOfBonds = theMol.GetNumBonds()
theNumOfBonds
Out[ ]:
9

분자에 H원자 붙이기

In [ ]:
theMolWithHAtoms = Chem.AddHs(theMol)
theMolWithHAtoms
Out[ ]:
No description has been provided for this image

분자구조의 위치를 3차원 좌표값으로 만들기

In [ ]:
theEmbededMolWithHAtoms = Chem.AddHs(theMol)
AllChem.EmbedMolecule(theEmbededMolWithHAtoms)
print(Chem.MolToMolBlock(theEmbededMolWithHAtoms))
theEmbededMolWithHAtoms
1-Propoxyhexane
     RDKit          3D

 30 29  0  0  0  0  0  0  0  0999 V2000
    5.0450    0.0205   -0.2500 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.6184    0.1887    0.2668 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.7104   -0.7744   -0.4439 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2812   -0.6901   -0.0080 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7155    0.6718   -0.2508 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7113    0.8229    0.1635 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5880   -0.0201   -0.4723 O   0  0  0  0  0  0  0  0  0  0  0  0
   -2.8602    0.2477    0.0237 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9197   -0.6224   -0.6033 C   0  0  0  0  0  0  0  0  0  0  0  0
   -5.2430   -0.2291    0.0217 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.2396   -1.0595   -0.3418 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.1990    0.5994   -1.1805 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.7520    0.4428    0.5126 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.3505    1.2424    0.1689 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.6869   -0.0240    1.3564 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.7894   -0.6215   -1.5245 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.0580   -1.7945   -0.1797 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.7221   -1.3924   -0.6787 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.1322   -0.9645    1.0423 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.8959    1.0137   -1.2943 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.2841    1.4153    0.3761 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.9765    1.8994   -0.0534 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7982    0.7251    1.2462 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.1496    1.3010   -0.1627 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.9214    0.0744    1.1215 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9203   -0.5049   -1.7079 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.7756   -1.7050   -0.3312 H   0  0  0  0  0  0  0  0  0  0  0  0
   -5.5778    0.6775   -0.5523 H   0  0  0  0  0  0  0  0  0  0  0  0
   -5.9903   -1.0452   -0.0312 H   0  0  0  0  0  0  0  0  0  0  0  0
   -5.0484    0.1049    1.0527 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  4  1  0
  4  5  1  0
  5  6  1  0
  6  7  1  0
  7  8  1  0
  8  9  1  0
  9 10  1  0
  1 11  1  0
  1 12  1  0
  1 13  1  0
  2 14  1  0
  2 15  1  0
  3 16  1  0
  3 17  1  0
  4 18  1  0
  4 19  1  0
  5 20  1  0
  5 21  1  0
  6 22  1  0
  6 23  1  0
  8 24  1  0
  8 25  1  0
  9 26  1  0
  9 27  1  0
 10 28  1  0
 10 29  1  0
 10 30  1  0
M  END

Out[ ]:
No description has been provided for this image
In [ ]:
!pip install py3Dmol
Collecting py3Dmol
  Downloading py3dmol-2.5.3-py2.py3-none-any.whl.metadata (2.1 kB)
Downloading py3dmol-2.5.3-py2.py3-none-any.whl (7.2 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.5.3
In [ ]:
import py3Dmol

def show3DMol(theMol, style='stick'):
    mblock = Chem.MolToMolBlock(theMol)

    view = py3Dmol.view(width=400, height=400)
    view.addModel(mblock, 'mol')
    view.setStyle({style:{}})
    view.zoomTo()
    view.show()

def show3DMolWithOptimization(theMol, style='stick'):
    mol = Chem.AddHs(theMol)
    AllChem.EmbedMolecule(mol)
    AllChem.MMFFOptimizeMolecule(mol, maxIters=200)
    mblock = Chem.MolToMolBlock(mol)

    view = py3Dmol.view(width=400, height=400)
    view.addModel(mblock, 'mol')
    view.setStyle({style:{}})
    view.zoomTo()
    view.show()
In [ ]:
show3DMol(theMolWithHAtoms)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

In [ ]:
show3DMol(theEmbededMolWithHAtoms)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

In [ ]:
show3DMolWithOptimization(theEmbededMolWithHAtoms)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

In [ ]:
show3DMolWithOptimization(theMolWithHAtoms)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

분자구조를 2차원으로 만들기

In [ ]:
AllChem.Compute2DCoords(theMolWithHAtoms)
print(Chem.MolToMolBlock(theMolWithHAtoms))
theMolWithHAtoms
1-Propoxyhexane
     RDKit          2D

 30 29  0  0  0  0  0  0  0  0999 V2000
   -6.0666    0.8406    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.6309    0.4061    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.1952   -0.0283    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7595   -0.4628    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.3238   -0.8973    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.1119   -1.3318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5476   -1.7663    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.6417   -0.7402    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.7358    0.2860    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8300    1.3121    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -7.5023    1.2751    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -5.6321    2.2763    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -6.5011   -0.5951    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -5.0654   -1.0296    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -4.1964    1.8418    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.7607    1.4073    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.6297   -1.4640    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1940   -1.8985    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.3250    0.9729    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.1107    0.5384    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7583   -2.3330    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.6774   -2.7675    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.5464    0.1039    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.6156    0.3540    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.6678   -1.8343    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.7097    1.3801    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.7620   -0.8082    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    6.9241    2.3382    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    6.8561    0.2179    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.8039    2.4062    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  4  1  0
  4  5  1  0
  5  6  1  0
  6  7  1  0
  7  8  1  0
  8  9  1  0
  9 10  1  0
  1 11  1  0
  1 12  1  0
  1 13  1  0
  2 14  1  0
  2 15  1  0
  3 16  1  0
  3 17  1  0
  4 18  1  0
  4 19  1  0
  5 20  1  0
  5 21  1  0
  6 22  1  0
  6 23  1  0
  8 24  1  0
  8 25  1  0
  9 26  1  0
  9 27  1  0
 10 28  1  0
 10 29  1  0
 10 30  1  0
M  END

Out[ ]:
No description has been provided for this image

H원자 지우기

In [ ]:
theMol2 = Chem.RemoveHs(theMolWithHAtoms)
print(Chem.MolToMolBlock(theMol2))
theMol2
1-Propoxyhexane
     RDKit          2D

 10  9  0  0  0  0  0  0  0  0999 V2000
   -6.0666    0.8406    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.6309    0.4061    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.1952   -0.0283    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7595   -0.4628    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.3238   -0.8973    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.1119   -1.3318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5476   -1.7663    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.6417   -0.7402    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.7358    0.2860    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8300    1.3121    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  4  1  0
  4  5  1  0
  5  6  1  0
  6  7  1  0
  7  8  1  0
  8  9  1  0
  9 10  1  0
M  END

Out[ ]:
No description has been provided for this image

Atom과 Bond 다루기¶

개별 Atom 객체 가져오기

In [ ]:
theFirstAtomOfMol = theMol.GetAtomWithIdx(0)
theFirstAtomOfMol
Out[ ]:
<rdkit.Chem.rdchem.Atom at 0x7b0298bb2c00>
In [ ]:
theFirstAtomOfMol.GetAtomicNum()
Out[ ]:
6
In [ ]:
theFirstAtomOfMol.GetMass()
Out[ ]:
12.011
In [ ]:
theFirstAtomOfMol.GetSymbol()
Out[ ]:
'C'
In [ ]:
theNeighbors = theFirstAtomOfMol.GetNeighbors()
theNeighbors
Out[ ]:
(<rdkit.Chem.rdchem.Atom at 0x7b0298b28580>,)

원자번호 및 원소기호 출력

In [ ]:
#GetAtoms()

for index, ithAtom in enumerate(theMolWithHAtoms.GetAtoms()):
  print(str(index+1).zfill(2), '원자번호: {0}, 원소기호: {1}'.format(ithAtom.GetAtomicNum(), ithAtom.GetSymbol()))
01 원자번호: 6, 원소기호: C
02 원자번호: 6, 원소기호: C
03 원자번호: 6, 원소기호: C
04 원자번호: 6, 원소기호: C
05 원자번호: 6, 원소기호: C
06 원자번호: 6, 원소기호: C
07 원자번호: 8, 원소기호: O
08 원자번호: 6, 원소기호: C
09 원자번호: 6, 원소기호: C
10 원자번호: 6, 원소기호: C
11 원자번호: 1, 원소기호: H
12 원자번호: 1, 원소기호: H
13 원자번호: 1, 원소기호: H
14 원자번호: 1, 원소기호: H
15 원자번호: 1, 원소기호: H
16 원자번호: 1, 원소기호: H
17 원자번호: 1, 원소기호: H
18 원자번호: 1, 원소기호: H
19 원자번호: 1, 원소기호: H
20 원자번호: 1, 원소기호: H
21 원자번호: 1, 원소기호: H
22 원자번호: 1, 원소기호: H
23 원자번호: 1, 원소기호: H
24 원자번호: 1, 원소기호: H
25 원자번호: 1, 원소기호: H
26 원자번호: 1, 원소기호: H
27 원자번호: 1, 원소기호: H
28 원자번호: 1, 원소기호: H
29 원자번호: 1, 원소기호: H
30 원자번호: 1, 원소기호: H

개별 Bond 객체 가져오기

In [ ]:
theFirstBond = theMol.GetBondWithIdx(0)
theFirstBond
Out[ ]:
<rdkit.Chem.rdchem.Bond at 0x7b0298b29540>
In [ ]:
theFirstBond.GetBeginAtomIdx()
Out[ ]:
0
In [ ]:
theFirstBond.GetEndAtomIdx()
Out[ ]:
1
In [ ]:
theFirstBond.GetBondType()
Out[ ]:
rdkit.Chem.rdchem.BondType.SINGLE

Bond 정보 출력

In [ ]:
#GetBonds()

for index, ithBond in enumerate(theMolWithHAtoms.GetBonds()):
  print(str(index+1).zfill(2), '\t시작: {0}, 끝: {1}, Type: {2}'.format(
      str(ithBond.GetBeginAtomIdx()).zfill(2),
      str(ithBond.GetEndAtomIdx()).zfill(2),
      ithBond.GetBondType()))
01 	시작: 00, 끝: 01, Type: SINGLE
02 	시작: 01, 끝: 02, Type: SINGLE
03 	시작: 02, 끝: 03, Type: SINGLE
04 	시작: 03, 끝: 04, Type: SINGLE
05 	시작: 04, 끝: 05, Type: SINGLE
06 	시작: 05, 끝: 06, Type: SINGLE
07 	시작: 06, 끝: 07, Type: SINGLE
08 	시작: 07, 끝: 08, Type: SINGLE
09 	시작: 08, 끝: 09, Type: SINGLE
10 	시작: 00, 끝: 10, Type: SINGLE
11 	시작: 00, 끝: 11, Type: SINGLE
12 	시작: 00, 끝: 12, Type: SINGLE
13 	시작: 01, 끝: 13, Type: SINGLE
14 	시작: 01, 끝: 14, Type: SINGLE
15 	시작: 02, 끝: 15, Type: SINGLE
16 	시작: 02, 끝: 16, Type: SINGLE
17 	시작: 03, 끝: 17, Type: SINGLE
18 	시작: 03, 끝: 18, Type: SINGLE
19 	시작: 04, 끝: 19, Type: SINGLE
20 	시작: 04, 끝: 20, Type: SINGLE
21 	시작: 05, 끝: 21, Type: SINGLE
22 	시작: 05, 끝: 22, Type: SINGLE
23 	시작: 07, 끝: 23, Type: SINGLE
24 	시작: 07, 끝: 24, Type: SINGLE
25 	시작: 08, 끝: 25, Type: SINGLE
26 	시작: 08, 끝: 26, Type: SINGLE
27 	시작: 09, 끝: 27, Type: SINGLE
28 	시작: 09, 끝: 28, Type: SINGLE
29 	시작: 09, 끝: 29, Type: SINGLE

SMILES 코드 다루기¶

  • Chiral 표현
In [ ]:
theChiralMol = Chem.MolFromSmiles('C[C@H](O)c1ccccc1')
print(Chem.MolToMolBlock(theChiralMol))
theChiralMol
     RDKit          2D

  9  9  0  0  0  0  0  0  0  0999 V2000
    3.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  1
  2  3  1  0
  2  4  1  0
  4  5  2  0
  5  6  1  0
  6  7  2  0
  7  8  1  0
  8  9  2  0
  9  4  1  0
M  END

Out[ ]:
No description has been provided for this image
In [ ]:
show3DMol(theChiralMol)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

In [ ]:
show3DMolWithOptimization(theChiralMol)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Chiral 제거

In [ ]:
theRemovedChiralMolSmiles = Chem.MolToSmiles(theChiralMol,isomericSmiles=False)
theRemovedChiralMol = Chem.MolFromSmiles(theRemovedChiralMolSmiles)
print(Chem.MolToMolBlock(theRemovedChiralMol))
print(theRemovedChiralMolSmiles)
theRemovedChiralMol
     RDKit          2D

  9  9  0  0  0  0  0  0  0  0999 V2000
    3.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  2  4  1  0
  4  5  2  0
  5  6  1  0
  6  7  2  0
  7  8  1  0
  8  9  2  0
  9  4  1  0
M  END

CC(O)c1ccccc1
Out[ ]:
No description has been provided for this image
  • 기본 SMILES코드는 Canonical SMILES를 제공함
In [ ]:
print(Chem.MolToSmiles(Chem.MolFromSmiles('C1=CC=CN=C1')))
Chem.MolFromSmiles('C1=CC=CN=C1')
c1ccncc1
Out[ ]:
No description has been provided for this image
In [ ]:
print(Chem.MolToSmiles(Chem.MolFromSmiles('c1cccnc1')))
Chem.MolFromSmiles('c1cccnc1')
c1ccncc1
Out[ ]:
No description has been provided for this image
In [ ]:
print(Chem.MolToSmiles(Chem.MolFromSmiles('n1ccccc1')))
Chem.MolFromSmiles('n1ccccc1')
c1ccncc1
Out[ ]:
No description has been provided for this image

MDL SDF 파일 읽기 (Reading sets of molecules)¶

MDL SD 파일은 "Mol 파일 묶음 + 분자 속성"을 가진 파일입니다.

In [ ]:
from urllib.request import urlopen
theSdfUrl = 'https://raw.githubusercontent.com/youngmook/cheminfo-python/main/in-stock%2Bfor-sale.sdf'

with urlopen(theSdfUrl) as theStream:
  theSdf = theStream.read().decode()
  pass

print(theSdf.split('$$$$')[0])
     RDKit          3D

 25 28  0  0  0  0  0  0  0  0999 V2000
   -1.9187   -1.7530    0.7656 O   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5058   -0.7929    0.2316 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9590   -0.8400    0.1022 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.7112    0.2651   -0.1854 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.0778    0.2194   -0.3091 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.7477   -0.9842   -0.1404 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.0009   -2.1023    0.1500 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.6121   -2.0397    0.2720 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8376    0.3831   -0.2808 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.3843    1.6333   -0.3153 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7124    2.7450   -0.8045 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4384    2.5527   -1.2690 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.1658    1.3305   -1.2621 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5292    0.2366   -0.7684 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0654   -0.9881   -0.7639 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.2716   -1.4549   -1.1554 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.4622   -0.8591   -0.4642 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5676   -0.3829   -1.0172 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.5252    0.0899   -0.2664 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7454    0.6131   -0.7337 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.6695    1.0678    0.1409 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.5293    1.0695    1.5033 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.3151    0.5506    1.9759 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.3407    0.0729    1.0865 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.7774   -0.6324    1.1998 S   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0
  2  3  1  0
  3  4  2  0
  4  5  1  0
  5  6  2  0
  6  7  1  0
  7  8  2  0
  2  9  1  0
  9 10  2  0
 10 11  1  0
 11 12  2  0
 12 13  1  0
 13 14  2  0
 14 15  1  0
 15 16  1  0
 16 17  1  0
 17 18  2  0
 18 19  1  0
 19 20  2  0
 20 21  1  0
 21 22  2  0
 22 23  1  0
 23 24  2  0
 24 25  1  0
  8  3  1  0
 14  9  1  0
 25 17  1  0
 24 19  1  0
M  END
>  <zinc_id>  (1) 
ZINC000000035284

>  <smiles>  (1) 
O=C(c1ccccc1)c1ccccc1OCc1nc2ccccc2s1


In [ ]:
with open('in-stock+for-sale.sdf', 'w') as theWriter:
  theWriter.write(theSdf)
  pass
In [ ]:
theSDMolSupplier = Chem.SDMolSupplier('in-stock+for-sale.sdf')

theZincMolList = []

for ithMol in theSDMolSupplier :
  theZincMolList.append(ithMol)
  pass

theZincMolList[0:10]
Out[ ]:
[<rdkit.Chem.rdchem.Mol at 0x7b0298b2adc0>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2ae30>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2af10>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2af80>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2aff0>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2b140>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2b1b0>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2b220>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2b290>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298b2b370>]
In [ ]:
print(theZincMolList[0].GetProp("zinc_id"))
ZINC000000035284
In [ ]:
print(Chem.MolToMolBlock(theZincMolList[0]))
     RDKit          3D

 25 28  0  0  0  0  0  0  0  0999 V2000
   -1.9187   -1.7530    0.7656 O   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5058   -0.7929    0.2316 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9590   -0.8400    0.1022 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.7112    0.2651   -0.1854 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.0778    0.2194   -0.3091 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.7477   -0.9842   -0.1404 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.0009   -2.1023    0.1500 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.6121   -2.0397    0.2720 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8376    0.3831   -0.2808 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.3843    1.6333   -0.3153 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7124    2.7450   -0.8045 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4384    2.5527   -1.2690 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.1658    1.3305   -1.2621 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5292    0.2366   -0.7684 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0654   -0.9881   -0.7639 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.2716   -1.4549   -1.1554 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.4622   -0.8591   -0.4642 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5676   -0.3829   -1.0172 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.5252    0.0899   -0.2664 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.7454    0.6131   -0.7337 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.6695    1.0678    0.1409 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.5293    1.0695    1.5033 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.3151    0.5506    1.9759 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.3407    0.0729    1.0865 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.7774   -0.6324    1.1998 S   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0
  2  3  1  0
  3  4  2  0
  4  5  1  0
  5  6  2  0
  6  7  1  0
  7  8  2  0
  2  9  1  0
  9 10  2  0
 10 11  1  0
 11 12  2  0
 12 13  1  0
 13 14  2  0
 14 15  1  0
 15 16  1  0
 16 17  1  0
 17 18  2  0
 18 19  1  0
 19 20  2  0
 20 21  1  0
 21 22  2  0
 22 23  1  0
 23 24  2  0
 24 25  1  0
  8  3  1  0
 14  9  1  0
 25 17  1  0
 24 19  1  0
M  END

In [ ]:
theZincMolList[0]
Out[ ]:
zinc_idZINC000000035284
smilesO=C(c1ccccc1)c1ccccc1OCc1nc2ccccc2s1
In [ ]:
show3DMol(theZincMolList[0])

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

In [ ]:
show3DMolWithOptimization(theZincMolList[0])

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

분자 그림 파일 만들기¶

In [ ]:
from rdkit.Chem import Draw
In [ ]:
Draw.MolToFile(theZincMolList[0], 'zinc-001.png')
theZincMolList[0]
Out[ ]:
zinc_idZINC000000035284
smilesO=C(c1ccccc1)c1ccccc1OCc1nc2ccccc2s1

2D 구조 최적화 후 그림 저장

In [ ]:
import copy
theFirstZincMol = copy.deepcopy(theZincMolList[0])
AllChem.Compute2DCoords(theFirstZincMol)
Out[ ]:
0
In [ ]:
Draw.MolToFile(theFirstZincMol, 'zinc-001-2D.png')
theFirstZincMol
Out[ ]:
zinc_idZINC000000035284
smilesO=C(c1ccccc1)c1ccccc1OCc1nc2ccccc2s1

여러 분자를 Grid 형태로 저장

In [ ]:
theGridImage = Draw.MolsToGridImage(theZincMolList[:8],molsPerRow=4,subImgSize=(200,200),legends=[x.GetProp("zinc_id") for x in theZincMolList[:8]], returnPNG=False)
theGridImage.save('zinc-grid-001-008.png')
theGridImage
Out[ ]:
No description has been provided for this image
In [ ]:
for ithMol in theZincMolList:
  AllChem.Compute2DCoords(ithMol)
  pass

theGridImage = Draw.MolsToGridImage(theZincMolList[:8],molsPerRow=4,subImgSize=(200,200),legends=[x.GetProp("zinc_id") for x in theZincMolList[:8]], returnPNG=False)
theGridImage.save('zinc-grid-001-008-2D.png')
theGridImage
Out[ ]:
No description has been provided for this image

Sub structure를 가진 화합물 찾아 그림으로 저장

In [ ]:
theCommonCoreMol = Chem.MolFromSmiles('Nc1ccccc1')
theCommonCoreMol
Out[ ]:
No description has been provided for this image
In [ ]:
#theSubZincMolList = [x for x in theZincMolList if x.HasSubstructMatch(theCommonCoreMol)]

theSubMatchedMolList = []
for ithMol in theZincMolList:
  if (ithMol.HasSubstructMatch(theCommonCoreMol)):
    theSubMatchedMolList.append(ithMol)
    pass
  pass

print('# of total molecule list : ' + str(len(theZincMolList)))
print('# of matched molecules : ' + str(len(theSubMatchedMolList)))
# of total molecule list : 100
# of matched molecules : 48
In [ ]:
AllChem.Compute2DCoords(theCommonCoreMol)

for ithMatchedMol in theSubMatchedMolList:
  _ = AllChem.GenerateDepictionMatching2DStructure(ithMatchedMol,theCommonCoreMol)
In [ ]:
theMatchedGridImage = Draw.MolsToGridImage(theSubMatchedMolList[:12],molsPerRow=4,subImgSize=(300,300),legends=[x.GetProp("zinc_id") for x in theSubMatchedMolList], returnPNG=False)
theMatchedGridImage.save('zinc-matched-grid.png')
theMatchedGridImage
Out[ ]:
No description has been provided for this image

Substructure 검색¶

In [ ]:
theMolecule = Chem.MolFromSmiles('c1ccccc1O')
theMolecule
Out[ ]:
No description has been provided for this image
In [ ]:
thePattern = Chem.MolFromSmarts('ccO')
thePattern
Out[ ]:
No description has been provided for this image
In [ ]:
theMolecule.HasSubstructMatch(thePattern)
Out[ ]:
True
In [ ]:
theMolecule.GetSubstructMatch(thePattern)
Out[ ]:
(0, 5, 6)
In [ ]:
theMolecule.GetSubstructMatches(thePattern)
Out[ ]:
((0, 5, 6), (4, 5, 6))
In [ ]:
theMatchedMolList = []
for ithZincMol in theZincMolList:
  if ithZincMol.HasSubstructMatch(thePattern):
    theMatchedMolList.append(ithZincMol)

print(len(theMatchedMolList))
41
In [ ]:
for ithMol in theMatchedMolList:
  AllChem.Compute2DCoords(ithMol)
  pass

theGridImage = Draw.MolsToGridImage(theMatchedMolList,molsPerRow=6,subImgSize=(200,200),legends=[x.GetProp("zinc_id") for x in theMatchedMolList], returnPNG=False)
theGridImage.save('zinc-substr-matched-grid-2D.png')
theGridImage
Out[ ]:
No description has been provided for this image

Chemical Transformations¶

Substructure-based Transformations

  • Deleting substructure
In [ ]:
theMol = Chem.MolFromSmiles('CC(=O)O')
theMol
Out[ ]:
No description has been provided for this image
In [ ]:
thePattern = Chem.MolFromSmarts('C(=O)[OH]')
thePattern
Out[ ]:
No description has been provided for this image
In [ ]:
theRemovedMol = AllChem.DeleteSubstructs(theMol,thePattern)
theRemovedMol
Out[ ]:
No description has been provided for this image
  • Replacing substructure
In [ ]:
theReplaceMol = Chem.MolFromSmiles('OC')
theReplaceMol
Out[ ]:
No description has been provided for this image
In [ ]:
thePattern = Chem.MolFromSmarts('[$(NC(=O))]')
thePattern
Out[ ]:
No description has been provided for this image
In [ ]:
theMol = Chem.MolFromSmiles('CC(=O)N')
theMol
Out[ ]:
No description has been provided for this image
In [ ]:
AllChem.ReplaceSubstructs(theMol,thePattern,theReplaceMol)[0]
Out[ ]:
No description has been provided for this image
In [ ]:
 

Fingerprinting and Molecular Similarity¶

In [ ]:
from rdkit import DataStructs

첫번째 분자와 나머지 분자의 유사도 계산

In [ ]:
theFingerprintList = [Chem.RDKFingerprint(x) for x in theZincMolList]
for idx, ithFingerprint in enumerate(theFingerprintList):
  if idx == 0 : continue
  ithSimilarity = DataStructs.FingerprintSimilarity(theFingerprintList[0], theFingerprintList[idx])
  print(idx, ithSimilarity)
1 0.16019417475728157
2 0.32099758648431215
3 0.23923923923923923
4 0.17231075697211157
5 0.24149659863945577
6 0.29102384291725103
7 0.20123565754633715
8 0.1726479146459748
9 0.2333984375
10 0.21395348837209302
11 0.3170266836086404
12 0.24427480916030533
13 0.24572317262830481
14 0.14798694232861806
15 0.1914257228315055
16 0.2785425101214575
17 0.3434547908232119
18 0.30604982206405695
19 0.32371794871794873
20 0.3335419274092616
21 0.2857142857142857
22 0.24226415094339623
23 0.3796825396825397
24 0.24062772449869224
25 0.3014065639651708
26 0.21099290780141844
27 0.22090059473237042
28 0.30455153949129854
29 0.3102766798418972
30 0.2858187134502924
31 0.34825174825174826
32 0.2917547568710359
33 0.2869496855345912
34 0.29353562005277045
35 0.2564575645756458
36 0.3215767634854772
37 0.3315614617940199
38 0.27180966113914923
39 0.29862306368330466
40 0.33941605839416056
41 0.2719298245614035
42 0.267221801665405
43 0.3039426523297491
44 0.3253652058432935
45 0.3108359133126935
46 0.2939501779359431
47 0.2525096525096525
48 0.24326833797585887
49 0.29554043839758126
50 0.32117920868890615
51 0.29735849056603775
52 0.29183187946074546
53 0.34238683127572017
54 0.21972318339100347
55 0.24272727272727274
56 0.24606462303231152
57 0.32234432234432236
58 0.058385093167701865
59 0.252129471890971
60 0.31554677206851117
61 0.3116076970825574
62 0.30514939605848695
63 0.2858176555716353
64 0.223717409587889
65 0.36026200873362446
66 0.30659767141009053
67 0.28471683475562454
68 0.3298835705045278
69 0.3027834351663272
70 0.3059467918622848
71 0.30463096960926195
72 0.2531752751905165
73 0.2899860917941586
74 0.30421909696521093
75 0.28652886671418387
76 0.2664601084430674
77 0.2745961820851689
78 0.29927007299270075
79 0.3292367399741268
80 0.2841880341880342
81 0.29423328964613366
82 0.29322813938198555
83 0.2669456066945607
84 0.31485148514851485
85 0.2791762013729977
86 0.25040387722132473
87 0.29968203497615264
88 0.2783053323593864
89 0.21894005212858383
90 0.19038817005545286
91 0.20425138632162662
92 0.19718309859154928
93 0.2281144781144781
94 0.13908872901678657
95 0.10610932475884244
96 0.10588235294117647
97 0.22489626556016598
98 0.11818181818181818
99 0.3115610711952972

Fingerprint 이미지 만들기

In [ ]:
from rdkit.Chem import Draw
mol = Chem.MolFromSmiles('c1ccccc1CC1CC1')
bi = {}
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, bitInfo=bi)
bi[872]
[14:08:18] DEPRECATION WARNING: please use MorganGenerator
Out[ ]:
((6, 2),)
In [ ]:
mfp2_svg = Draw.DrawMorganBit(mol, 872, bi, useSVG=True)
mfp2_svg
Out[ ]:
No description has been provided for this image
In [ ]:
rdkbi = {}
rdkfp = Chem.RDKFingerprint(mol, maxPath=5, bitInfo=rdkbi)
rdkbi[1553]
Out[ ]:
[[0, 1, 9, 5, 4], [2, 3, 4, 9, 5]]
In [ ]:
rdk_svg = Draw.DrawRDKitBit(mol, 1553, rdkbi, useSVG=True)
rdk_svg
Out[ ]:
No description has been provided for this image
In [ ]:
 
In [ ]:
import requests
theLogSDataFileUrl = 'https://raw.githubusercontent.com/youngmook/cheminfo-python/main/logS-data.sdf'

theResponse = requests.get(theLogSDataFileUrl, allow_redirects=True)
with open('logS-data.sdf', 'wb') as theWriter:
  theWriter.write(theResponse.content)
In [ ]:
theSDMolSupplier = Chem.SDMolSupplier('logS-data.sdf')

theMolList = []

for ithMol in theSDMolSupplier :
  theMolList.append(ithMol)
  pass

theMolList[0:10]
Out[ ]:
[<rdkit.Chem.rdchem.Mol at 0x7b0298c7dd20>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7dd90>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7de00>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7de70>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7dee0>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7df50>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7dfc0>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7e030>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7e0a0>,
 <rdkit.Chem.rdchem.Mol at 0x7b0298c7e110>]
In [ ]:
from rdkit import Chem
from rdkit.Chem.EState import Fingerprinter
from rdkit.Chem import Descriptors
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Activation
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import (
    Dense,
    Dropout,
)
In [ ]:
def molToFingerprintList(mol):
  return np.append(Fingerprinter.FingerprintMol(mol)[0],Descriptors.MolWt(mol))
In [ ]:
X = []
y = []
for ithMol in theMolList:
  X.append(molToFingerprintList(ithMol))
  y.append(float(ithMol.GetProp('logS')))
X = np.array(X)
y = np.array(y)
X
Out[ ]:
array([[  0.   ,   0.   ,   0.   , ...,   0.   ,   0.   , 665.733],
       [  0.   ,   0.   ,   0.   , ...,   0.   ,   0.   , 589.64 ],
       [  0.   ,   0.   ,   0.   , ...,   0.   ,   0.   , 528.582],
       ...,
       [  0.   ,   0.   ,   0.   , ...,   0.   ,   0.   , 206.266],
       [  0.   ,   0.   ,   0.   , ...,   0.   ,   0.   , 218.321],
       [  0.   ,   0.   ,   0.   , ...,   0.   ,   0.   , 141.086]])
In [ ]:
theStandardScaler = StandardScaler()
X= theStandardScaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
In [ ]:
X_train
Out[ ]:
array([[ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  1.99449515],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.36520251],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.08054434],
       ...,
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        , -0.86357368],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.62097486],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        , -1.04418619]])
In [ ]:
model = Sequential()
model.add(Dense(units=512, activation='relu', input_shape=(X.shape[1],)))
model.add(Dense(units = 512, activation='relu'))
model.add(Dense(units = 1024, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units = 256, activation='relu'))
model.add(Dense(units = 1024, activation='relu'))
model.add(Dense(units = 512, activation='relu'))
model.add(Dense(units = 1, activation='linear'))
/usr/local/lib/python3.12/dist-packages/keras/src/layers/core/dense.py:93: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
In [ ]:
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 512)            │        41,472 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 512)            │       262,656 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1024)           │       525,312 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 1024)           │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 256)            │       262,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_4 (Dense)                 │ (None, 1024)           │       263,168 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_5 (Dense)                 │ (None, 512)            │       524,800 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_6 (Dense)                 │ (None, 1)              │           513 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 1,880,321 (7.17 MB)
 Trainable params: 1,880,321 (7.17 MB)
 Non-trainable params: 0 (0.00 B)
In [ ]:
model.compile(loss='mean_squared_error', optimizer=SGD(learning_rate=0.001, momentum=0.9, nesterov=True))
In [ ]:
#history = model.fit(X_train, y_train, nb_epoch=500, batch_size=32)
history = model.fit(
    X_train, y_train, epochs=50, verbose=1, validation_data=(X_test, y_test)
)
Epoch 1/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 6s 75ms/step - loss: 7.4712 - val_loss: 1.9101
Epoch 2/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 1.6193 - val_loss: 1.1393
Epoch 3/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.8356 - val_loss: 0.9632
Epoch 4/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.7753 - val_loss: 0.8497
Epoch 5/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.5801 - val_loss: 0.7433
Epoch 6/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.4566 - val_loss: 0.7036
Epoch 7/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.4438 - val_loss: 0.6493
Epoch 8/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.3776 - val_loss: 0.6636
Epoch 9/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.3570 - val_loss: 0.6118
Epoch 10/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.3327 - val_loss: 0.5915
Epoch 11/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2897 - val_loss: 0.6221
Epoch 12/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.3153 - val_loss: 0.5269
Epoch 13/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.2601 - val_loss: 0.5467
Epoch 14/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.2535 - val_loss: 0.5301
Epoch 15/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.2185 - val_loss: 0.5017
Epoch 16/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2311 - val_loss: 0.5127
Epoch 17/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2181 - val_loss: 0.5086
Epoch 18/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2118 - val_loss: 0.5184
Epoch 19/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1916 - val_loss: 0.5086
Epoch 20/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2086 - val_loss: 0.5394
Epoch 21/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2007 - val_loss: 0.5155
Epoch 22/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1812 - val_loss: 0.4820
Epoch 23/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1628 - val_loss: 0.5074
Epoch 24/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1443 - val_loss: 0.5006
Epoch 25/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1700 - val_loss: 0.4729
Epoch 26/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1354 - val_loss: 0.4841
Epoch 27/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1425 - val_loss: 0.4729
Epoch 28/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1491 - val_loss: 0.4687
Epoch 29/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1315 - val_loss: 0.4994
Epoch 30/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1313 - val_loss: 0.4781
Epoch 31/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1353 - val_loss: 0.4767
Epoch 32/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1280 - val_loss: 0.4755
Epoch 33/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1364 - val_loss: 0.4851
Epoch 34/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1332 - val_loss: 0.4583
Epoch 35/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1248 - val_loss: 0.4679
Epoch 36/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1036 - val_loss: 0.4884
Epoch 37/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1164 - val_loss: 0.4797
Epoch 38/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1217 - val_loss: 0.4701
Epoch 39/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1134 - val_loss: 0.4780
Epoch 40/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1132 - val_loss: 0.4636
Epoch 41/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1280 - val_loss: 0.4719
Epoch 42/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1108 - val_loss: 0.4670
Epoch 43/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1124 - val_loss: 0.4845
Epoch 44/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1194 - val_loss: 0.4615
Epoch 45/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1115 - val_loss: 0.4800
Epoch 46/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1030 - val_loss: 0.4790
Epoch 47/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.1060 - val_loss: 0.4876
Epoch 48/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.0991 - val_loss: 0.4826
Epoch 49/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.1005 - val_loss: 0.4735
Epoch 50/50
33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.0890 - val_loss: 0.4874
In [ ]:
import matplotlib.pyplot as plt

plt.scatter(y_train,model.predict(X_train), label = 'Train', c='blue')
plt.title('Neural Network Predictor')
plt.xlabel('Measured Solubility')
plt.ylabel('Predicted Solubility')
plt.scatter(y_test,model.predict(X_test),c='lightgreen', label='Test', alpha = 0.8)
plt.legend(loc=4)
plt.show()
33/33 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step
No description has been provided for this image
In [ ]:
 
In [ ]: