RDKit (Cheminformatics tool) 설치¶
!pip install rdkit
Requirement already satisfied: rdkit in /usr/local/lib/python3.12/dist-packages (2025.9.1) Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from rdkit) (2.0.2) Requirement already satisfied: Pillow in /usr/local/lib/python3.12/dist-packages (from rdkit) (11.3.0)
RDKit 불러오기¶
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
MDL Mol 파일을 읽고 쓰기를 해보자¶
1-Propoxyhexane 분자를 만들어 보자 (SMILES 코드를 이용하자)
theMol = Chem.MolFromSmiles('CCCCCCOCCC')
theMol
MDL Molfile 문자열 만들기
theMolBlock = Chem.MolToMolBlock(theMol)
print(theMolBlock)
RDKit 2D
10 9 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.5981 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.8971 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
5.1962 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
6.4952 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
7.7942 -0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
9.0933 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
10.3923 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
11.6913 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 0
6 7 1 0
7 8 1 0
8 9 1 0
9 10 1 0
M END
분자 이름 변경하기
theMol.SetProp('_Name','1-Propoxyhexane')
print(Chem.MolToMolBlock(theMol))
1-Propoxyhexane
RDKit 2D
10 9 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.5981 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.8971 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
5.1962 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
6.4952 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
7.7942 -0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
9.0933 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
10.3923 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
11.6913 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 0
6 7 1 0
7 8 1 0
8 9 1 0
9 10 1 0
M END
분자를 MDL Mol파일로 저장하기
theMolName = '1-Propoxyhexane.mol'
print(Chem.MolToMolBlock(theMol),file=open(theMolName, 'w+'))
MDL Mol 파일 불러오기
theAnotherMol = Chem.MolFromMolFile(theMolName)
theAnotherMol
잘못된 분자구조를 읽어오려고 하면, 오류 메세지와 함께 Mol 객체는 None을 나타냄
theInvalidMolecule1 = Chem.MolFromSmiles('CO(C)C')
theInvalidMolecule1 is None
[14:08:05] Explicit valence for atom # 1 O, 3, is greater than permitted
True
잘못된 분자구조를 읽어오려고 하면, 오류 메세지와 함께 Mol 객체는 None을 나타냄 (Kekulize 오류)
theInvalidMolecule1 = Chem.MolFromSmiles('c1cc1')
theInvalidMolecule1 is None
[14:08:05] Can't kekulize mol. Unkekulized atoms: 0 1 2
True
RDKit Mol 객체 다루기!!!¶
분자의 원자 개수 확인
theNumOfAtoms = theMol.GetNumAtoms()
theNumOfAtoms
10
분자의 Bond 개수 확인
theNumOfBonds = theMol.GetNumBonds()
theNumOfBonds
9
분자에 H원자 붙이기
theMolWithHAtoms = Chem.AddHs(theMol)
theMolWithHAtoms
분자구조의 위치를 3차원 좌표값으로 만들기
theEmbededMolWithHAtoms = Chem.AddHs(theMol)
AllChem.EmbedMolecule(theEmbededMolWithHAtoms)
print(Chem.MolToMolBlock(theEmbededMolWithHAtoms))
theEmbededMolWithHAtoms
1-Propoxyhexane
RDKit 3D
30 29 0 0 0 0 0 0 0 0999 V2000
5.0450 0.0205 -0.2500 C 0 0 0 0 0 0 0 0 0 0 0 0
3.6184 0.1887 0.2668 C 0 0 0 0 0 0 0 0 0 0 0 0
2.7104 -0.7744 -0.4439 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2812 -0.6901 -0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7155 0.6718 -0.2508 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7113 0.8229 0.1635 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.5880 -0.0201 -0.4723 O 0 0 0 0 0 0 0 0 0 0 0 0
-2.8602 0.2477 0.0237 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.9197 -0.6224 -0.6033 C 0 0 0 0 0 0 0 0 0 0 0 0
-5.2430 -0.2291 0.0217 C 0 0 0 0 0 0 0 0 0 0 0 0
5.2396 -1.0595 -0.3418 H 0 0 0 0 0 0 0 0 0 0 0 0
5.1990 0.5994 -1.1805 H 0 0 0 0 0 0 0 0 0 0 0 0
5.7520 0.4428 0.5126 H 0 0 0 0 0 0 0 0 0 0 0 0
3.3505 1.2424 0.1689 H 0 0 0 0 0 0 0 0 0 0 0 0
3.6869 -0.0240 1.3564 H 0 0 0 0 0 0 0 0 0 0 0 0
2.7894 -0.6215 -1.5245 H 0 0 0 0 0 0 0 0 0 0 0 0
3.0580 -1.7945 -0.1797 H 0 0 0 0 0 0 0 0 0 0 0 0
0.7221 -1.3924 -0.6787 H 0 0 0 0 0 0 0 0 0 0 0 0
1.1322 -0.9645 1.0423 H 0 0 0 0 0 0 0 0 0 0 0 0
0.8959 1.0137 -1.2943 H 0 0 0 0 0 0 0 0 0 0 0 0
1.2841 1.4153 0.3761 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.9765 1.8994 -0.0534 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.7982 0.7251 1.2462 H 0 0 0 0 0 0 0 0 0 0 0 0
-3.1496 1.3010 -0.1627 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.9214 0.0744 1.1215 H 0 0 0 0 0 0 0 0 0 0 0 0
-3.9203 -0.5049 -1.7079 H 0 0 0 0 0 0 0 0 0 0 0 0
-3.7756 -1.7050 -0.3312 H 0 0 0 0 0 0 0 0 0 0 0 0
-5.5778 0.6775 -0.5523 H 0 0 0 0 0 0 0 0 0 0 0 0
-5.9903 -1.0452 -0.0312 H 0 0 0 0 0 0 0 0 0 0 0 0
-5.0484 0.1049 1.0527 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 0
6 7 1 0
7 8 1 0
8 9 1 0
9 10 1 0
1 11 1 0
1 12 1 0
1 13 1 0
2 14 1 0
2 15 1 0
3 16 1 0
3 17 1 0
4 18 1 0
4 19 1 0
5 20 1 0
5 21 1 0
6 22 1 0
6 23 1 0
8 24 1 0
8 25 1 0
9 26 1 0
9 27 1 0
10 28 1 0
10 29 1 0
10 30 1 0
M END
!pip install py3Dmol
Collecting py3Dmol Downloading py3dmol-2.5.3-py2.py3-none-any.whl.metadata (2.1 kB) Downloading py3dmol-2.5.3-py2.py3-none-any.whl (7.2 kB) Installing collected packages: py3Dmol Successfully installed py3Dmol-2.5.3
import py3Dmol
def show3DMol(theMol, style='stick'):
mblock = Chem.MolToMolBlock(theMol)
view = py3Dmol.view(width=400, height=400)
view.addModel(mblock, 'mol')
view.setStyle({style:{}})
view.zoomTo()
view.show()
def show3DMolWithOptimization(theMol, style='stick'):
mol = Chem.AddHs(theMol)
AllChem.EmbedMolecule(mol)
AllChem.MMFFOptimizeMolecule(mol, maxIters=200)
mblock = Chem.MolToMolBlock(mol)
view = py3Dmol.view(width=400, height=400)
view.addModel(mblock, 'mol')
view.setStyle({style:{}})
view.zoomTo()
view.show()
show3DMol(theMolWithHAtoms)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
show3DMol(theEmbededMolWithHAtoms)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
show3DMolWithOptimization(theEmbededMolWithHAtoms)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
show3DMolWithOptimization(theMolWithHAtoms)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
분자구조를 2차원으로 만들기
AllChem.Compute2DCoords(theMolWithHAtoms)
print(Chem.MolToMolBlock(theMolWithHAtoms))
theMolWithHAtoms
1-Propoxyhexane
RDKit 2D
30 29 0 0 0 0 0 0 0 0999 V2000
-6.0666 0.8406 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.6309 0.4061 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.1952 -0.0283 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.7595 -0.4628 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3238 -0.8973 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.1119 -1.3318 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.5476 -1.7663 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3.6417 -0.7402 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.7358 0.2860 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
5.8300 1.3121 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-7.5023 1.2751 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-5.6321 2.2763 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-6.5011 -0.5951 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-5.0654 -1.0296 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-4.1964 1.8418 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.7607 1.4073 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-3.6297 -1.4640 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.1940 -1.8985 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-1.3250 0.9729 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
0.1107 0.5384 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.7583 -2.3330 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
0.6774 -2.7675 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1.5464 0.1039 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.6156 0.3540 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.6678 -1.8343 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
3.7097 1.3801 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
5.7620 -0.8082 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
6.9241 2.3382 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
6.8561 0.2179 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.8039 2.4062 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 0
6 7 1 0
7 8 1 0
8 9 1 0
9 10 1 0
1 11 1 0
1 12 1 0
1 13 1 0
2 14 1 0
2 15 1 0
3 16 1 0
3 17 1 0
4 18 1 0
4 19 1 0
5 20 1 0
5 21 1 0
6 22 1 0
6 23 1 0
8 24 1 0
8 25 1 0
9 26 1 0
9 27 1 0
10 28 1 0
10 29 1 0
10 30 1 0
M END
H원자 지우기
theMol2 = Chem.RemoveHs(theMolWithHAtoms)
print(Chem.MolToMolBlock(theMol2))
theMol2
1-Propoxyhexane
RDKit 2D
10 9 0 0 0 0 0 0 0 0999 V2000
-6.0666 0.8406 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.6309 0.4061 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.1952 -0.0283 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.7595 -0.4628 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3238 -0.8973 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.1119 -1.3318 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.5476 -1.7663 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3.6417 -0.7402 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.7358 0.2860 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
5.8300 1.3121 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 0
6 7 1 0
7 8 1 0
8 9 1 0
9 10 1 0
M END
Atom과 Bond 다루기¶
개별 Atom 객체 가져오기
theFirstAtomOfMol = theMol.GetAtomWithIdx(0)
theFirstAtomOfMol
<rdkit.Chem.rdchem.Atom at 0x7b0298bb2c00>
theFirstAtomOfMol.GetAtomicNum()
6
theFirstAtomOfMol.GetMass()
12.011
theFirstAtomOfMol.GetSymbol()
'C'
theNeighbors = theFirstAtomOfMol.GetNeighbors()
theNeighbors
(<rdkit.Chem.rdchem.Atom at 0x7b0298b28580>,)
원자번호 및 원소기호 출력
#GetAtoms()
for index, ithAtom in enumerate(theMolWithHAtoms.GetAtoms()):
print(str(index+1).zfill(2), '원자번호: {0}, 원소기호: {1}'.format(ithAtom.GetAtomicNum(), ithAtom.GetSymbol()))
01 원자번호: 6, 원소기호: C 02 원자번호: 6, 원소기호: C 03 원자번호: 6, 원소기호: C 04 원자번호: 6, 원소기호: C 05 원자번호: 6, 원소기호: C 06 원자번호: 6, 원소기호: C 07 원자번호: 8, 원소기호: O 08 원자번호: 6, 원소기호: C 09 원자번호: 6, 원소기호: C 10 원자번호: 6, 원소기호: C 11 원자번호: 1, 원소기호: H 12 원자번호: 1, 원소기호: H 13 원자번호: 1, 원소기호: H 14 원자번호: 1, 원소기호: H 15 원자번호: 1, 원소기호: H 16 원자번호: 1, 원소기호: H 17 원자번호: 1, 원소기호: H 18 원자번호: 1, 원소기호: H 19 원자번호: 1, 원소기호: H 20 원자번호: 1, 원소기호: H 21 원자번호: 1, 원소기호: H 22 원자번호: 1, 원소기호: H 23 원자번호: 1, 원소기호: H 24 원자번호: 1, 원소기호: H 25 원자번호: 1, 원소기호: H 26 원자번호: 1, 원소기호: H 27 원자번호: 1, 원소기호: H 28 원자번호: 1, 원소기호: H 29 원자번호: 1, 원소기호: H 30 원자번호: 1, 원소기호: H
개별 Bond 객체 가져오기
theFirstBond = theMol.GetBondWithIdx(0)
theFirstBond
<rdkit.Chem.rdchem.Bond at 0x7b0298b29540>
theFirstBond.GetBeginAtomIdx()
0
theFirstBond.GetEndAtomIdx()
1
theFirstBond.GetBondType()
rdkit.Chem.rdchem.BondType.SINGLE
Bond 정보 출력
#GetBonds()
for index, ithBond in enumerate(theMolWithHAtoms.GetBonds()):
print(str(index+1).zfill(2), '\t시작: {0}, 끝: {1}, Type: {2}'.format(
str(ithBond.GetBeginAtomIdx()).zfill(2),
str(ithBond.GetEndAtomIdx()).zfill(2),
ithBond.GetBondType()))
01 시작: 00, 끝: 01, Type: SINGLE 02 시작: 01, 끝: 02, Type: SINGLE 03 시작: 02, 끝: 03, Type: SINGLE 04 시작: 03, 끝: 04, Type: SINGLE 05 시작: 04, 끝: 05, Type: SINGLE 06 시작: 05, 끝: 06, Type: SINGLE 07 시작: 06, 끝: 07, Type: SINGLE 08 시작: 07, 끝: 08, Type: SINGLE 09 시작: 08, 끝: 09, Type: SINGLE 10 시작: 00, 끝: 10, Type: SINGLE 11 시작: 00, 끝: 11, Type: SINGLE 12 시작: 00, 끝: 12, Type: SINGLE 13 시작: 01, 끝: 13, Type: SINGLE 14 시작: 01, 끝: 14, Type: SINGLE 15 시작: 02, 끝: 15, Type: SINGLE 16 시작: 02, 끝: 16, Type: SINGLE 17 시작: 03, 끝: 17, Type: SINGLE 18 시작: 03, 끝: 18, Type: SINGLE 19 시작: 04, 끝: 19, Type: SINGLE 20 시작: 04, 끝: 20, Type: SINGLE 21 시작: 05, 끝: 21, Type: SINGLE 22 시작: 05, 끝: 22, Type: SINGLE 23 시작: 07, 끝: 23, Type: SINGLE 24 시작: 07, 끝: 24, Type: SINGLE 25 시작: 08, 끝: 25, Type: SINGLE 26 시작: 08, 끝: 26, Type: SINGLE 27 시작: 09, 끝: 27, Type: SINGLE 28 시작: 09, 끝: 28, Type: SINGLE 29 시작: 09, 끝: 29, Type: SINGLE
SMILES 코드 다루기¶
- Chiral 표현
theChiralMol = Chem.MolFromSmiles('C[C@H](O)c1ccccc1')
print(Chem.MolToMolBlock(theChiralMol))
theChiralMol
RDKit 2D
9 9 0 0 0 0 0 0 0 0999 V2000
3.7500 -1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7500 -1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7500 -1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7500 1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7500 1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1
2 3 1 0
2 4 1 0
4 5 2 0
5 6 1 0
6 7 2 0
7 8 1 0
8 9 2 0
9 4 1 0
M END
show3DMol(theChiralMol)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
show3DMolWithOptimization(theChiralMol)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Chiral 제거
theRemovedChiralMolSmiles = Chem.MolToSmiles(theChiralMol,isomericSmiles=False)
theRemovedChiralMol = Chem.MolFromSmiles(theRemovedChiralMolSmiles)
print(Chem.MolToMolBlock(theRemovedChiralMol))
print(theRemovedChiralMolSmiles)
theRemovedChiralMol
RDKit 2D
9 9 0 0 0 0 0 0 0 0999 V2000
3.7500 -1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7500 -1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7500 -1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7500 1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7500 1.2990 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
2 4 1 0
4 5 2 0
5 6 1 0
6 7 2 0
7 8 1 0
8 9 2 0
9 4 1 0
M END
CC(O)c1ccccc1
- 기본 SMILES코드는 Canonical SMILES를 제공함
print(Chem.MolToSmiles(Chem.MolFromSmiles('C1=CC=CN=C1')))
Chem.MolFromSmiles('C1=CC=CN=C1')
c1ccncc1
print(Chem.MolToSmiles(Chem.MolFromSmiles('c1cccnc1')))
Chem.MolFromSmiles('c1cccnc1')
c1ccncc1
print(Chem.MolToSmiles(Chem.MolFromSmiles('n1ccccc1')))
Chem.MolFromSmiles('n1ccccc1')
c1ccncc1
MDL SDF 파일 읽기 (Reading sets of molecules)¶
MDL SD 파일은 "Mol 파일 묶음 + 분자 속성"을 가진 파일입니다.
from urllib.request import urlopen
theSdfUrl = 'https://raw.githubusercontent.com/youngmook/cheminfo-python/main/in-stock%2Bfor-sale.sdf'
with urlopen(theSdfUrl) as theStream:
theSdf = theStream.read().decode()
pass
print(theSdf.split('$$$$')[0])
RDKit 3D
25 28 0 0 0 0 0 0 0 0999 V2000
-1.9187 -1.7530 0.7656 O 0 0 0 0 0 0 0 0 0 0 0 0
-2.5058 -0.7929 0.2316 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.9590 -0.8400 0.1022 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.7112 0.2651 -0.1854 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.0778 0.2194 -0.3091 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.7477 -0.9842 -0.1404 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.0009 -2.1023 0.1500 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.6121 -2.0397 0.2720 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.8376 0.3831 -0.2808 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.3843 1.6333 -0.3153 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.7124 2.7450 -0.8045 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.4384 2.5527 -1.2690 C 0 0 0 0 0 0 0 0 0 0 0 0
0.1658 1.3305 -1.2621 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5292 0.2366 -0.7684 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0654 -0.9881 -0.7639 O 0 0 0 0 0 0 0 0 0 0 0 0
1.2716 -1.4549 -1.1554 C 0 0 0 0 0 0 0 0 0 0 0 0
2.4622 -0.8591 -0.4642 C 0 0 0 0 0 0 0 0 0 0 0 0
3.5676 -0.3829 -1.0172 N 0 0 0 0 0 0 0 0 0 0 0 0
4.5252 0.0899 -0.2664 C 0 0 0 0 0 0 0 0 0 0 0 0
5.7454 0.6131 -0.7337 C 0 0 0 0 0 0 0 0 0 0 0 0
6.6695 1.0678 0.1409 C 0 0 0 0 0 0 0 0 0 0 0 0
6.5293 1.0695 1.5033 C 0 0 0 0 0 0 0 0 0 0 0 0
5.3151 0.5506 1.9759 C 0 0 0 0 0 0 0 0 0 0 0 0
4.3407 0.0729 1.0865 C 0 0 0 0 0 0 0 0 0 0 0 0
2.7774 -0.6324 1.1998 S 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0
2 3 1 0
3 4 2 0
4 5 1 0
5 6 2 0
6 7 1 0
7 8 2 0
2 9 1 0
9 10 2 0
10 11 1 0
11 12 2 0
12 13 1 0
13 14 2 0
14 15 1 0
15 16 1 0
16 17 1 0
17 18 2 0
18 19 1 0
19 20 2 0
20 21 1 0
21 22 2 0
22 23 1 0
23 24 2 0
24 25 1 0
8 3 1 0
14 9 1 0
25 17 1 0
24 19 1 0
M END
> <zinc_id> (1)
ZINC000000035284
> <smiles> (1)
O=C(c1ccccc1)c1ccccc1OCc1nc2ccccc2s1
with open('in-stock+for-sale.sdf', 'w') as theWriter:
theWriter.write(theSdf)
pass
theSDMolSupplier = Chem.SDMolSupplier('in-stock+for-sale.sdf')
theZincMolList = []
for ithMol in theSDMolSupplier :
theZincMolList.append(ithMol)
pass
theZincMolList[0:10]
[<rdkit.Chem.rdchem.Mol at 0x7b0298b2adc0>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2ae30>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2af10>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2af80>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2aff0>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2b140>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2b1b0>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2b220>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2b290>, <rdkit.Chem.rdchem.Mol at 0x7b0298b2b370>]
print(theZincMolList[0].GetProp("zinc_id"))
ZINC000000035284
print(Chem.MolToMolBlock(theZincMolList[0]))
RDKit 3D
25 28 0 0 0 0 0 0 0 0999 V2000
-1.9187 -1.7530 0.7656 O 0 0 0 0 0 0 0 0 0 0 0 0
-2.5058 -0.7929 0.2316 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.9590 -0.8400 0.1022 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.7112 0.2651 -0.1854 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.0778 0.2194 -0.3091 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.7477 -0.9842 -0.1404 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.0009 -2.1023 0.1500 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.6121 -2.0397 0.2720 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.8376 0.3831 -0.2808 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.3843 1.6333 -0.3153 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.7124 2.7450 -0.8045 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.4384 2.5527 -1.2690 C 0 0 0 0 0 0 0 0 0 0 0 0
0.1658 1.3305 -1.2621 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5292 0.2366 -0.7684 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0654 -0.9881 -0.7639 O 0 0 0 0 0 0 0 0 0 0 0 0
1.2716 -1.4549 -1.1554 C 0 0 0 0 0 0 0 0 0 0 0 0
2.4622 -0.8591 -0.4642 C 0 0 0 0 0 0 0 0 0 0 0 0
3.5676 -0.3829 -1.0172 N 0 0 0 0 0 0 0 0 0 0 0 0
4.5252 0.0899 -0.2664 C 0 0 0 0 0 0 0 0 0 0 0 0
5.7454 0.6131 -0.7337 C 0 0 0 0 0 0 0 0 0 0 0 0
6.6695 1.0678 0.1409 C 0 0 0 0 0 0 0 0 0 0 0 0
6.5293 1.0695 1.5033 C 0 0 0 0 0 0 0 0 0 0 0 0
5.3151 0.5506 1.9759 C 0 0 0 0 0 0 0 0 0 0 0 0
4.3407 0.0729 1.0865 C 0 0 0 0 0 0 0 0 0 0 0 0
2.7774 -0.6324 1.1998 S 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0
2 3 1 0
3 4 2 0
4 5 1 0
5 6 2 0
6 7 1 0
7 8 2 0
2 9 1 0
9 10 2 0
10 11 1 0
11 12 2 0
12 13 1 0
13 14 2 0
14 15 1 0
15 16 1 0
16 17 1 0
17 18 2 0
18 19 1 0
19 20 2 0
20 21 1 0
21 22 2 0
22 23 1 0
23 24 2 0
24 25 1 0
8 3 1 0
14 9 1 0
25 17 1 0
24 19 1 0
M END
theZincMolList[0]
| zinc_id | ZINC000000035284 |
|---|---|
| smiles | O=C(c1ccccc1)c1ccccc1OCc1nc2ccccc2s1 |
show3DMol(theZincMolList[0])
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
show3DMolWithOptimization(theZincMolList[0])
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
분자 그림 파일 만들기¶
from rdkit.Chem import Draw
Draw.MolToFile(theZincMolList[0], 'zinc-001.png')
theZincMolList[0]
| zinc_id | ZINC000000035284 |
|---|---|
| smiles | O=C(c1ccccc1)c1ccccc1OCc1nc2ccccc2s1 |
2D 구조 최적화 후 그림 저장
import copy
theFirstZincMol = copy.deepcopy(theZincMolList[0])
AllChem.Compute2DCoords(theFirstZincMol)
0
Draw.MolToFile(theFirstZincMol, 'zinc-001-2D.png')
theFirstZincMol
| zinc_id | ZINC000000035284 |
|---|---|
| smiles | O=C(c1ccccc1)c1ccccc1OCc1nc2ccccc2s1 |
여러 분자를 Grid 형태로 저장
theGridImage = Draw.MolsToGridImage(theZincMolList[:8],molsPerRow=4,subImgSize=(200,200),legends=[x.GetProp("zinc_id") for x in theZincMolList[:8]], returnPNG=False)
theGridImage.save('zinc-grid-001-008.png')
theGridImage
for ithMol in theZincMolList:
AllChem.Compute2DCoords(ithMol)
pass
theGridImage = Draw.MolsToGridImage(theZincMolList[:8],molsPerRow=4,subImgSize=(200,200),legends=[x.GetProp("zinc_id") for x in theZincMolList[:8]], returnPNG=False)
theGridImage.save('zinc-grid-001-008-2D.png')
theGridImage
Sub structure를 가진 화합물 찾아 그림으로 저장
theCommonCoreMol = Chem.MolFromSmiles('Nc1ccccc1')
theCommonCoreMol
#theSubZincMolList = [x for x in theZincMolList if x.HasSubstructMatch(theCommonCoreMol)]
theSubMatchedMolList = []
for ithMol in theZincMolList:
if (ithMol.HasSubstructMatch(theCommonCoreMol)):
theSubMatchedMolList.append(ithMol)
pass
pass
print('# of total molecule list : ' + str(len(theZincMolList)))
print('# of matched molecules : ' + str(len(theSubMatchedMolList)))
# of total molecule list : 100 # of matched molecules : 48
AllChem.Compute2DCoords(theCommonCoreMol)
for ithMatchedMol in theSubMatchedMolList:
_ = AllChem.GenerateDepictionMatching2DStructure(ithMatchedMol,theCommonCoreMol)
theMatchedGridImage = Draw.MolsToGridImage(theSubMatchedMolList[:12],molsPerRow=4,subImgSize=(300,300),legends=[x.GetProp("zinc_id") for x in theSubMatchedMolList], returnPNG=False)
theMatchedGridImage.save('zinc-matched-grid.png')
theMatchedGridImage
Substructure 검색¶
theMolecule = Chem.MolFromSmiles('c1ccccc1O')
theMolecule
thePattern = Chem.MolFromSmarts('ccO')
thePattern
theMolecule.HasSubstructMatch(thePattern)
True
theMolecule.GetSubstructMatch(thePattern)
(0, 5, 6)
theMolecule.GetSubstructMatches(thePattern)
((0, 5, 6), (4, 5, 6))
theMatchedMolList = []
for ithZincMol in theZincMolList:
if ithZincMol.HasSubstructMatch(thePattern):
theMatchedMolList.append(ithZincMol)
print(len(theMatchedMolList))
41
for ithMol in theMatchedMolList:
AllChem.Compute2DCoords(ithMol)
pass
theGridImage = Draw.MolsToGridImage(theMatchedMolList,molsPerRow=6,subImgSize=(200,200),legends=[x.GetProp("zinc_id") for x in theMatchedMolList], returnPNG=False)
theGridImage.save('zinc-substr-matched-grid-2D.png')
theGridImage
Chemical Transformations¶
Substructure-based Transformations
- Deleting substructure
theMol = Chem.MolFromSmiles('CC(=O)O')
theMol
thePattern = Chem.MolFromSmarts('C(=O)[OH]')
thePattern
theRemovedMol = AllChem.DeleteSubstructs(theMol,thePattern)
theRemovedMol
- Replacing substructure
theReplaceMol = Chem.MolFromSmiles('OC')
theReplaceMol
thePattern = Chem.MolFromSmarts('[$(NC(=O))]')
thePattern
theMol = Chem.MolFromSmiles('CC(=O)N')
theMol
AllChem.ReplaceSubstructs(theMol,thePattern,theReplaceMol)[0]
Fingerprinting and Molecular Similarity¶
from rdkit import DataStructs
첫번째 분자와 나머지 분자의 유사도 계산
theFingerprintList = [Chem.RDKFingerprint(x) for x in theZincMolList]
for idx, ithFingerprint in enumerate(theFingerprintList):
if idx == 0 : continue
ithSimilarity = DataStructs.FingerprintSimilarity(theFingerprintList[0], theFingerprintList[idx])
print(idx, ithSimilarity)
1 0.16019417475728157 2 0.32099758648431215 3 0.23923923923923923 4 0.17231075697211157 5 0.24149659863945577 6 0.29102384291725103 7 0.20123565754633715 8 0.1726479146459748 9 0.2333984375 10 0.21395348837209302 11 0.3170266836086404 12 0.24427480916030533 13 0.24572317262830481 14 0.14798694232861806 15 0.1914257228315055 16 0.2785425101214575 17 0.3434547908232119 18 0.30604982206405695 19 0.32371794871794873 20 0.3335419274092616 21 0.2857142857142857 22 0.24226415094339623 23 0.3796825396825397 24 0.24062772449869224 25 0.3014065639651708 26 0.21099290780141844 27 0.22090059473237042 28 0.30455153949129854 29 0.3102766798418972 30 0.2858187134502924 31 0.34825174825174826 32 0.2917547568710359 33 0.2869496855345912 34 0.29353562005277045 35 0.2564575645756458 36 0.3215767634854772 37 0.3315614617940199 38 0.27180966113914923 39 0.29862306368330466 40 0.33941605839416056 41 0.2719298245614035 42 0.267221801665405 43 0.3039426523297491 44 0.3253652058432935 45 0.3108359133126935 46 0.2939501779359431 47 0.2525096525096525 48 0.24326833797585887 49 0.29554043839758126 50 0.32117920868890615 51 0.29735849056603775 52 0.29183187946074546 53 0.34238683127572017 54 0.21972318339100347 55 0.24272727272727274 56 0.24606462303231152 57 0.32234432234432236 58 0.058385093167701865 59 0.252129471890971 60 0.31554677206851117 61 0.3116076970825574 62 0.30514939605848695 63 0.2858176555716353 64 0.223717409587889 65 0.36026200873362446 66 0.30659767141009053 67 0.28471683475562454 68 0.3298835705045278 69 0.3027834351663272 70 0.3059467918622848 71 0.30463096960926195 72 0.2531752751905165 73 0.2899860917941586 74 0.30421909696521093 75 0.28652886671418387 76 0.2664601084430674 77 0.2745961820851689 78 0.29927007299270075 79 0.3292367399741268 80 0.2841880341880342 81 0.29423328964613366 82 0.29322813938198555 83 0.2669456066945607 84 0.31485148514851485 85 0.2791762013729977 86 0.25040387722132473 87 0.29968203497615264 88 0.2783053323593864 89 0.21894005212858383 90 0.19038817005545286 91 0.20425138632162662 92 0.19718309859154928 93 0.2281144781144781 94 0.13908872901678657 95 0.10610932475884244 96 0.10588235294117647 97 0.22489626556016598 98 0.11818181818181818 99 0.3115610711952972
Fingerprint 이미지 만들기
from rdkit.Chem import Draw
mol = Chem.MolFromSmiles('c1ccccc1CC1CC1')
bi = {}
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, bitInfo=bi)
bi[872]
[14:08:18] DEPRECATION WARNING: please use MorganGenerator
((6, 2),)
mfp2_svg = Draw.DrawMorganBit(mol, 872, bi, useSVG=True)
mfp2_svg
rdkbi = {}
rdkfp = Chem.RDKFingerprint(mol, maxPath=5, bitInfo=rdkbi)
rdkbi[1553]
[[0, 1, 9, 5, 4], [2, 3, 4, 9, 5]]
rdk_svg = Draw.DrawRDKitBit(mol, 1553, rdkbi, useSVG=True)
rdk_svg
import requests
theLogSDataFileUrl = 'https://raw.githubusercontent.com/youngmook/cheminfo-python/main/logS-data.sdf'
theResponse = requests.get(theLogSDataFileUrl, allow_redirects=True)
with open('logS-data.sdf', 'wb') as theWriter:
theWriter.write(theResponse.content)
theSDMolSupplier = Chem.SDMolSupplier('logS-data.sdf')
theMolList = []
for ithMol in theSDMolSupplier :
theMolList.append(ithMol)
pass
theMolList[0:10]
[<rdkit.Chem.rdchem.Mol at 0x7b0298c7dd20>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7dd90>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7de00>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7de70>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7dee0>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7df50>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7dfc0>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7e030>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7e0a0>, <rdkit.Chem.rdchem.Mol at 0x7b0298c7e110>]
from rdkit import Chem
from rdkit.Chem.EState import Fingerprinter
from rdkit.Chem import Descriptors
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Activation
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import (
Dense,
Dropout,
)
def molToFingerprintList(mol):
return np.append(Fingerprinter.FingerprintMol(mol)[0],Descriptors.MolWt(mol))
X = []
y = []
for ithMol in theMolList:
X.append(molToFingerprintList(ithMol))
y.append(float(ithMol.GetProp('logS')))
X = np.array(X)
y = np.array(y)
X
array([[ 0. , 0. , 0. , ..., 0. , 0. , 665.733],
[ 0. , 0. , 0. , ..., 0. , 0. , 589.64 ],
[ 0. , 0. , 0. , ..., 0. , 0. , 528.582],
...,
[ 0. , 0. , 0. , ..., 0. , 0. , 206.266],
[ 0. , 0. , 0. , ..., 0. , 0. , 218.321],
[ 0. , 0. , 0. , ..., 0. , 0. , 141.086]])
theStandardScaler = StandardScaler()
X= theStandardScaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
X_train
array([[ 0. , 0. , 0. , ..., 0. ,
0. , 1.99449515],
[ 0. , 0. , 0. , ..., 0. ,
0. , 0.36520251],
[ 0. , 0. , 0. , ..., 0. ,
0. , 0.08054434],
...,
[ 0. , 0. , 0. , ..., 0. ,
0. , -0.86357368],
[ 0. , 0. , 0. , ..., 0. ,
0. , 0.62097486],
[ 0. , 0. , 0. , ..., 0. ,
0. , -1.04418619]])
model = Sequential()
model.add(Dense(units=512, activation='relu', input_shape=(X.shape[1],)))
model.add(Dense(units = 512, activation='relu'))
model.add(Dense(units = 1024, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units = 256, activation='relu'))
model.add(Dense(units = 1024, activation='relu'))
model.add(Dense(units = 512, activation='relu'))
model.add(Dense(units = 1, activation='linear'))
/usr/local/lib/python3.12/dist-packages/keras/src/layers/core/dense.py:93: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, **kwargs)
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 512) │ 41,472 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 512) │ 262,656 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 1024) │ 525,312 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 1024) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_3 (Dense) │ (None, 256) │ 262,400 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_4 (Dense) │ (None, 1024) │ 263,168 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_5 (Dense) │ (None, 512) │ 524,800 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_6 (Dense) │ (None, 1) │ 513 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 1,880,321 (7.17 MB)
Trainable params: 1,880,321 (7.17 MB)
Non-trainable params: 0 (0.00 B)
model.compile(loss='mean_squared_error', optimizer=SGD(learning_rate=0.001, momentum=0.9, nesterov=True))
#history = model.fit(X_train, y_train, nb_epoch=500, batch_size=32)
history = model.fit(
X_train, y_train, epochs=50, verbose=1, validation_data=(X_test, y_test)
)
Epoch 1/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 6s 75ms/step - loss: 7.4712 - val_loss: 1.9101 Epoch 2/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 1.6193 - val_loss: 1.1393 Epoch 3/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.8356 - val_loss: 0.9632 Epoch 4/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.7753 - val_loss: 0.8497 Epoch 5/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.5801 - val_loss: 0.7433 Epoch 6/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.4566 - val_loss: 0.7036 Epoch 7/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.4438 - val_loss: 0.6493 Epoch 8/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.3776 - val_loss: 0.6636 Epoch 9/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.3570 - val_loss: 0.6118 Epoch 10/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.3327 - val_loss: 0.5915 Epoch 11/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2897 - val_loss: 0.6221 Epoch 12/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.3153 - val_loss: 0.5269 Epoch 13/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.2601 - val_loss: 0.5467 Epoch 14/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.2535 - val_loss: 0.5301 Epoch 15/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.2185 - val_loss: 0.5017 Epoch 16/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2311 - val_loss: 0.5127 Epoch 17/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2181 - val_loss: 0.5086 Epoch 18/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2118 - val_loss: 0.5184 Epoch 19/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1916 - val_loss: 0.5086 Epoch 20/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2086 - val_loss: 0.5394 Epoch 21/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2007 - val_loss: 0.5155 Epoch 22/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1812 - val_loss: 0.4820 Epoch 23/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1628 - val_loss: 0.5074 Epoch 24/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1443 - val_loss: 0.5006 Epoch 25/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1700 - val_loss: 0.4729 Epoch 26/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1354 - val_loss: 0.4841 Epoch 27/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1425 - val_loss: 0.4729 Epoch 28/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1491 - val_loss: 0.4687 Epoch 29/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1315 - val_loss: 0.4994 Epoch 30/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1313 - val_loss: 0.4781 Epoch 31/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1353 - val_loss: 0.4767 Epoch 32/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1280 - val_loss: 0.4755 Epoch 33/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1364 - val_loss: 0.4851 Epoch 34/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1332 - val_loss: 0.4583 Epoch 35/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1248 - val_loss: 0.4679 Epoch 36/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1036 - val_loss: 0.4884 Epoch 37/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1164 - val_loss: 0.4797 Epoch 38/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1217 - val_loss: 0.4701 Epoch 39/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1134 - val_loss: 0.4780 Epoch 40/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1132 - val_loss: 0.4636 Epoch 41/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1280 - val_loss: 0.4719 Epoch 42/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1108 - val_loss: 0.4670 Epoch 43/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1124 - val_loss: 0.4845 Epoch 44/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1194 - val_loss: 0.4615 Epoch 45/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1115 - val_loss: 0.4800 Epoch 46/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1030 - val_loss: 0.4790 Epoch 47/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.1060 - val_loss: 0.4876 Epoch 48/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.0991 - val_loss: 0.4826 Epoch 49/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.1005 - val_loss: 0.4735 Epoch 50/50 33/33 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.0890 - val_loss: 0.4874
import matplotlib.pyplot as plt
plt.scatter(y_train,model.predict(X_train), label = 'Train', c='blue')
plt.title('Neural Network Predictor')
plt.xlabel('Measured Solubility')
plt.ylabel('Predicted Solubility')
plt.scatter(y_test,model.predict(X_test),c='lightgreen', label='Test', alpha = 0.8)
plt.legend(loc=4)
plt.show()
33/33 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step 11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step