(서울=연합뉴스) 한성간 기자 = 미국 식품의약청(FDA)의 승인을 받아 안전성이 입증된 구충제 이버멕틴(Ivermectin)이 신종 코로나바이러스 감염증(코로나19) 바이러스를 48시간 이내에 죽인다는 세포배양 실험 결과가 나왔다.
호주 모니쉬(Monash)대학 생의학발견연구소(BiomedicineDiscoveryInstitute)의 카일리 왜그스태프 박사는 세포 배양된 코로나19 바이러스가 이버멕틴에 노출되자 48시간 안에 모든 유전물질이 소멸됐다는 실험 결과를 발표했다고 사이언스 데일리가 4일 보도했다.
단 한 번 투여된 용량에도 24시간 후 코로나19 바이러스의RNA가 상당 부분 줄어들었으며 48시간이 지나자RNA전부가 완전히 사라졌다고 왜그스태프 박사는 밝혔다.
그러나 이는 세포 배양 실험에서 나온 결과이기 때문에 코로나19 환자에게 직접 투여하는 임상시험이 필요하다고 그는 설명했다.
이버멕틴은 널리 사용되고 있는 안전한 약이지만 어느 정도 용량을 투여해야 코로나19 감염 환자에게 효과가 있는지를 우선 확인할 필요가 있다고 그는 강조했다.
이버멕틴이 코로나19 바이러스에 작용하는 기전은 알 수 없으나 다른 바이러스에 작용하는 메커니즘을 보면 바이러스가 숙주 세포의 방어력을 '약화'시키지 못하게 차단한다고 왜그스태프 박사는 밝혔다.
이버멕틴은 구충제로 승인된 약이지만 에이즈, 뎅기열, 독감, 지카 바이러스를 포함, 광범한 종류의 바이러스에도 효과가 있는 것으로 시험관실험에서 나타나고 있다고 그는 덧붙였다.
compared with a mortality rate of less than 1% from influenza. There is an urgent need for effective treatment. Current focus has been on the development of novel therapeutics, including antivirals and vaccines. Accumulating evidence suggests that a subgroup of patients with severe COVID-19 might have a cytokine storm syndrome. We recommend identification and treatment of hyperinflammation using existing, approved therapies with proven safety profiles to address the immediate need to reduce the rising mortality.
Current management of COVID-19 is supportive, and respiratory failure from acute respiratory distress syndrome (ARDS) is the leading cause of mortality.
Secondary haemophagocytic lymphohistiocytosis (sHLH) is an under-recognised, hyperinflammatory syndrome characterised by a fulminant and fatal hypercytokinaemia with multiorgan failure. In adults, sHLH is most commonly triggered by viral infections
Cardinal features of sHLH include unremitting fever, cytopenias, and hyperferritinaemia; pulmonary involvement (including ARDS) occurs in approximately 50% of patients.
A cytokine profile resembling sHLH is associated with COVID-19 disease severity, characterised by increased interleukin (IL)-2, IL-7, granulocyte-colony stimulating factor, interferon-γ inducible protein 10, monocyte chemoattractant protein 1, macrophage inflammatory protein 1-α, and tumour necrosis factor-α.
Predictors of fatality from a recent retrospective, multicentre study of 150 confirmed COVID-19 cases in Wuhan, China, included elevated ferritin (mean 1297·6 ng/ml in non-survivorsvs614·0 ng/ml in survivors; p<0·001) and IL-6 (p<0·0001),
As during previous pandemics (severe acute respiratory syndrome and Middle East respiratory syndrome), corticosteroids are not routinely recommended and might exacerbate COVID-19-associated lung injury.
However, in hyperinflammation, immunosuppression is likely to be beneficial. Re-analysis of data from a phase 3 randomised controlled trial of IL-1 blockade (anakinra) in sepsis, showed significant survival benefit in patients with hyperinflammation, without increased adverse events.
A multicentre, randomised controlled trial of tocilizumab (IL-6 receptor blockade, licensed for cytokine release syndrome), has been approved in patients with COVID-19 pneumonia and elevated IL-6 in China (ChiCTR2000029765).
All patients with severe COVID-19 should be screened for hyperinflammation using laboratory trends (eg, increasing ferritin, decreasing platelet counts, or erythrocyte sedimentation rate) and the HScore
(table) to identify the subgroup of patients for whom immunosuppression could improve mortality. Therapeutic options include steroids, intravenous immunoglobulin, selective cytokine blockade (eg, anakinra or tocilizumab) and JAK inhibition.
TableHScore for secondary HLH, by clinical parameter
generates a probability for the presence of secondary HLH. HScores greater than 169 are 93% sensitive and 86% specific for HLH. Note that bone marrow haemophagocytosis is not mandatory for a diagnosis of HLH. HScores can be calculated using anonline HScore calculator.
*Defined as either haemoglobin concentration of 9·2 g/dL or less (≤5·71 mmol/L), a white blood cell count of 5000 white blood cells per mm3or less, or platelet count of 110 000 platelets per mm3or less, or all of these criteria combined.
PM is a clinical training fellow within the Experimental Medicine Initiative to Explore New Therapies network and receives project funding unrelated to this Correspondence. PM also receives co-funding by the National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre. DFM chairs the NIHR and Medical Research Council funding committee for COVID-19 for therapeutics and vaccines. DFM reports personal fees from consultancy for ARDS for GlaxoSmithKline, Boehringer Ingelheim, and Bayer; in addition, his institution has received funds from grants from the UK NIHR, Wellcome Trust, Innovate UK, and others, all unrelated to this Correspondence. DFM also has a patent issued to his institution for a treatment for ARDS. DFM is a Director of Research for the Intensive Care Society and NIHR Efficacy and Mechanism Evaluation Programme Director. All other authors declare no competing interests.
Interleukin-1 receptor blockade is associated with reduced mortality in sepsis patients with features of macrophage activation syndrome: reanalysis of a prior phase iii trial.
…clearly, the coronavirus has changed its internal structure to adapt to the new species of their host (to be more precise, about 20% of the internal structure of the coronavirus was mutated), but maintained enough such that it is still true to its origin species.
In fact, research has shown COVID-19 has mutated repeatedly in ways to boost its survival. In our fight to defeat the coronavirus, we need to find not just how the virus can be destroyed, but how the virus mutates and how those mutations can be addressed.
In this article, I will…
Provide a surface-level explanation of what RNA nucleotide sequences are
Use K-Means to create genome information clusters
Use PCA to visualize the clusters
…and derive insights from each of the analytics procedures we perform.
What are genome sequences?
Feel free to skip over this part if you have a basic understanding of RNA nucleotide sequences.
Genome sequencing, commonly compared to “decoding,” is the process of analyzing deoxyribonucleic acid (DNA) taken from a sample. Within every normal cell are 23 pairs of chromosomes, structures that house DNA.
The curled double helix structure of DNA allows it to unwind into a ladder shape. This ladder is made out of paired chemical letters called bases. There are only four of these present in DNA: adenine, thymine, guanine, and cytosine. Adenine joins only with thymine, and guanine joins only with cytosine. These bases are represented with A, T, G, and C, respectively.
These bases form a code of sorts that instructs the organism how to construct proteins — it is the DNA that essentially controls how the virus acts.
The process of DNA into RNA into protein creation.Source. Image free to share and use commercially.
Using specialized equipment, including sequencing instruments and specialized tags, the DNA sequences of specific fragments are revealed. Information obtained from this undergoes further analysis and comparison to allow researchers to identify changes in genes, associations with diseases and phenotypes, and identify potential drug targets.
The genome sequence, a long string of ‘A’s, ‘T’s, ‘G’s, and ‘C’s, represents how the organism reacts to its environment. Mutations to an organism are created by altering the DNA. Looking at the genome sequence is a strong way to analyze coronavirus mutations.
Get to know the data.
The data, which can be found on Kagglehere, looks like this:
Each one of the rows represents one mutation of the bat virus. First, just take a minute to admire how incredible nature is — within a few weeks, the coronavirus has already created 262 mutations of itself to increase survival rates.
Some important columns:
query acc.verrepresents the original virus identifier.
subject acc.veris the identifier for a virus mutation.
% identityrepresents what percent of the sequence is the same as the original virus.
alignment lengthrepresents how many items in the sequence are the same, or aligned.
mismatchesrepresents the number of items that the mutation and the original differ on.
bit scorerepresents a measure to represent how good an alignment is; the higher the score, the better the alignment.
Some statistical measures of each of the columns (this can be handily called in Python withdata.describe()):
Looking at the% identitycolumn, it is interesting to see the minimum alignment percent a mutation has with the original virus — about 77.6 percent. The rather large standard deviation of 7 percent for% identitymeans that there is a wide range of mutation. This is supported by amassivestandard deviation inbit score— the standard deviation is larger than the mean!
A good way to visualize data is through a correlation heatmap. Each cell represents how correlated one feature is with another.
A lot of the data is highly correlated with each other. This makes sense since most of the measures are variations of each other. One thing to take note of ifalignment length’s high correlation withbit score.
Using K-Means to Create Mutation Clusters
K-Means is an algorithm forclustering, a method in machine learning to find groups of data points in the feature space. The goal of our K-Means is to find clusters of mutations, so we can derive insights on the nature of the mutations and how to address them.
However, we still need to choose the number of clustersk. While this is as simple as plotting out the points in two dimensions, this is unachievable in higher dimensions (if we want to retain the most information). Methods like the elbow method to choosekare subjective and inaccurate, so instead, we will use the silhouette method.
The silhouette method is a score given tokclusters on how well the clusters suit the data. Thesklearnlibrary in Python makes implementing both K-Means and the silouhette method very simple.
It seems that 5 clusters seems to be the best for the data. Now, we can determine the cluster centers. These are the points in which each cluster is centered around, and represent a numerical evaluation of (in this case) the 5 main types of mutations.
Note: The features have been standardized to put them all on the same scale. Otherwise, columns would not be comparable.
This heatmap represents each cluster’s attributes, by column. Because the points were scaled, the actual annotated values do not quantitatively mean anything. However, scaled values in each column can be compared. You can get a visual sense for the relative attributes of each of the mutation clusters is. If scientists were to develop a vaccine, it should address these main clusters of virii.
In the next section, we will visualize the clusters using PCA.
PCA for Cluster Visualization
PCA, or Principal Component Analysis, is a method of dimensionality reduction. It selects orthogonal vectors in multidimensional space to represent axes, such that the most information (variance) is retained.
With popular Python librarysklearn, implementing PCA can be done in two lines. First, we can check the explained variance ratio. This is the percent of statistical information that is retained from the original dataset. The explained variance ratio, in this case, is0.9838548580740327, which is astronomically high! We can be assured that whatever analyses we take from PCA will be true to the data.
Each new feature (principal component) is a linear combination of several other columns. We can visualize how important a column is to one of the two principal components with a heatmap.
It is important to understand what having a high value in the first component means — in this case, it is characterized by having a higher alignment length (is closer to the original virus), and component 2 is largely characterized by having a shorter alignment length (mutated farther from the original value). This is also reflected by the larger difference inbit score.
It is clear that there are 5 main strands of the virus mutation. We can take away lots of insights.
Four of the virus mutations are on the left side of the first principal component, and one on the right side. A signature of the first principal component is a highalignment length. This means that a higher value for a first principal component means a higheralignment length(is closer to the original virus). Lower values of component 1, thus, are farther genetically from the original virus. Most of the virus clusters vary largely from the original virus. Hence, scientists attempting to create a vaccine should be aware that the virus mutatesalot.
Conclusion
Using K-Means and PCA, were able to identify five main clusters of mutations in the coronavirus. Scientists developing vaccines for the coronavirus can use the cluster centers to gain knowledge about characteristics of each cluster. We were able to visualize the clusters in two dimensions using principal component analysis, and found that the coronavirus has a very high rate of mutation. This may be what makes it so deadly.
Note from the editors:Towards Data Scienceis a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can clickhere.