VEP output SIFT_score unclear Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?Why are there missing calls in a VCF file from exome sequencing?Selecting sites from VCF which have an alt AD > 10Keep Format and Individual fields when annotating VCF with VEPupdate dbSNP IDWhy Ti/Tv ratio?Meaning of the FORMAT fields of the VCF file coming from GIAB projectAnnotation with Prokka or RAST.Efficiently aligning a lot of reads on the same small reference sequenceFastqc- Per Base Sequence QualityHow to correctly call a VCF file using damaged DNA? (IonTorrent & FFPE)

Maximum summed powersets with non-adjacent items

Chinese Seal on silk painting - what does it mean?

Can you use the Shield Master feat to shove someone before you make an attack by using a Readied action?

Trademark violation for app?

How do I find out the mythology and history of my Fortress?

What are the out-of-universe reasons for the references to Toby Maguire-era Spider-Man in ITSV

What is the meaning of the simile “quick as silk”?

Extracting terms with certain heads in a function

Do wooden building fires get hotter than 600°C?

Significance of Cersei's obsession with elephants?

How to Make a Beautiful Stacked 3D Plot

Can a party unilaterally change candidates in preparation for a General election?

What would be the ideal power source for a cybernetic eye?

Dating a Former Employee

Do jazz musicians improvise on the parent scale in addition to the chord-scales?

Around usage results

Do I really need to have a message in a novel to appeal to readers?

Has negative voting ever been officially implemented in elections, or seriously proposed, or even studied?

Does classifying an integer as a discrete log require it be part of a multiplicative group?

Should I use a zero-interest credit card for a large one-time purchase?

How does the math work when buying airline miles?

Can an alien society believe that their star system is the universe?

What causes the direction of lightning flashes?

Compare a given version number in the form major.minor.build.patch and see if one is less than the other



VEP output SIFT_score unclear



Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?Why are there missing calls in a VCF file from exome sequencing?Selecting sites from VCF which have an alt AD > 10Keep Format and Individual fields when annotating VCF with VEPupdate dbSNP IDWhy Ti/Tv ratio?Meaning of the FORMAT fields of the VCF file coming from GIAB projectAnnotation with Prokka or RAST.Efficiently aligning a lot of reads on the same small reference sequenceFastqc- Per Base Sequence QualityHow to correctly call a VCF file using damaged DNA? (IonTorrent & FFPE)










1












$begingroup$


We have been experimenting with VEP (Variant Effect Predictor). One of the meta data attributes that we are interested in is the SIFT score, indeed when we apply the dbNSFP plug we get a column containing the scores (named SIFT_score). However, I don't understand why there are sometimes dots or multiple values in the fields. For example, the gene ENSG00000196924 below has 5 transcripts:



DataFrame of VEP output



The SIFT_score column contains several values, not 1 per transcript/rs-number...



Here is another example that confuses me (I added the SIFT_pred column this time):



DataFrame of another gene with mutations



There are two mutations, the lower one can be expressed in 4 transcripts and I thus understand that there can be 4 SIFT scores, but why are all for in given in every row? Is the first one the SIFT_score for the first transcript?



One last example, again 4 transcripts, but now 2 of the scores are dots, what does that mean?



And another pd.DataFrame



I have been looking for quite some time now how to interpret this data, any help is appreciated.










share|improve this question









$endgroup$
















    1












    $begingroup$


    We have been experimenting with VEP (Variant Effect Predictor). One of the meta data attributes that we are interested in is the SIFT score, indeed when we apply the dbNSFP plug we get a column containing the scores (named SIFT_score). However, I don't understand why there are sometimes dots or multiple values in the fields. For example, the gene ENSG00000196924 below has 5 transcripts:



    DataFrame of VEP output



    The SIFT_score column contains several values, not 1 per transcript/rs-number...



    Here is another example that confuses me (I added the SIFT_pred column this time):



    DataFrame of another gene with mutations



    There are two mutations, the lower one can be expressed in 4 transcripts and I thus understand that there can be 4 SIFT scores, but why are all for in given in every row? Is the first one the SIFT_score for the first transcript?



    One last example, again 4 transcripts, but now 2 of the scores are dots, what does that mean?



    And another pd.DataFrame



    I have been looking for quite some time now how to interpret this data, any help is appreciated.










    share|improve this question









    $endgroup$














      1












      1








      1





      $begingroup$


      We have been experimenting with VEP (Variant Effect Predictor). One of the meta data attributes that we are interested in is the SIFT score, indeed when we apply the dbNSFP plug we get a column containing the scores (named SIFT_score). However, I don't understand why there are sometimes dots or multiple values in the fields. For example, the gene ENSG00000196924 below has 5 transcripts:



      DataFrame of VEP output



      The SIFT_score column contains several values, not 1 per transcript/rs-number...



      Here is another example that confuses me (I added the SIFT_pred column this time):



      DataFrame of another gene with mutations



      There are two mutations, the lower one can be expressed in 4 transcripts and I thus understand that there can be 4 SIFT scores, but why are all for in given in every row? Is the first one the SIFT_score for the first transcript?



      One last example, again 4 transcripts, but now 2 of the scores are dots, what does that mean?



      And another pd.DataFrame



      I have been looking for quite some time now how to interpret this data, any help is appreciated.










      share|improve this question









      $endgroup$




      We have been experimenting with VEP (Variant Effect Predictor). One of the meta data attributes that we are interested in is the SIFT score, indeed when we apply the dbNSFP plug we get a column containing the scores (named SIFT_score). However, I don't understand why there are sometimes dots or multiple values in the fields. For example, the gene ENSG00000196924 below has 5 transcripts:



      DataFrame of VEP output



      The SIFT_score column contains several values, not 1 per transcript/rs-number...



      Here is another example that confuses me (I added the SIFT_pred column this time):



      DataFrame of another gene with mutations



      There are two mutations, the lower one can be expressed in 4 transcripts and I thus understand that there can be 4 SIFT scores, but why are all for in given in every row? Is the first one the SIFT_score for the first transcript?



      One last example, again 4 transcripts, but now 2 of the scores are dots, what does that mean?



      And another pd.DataFrame



      I have been looking for quite some time now how to interpret this data, any help is appreciated.







      ngs variant-calling vep variant-effect-predictor






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 10 at 8:17









      FreekFreek

      2176




      2176




















          2 Answers
          2






          active

          oldest

          votes


















          3












          $begingroup$

          The dbNSFP plugin from VEP accesses tables of data for each variant from dbNSFP and pulls out the values. dbNSFP provide their SIFT scores in that format: a score for every transcript affected by the variant, all on one line. The lookup is just for the variant, not for the variant/transcript combo, so they provide scores for all variant/transcript combos. You can also get a column that gives you a list of the transcripts or proteins (Ensembl_transcriptid or Ensembl_proteinid) in order so you know which score goes with which transcript.



          A better way to get SIFT scores with VEP is to get them directly from VEP, rather than using dbNSFP. This will get the SIFT score that goes with the transcript on the line with the relevant transcript.






          share|improve this answer











          $endgroup$












          • $begingroup$
            I am guessing the dots are there for cases where dbNSFP doesn't have a value for the relevant transcript, right?
            $endgroup$
            – terdon
            Apr 10 at 12:18










          • $begingroup$
            Yes, that's it. Could be that the variant isn't missense in that transcript.
            $endgroup$
            – Emily_Ensembl
            Apr 10 at 13:00


















          3












          $begingroup$

          The first gene you mention, ENSG00000196924, actually has 6 transcripts (link to the VarSome.com page of variant rs371839875), not 5. It's just that one of them is non-coding:



          VarSome genome browser showing transcripts



          So the Sift scores you see are indeed one per transcript, it's just that there are 6 because dbNSFP also includes a score for the non-coding transcript of the gene.



          The dots are just there as placeholders, they mean there was no value associated with that transcript. Many tools will show some sort of symbol instead of an empty field both for clarity and for practical technical reasons.



          Visiting the variant's page on VarSome gives you a clearer picture since we collapse the identical scores and also include the converted rankscore provided by dbNSFP so you can have a single number for your variant:



          VarSome screenshot showing SIFT score




          Disclaimer: I work for the company behind VarSome, but it's a free tool. You need to pay to annotate VCF files (unlike the 100% free VEP), but it's free to use as a lookup tool for single variants.






          share|improve this answer











          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "676"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f7430%2fvep-output-sift-score-unclear%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            The dbNSFP plugin from VEP accesses tables of data for each variant from dbNSFP and pulls out the values. dbNSFP provide their SIFT scores in that format: a score for every transcript affected by the variant, all on one line. The lookup is just for the variant, not for the variant/transcript combo, so they provide scores for all variant/transcript combos. You can also get a column that gives you a list of the transcripts or proteins (Ensembl_transcriptid or Ensembl_proteinid) in order so you know which score goes with which transcript.



            A better way to get SIFT scores with VEP is to get them directly from VEP, rather than using dbNSFP. This will get the SIFT score that goes with the transcript on the line with the relevant transcript.






            share|improve this answer











            $endgroup$












            • $begingroup$
              I am guessing the dots are there for cases where dbNSFP doesn't have a value for the relevant transcript, right?
              $endgroup$
              – terdon
              Apr 10 at 12:18










            • $begingroup$
              Yes, that's it. Could be that the variant isn't missense in that transcript.
              $endgroup$
              – Emily_Ensembl
              Apr 10 at 13:00















            3












            $begingroup$

            The dbNSFP plugin from VEP accesses tables of data for each variant from dbNSFP and pulls out the values. dbNSFP provide their SIFT scores in that format: a score for every transcript affected by the variant, all on one line. The lookup is just for the variant, not for the variant/transcript combo, so they provide scores for all variant/transcript combos. You can also get a column that gives you a list of the transcripts or proteins (Ensembl_transcriptid or Ensembl_proteinid) in order so you know which score goes with which transcript.



            A better way to get SIFT scores with VEP is to get them directly from VEP, rather than using dbNSFP. This will get the SIFT score that goes with the transcript on the line with the relevant transcript.






            share|improve this answer











            $endgroup$












            • $begingroup$
              I am guessing the dots are there for cases where dbNSFP doesn't have a value for the relevant transcript, right?
              $endgroup$
              – terdon
              Apr 10 at 12:18










            • $begingroup$
              Yes, that's it. Could be that the variant isn't missense in that transcript.
              $endgroup$
              – Emily_Ensembl
              Apr 10 at 13:00













            3












            3








            3





            $begingroup$

            The dbNSFP plugin from VEP accesses tables of data for each variant from dbNSFP and pulls out the values. dbNSFP provide their SIFT scores in that format: a score for every transcript affected by the variant, all on one line. The lookup is just for the variant, not for the variant/transcript combo, so they provide scores for all variant/transcript combos. You can also get a column that gives you a list of the transcripts or proteins (Ensembl_transcriptid or Ensembl_proteinid) in order so you know which score goes with which transcript.



            A better way to get SIFT scores with VEP is to get them directly from VEP, rather than using dbNSFP. This will get the SIFT score that goes with the transcript on the line with the relevant transcript.






            share|improve this answer











            $endgroup$



            The dbNSFP plugin from VEP accesses tables of data for each variant from dbNSFP and pulls out the values. dbNSFP provide their SIFT scores in that format: a score for every transcript affected by the variant, all on one line. The lookup is just for the variant, not for the variant/transcript combo, so they provide scores for all variant/transcript combos. You can also get a column that gives you a list of the transcripts or proteins (Ensembl_transcriptid or Ensembl_proteinid) in order so you know which score goes with which transcript.



            A better way to get SIFT scores with VEP is to get them directly from VEP, rather than using dbNSFP. This will get the SIFT score that goes with the transcript on the line with the relevant transcript.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 10 at 10:45

























            answered Apr 10 at 10:05









            Emily_EnsemblEmily_Ensembl

            1,06918




            1,06918











            • $begingroup$
              I am guessing the dots are there for cases where dbNSFP doesn't have a value for the relevant transcript, right?
              $endgroup$
              – terdon
              Apr 10 at 12:18










            • $begingroup$
              Yes, that's it. Could be that the variant isn't missense in that transcript.
              $endgroup$
              – Emily_Ensembl
              Apr 10 at 13:00
















            • $begingroup$
              I am guessing the dots are there for cases where dbNSFP doesn't have a value for the relevant transcript, right?
              $endgroup$
              – terdon
              Apr 10 at 12:18










            • $begingroup$
              Yes, that's it. Could be that the variant isn't missense in that transcript.
              $endgroup$
              – Emily_Ensembl
              Apr 10 at 13:00















            $begingroup$
            I am guessing the dots are there for cases where dbNSFP doesn't have a value for the relevant transcript, right?
            $endgroup$
            – terdon
            Apr 10 at 12:18




            $begingroup$
            I am guessing the dots are there for cases where dbNSFP doesn't have a value for the relevant transcript, right?
            $endgroup$
            – terdon
            Apr 10 at 12:18












            $begingroup$
            Yes, that's it. Could be that the variant isn't missense in that transcript.
            $endgroup$
            – Emily_Ensembl
            Apr 10 at 13:00




            $begingroup$
            Yes, that's it. Could be that the variant isn't missense in that transcript.
            $endgroup$
            – Emily_Ensembl
            Apr 10 at 13:00











            3












            $begingroup$

            The first gene you mention, ENSG00000196924, actually has 6 transcripts (link to the VarSome.com page of variant rs371839875), not 5. It's just that one of them is non-coding:



            VarSome genome browser showing transcripts



            So the Sift scores you see are indeed one per transcript, it's just that there are 6 because dbNSFP also includes a score for the non-coding transcript of the gene.



            The dots are just there as placeholders, they mean there was no value associated with that transcript. Many tools will show some sort of symbol instead of an empty field both for clarity and for practical technical reasons.



            Visiting the variant's page on VarSome gives you a clearer picture since we collapse the identical scores and also include the converted rankscore provided by dbNSFP so you can have a single number for your variant:



            VarSome screenshot showing SIFT score




            Disclaimer: I work for the company behind VarSome, but it's a free tool. You need to pay to annotate VCF files (unlike the 100% free VEP), but it's free to use as a lookup tool for single variants.






            share|improve this answer











            $endgroup$

















              3












              $begingroup$

              The first gene you mention, ENSG00000196924, actually has 6 transcripts (link to the VarSome.com page of variant rs371839875), not 5. It's just that one of them is non-coding:



              VarSome genome browser showing transcripts



              So the Sift scores you see are indeed one per transcript, it's just that there are 6 because dbNSFP also includes a score for the non-coding transcript of the gene.



              The dots are just there as placeholders, they mean there was no value associated with that transcript. Many tools will show some sort of symbol instead of an empty field both for clarity and for practical technical reasons.



              Visiting the variant's page on VarSome gives you a clearer picture since we collapse the identical scores and also include the converted rankscore provided by dbNSFP so you can have a single number for your variant:



              VarSome screenshot showing SIFT score




              Disclaimer: I work for the company behind VarSome, but it's a free tool. You need to pay to annotate VCF files (unlike the 100% free VEP), but it's free to use as a lookup tool for single variants.






              share|improve this answer











              $endgroup$















                3












                3








                3





                $begingroup$

                The first gene you mention, ENSG00000196924, actually has 6 transcripts (link to the VarSome.com page of variant rs371839875), not 5. It's just that one of them is non-coding:



                VarSome genome browser showing transcripts



                So the Sift scores you see are indeed one per transcript, it's just that there are 6 because dbNSFP also includes a score for the non-coding transcript of the gene.



                The dots are just there as placeholders, they mean there was no value associated with that transcript. Many tools will show some sort of symbol instead of an empty field both for clarity and for practical technical reasons.



                Visiting the variant's page on VarSome gives you a clearer picture since we collapse the identical scores and also include the converted rankscore provided by dbNSFP so you can have a single number for your variant:



                VarSome screenshot showing SIFT score




                Disclaimer: I work for the company behind VarSome, but it's a free tool. You need to pay to annotate VCF files (unlike the 100% free VEP), but it's free to use as a lookup tool for single variants.






                share|improve this answer











                $endgroup$



                The first gene you mention, ENSG00000196924, actually has 6 transcripts (link to the VarSome.com page of variant rs371839875), not 5. It's just that one of them is non-coding:



                VarSome genome browser showing transcripts



                So the Sift scores you see are indeed one per transcript, it's just that there are 6 because dbNSFP also includes a score for the non-coding transcript of the gene.



                The dots are just there as placeholders, they mean there was no value associated with that transcript. Many tools will show some sort of symbol instead of an empty field both for clarity and for practical technical reasons.



                Visiting the variant's page on VarSome gives you a clearer picture since we collapse the identical scores and also include the converted rankscore provided by dbNSFP so you can have a single number for your variant:



                VarSome screenshot showing SIFT score




                Disclaimer: I work for the company behind VarSome, but it's a free tool. You need to pay to annotate VCF files (unlike the 100% free VEP), but it's free to use as a lookup tool for single variants.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 10 at 12:13

























                answered Apr 10 at 12:06









                terdonterdon

                4,8102830




                4,8102830



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Bioinformatics Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f7430%2fvep-output-sift-score-unclear%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    រឿង រ៉ូមេអូ និង ហ្ស៊ុយលីយេ សង្ខេបរឿង តួអង្គ បញ្ជីណែនាំ

                    Crop image to path created in TikZ? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Crop an inserted image?TikZ pictures does not appear in posterImage behind and beyond crop marks?Tikz picture as large as possible on A4 PageTransparency vs image compression dilemmaHow to crop background from image automatically?Image does not cropTikzexternal capturing crop marks when externalizing pgfplots?How to include image path that contains a dollar signCrop image with left size given

                    Romeo and Juliet ContentsCharactersSynopsisSourcesDate and textThemes and motifsCriticism and interpretationLegacyScene by sceneSee alsoNotes and referencesSourcesExternal linksNavigation menu"Consumer Price Index (estimate) 1800–"10.2307/28710160037-3222287101610.1093/res/II.5.31910.2307/45967845967810.2307/2869925286992510.1525/jams.1982.35.3.03a00050"Dada Masilo: South African dancer who breaks the rules"10.1093/res/os-XV.57.1610.2307/28680942868094"Sweet Sorrow: Mann-Korman's Romeo and Juliet Closes Sept. 5 at MN's Ordway"the original10.2307/45957745957710.1017/CCOL0521570476.009"Ram Leela box office collections hit massive Rs 100 crore, pulverises prediction"Archived"Broadway Revival of Romeo and Juliet, Starring Orlando Bloom and Condola Rashad, Will Close Dec. 8"Archived10.1075/jhp.7.1.04hon"Wherefore art thou, Romeo? To make us laugh at Navy Pier"the original10.1093/gmo/9781561592630.article.O006772"Ram-leela Review Roundup: Critics Hail Film as Best Adaptation of Romeo and Juliet"Archived10.2307/31946310047-77293194631"Romeo and Juliet get Twitter treatment""Juliet's Nurse by Lois Leveen""Romeo and Juliet: Orlando Bloom's Broadway Debut Released in Theaters for Valentine's Day"Archived"Romeo and Juliet Has No Balcony"10.1093/gmo/9781561592630.article.O00778110.2307/2867423286742310.1076/enst.82.2.115.959510.1080/00138380601042675"A plague o' both your houses: error in GCSE exam paper forces apology""Juliet of the Five O'Clock Shadow, and Other Wonders"10.2307/33912430027-4321339124310.2307/28487440038-7134284874410.2307/29123140149-661129123144728341M"Weekender Guide: Shakespeare on The Drive""balcony"UK public library membership"romeo"UK public library membership10.1017/CCOL9780521844291"Post-Zionist Critique on Israel and the Palestinians Part III: Popular Culture"10.2307/25379071533-86140377-919X2537907"Capulets and Montagues: UK exam board admit mixing names up in Romeo and Juliet paper"Istoria Novellamente Ritrovata di Due Nobili Amanti2027/mdp.390150822329610820-750X"GCSE exam error: Board accidentally rewrites Shakespeare"10.2307/29176390149-66112917639"Exam board apologises after error in English GCSE paper which confused characters in Shakespeare's Romeo and Juliet""From Mariotto and Ganozza to Romeo and Guilietta: Metamorphoses of a Renaissance Tale"10.2307/37323537323510.2307/2867455286745510.2307/28678912867891"10 Questions for Taylor Swift"10.2307/28680922868092"Haymarket Theatre""The Zeffirelli Way: Revealing Talk by Florentine Director""Michael Smuin: 1938-2007 / Prolific dance director had showy career"The Life and Art of Edwin BoothRomeo and JulietRomeo and JulietRomeo and JulietRomeo and JulietEasy Read Romeo and JulietRomeo and Julieteeecb12003684p(data)4099369-3n8211610759dbe00d-a9e2-41a3-b2c1-977dd692899302814385X313670221313670221