How to penalize for empty fields in a DataFrame?2019 Community Moderator ElectionPandas: access fields within field in a DataFrameHow duplicated items can be deleted from dataframe in pandaslengthy criteria in dataframe selectorResampling pandas Dataframe keeping other columnsHow to group this dataframe in python?Pandas DataFrame Rollup ErrorDataframe size is null?Pivot reshape dataframeHow to get a dataframe values in one single column for the following dataset?Manipulating multi-indices for a pandas dataframe

Finitely generated matrix groups whose eigenvalues are all algebraic

What exactly is ineptocracy?

How dangerous is XSS

Notepad++ delete until colon for every line with replace all

How to find if SQL server backup is encrypted with TDE without restoring the backup

What does the same-ish mean?

How exploitable/balanced is this homebrew spell: Spell Permanency?

Can compressed videos be decoded back to their uncompresed original format?

Is it possible to create a QR code using text?

One verb to replace 'be a member of' a club

What is a Samsaran Word™?

Should I tell management that I intend to leave due to bad software development practices?

How can I deal with my CEO asking me to hire someone with a higher salary than me, a co-founder?

Why were 5.25" floppy drives cheaper than 8"?

Can I hook these wires up to find the connection to a dead outlet?

Is this answer explanation correct?

Unlock My Phone! February 2018

Does the Cone of Cold spell freeze water?

How to remove border from elements in the last row?

files created then deleted at every second in tmp directory

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

how do we prove that a sum of two periods is still a period?

Fair gambler's ruin problem intuition

Is this draw by repetition?



How to penalize for empty fields in a DataFrame?



2019 Community Moderator ElectionPandas: access fields within field in a DataFrameHow duplicated items can be deleted from dataframe in pandaslengthy criteria in dataframe selectorResampling pandas Dataframe keeping other columnsHow to group this dataframe in python?Pandas DataFrame Rollup ErrorDataframe size is null?Pivot reshape dataframeHow to get a dataframe values in one single column for the following dataset?Manipulating multi-indices for a pandas dataframe










2












$begingroup$


I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.



So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.










share|improve this question







New contributor




jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    2












    $begingroup$


    I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.



    So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.










    share|improve this question







    New contributor




    jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      2












      2








      2





      $begingroup$


      I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.



      So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.










      share|improve this question







      New contributor




      jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.



      So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.







      pandas data






      share|improve this question







      New contributor




      jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      jatrp5jatrp5

      111




      111




      New contributor




      jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          This heavily depends on the domain knowledge. A general approach would be to place



          1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


          2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


          3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


          No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



          1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


          2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.






          share|improve this answer











          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48293%2fhow-to-penalize-for-empty-fields-in-a-dataframe%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            This heavily depends on the domain knowledge. A general approach would be to place



            1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


            2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


            3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


            No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



            1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


            2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.






            share|improve this answer











            $endgroup$

















              1












              $begingroup$

              This heavily depends on the domain knowledge. A general approach would be to place



              1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


              2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


              3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


              No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



              1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


              2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.






              share|improve this answer











              $endgroup$















                1












                1








                1





                $begingroup$

                This heavily depends on the domain knowledge. A general approach would be to place



                1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


                2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


                3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


                No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



                1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


                2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.






                share|improve this answer











                $endgroup$



                This heavily depends on the domain knowledge. A general approach would be to place



                1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


                2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


                3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


                No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



                1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


                2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 2 days ago

























                answered 2 days ago









                EsmailianEsmailian

                2,487318




                2,487318




















                    jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.












                    jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.











                    jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48293%2fhow-to-penalize-for-empty-fields-in-a-dataframe%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    រឿង រ៉ូមេអូ និង ហ្ស៊ុយលីយេ សង្ខេបរឿង តួអង្គ បញ្ជីណែនាំ

                    Crop image to path created in TikZ? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Crop an inserted image?TikZ pictures does not appear in posterImage behind and beyond crop marks?Tikz picture as large as possible on A4 PageTransparency vs image compression dilemmaHow to crop background from image automatically?Image does not cropTikzexternal capturing crop marks when externalizing pgfplots?How to include image path that contains a dollar signCrop image with left size given

                    Romeo and Juliet ContentsCharactersSynopsisSourcesDate and textThemes and motifsCriticism and interpretationLegacyScene by sceneSee alsoNotes and referencesSourcesExternal linksNavigation menu"Consumer Price Index (estimate) 1800–"10.2307/28710160037-3222287101610.1093/res/II.5.31910.2307/45967845967810.2307/2869925286992510.1525/jams.1982.35.3.03a00050"Dada Masilo: South African dancer who breaks the rules"10.1093/res/os-XV.57.1610.2307/28680942868094"Sweet Sorrow: Mann-Korman's Romeo and Juliet Closes Sept. 5 at MN's Ordway"the original10.2307/45957745957710.1017/CCOL0521570476.009"Ram Leela box office collections hit massive Rs 100 crore, pulverises prediction"Archived"Broadway Revival of Romeo and Juliet, Starring Orlando Bloom and Condola Rashad, Will Close Dec. 8"Archived10.1075/jhp.7.1.04hon"Wherefore art thou, Romeo? To make us laugh at Navy Pier"the original10.1093/gmo/9781561592630.article.O006772"Ram-leela Review Roundup: Critics Hail Film as Best Adaptation of Romeo and Juliet"Archived10.2307/31946310047-77293194631"Romeo and Juliet get Twitter treatment""Juliet's Nurse by Lois Leveen""Romeo and Juliet: Orlando Bloom's Broadway Debut Released in Theaters for Valentine's Day"Archived"Romeo and Juliet Has No Balcony"10.1093/gmo/9781561592630.article.O00778110.2307/2867423286742310.1076/enst.82.2.115.959510.1080/00138380601042675"A plague o' both your houses: error in GCSE exam paper forces apology""Juliet of the Five O'Clock Shadow, and Other Wonders"10.2307/33912430027-4321339124310.2307/28487440038-7134284874410.2307/29123140149-661129123144728341M"Weekender Guide: Shakespeare on The Drive""balcony"UK public library membership"romeo"UK public library membership10.1017/CCOL9780521844291"Post-Zionist Critique on Israel and the Palestinians Part III: Popular Culture"10.2307/25379071533-86140377-919X2537907"Capulets and Montagues: UK exam board admit mixing names up in Romeo and Juliet paper"Istoria Novellamente Ritrovata di Due Nobili Amanti2027/mdp.390150822329610820-750X"GCSE exam error: Board accidentally rewrites Shakespeare"10.2307/29176390149-66112917639"Exam board apologises after error in English GCSE paper which confused characters in Shakespeare's Romeo and Juliet""From Mariotto and Ganozza to Romeo and Guilietta: Metamorphoses of a Renaissance Tale"10.2307/37323537323510.2307/2867455286745510.2307/28678912867891"10 Questions for Taylor Swift"10.2307/28680922868092"Haymarket Theatre""The Zeffirelli Way: Revealing Talk by Florentine Director""Michael Smuin: 1938-2007 / Prolific dance director had showy career"The Life and Art of Edwin BoothRomeo and JulietRomeo and JulietRomeo and JulietRomeo and JulietEasy Read Romeo and JulietRomeo and Julieteeecb12003684p(data)4099369-3n8211610759dbe00d-a9e2-41a3-b2c1-977dd692899302814385X313670221313670221