Flag only first row where condition is met in a DataFrameAdd one row to pandas DataFrameFilter dataframe rows if value in column is in a set list of valuesUse a list of values to select rows from a pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?Selecting a row of pandas series/dataframe by integer indexHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasDeleting DataFrame row in Pandas based on column valuer

Flag only first row where condition is met in a DataFrameAdd one row to pandas DataFrameFilter dataframe rows if value in column is in a set list of valuesUse a list of values to select rows from a pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?Selecting a row of pandas series/dataframe by integer indexHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasDeleting DataFrame row in Pandas based on column valuer - reorder certain rows if condition is met

How does buying out courses with grant money work?

India just shot down a satellite from the ground. At what altitude range is the resulting debris field?

What happens if you roll doubles 3 times then land on "Go to jail?"

How can I get through very long and very dry, but also very useful technical documents when learning a new tool?

Would a high gravity rocky planet be guaranteed to have an atmosphere?

How do I go from 300 unfinished/half written blog posts, to published posts?

Pole-zeros of a real-valued causal FIR system

Is exact Kanji stroke length important?

Flow chart document symbol

Tiptoe or tiphoof? Adjusting words to better fit fantasy races

Energy of the particles in the particle accelerator

How can a function with a hole (removable discontinuity) equal a function with no hole?

Class Action - which options I have?

What does 算不上 mean in 算不上太美好的日子?

A Rare Riley Riddle

What can we do to stop prior company from asking us questions?

Would this custom Sorcerer variant that can only learn any verbal-component-only spell be unbalanced?

Large drywall patch supports

Shortcut for value of this indefinite integral?

Is the destination of a commercial flight important for the pilot?

Short story about space worker geeks who zone out by 'listening' to radiation from stars

How does it work when somebody invests in my business?

Is there a problem with hiding "forgot password" until it's needed?

Unreliable Magic - Is it worth it?

Flag only first row where condition is met in a DataFrame

Add one row to pandas DataFrameFilter dataframe rows if value in column is in a set list of valuesUse a list of values to select rows from a pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?Selecting a row of pandas series/dataframe by integer indexHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasDeleting DataFrame row in Pandas based on column valuer - reorder certain rows if condition is met

I have the following DataFrame df, which can be created as follows:

date_today = datetime.now().date()
days = pd.date_range(date_today, date_today + timedelta(19), freq='D')
x = np.arange(0,2*np.pi,0.1*np.pi) # start,stop,step
y = np.sin(x)
df = pd.DataFrame('dates': days, 'vals': y, 'is_hit': abs(y)>0.9)
df = df.set_index('dates')

And which looks like this:

 is_hit vals
dates 
2019-03-27 False 0.000000e+00
2019-03-28 False 3.090170e-01
2019-03-29 False 5.877853e-01
2019-03-30 False 8.090170e-01
2019-03-31 True 9.510565e-01
2019-04-01 True 1.000000e+00
2019-04-02 True 9.510565e-01
2019-04-03 False 8.090170e-01
2019-04-04 False 5.877853e-01
2019-04-05 False 3.090170e-01
2019-04-06 False 1.224647e-16
2019-04-07 False -3.090170e-01
2019-04-08 False -5.877853e-01
2019-04-09 False -8.090170e-01
2019-04-10 True -9.510565e-01
2019-04-11 True -1.000000e+00
2019-04-12 True -9.510565e-01
2019-04-13 False -8.090170e-01
2019-04-14 False -5.877853e-01
2019-04-15 False -3.090170e-01

I want to flag the rows where the is_hit condition is True for the first time, such that the expected new column hit_first would be:

 is_hit vals hit_first
dates 
2019-03-27 False 0.000000e+00 False
2019-03-28 False 3.090170e-01 False
2019-03-29 False 5.877853e-01 False
2019-03-30 False 8.090170e-01 False
2019-03-31 True 9.510565e-01 True
2019-04-01 True 1.000000e+00 False
2019-04-02 True 9.510565e-01 False
2019-04-03 False 8.090170e-01 False
2019-04-04 False 5.877853e-01 False
2019-04-05 False 3.090170e-01 False
2019-04-06 False 1.224647e-16 False
2019-04-07 False -3.090170e-01 False
2019-04-08 False -5.877853e-01 False
2019-04-09 False -8.090170e-01 False
2019-04-10 True -9.510565e-01 True
2019-04-11 True -1.000000e+00 False
2019-04-12 True -9.510565e-01 False
2019-04-13 False -8.090170e-01 False
2019-04-14 False -5.877853e-01 False
2019-04-15 False -3.090170e-01 False

asked 18 hours ago

JejeBelfort

6591624

add a comment |

I have the following DataFrame df, which can be created as follows:

date_today = datetime.now().date()
days = pd.date_range(date_today, date_today + timedelta(19), freq='D')
x = np.arange(0,2*np.pi,0.1*np.pi) # start,stop,step
y = np.sin(x)
df = pd.DataFrame('dates': days, 'vals': y, 'is_hit': abs(y)>0.9)
df = df.set_index('dates')

And which looks like this:

 is_hit vals
dates 
2019-03-27 False 0.000000e+00
2019-03-28 False 3.090170e-01
2019-03-29 False 5.877853e-01
2019-03-30 False 8.090170e-01
2019-03-31 True 9.510565e-01
2019-04-01 True 1.000000e+00
2019-04-02 True 9.510565e-01
2019-04-03 False 8.090170e-01
2019-04-04 False 5.877853e-01
2019-04-05 False 3.090170e-01
2019-04-06 False 1.224647e-16
2019-04-07 False -3.090170e-01
2019-04-08 False -5.877853e-01
2019-04-09 False -8.090170e-01
2019-04-10 True -9.510565e-01
2019-04-11 True -1.000000e+00
2019-04-12 True -9.510565e-01
2019-04-13 False -8.090170e-01
2019-04-14 False -5.877853e-01
2019-04-15 False -3.090170e-01

I want to flag the rows where the is_hit condition is True for the first time, such that the expected new column hit_first would be:

 is_hit vals hit_first
dates 
2019-03-27 False 0.000000e+00 False
2019-03-28 False 3.090170e-01 False
2019-03-29 False 5.877853e-01 False
2019-03-30 False 8.090170e-01 False
2019-03-31 True 9.510565e-01 True
2019-04-01 True 1.000000e+00 False
2019-04-02 True 9.510565e-01 False
2019-04-03 False 8.090170e-01 False
2019-04-04 False 5.877853e-01 False
2019-04-05 False 3.090170e-01 False
2019-04-06 False 1.224647e-16 False
2019-04-07 False -3.090170e-01 False
2019-04-08 False -5.877853e-01 False
2019-04-09 False -8.090170e-01 False
2019-04-10 True -9.510565e-01 True
2019-04-11 True -1.000000e+00 False
2019-04-12 True -9.510565e-01 False
2019-04-13 False -8.090170e-01 False
2019-04-14 False -5.877853e-01 False
2019-04-15 False -3.090170e-01 False

asked 18 hours ago

JejeBelfort

6591624

add a comment |

I have the following DataFrame df, which can be created as follows:

date_today = datetime.now().date()
days = pd.date_range(date_today, date_today + timedelta(19), freq='D')
x = np.arange(0,2*np.pi,0.1*np.pi) # start,stop,step
y = np.sin(x)
df = pd.DataFrame('dates': days, 'vals': y, 'is_hit': abs(y)>0.9)
df = df.set_index('dates')

And which looks like this:

 is_hit vals
dates 
2019-03-27 False 0.000000e+00
2019-03-28 False 3.090170e-01
2019-03-29 False 5.877853e-01
2019-03-30 False 8.090170e-01
2019-03-31 True 9.510565e-01
2019-04-01 True 1.000000e+00
2019-04-02 True 9.510565e-01
2019-04-03 False 8.090170e-01
2019-04-04 False 5.877853e-01
2019-04-05 False 3.090170e-01
2019-04-06 False 1.224647e-16
2019-04-07 False -3.090170e-01
2019-04-08 False -5.877853e-01
2019-04-09 False -8.090170e-01
2019-04-10 True -9.510565e-01
2019-04-11 True -1.000000e+00
2019-04-12 True -9.510565e-01
2019-04-13 False -8.090170e-01
2019-04-14 False -5.877853e-01
2019-04-15 False -3.090170e-01

I want to flag the rows where the is_hit condition is True for the first time, such that the expected new column hit_first would be:

 is_hit vals hit_first
dates 
2019-03-27 False 0.000000e+00 False
2019-03-28 False 3.090170e-01 False
2019-03-29 False 5.877853e-01 False
2019-03-30 False 8.090170e-01 False
2019-03-31 True 9.510565e-01 True
2019-04-01 True 1.000000e+00 False
2019-04-02 True 9.510565e-01 False
2019-04-03 False 8.090170e-01 False
2019-04-04 False 5.877853e-01 False
2019-04-05 False 3.090170e-01 False
2019-04-06 False 1.224647e-16 False
2019-04-07 False -3.090170e-01 False
2019-04-08 False -5.877853e-01 False
2019-04-09 False -8.090170e-01 False
2019-04-10 True -9.510565e-01 True
2019-04-11 True -1.000000e+00 False
2019-04-12 True -9.510565e-01 False
2019-04-13 False -8.090170e-01 False
2019-04-14 False -5.877853e-01 False
2019-04-15 False -3.090170e-01 False

asked 18 hours ago

JejeBelfort

6591624

I have the following DataFrame df, which can be created as follows:

date_today = datetime.now().date()
days = pd.date_range(date_today, date_today + timedelta(19), freq='D')
x = np.arange(0,2*np.pi,0.1*np.pi) # start,stop,step
y = np.sin(x)
df = pd.DataFrame('dates': days, 'vals': y, 'is_hit': abs(y)>0.9)
df = df.set_index('dates')

And which looks like this:

 is_hit vals
dates 
2019-03-27 False 0.000000e+00
2019-03-28 False 3.090170e-01
2019-03-29 False 5.877853e-01
2019-03-30 False 8.090170e-01
2019-03-31 True 9.510565e-01
2019-04-01 True 1.000000e+00
2019-04-02 True 9.510565e-01
2019-04-03 False 8.090170e-01
2019-04-04 False 5.877853e-01
2019-04-05 False 3.090170e-01
2019-04-06 False 1.224647e-16
2019-04-07 False -3.090170e-01
2019-04-08 False -5.877853e-01
2019-04-09 False -8.090170e-01
2019-04-10 True -9.510565e-01
2019-04-11 True -1.000000e+00
2019-04-12 True -9.510565e-01
2019-04-13 False -8.090170e-01
2019-04-14 False -5.877853e-01
2019-04-15 False -3.090170e-01

I want to flag the rows where the is_hit condition is True for the first time, such that the expected new column hit_first would be:

 is_hit vals hit_first
dates 
2019-03-27 False 0.000000e+00 False
2019-03-28 False 3.090170e-01 False
2019-03-29 False 5.877853e-01 False
2019-03-30 False 8.090170e-01 False
2019-03-31 True 9.510565e-01 True
2019-04-01 True 1.000000e+00 False
2019-04-02 True 9.510565e-01 False
2019-04-03 False 8.090170e-01 False
2019-04-04 False 5.877853e-01 False
2019-04-05 False 3.090170e-01 False
2019-04-06 False 1.224647e-16 False
2019-04-07 False -3.090170e-01 False
2019-04-08 False -5.877853e-01 False
2019-04-09 False -8.090170e-01 False
2019-04-10 True -9.510565e-01 True
2019-04-11 True -1.000000e+00 False
2019-04-12 True -9.510565e-01 False
2019-04-13 False -8.090170e-01 False
2019-04-14 False -5.877853e-01 False
2019-04-15 False -3.090170e-01 False

python pandas dataframe

asked 18 hours ago

JejeBelfort

6591624

asked 18 hours ago

JejeBelfort

6591624

asked 18 hours ago

JejeBelfort

6591624

asked 18 hours ago

JejeBelfort

6591624

asked 18 hours ago

JejeBelfort

6591624

add a comment |

4 Answers
4

active

oldest

votes

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered 17 hours ago

ecortazar

7867

add a comment |

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited 17 hours ago

answered 17 hours ago

jezrael

352k26316391

add a comment |

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited 17 hours ago

answered 17 hours ago

Scott Boston

57.1k73258

2

Nice, maybe performance in large data should be interesting.

– jezrael
17 hours ago

add a comment |

-1

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited 7 hours ago

answered 17 hours ago

Loochie

776310

1

The use of np.where here is quite pointless.

– miradulo
11 hours ago

Yes I understood. Thanks :)

– Loochie
9 hours ago

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
9 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55377130%2fflag-only-first-row-where-condition-is-met-in-a-dataframe%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered 17 hours ago

ecortazar

7867

add a comment |

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered 17 hours ago

ecortazar

7867

add a comment |

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered 17 hours ago

ecortazar

7867

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered 17 hours ago

ecortazar

7867

answered 17 hours ago

ecortazar

7867

answered 17 hours ago

ecortazar

7867

answered 17 hours ago

ecortazar

7867

add a comment |

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited 17 hours ago

answered 17 hours ago

jezrael

352k26316391

add a comment |

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited 17 hours ago

answered 17 hours ago

jezrael

352k26316391

add a comment |

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited 17 hours ago

answered 17 hours ago

jezrael

352k26316391

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited 17 hours ago

answered 17 hours ago

jezrael

352k26316391

edited 17 hours ago

answered 17 hours ago

jezrael

352k26316391

answered 17 hours ago

jezrael

352k26316391

answered 17 hours ago

jezrael

352k26316391

add a comment |

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited 17 hours ago

answered 17 hours ago

Scott Boston

57.1k73258

2

Nice, maybe performance in large data should be interesting.

– jezrael
17 hours ago

add a comment |

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited 17 hours ago

answered 17 hours ago

Scott Boston

57.1k73258

2

Nice, maybe performance in large data should be interesting.

– jezrael
17 hours ago

add a comment |

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited 17 hours ago

answered 17 hours ago

Scott Boston

57.1k73258

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited 17 hours ago

answered 17 hours ago

Scott Boston

57.1k73258

edited 17 hours ago

answered 17 hours ago

Scott Boston

57.1k73258

answered 17 hours ago

Scott Boston

57.1k73258

answered 17 hours ago

Scott Boston

57.1k73258

2

Nice, maybe performance in large data should be interesting.

– jezrael
17 hours ago

add a comment |

2

Nice, maybe performance in large data should be interesting.

– jezrael
17 hours ago

Nice, maybe performance in large data should be interesting.

– jezrael
17 hours ago

add a comment |

-1

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited 7 hours ago

answered 17 hours ago

Loochie

776310

1

The use of np.where here is quite pointless.

– miradulo
11 hours ago

Yes I understood. Thanks :)

– Loochie
9 hours ago

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
9 hours ago

add a comment |

-1

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited 7 hours ago

answered 17 hours ago

Loochie

776310

1

The use of np.where here is quite pointless.

– miradulo
11 hours ago

Yes I understood. Thanks :)

– Loochie
9 hours ago

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
9 hours ago

add a comment |

-1

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited 7 hours ago

answered 17 hours ago

Loochie

776310

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited 7 hours ago

answered 17 hours ago

Loochie

776310

edited 7 hours ago

answered 17 hours ago

Loochie

776310

answered 17 hours ago

Loochie

776310

answered 17 hours ago

Loochie

776310

1

The use of np.where here is quite pointless.

– miradulo
11 hours ago

Yes I understood. Thanks :)

– Loochie
9 hours ago

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
9 hours ago

add a comment |

1

The use of np.where here is quite pointless.

– miradulo
11 hours ago

Yes I understood. Thanks :)

– Loochie
9 hours ago

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
9 hours ago

The use of np.where here is quite pointless.

– miradulo
11 hours ago

Yes I understood. Thanks :)

– Loochie
9 hours ago

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
9 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrhyu

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

4 Answers
4

4 Answers
4

4 Answers
4