wrong

Rayon Scheduler Deep Dive (2020.01.23)

so let's see I figured I would just kind

of try what's happening and we'll go

we'll kind of take it I know it's the

word just naturally um go where it goes

so let me get the branch pulled up let's

target to it to probably the place to

start is let's start with let's start

with where I'll start with what happens

when well I guess we'll start way back

just to make sure we're all on the same

page but real briefly like fools I was

trying to go through and four right of

which is basically like when when you

wake up right when when a thread is

blocked on some other task completing

and that task completes we want to wake

up just that one thread which somewhat

amazingly Iran doesn't do today and we

want to when new work arrives start a

relatively small number of workers in

response a proportional number of

workers in response or something and so

forth basically random scheduler today I

just kind of gone off and we want to

make things more proportional everyone I

do it all by being fast but probably bad

so the key one key thing in the code is

this I'm going to navigate in here if

you

and rayon core source lead counters

actually long

well I'll describe the code as it is I

thought of one small change I would

probably make as I was reviewing the

algorithm but we can discuss it this is

kind of a key data structure this atomic

donors this is like a central it's the

central state for the scheduler and it

has four things packed into one 64-bit

word and the reason for that is we want

to be able to modify them atomically all

at once so one is the idle or the

easiest ones to explain are the idle and

sleeping threads and basically what this

is counting is just the number of

threads an idle thread is one that is

blocked on some lash and looking for a

work and a sleeping thread is one that

is asleep and actually in this case idle

is a superset of sleeping at least in

the data structure that doesn't have to

be that way a bit but it just is so in

other words we increment your thread

kind of incremented the counter when it

goes idle and then if it goes to sleep

it increments the sleeping thread but it

never it doesn't decrement the idle

until until it finds work so incremented

when you are searching for work

decrement is when you find it

incremented when you go to sleep we'll

talk about that process decremented and

one minor detail is that when you go to

thread well we'll see this later but

when you go to sleep you increment this

counter that says you're asleep and when

you get woken up the only way they wake

up from being asleep with somebody else

wakes you so when you get waking up the

one that waits you're decrement

that that's a total probably irrelevant

micro optimization to keep the counter

just a little bit faster and more in

sync with who's waking up that makes

sense so far I guess okay so the last

part of these two words the jobs and

sleepy counters and they're basically a

mechanism for detecting when new work is

published in a lightweight way oh I'll

cover them in a bit but the high-level

idea is that the sleepy counter gets

incremented when I sweat gets sleepy

which means it's it's planning to go to

sleep but it hasn't yet it's going to

search one more time or so before it

does and the jobs counter gets

synchronized with the sleepy counter so

that means it become when new work is

published so the idea might be this if

this if the job isn't sleeping on are

equal then nobody has tried to go to

sleep since new work since the work was

since since work was made available and

so there's no work that gets done but if

someone says that they're sleepy then

when we make new work will increase the

job's counter to match so if the two

counters are equal nobody has gotten as

sleepy welcome back to this since work

was published but when if they are not

when new work is published they are made

equal okay this is kind of the key

that's one of the key data structures

the other one is here in sleep mod so

the the worker sleep states

which is basically a array for every

worker it has a mutex and and the

boolean that is false there that is set

to true when the worker goes to sleep

we'll come back to and set the false

when the worker is woken up so this is a

way we used to not have per worker

states but having these per having a

mutex per worker means that if you want

to wake up a specific worker you can do

that versus very easily with and if two

workers are being woken up they don't

contend with line in there anything else

I guess the latches let's see four

latches a latch in way on terminology I

don't know if other people use this or

if I just made it up I can't remember

it's a probably somebody uses it I hope

a one time and then basically so old

like a barrier but it won't be read

won't be happening multiple times so

it's not set and then it gets set and we

accept then something has happened

that's interesting and so allot just

basically go from on set and set and

each latch is created by some thread

that is waiting for that event and there

are various kinds of ledges they all

wrap like or latch more or less so the

four latch is just a wrapper on an

atomic you size it has three state or

four states we'll talk about them in a

little bit but unset sleep easily being

inside that are used to UM track

what's happening so unset means the

event has not happened and owning thread

is awake

sleepy means the owning thread is going

to sleep but hasn't yet sleeping the

owning thread is blocked and must be

awoken when the latch is set and set

means the latch has been set event has

happened so idea is when you come to set

the lock you can kind of if it's unset

you don't have to do anything but if

it's in one of these two other states

you might have something you have to do

especially sleeping so those are the

data structures let's forget you're

using atomic you 64 which is not

supported by all targets how do we feel

about that it's a good point I don't

know I don't care

maybe I do we could plausibly we might

be able to get to squeeze into fewer

bits but I'm not sure um are there any

realistic targets that don't support it

MIPS 32-bit MIPS okay the film I think

also does not wait a second for that

kind of tiny architecture how many

threads you're expecting to have well

okay

probably not you sixteen number threads

but we need to either condition-wise the

code for that or I mean we need to

decide how we want to deal with that in

some way yeah it seems like if we it

seems like we could use fewer bits and

be fine

but let's let's table it but at minimum

it would like the most obvious thing

would be that we use 2 to the 8 bits for

each one or 8 bits rather which is still

256 turns which is plenty the jobs and

sleepy counters could get fewer

potentially though it might impact

I don't think addiction has makes

beginning difference we have the

rollover as an interesting question we

want to talk about and it would make

rollover more common but I don't think

rollover is particularly expensive okay

it's a good catch stuff risk 32 is

another one so I mean I think of 32 the

targets I 686 and the main arm targets

do support 64-bit Atomics but it may

also be the 32 bit is faster in some

cases I don't know if that's actually

June probably yeah okay let's let's keep

going but come back to that this I think

we could switch to 32 bit without too

much pain if we're willing to put a

lower limit on the number of threads or

it may be 32 bit for 32 bit our targets

in let it be 64 bit or others right um

so let's keep going into events anyway

hi it's Thomas right yep you missed you

missed some of the new structures but I

think we'll be going back over them so

it's fine you can go and I did also read

the code that did bits of it so let's

see we can start with like it's the most

complicated but it's probably the in

some ways the main thread is what

happens when a thread goes idle um and

then I think the other things kind of

fit into that so I think it's probably

the best place to start and so the idea

is if we circle back up to register

three the main routine for this is this

wait until not stopped there is this

wait until which basically says I've got

a latch and I'm gonna wait and look for

work until the latch is set and I

noticed one minor present one minor

thing about the definition of of Idol

which is kind of interesting about wait

until but which is that if you're using

join it doesn't call wait until right

away it first drains the local local set

of tasks the local decks I guess and

then calls wait until whereas the scope

method calls wait until right away even

if local deck has tasks so it's a slight

difference between two I kind of lean

more towards that joins definition of

idle like in other words if you just

create a scope and push two tasks and

you consider yourself idle even before

you've like tried to work through those

two tasks whereas join doesn't consider

itself idle until the work you created

locally is done does that make sense

um it's waited till does that still try

to steal from its own queue it does so

it's not it will do the same thing in

terms of first it will steal from its

own queue only difference is that be

like idle counters get incremented so it

might have a difference it might cause

different behavior in terms of will see

you later

whether we wake someone from sleep or

not but I don't all the benchmarks that

were actually having problems we're

doing the join behavior and not the

scope behavior and that would be the one

that I think is more correct so I don't

actually think it's that big in deal

what's the scope behavior used for in

the trio code

what do you mean by you is used for

scope do you mean like which benchmarks

you scope or alright oh I see okay less

for that curve little iterators a bit

fun join the scope is used in some

things we care about like the Firefox

server stuff yeah Firefox the tsp

benchmark uses scope so so what is wait

until do so it it start it starts by it

has this first round where it's kind of

um certain uh see I'm going to put out

the flow it's like a long little things

a step one is it's going to be searching

for some number of rounds and each

search each search is like this is where

the thing you were asking Josh like it

takes a local job first and then it

tries to steal and then it pops the

injected job um and you know if it finds

work we execute it and go back to step

one are we we leave the idle state

execute it the work and go back to step

one so that happens here so the idle

state is represented by this like idle

state variable and it's actually a value

kind of that gets consumed and produced

and it's not copy that's like leaning on

the type system a tiny bit while that

value is live you're in the idle state

sort of speak but let's see in idle

state so when you start looking we're

just gonna add to the counter to

indicate we have an idle thread that's

all it really happens when you call

start looking

maybe step one is enter the aisle state

step 2

search and so each time around while we

search we're going to call if we don't

find work we're going to call no work

found and that's going to basically just

count how many rounds we've been

searching and at some point it gets to

it becomes it gets to this sleepy state

on the sleepy state what it does but the

idea of sleepy is we're gonna do one

more search and then and then we're

going to start to actually go to sleep

remember if there's a reason you have to

have a sleepy but I don't find the

function that's getting cold with no

work found a schedule or yield or

something what's that this is the

function that announces sleepy um but

what it does is it basically reads

increments the sleepy counter like just

called you have a sleepy Connor and

stores the old value and this is

something I would like to change

actually I won't go into the change I

want to make that though but the idea is

now we've signaled that if all that the

only effect that incrementing a sleepy

counter has we're gonna see later but it

has an effect on when work is published

so step four would be search some more

and

[Music]

Step five

okay if if we didn't when we're done we

call this sleep routine this is where we

do the actual work to go to figure out

how to go to sleep and so one thing what

change that's happened here that I

didn't mention explicitly when we are

becoming idle we pass in the latch that

we're waiting on we've been used to do

this in the old man but so we know know

the latch that were that were blocked on

so that when that should trigger it can

it can wake us up so in step five we're

going to you know to a bunch of steps

there's kind of two separate

synchronization things happening the

first one is that we set the latch to

the sleepy state and I realize now that

it's a little confusing so I you sleepy

in kind of two ways the I mean so we're

telling the latch that okay our thread

is getting sleepy then we acquire our

threads lock and then we we try to set

the latch to the sleeping State and the

reason we try is that someone may have

signaled the lodge in the meantime

before we actually acquired our lock or

while we were acquiring our lock in that

case they would have stay would have

changed the state to set because they

would have said it so if the latch was

changed abort falling asleep basically

wake up from idle state so the reason

they would wake up is that there's no

more reason to block the thing we were

waiting for has happened we can stop um

but if we succeed then the latch is now

in the sleeping State and that means

that the next person who sets the lock

or a sense the latch is going to know

they have to actually wake us up and you

have to do it in this kind of two-phase

way because if you don't doesn't work

there might be some other way but this

is how we detect like a race between

why do I need the sleepy state I do

there's some kind of race that it's not

immediately obvious to you might know

but that if you were to make sure that

well you need to know if somebody said

it I need to know it I guess you don't

want a work item to be submitted as the

thread is going asleep because it won't

wake up them yeah

but the work item is coming next this

latch what it would be would be I'm

going to sleep and see hey you don't

need it

actually but I feel like you do I feel

like there's an obvious problem and I

rediscover it periodically if the latch

is set to wake up a thread if you go to

sleep do you just immediately wake up

again well what happens is we can look

at that code so what happens is when the

latch somebody's caught somebody to wake

up the latch somebody has to call set

right and what happens when you call set

is let's say it's a spin latch or

something it calls for latch said and

what that does the core latch method on

upset

let's count latch that's something else

where is this there it is

it will swap with the set value um and

then it will check that the thing is

sleeping so it basically sets the value

to set atomically and what I'm trying to

figure out is why we need a D sleeping

I'm not sure maybe you don't need it but

in any case what's gonna happens oh yeah

when they come in they're gonna set the

value to the set constant and then if

you were already in the sleeping State

so if this had happened then they'll

come and wake you and for sure what's

definitely required is that you acquire

the lock before you set it to the

sleeping State because otherwise they

might observe this acquire the lock lock

wake you up

and you haven't even gotten around to

when you sleep yet um so maybe this

sleepy state isn't actually required I'm

not sure I think that talked about an

RFC off the truck but regardless the

point is we get the lock on our thread

we set this thing to sleeping if that

succeeds we can keep going because now

we want to check whether the wrong file

again I want the sleep file now we want

to check whether there's been any work

published since we got sleepy basically

and so what we do here is we have this

little loop um actually I'm gonna put a

link to just to keep the bits of code

lined up so what we'll do is we'll load

the current value of the counters and

we're gonna compare whether the sleepy

counter as we saw it like whether the

jobs counter has changed since we got

sleepy so I didn't talk about what about

happened yet but idea is when we

published new work if the sleepy counter

is higher than the jobs counter we're

going to increment it up so if the if

the job's kind of exchange that we got

sleepy the new work has been published

since we got sleepy so we can return

back to step three I think so if we look

here we say if new work is published

we're gonna call wait partly I just

resets our counter back to the sleepy

level um and so we'll go become sleepy

again we'll do some searching and if we

don't find work will become sleepy again

and keep going um so is this where it

wakes all like once you add new work it

wakes every thread so it can update

these counters and then

itself partly it doesn't wait this is

the new morality of them so it never

wakes everything okay so it's good like

wake the number of additional jobs and

needs like the difference between yeah

that this is the code for going to sleep

we'll I'll show you the code for waking

up yet but that's exactly right yes so

it when it comes to publish work if it

it will wake the number of it'll try to

get an estimate of how many threads that

should wake and then it will wake that

number which is basically yeah so the

last part is okay then we increment the

number of sleeping threads so that's

what happens here and you notice this is

conditional and there's a possible loop

that can go back around the reason is

this is sort of an atomic swap thing

happening so we read the counters here

and then we try to atomically increment

the sleeping threads and that means

somebody can't like when we exit this

loop

nobody kind of either that if somebody's

publishing new work they either adjusted

the jobs counter first or we adjusted

our sleeping threads first if they can't

come in between overlap with one another

so either we knew that new work was

posted or they know the worst leak and

that's important because otherwise they

might fail to wake us up that would be

bad so we load the job's counter see

load of C so either we see that new work

was posted or the new work or the people

posting the work will see that we are

asleep at least that's my contention um

so at that point we've successfully

registered ourselves as asleep we both

we both marked the latch that we're

sleeping and we marked that we have a

sleeping thread that's the two things we

have to do and then we can actually just

set this to true and go to sleep

and when we're done we wake up so

somebody's gonna come and signal us and

we'll see that in a second so somebody

will signal us to wake then we will wake

and when we wake we don't have to adjust

the we kind of undo some of what we did

we reset the latch state back to unset

unless it became set in the meantime we

set our counter back to zero we don't as

it happens

decrement the number of sleeping threads

because the other person is already

going to do it but you would otherwise

want to do that okay so that's the

sleeping procedure so let's see Phoebe

will say what happens when do a lot

let's do that job is published is that's

the most interesting one I'm assuming

you'll stop me when you have questions

please do so there's two ways jobs can

get published I originally only had a

slightly different code at some point in

this evolution now I don't so they're

wrappers around to come and help her um

you could probably go away but the idea

is when new work is coming in you know

you know the job that posted it although

I don't think that's relevant for

anything except debugging logs you know

how many jobs were posted usually it's

one there might be two and you know

whether the queue of jobs for that

thread was empty before the job was

posted that's kind of a heuristic the

idea is if the queue was empty then

there was kind of enough threads active

well either were at the beginning or

there are enough threads that they're

keeping the queue empty thirsty whatever

work had

they're got stolen whereas if the queue

is got stuck waiting in it then clearly

there's not enough threads because they

haven't stolen the work you published

yet um well not clearly it's a heuristic

but that's the idea so you know the

number of jobs that were created and you

know was the queue empty if not then

there are not enough idle threads to

steal work so when this gets called what

do we do okay actually before this is

called something happens before a new

jobs is called the jobs are pushed to

the queue that's important but they're

already in the queue otherwise bad

things can happen you might miss them um

but so what will happen here is we're

gonna load the counters do we know how

many sleeping threads or whatever else

they are we check if the jobs counter

and the sleeping counter are not equal

that means that that then some thread

got sleepy since the last job was posted

and we're gonna synchronize them so what

this sinc jobs counter does is it just

sets them to be equal and does it atomic

exchange and you notice it updates

counters with that might take a few

rounds so it has to read kind of reread

the value so when we're done basically

when we exit this loop or we exit this

line at this point we have an

instantaneous read of counters where we

know the number of sleeping threads etc

or idle sleeping threads and we have

notified any sleepy threads

that haven't yet become sleeping that

new work was published and we did that

by setting the job's counter equal to

the sleepy counter so now we can go look

and see okay how much work do we have so

if there's nobody asleep there's nothing

for us to do because there's no one we

couldn't even wake anyone up if we

wanted to

otherwise we look and see if the queue

is not empty then but this was some

tweaking I did I don't know how

important this is I've tweaked these

heuristics a lot but it makes some

difference it's basically you can kind

of tune this is where we decide how many

threads to wake up you can kind of tune

how aggressive you want to be but it's

always a bit of a guessing game

this current heuristic is it says okay

if there's stuff piled up in the queue

then I want to lean towards waking more

people up because they're not doing a

good job at keeping my queue empty so I

look at how many jobs I published and

the number of people that are asleep

take the minimum and wake up that many

people um otherwise my queue is empty so

maybe they're doing a good job then I'll

look at how many idle threads there are

and I'm going to assume that if there's

idle threads they're gonna pick off this

they're gonna they're gonna pick off

this job I don't need to wake up more

threads I already have an idle one who

is searching for work so I'll leave it

to that to them um and so I'll only wake

up more threads if there's not enough

idle threads to take care of the work

that I have now there's an obvious I

mean it's I know this is heuristic but

there could be multiple threads trying

to do this push at the same time yes

exactly so so there's a couple of ways

this can go wrong way how many threads

to week yeah you only have a sort of

partial picture and I think that was

exactly why I added this queue is empty

heuristic was that if there are multiple

threads pushing all at the same time

then like

they don't both get to claim all the

sleepers slightly yeah for the all the

idle threads I mean they can I mean sort

of

exactly overtime they should it

shouldn't stop right like your first

push before you your queue is empty you

push it on and you see an idle thread

and you think okay great he's gonna take

it and then your next push well it turns

out that idle thread got stuck taking

somebody else's work that you're not

aware of but on your next push your job

still isn't taken so now you're not you

know trust the fact that is idle anymore

and you just go and wake somebody up

because something's not right that was

the idea and I think I did notice

I don't remember we have to rerun I did

notice something right it's just it

would happen basically if idle threat

has woken up but is not updated at

counters yet which is possible sorry

what would happen in that case

so this this would only be inaccurate in

a case where the thread is woken up but

he hasn't yet updated encounters no the

other problem is that is not just one

thread there's not just to Brent

involved there are many threads right so

the case that Josh was saying is there

like if two of us are posting jobs we

can it can happen that we both see one

idle thread and each thread things I

only have one job so they think it's

okay

that idle throw a job this whole

function is basically concurrent right

this this talk to be executing on every

throw the ones yeah exactly exactly this

is maddening you know like I don't know

how you can't get rid of this tree off

um I can tell you can't get a global

view so you have to take a guess what's

happened

I think the pathological case would be

it if you don't push again so you don't

see that we changed Cuba's empty yes if

your jobs are very big big matters like

if you have a few big jobs it matters a

lot more than if you have smaller many

jobs because we're looking one it's not

great there is a there is another trick

to help with that case which I will show

you in a second um well actually maybe

I'll show you right now check forgot I

didn't mention it when we were well

maybe let's take a few notes first of

all so so have only the problem is

we have only an isolated picture it is

possible for example that two threads

will post work each thread sees or when

there is only one idle thread each

thread sees one idle thread decides not

to wake anybody now we have one thread

and two jobs um so there is there is it

there is one other hat to help with that

which I think I found when we go

scheduler and I thought was pretty

clever which is here if when you find

work that means you're transitioning out

of idle and into busy at that point you

decrement the idle counter but you look

if you were the last idle thread then

you wake up another thread and the idea

is exactly for this case that okay maybe

two threads came and so I'm just gonna

wake up one more and if it happens that

that thread doesn't find any work it'll

go back to sleep so you kind of get like

a little ripple but if if it happens

that you have work waiting then it's

gonna be a smoother transition so yeah

that's important then and it it means

there's gonna be a chain of latency for

this to take effect but it will

eventually take effect and that's right

probably there's two things to consider

I guess so let's see what happens when

you stop being idle if you are the last

idles guys wake up one more thread just

in case there's there's like a deadlock

condition and then the efficiency

condition and the deadlock condition is

that you never want to get you never

want to inject the job and nobody

notices um like everybody's asleep and

that we don't I think is impossible

that's where I kind of emphasize some of

these you know atomic conditions but the

other case of just there you inject the

job and people are awake but not as many

as you would like that's definitely

possible and that's what this is aimed

at reducing that so sort of meta comment

I think there are a lot of things that

you're bringing up and putting in the

document here that are

important points or caveats or or

talking about the interactions that

aren't fully reflected in comments in

the code so that needs to be translated

over that's part of the way I'm putting

them in the document I told I totally

agree uh who is the kind of code I you

know always chide people for submitting

in a px where a lot of the logic isn't

there but it's exactly what you write

when you're just hacking it up um yeah

so that's why I'm trying to think oh I

guess the last point we could move on

from this but the last thing I want to

show you is just the latch code let me

just show you to wake any threads

actually before we do anything else so

what how does wake any threats actually

work it's not the most efficient but I

don't think it matters at least I never

should see that any benchmarks um it

iterates through all the threads leaks

right now cuz we don't we don't really

have a central list of which ones are

asleep um could probably change that but

we don't so we just know that there are

sleeping threads we don't know which

ones there are so we iterate through we

acquire a for each one we invoke wake

specific lead which requires the lock on

the thread checks if it's asleep and if

so notifies the convar and this is where

I keep referring to this is what

actually does the decrement of the

number of sleeping threads and the

reason I put it here I did comment it in

this case uh it's a micro optimization

the idea is there's another possible

race condition where if you decrement

the sleeping thread only when the thread

itself actually wakes up and two people

are publishing jobs then one of them

might think that there's a like one of

them might think there are more threads

that are available to be woken up and

there actually are so the sooner you

decrement this counter the faster you

advertise to everybody else who's

considering whether to loop through and

look for more threads to wake up in fact

there's not anyone to find this it seems

to me that this loop is probably

okay if you have like 8 or 16 beds you

had 256 threads it might start to be a

bottleneck and we might want to replace

it with some central vector or some way

to know which indices you should wake up

that's kind of annoying to maintain sort

of a related point the the thread that

wakes up doesn't know who woke it up

right no so it seems like there

especially with the higher number of

threads it seems like that's something

we could optimize to have it if it knew

who woke it up it would also know who to

go looks trying to steal from first yeah

that's a good point so let's see first

searching through all threads to find

those that are asleep is not great and

secondly when a thread wakes it could be

told where the word is waiting for it as

much heuristic of where to start

searching yeah it wouldn't be like a

guarantee that someone else - blowing

write this up does this have a lock over

the you said it has a lock over the

particular thread that it's using but it

still means that that a thread could

steal from other people right it's just

it's just the lock for this yeah each

thread and this is one of the parts you

missed but each thread there's a lock

there's something called the workers

sleep states and there's basically a

vector of one per thread and they're

there each they each have out a lock

here it is that's just recording if

they're asleep or not it's not used for

anything else and doesn't affect who

they might stay in is that that's not

such by the worker at all it's only said

by the scheduler level stuff yeah yes

the worker actually by step by the

worker but the worker thread in the

scheduler code the worker thread says I

didn't find work and when it's going to

sleep it'll set that to true and then in

this code okay so the worker thread will

grab the v-tex yeah it's kind of a meta

scheduling variable it's not used by a

user code or visible to users

all right so okay so let's see back to

where we were

eristic how many threads to wig or the

code that does the wakeup

I just tickle that's this might old doc

know what we said wait yeah decrements

number of sleepers okay the only other

thing I didn't show you is latching but

actually latching is pretty like that's

the way to wake someone up is setting a

latch right we kind of think we covered

it and it's pretty simple it's simple

conceptually there were a few

refactorings I had to do along the way

but in order to guarantee this property

but basically since every latch has so

ideas

okay core latch isn't really exposed

very publicly but it just contains the

core machinery of how like this core

latch flag that goes from unset to set

but it doesn't track other stuff like

who the owner is that gets tracked by

these outer rappers that you have a

slightly different way but the spin

latch is the most basic one it keeps a

hold of the registry but also the index

I guess I call it the target worker

instead of the owner but um so the idea

is when you get set

you're gonna go through you never you

never set you know call the core latch

set directly that's private that you go

through these outer layers and they'll

just call they'll call this method

notify worker latches set which winds up

going down to a wake up the specific

worker

[Music]

is there a race here between being woken

for your latch and being woken for a

general work or there's a kind of race

in that you could someone could set your

latch and wake you up at the same time

that the that someone else is posting

work and you're considered to be asleep

as part of their calculation because you

haven't yet like incremented your

counter or whatever or decrement the

number of sleeping threads counter but

all that would cause the only thing that

would do is cause them to overestimate

how many sleeping threads there are

which might then lead to them searching

for a sleeping rather they never find

I'm actually look I think I'm a little

more wondering about the other direction

where if you get woken as a as a

sleeping thread for available work and

then you're Josh is posted or near the

same time I see so you might wake up for

new work but actually go process your

latch instead because you also see that

that could definitely happen yeah so so

you could be awoken for new work let's

see so yeah you could be awoken for new

work and then have the latch get set and

then you exit the idle loop because your

latch is sex and never steal I don't

think there's much mitigation against

that except for this mechanism of waking

another idle thread if you happen to be

the last one okay so that will still go

through the the last idle worker thingy

yes right yeah I think so I mean let's

let's verify it but I think so

what happens is I get river that is so

[Music]

what happens is you're in that case

you're yeah well because if you fell

asleep you're in the no work found

callback and so your idle state is still

like a ballad value so then we'll we'll

come around here when we get woken up

we'll see that the latch is set so it

will exit the while loop and will invoke

work found which is which is exactly

where that code lies so it's not ideal I

mean the last idle worker I don't I

guess I don't have a good idea of

exactly how effective it is in that well

there might you might have wished that

you you might have another idle worker

out there and so you might not be the

last one even if you were designated to

pick up that work I think it'll fall in

the same trap of like it there might be

slightly worse latency than we could

have but right but we won't actually

stall for this that's right so let me

talk about the logging a little well we

can talk about interesting races too but

I might mention the logging because it

it was designed to let me test some of

the hypotheses around what kinds of

racing might be occurring it's it's not

too complicated so I added this I don't

and I don't have the measurements of

something here do I know but I added

this log mechanism the idea is that it's

very simple-minded

if it's enabled it's gonna start up a

thread which will do the actual work so

if enable starts a thread a logging

thread separate logging thread and each

worker thread

events over a crossbeam channel one of

those multi producer multi consumer

aware channels this one it's our thread

pool or global / thread pool I was

originally gonna do something much more

clever where like you would accumulate

threads at your local thread local stage

and then some but this seemed fast

enough like in particular running with

this enabled at least for the benchmarks

I was looking at did not affect I'll use

um so that seemed good enough though I

did later find other benchmarks where

they get affected uh running disabled is

gonna be the general case more important

too but yes so it's actually right now I

think it gets compiled out in the case

I'm not if you build release like

there's like this this log enabled is a

hard constant so so it'll tell it's not

an environment constant well it's like

okay it's a compile time config okay you

know so that that is maybe a little

overly conservative in the sense that I

don't know I've never seen any impact if

it's not enabled even if it's compiled

in uh but you know that's how it is

right now um it's kind of annoying

actually because then you can you know

you have to recompile to use it so the

point is it sends these messages it has

a few different variants but the

interesting one was probably profile it

sounds promising

and what that tries to do is see yeah

basically it receives these events and

it simulates the state of the of the

world at each point so it tracks the

events might say like I started a new

thread I terminated a thread I posted

work my thread went to sleep these kind

of major state changes and so it'll

adjust various counters were like okay

then it's kind of reproducing the logic

I guess but

in that case I know how many threads

were asleep at this time and I know how

many were awake and then the idea was

that I you can produce statistics like

whenever a new job is posted how often

does it happen that a new job is posted

and no threads are idle to to look for

it or how often does it happen that

where's the average number of idle

threads when a new job as opposed to

things like that um and it's not 100%

perfect but this is always kind of an

approximation and I was always gathering

statistics like that I think the

analysis is in a different crate but

here's the simulator I'm talking about

so for each event it just kind of

adjusts the thread state like okay it

was idle now it's working now it's

terminated here's the local queue size

so when a job gets popped and adjust our

local queue size here's the injector

size maybe the Fed went to notify the

other thing I was curious about was like

how long in between the time that a

threat is woken up in the sense that

it's convar is notified and the time

that it actually wakes up and starts to

do work is that a noticeable like is is

that a long time and the answer is

sometimes

that's a lot of time stamps - or I I did

originally I think I removed that code

look because I found that they weren't

that interesting like I originally I was

using some x86 are DTSC attraction and

logging time Siam's but in practice the

difference in the statistics when I just

use the index in the vector versus the

timestamp was nil so I took it out

because it's one less dependency the the

actual logic does something like

accumulates a hundred events in a vector

yeah here it is it sits there and builds

up sorry there it is right here it

builds up a vector to a capacity if it

runs if the vector gets full and it

dumps stuff into a file and then it

jumps back so I don't know how important

that is but that was the idea that the

most of the time you're just pushing on

a vector and then you

jerry a bunch of data and dump it out

all at once um hopefully after the

evening or sometimes after that anyway

so i did generate these statistics and i

don't recall

unfortunately what if i have my notes i

don't recall what the values were I

think I left some comments on the RSC

good but we could rerun and see and it

might be interesting to test some of

these hypotheses like where time might

be going but I think my conclusion from

it in the end

I couldn't find any obvious culprit and

it felt to me like what was happening

was we were doing a good job of keeping

idle threads at least in the in the test

cases that were slower I was we were

doing a good job of keeping idle threads

awake I'm looking for if I had any

measurements but um just not as many

which is kind of what we want like the

main difference between this and the

normal rayon was the number of idle

threads and then ammonia was just more

which is exactly we're trying to reduce

and you can easily imagine that you have

more threads all searching the latency

till they steal a job is going to be

lower because there are more of them and

they're more likely it's more likely

that one will find it they're all going

round-robin basically at different

points and that seemed to be what

accounted for the difference and I felt

like that's kind of inherent but they

were storming every time there was a new

job anywhere right it is you know

maximally you're over provisioned as a

prudent certain value um I'm curious

where this way I'm plot I may be this is

it so I also was producing plots but

this may be I never pushed the code into

I should go find my old laptop

hopefully it's not gone forever I guess

I could rewrite it it wasn't super

amazing but you knowing it's totally

plausible that I never did I probably

was living on my server that is now

completely dead which is unfortunate

um can we take a quick break yeah sure

you

welcome back thank you now I'm wondering

where did that could go but wait a

second this race condition means that we

could have a job in multiple queues know

which race condition this last time yes

no so what happens is the job will get

pushed onto the queue of the thread that

produced it and then the question is

whether that thread will wake up other

threads to steal the job and it might

opt not to because it sees idle threads

around but those idle threads but it

doesn't wait again it has to sort of

guesstimate is how many other jobs are

getting pushed at that exact time but in

any case the job will only get pushed

into one deck the local deck and then it

can only get stolen once no matter what

happens okay like by some other worker

just because that's separate atomic

mechanism so wait a second you can steal

just once

hmm any given job can either be can be

executed at most once exactly once so it

could be stolen or it could be executed

by the threat that posted it by only

once

okay I'm wondering now as we say this if

there's some way no I don't know how

would work the general way to address

this last race condition would have to

be some I guess some way to estimate how

many other threads are kind of measuring

the overall system velocity of pushing

new tasks or something and taking a

guess based on that I don't know how

that would work there might be people

people who try these heuristics I think

I well I know I the the heuristic of

checking there was a paper I read once I

think on this topic and the heuristic of

checking your local thread queue to see

if it's empty or non came from there I

can remember what the name of the paper

was but I think their conclusion was

basically that they tried a lot of

complicated stuff in this simple

heuristic worked much better than all

that complicate stuff

I think they called it being hungry

threat was hungry if it's local cue

isn't or the opposite

it should platforms have you been

evaluating him only my I only ran it on

my desktop which had 24 cores between

you know 14 quarters in 28 hype records

um I probably ran on my laptop too but

that are those both Linux or no a lot my

laptop was a Mac it's now a known

running wsl on Windows I haven't done

any performance measurements on there um

I still have the Mac somewhere I could

whip it out on the windows when I'm not

too interested in WSL actually because

emulated kernel scheduling is gonna be

weird but native Windows would be good

to know yeah that I should do it I

haven't set up rust on native windows

yet which is a sad testament to my

diocese do you know if your W cell is

the old one or the new one it's double

yourself too okay if that is actually a

Linux kernel and I haven't tried to mesh

I also was kind of skeptical of how

meaningful the performance would be we

better to test it on real witness I know

that WS l1 is weird but yeah I'm trying

to think I think this covers all the

major changes there were a few other

incidental refactorings like we used to

have one target latch I made it one per

thread precisely so that it hadn't each

one had an owner like a well-known owner

before there wasn't such a thing I was a

pretty minor thing though I think one

thing about vlogging I think we should

make it clear that this is a debugging

tool and not part of

semantic interface yeah I would agree

with that

I think the configuration yeah sort of

debugging a measurement tool not

something to be relied on

or maybe just debugging even I mean I

actually like clarifying for users that

the semantic interface part of it that

like this is something we will feel free

to change in later versions yes

nope I considered vaguely whether we

would commit to the idea of there will

be some tool some flags that emit data

and there will be some other tool that

can parse this data and do things with

it

but I'd rather at this at this time I

ask you no reason to even say that

just it at the moment we have this

available we do not promise it will

remain available or in its current form

I mean I it seems logical that we would

want something available without

committing to what exactly that is right

I guess it does seem like a useful thing

to be able to say take your app set this

environment variable and then like like

if you're having if you want to reason

about your I'm going to curl as if

you're getting you can you can do it

easily enough um and we have tools to

help you understand the output that

seems like a cool future but right

that's kind of why I asked about the

timer is because uh for app dividing the

timer is more useful because I care

about if I have parallelism for most of

the time and not so much yeah and in

that case the other thing I would say is

that I'm not sure well I guess in

general I don't know if what I said up

here is what you would want for

debugging your app like maybe there's

stuff I can actually think not so like

it could be that we want to limit the

scope of this that were you Frederick

Frederick Wagner's thing might be closer

to Frederick Wagner well anyway Wagner s

we currently use a thing that basically

logs the the start in the end of every

task we have and I'm but actually the

does let us easily if you're just like

instead of gathering sad summer if you

look at it but I can list this from our

I can tell you that like the multiple

threads waking up idle had the same time

it never happens through our use case we

always get the right number of birds

woken up yeah it seemed like the main

use cases that suffer all they had this

characteristic it was like the quick

sword and and the parallel sort maybe I

was characteristic of just dumping a lot

of jobs they would also need to be I

think to have this happen you'd actually

have to dump a lot of jobs not at the

same time but like if you if you

basically dumped a ton he chose jobs

dumped a ton of jobs at the same time

then it did a very good job did that

would be the there was somewhat uneven

there wasn't like a constant stream of

jobs that would be fine I think because

then we just things just won't go to

sleep but it was more that they would

come in bursts and it would be enough

time between them that things start to

go to sleep and then it just doesn't

scale up fast enough that Firefox stuff

might actually fit that sort of bursty

profile it would definitely be worth

measuring yeah I mean regardless we

should you should measure the Firefox

case so I guess I if I want to go back

to the this this the sleepy state so

that heuristic will you compared a job

counter to this the sleepy counter is

that like the only place that's used

outside of the worker brow in terms of

scheduling yes

it's not I would be happy to entertain

alternative designs but the goals there

was that most of the time when you

publish work you're not doing any rights

or if they're I shouldn't say most but

if there aren't sleepy threads you're

not doing any rights so if you're just

constantly publishing work and the idea

then is that the counter is just remain

equal and you just keep going but uh but

if they are then you do one thing a

little confused about that

that's comparison is that it makes the

sense to me like the case we're like you

know job job counter is zero and sleepy

counter is zero or at like job you know

a sleepy corner or zero and job counter

is something but like it all it also is

false if it's like they're just

mismatched like job counters one in 63

is two and that seems that's you know it

seems to be a super rare case that that

would ever happen well it sounds kind of

should never get higher than the sleepy

counter except for a rollover which we

should discuss but the idea is that it

goes up to the sleepy counter but never

above so there really is only kind of a

binary state it's either not if if the

would there be pretty much any cases

where the job counter so the sleepy

counter says it there's there's up to

there's four threads and the sleepy

counters at two and then the jobs

counter is also a two is that in any way

we need a fourth isn't it's equal when

they're both two and I don't see how

that it's not it's meaningful in that it

means nobody has it's not it's not

really counting the number of threads

it's more like a clock so it's like two

threads went to sleep two threads got

sleepy over the course of this execution

and since that and jobs were posted at

some point after that happened after the

second thread custody

we also it's maybe necessarily on a

sleepy counter does it go back down on

the thread

actually sleep no okay I makes way more

sense okay I get it now I thought I

thought it was only counted when it was

sleepy and not sleep yeah it's poorly

named I think it should be called a

clock because it's monotonically

increasing and so I think what should

happen first of all I should we name it

if that will help I don't know but yes

to two things I think I would do first

as I would renamed to clock and second

is I think when you the procedure would

be when you become sleepy you read the

counters and store the old jobs counter

increment sleepy counter or roll over to

zero if necessary and when you are

trying to sleep you check if the jobs

counter has changed and the reason I

right now we store the old sleepy

counter but the reason I make this

distinction is I think for roll over

it's important because you could be in a

state then we're like maybe that the

jobs counter was I don't know how many

bits we have for it but n minus 1 where

n is max they were equal at the max

value then you rolled over the sleepy

counter and right and so if a job wasn't

published yet then the first adjust kind

is higher but that's not all the matters

is how to equal or not even the job gets

published it will become equal always

wait we talked about roll over because

it seems like something we should be

avoiding we didn't talk about it but I

don't think that's why I'm talking about

it now because I deferred it to come

back to UM but I think you can't avoid

it because especially on 32-bit case

they're just not enough bits to so you

only need a number of threads right

I never gets decremented so wait

yeah the sleepy the there are two things

there's there's one so there's two sets

of counters the jobs and sleepy ones

those are really clocks night counters

meaning they never go down they just go

up they're more like how many events

happened

the other ones are tracking the number

of active threads or the number of idle

or sleeping threads and those do get

incremented and determine we could of

course just there would be other schemes

that are not as tricky but they evolve

writes on the common case that's why I

didn't do them involve atomic rights

when posting work at least as far as I

can tell it seems like handling

rollovers should be doable right I don't

really see a problem it's just about if

they roll over like they said the

numbers really are in a neuroma I don't

know if I did I should them I think I

did because I think I made the max quite

low at some point just to see what

happen but it would be worth testing it

explicitly maybe making a test for it

but yeah the important thing would be

that they're just equal or not equal so

also there has to be there's two

important things I think one is that the

number of threads you don't want it to

loop around so there's a max on how many

events can occur without a job being

posted which is all the threads might

get sleepy and that means you need at

least as many bits as you have threads I

guess but as long as you have enough it

should be okay so yeah we should

document that Mexico especially if we

limit 32-bit - I guess 256 or maybe

slightly less threads

yeah otherwise well actually what

happens I think it even may not be the

end of the world

otherwise like if it were too few I

think it just means you'll sleep one

more you'll think that a job was posted

and you'll go one more round but I would

want in other words this the threat

becomes sleepy then sees that okay now

there then when it comes to go to sleep

it says oh the jobs is equal to two my

sleepy counter I'm not sure

like the idle counter or the sleeping

counter I don't like I don't think you

can do that modulo 256 or a smaller

number of bits because you could end up

on a zero when there's actually a lot of

threats sleeping but it's not counting

home when you're sleeping right it's

coming on it's only counting how it's

coming how many went to sleep since

since rolling so things like last idle

assume you have an actual number knowing

how many are idle yes that there's

that's right but that isn't that that's

the two different things so maybe we

need a different name clearly these

names are confusing okay but there's

what I what I called the jobs and sleepy

counter and then there's the number of

threads and the sleeping threads and

these are exact so those two numbers

need to have enough room to not overflow

now these cannot overflow this one I

like just even pretty like say the

sleepy you've been to counter would be

much clearer yeah and have been counter

rather than a thread counter or

something let's do that down here so I

remember to do it

hmm I will yeah and we should write out

exactly what the what happens with

rollover but I think it would work it

might be I have to go and I know

if we had 32 bits I'm gonna loop back to

that for a second if we have 32 bits I

guess if we made if we impose you could

imagine imposing a limit like 128 worker

threads or something which is still

quite a lot probably okay it's 64 bit

allowed to limit to 16 32 K the current

code has 2 to the 16 so I I'm aware I

mean sorry my branch okay how's this

limit in practice I don't know what the

it's the current I don't think the

existing like you had some limit I mean

in practice because they have fed ideas

but like it's a pretty big one yeah Oh

your size max or something at least but

but it's new to me like 2 to the 16 was

a pretty reasonable I doubt it's gonna

scale that we'll talk to that many

threads anyway so do you error over that

or just cap it I don't know I think I

don't even detect it I just assumed it

we should probably add something I would

probably just cap

well in case if you have 32-bit you

could have 8 bits for jobs event counter

8 bits for a sleepy event counter 8 bits

for idle threads and 8 bits for sleeping

threads what I don't know is if there is

indeed a problem I have to think about

it for having the same like if you want

to have the event counter able to be

incremented by every thread and not

overflow it might be that this has to be

7 bits or like let me know how many 7

bits but that the actual limit is not to

the 8 the 38 minus 1 or something

you see my point

that's still quite a lot of threats I'm

special I'm 32 it yeah right it seems

unlikely to 254 threads it's all that

useful right now

it could also yeah yeah I was thinking

that it would be plausible to do it

swap out more like I think the counter

mechanism is probably not completely

encapsulated but could be encapsulated a

certain such that you could have a

different mechanism you could do

different things on 32 versus 64-bit if

we found that the limits were

problematic there might be somewhat

slower but I convolve more communication

I I think there's a limit to what sort

of effort is worth applying to 32-bit

targets right now yes I agree or if

people care more about 32-bit they can

come and help us do that work but once

we do something reasonable I think we're

okay so I couldn't find any I was hoping

that reading through the code I would

see the bug see a bug that might lead to

seg faults I did not

are you leaving modifying any unsafe

code here well that's not I mean the

rayon core is like the unsafe part well

yes but the unsafe stuff has to do like

lifetime erasure and type erasure and

things like that but yeah but I could

imagine there being a cynical TIFF well

for example if we got the latches wrong

so that things started to execute when

they hadn't actually happened or if you

were doing

even your Atomics you're doing like

sexiest evil a lot of them right I was

that was another thing I wanted to check

I mean might I could reproduce it I'd

love to you know be a little easier to

to test there is this one thing that I

always bothers me about the C++ 11

memory mall that I wanted to double

check before we got in this call but I

fail to do so about the interaction of

sequentially consistent operations with

other kinds of operations and and how

much trying to member the details of the

case that it frightened me but I think

it seems like there's some there's some

additional ordering that's imposed on

sequentially consistent operations and

the question is how much that applies

transitively like is it possible to have

hmm acquire release relation within one

thread something that would have

propagated to that thread via acquire

release and then I sequentially

consistent operation that synchronizes

with another thread is this is this

override guaranteed to see thread a as a

wire release relationship to thread B

and then could it be as a sequentially

consistent relationship with what C is

thread CEC the writes from a that's the

case I'm usually concerned about and I

sort of forget the answer I do remember

I loosened I loosened a few things in

the latch code to acquire released from

sequentially consistent after convincing

myself that it was great and maybe I was

wrong

latches seem like the poster child for

for a quiet release but they're like a

one-way communication

yeah although when I look at the code

here no I am doing the query release

your inset I guess but some other places

I'm doing squinched I'm a little

inconsistent maybe I was just anyway

I mean I could just change it all the

screen to be consistent so it happens

it's like Wagner can reproduce it he'll

be good he can't reproduce I've been

seeing this I didn't see my rational

mind this is my system though I do

remember seeing one at some point in the

good history and I fixed the bug anyway

no way but maybe I didn't really do but

I can't reproduce it now so it seems I'm

like because it was happening pretty

reliably segfault certain nasty too

because you could still have the memory

issue without causing a seg fault right

without grains to during slow and misses

with all this stuff anyway although I

kind of wonder if if we know that we're

not doing the the job storm if the

behavior in developing will be a little

nicer hmm possibly I also wonder about

plugging in loom and see what we see

I have no idea how that would work I

haven't really looked it's not like

super simple to do things no you can't

have to mock out interfaces and yeah it

looks like a cool library but not

something you can just plug and play and

then Miri would be nice too but I think

it just doesn't support threading at all

yet no so what do we need to do to land

this about 30 minutes yes assuming

you're still doing keen on landing it

yeah so the addressing the 32 bit stuff

matters

um and then definitely adding some of

the knowledge that's in this document as

Co comments right both comments maybe

updated the RFC as well I I would find

that helpful in the sense that a it was

long for him but also the act of writing

tends to force me to think through

things that I might have been a little

blase about beforehand this seemed all

doable I don't know

Luka and Thomas whether how much you are

interested in like I'm slow okay in my

we tried it quite a bit

and for our use case is working really

well we have this kind of problem okay

yeah so I mean is there is there a or -

what part do you want me to take

basically that's what I'm wondering

so one thing might be like the 32 bit

it's not super hard but in theory

compile-time changing that type I guess

you guys compile-time changing the type

it's mostly I guess the test that needed

to be there anyway that are more likely

to overflow now yeah is there an easy

way to consider whether you run 32-bit

or before so I suggested above that we

can just change it to atomic you size

and you'll get to 32 64

that's enough though because we need to

assign two bits inside of it yeah so you

can use target pointer with config

target point with good and just choose

32-bit and 64-bit values

yeah me to compile the shits and all

that annoying stuff moves it'll be an

annoying block of configs but we could

even use configure if you want probably

it's just a matter of tweeting constants

or if it's not I mean this atomic type

has getters and setters so it's like

only gonna be inside that right I tried

to encapsulate that logic for the

segfault thing there might be some like

Malik tuning options where we can tell

Malik or even like gilepsy Malik or

maybe switch to J email like for testing

to set more aggressive freeing options I

mean if you can get Malik to discard

pages that have been free to might give

us a better chance of hitting a real

psycho yeah all right like people still

use electric fence something like that

it's probably I guess the first thing

would be to um or maybe we can like get

along I don't know a login on a system

where it produces more readily he'll we

party back

I think Wagner Tim it says

I guess that's another question can you

go find the code for Ram would be nice

I learned SVG for that was admitting SVG

drink yeah okay well does have to see if

we can reproduce if we can't reproduce

oh well yes pretend it doesn't exist

until somebody reports it I realize have

one question it's not quite related to

this this particular branch it's just

that the with the sleepy state of the

worker you mentioned so so basically

basically like loops I think it's set to

like 32 times for a spin lock like the

only purpose that state is this in Lock

Union he just increase the event counter

right when you start wanting to do this

pin locking in the sleep so that's the

only way that's reminds me I'm not sure

exactly what you're saying but I would

point out that I totally change these

constants also I think they're much

smaller yeah so it's like the one thing

I know is doing either to the loop is

it's like it's it's it's calling the

scheduler yield function every time

through the spin lock and if that

actually like context which is it be

making there's a far as I know the same

the same cost is actually just sleeping

yeah this kind of came up in that recent

storm about spin locks versus mutexes or

Torvalds gotten involved that Linus kind

of made a statement that if you're

calling skid yield you're probably not

being as effective as you think you are

so yeah it might be worth just

forgetting rounds here and just

make it like a a one-step transition

from sleepy to sleeping or potentially

do a loop that doesn't call yield at all

but I'm not sure then the sleepy time is

going to be so short but I think I'm

actually under I don't know how long the

event is before it's leaks is in terms

of like microseconds not very long the

current one at least I have it tuned to

just one round okay I think I tried

other variations to make that much

difference but well that's not clean OTT

entirely true as I recall for the

benchmarks that we're seeing performance

effects on it didn't matter somewhat

because they were basically the more

time the it's gonna directly

proportional the more time your thread

spent searching around the better yeah

one thing that the one other issue one

knows is that we've with least mastered

branch which we'll see if this effects

that I haven't really tried this a lot

with it already helps with the lab start

a third branch but with our software it

my default chooses a number of threads

equal the number of CPUs but if you were

to run like two instances of her program

then you of course allocated twice as

many threads so there could be

potentially bad behavior than if you're

if you turn into this spinlock but you

are like you need yield and you're like

yielding between two threads that are

both spin locking it's coming back

really bad and if they make those sleepy

state lasts a lot longer

and you've otherwise would so that's

like a case that's not not address

writing to us right now and we've been

able to solve because like when we set

up we always make sure to configure each

of our if we run multiple instances of a

program we set each one to have a subset

of the number of threads in the CPU but

it'll be nice if it didn't

you know a bad behavior is what I mean

I'm not entirely sure why like I think I

just put in yield now because because

not sure that was a correct decision um

yeah that's kind of separate to the

branch I just thought I'd mention it

serves me it's like there now I'm trying

to think why how important is the sleepy

state at all but I feel like I can my

brain does not want to turn on me know

what I think you wanted at least I think

there was a reason we needed a sleepy

state to get the synchronization right

if I don't know if we need 32 rounds

well I think you need to sleep you state

because you want to increase that you

least want to increase the the event

counter the moment you starts the one

which you do any check on the lock and

not before you grab the final sleep blog

but it's a 1:1 function of it is to

create this this period of light weight

wake up right where are we wake up we

notice the word in a lighter-weight way

then having to go through all the word

goes yeah potentially I guess you if you

you could if you only have one pass

through there before you it looks like

part of the sleeps thing you could rule

the stage or just other state the

exactly one check which is the thing

that I do you know I'm not gonna work on

that first cause I think I would rather

work on improving that corner situation

for three give it to this branch done

that's my one say I might also try

benchmarking is dropping that down I'm

sure it's a strongly strongly dependent

on what your particular os's yield

thread function does also I think the

other reason we needed sleep is that we

if I recall we were limiting so that

only one thread would go into that would

make that sleepy a sleeping transition

at a time oh and the older code yeah is

that not true anymore no not anymore

didn't as many as you want okay

I think that's valid I think I was a

reason why we had it before it was

because they were basically serialized

to transition to sleeping right I was

trying there to solve a bunch of random

things but among them I didn't want to

have any limits on how many threads you

could have it doesn't make any sense

but maybe it's something our visit after

I leave this market no but one of my big

goals was definitely that many threads

can the threads are acting more

independently in this branch and they're

not serialized to go to sleep one at a

time okay so yeah I think generally I'm

happy with the tour I've gotten here I

haven't done the actual code review

comparing side-by-side what the actual

changes are I'm facing this based on

what you've shown me so that's I don't

expect surprises because you're in

August developer but

taking care I could overlook some more

yeah so I do it all actually do is I'll

make a I'll take this branch and I'll

make it PR right now that just I'll pull

logging all out and take all the the

undo redo patches out and I'm like a PR

and an upstart sake

I think I already did that actually um

if you look at the branch and the number

two branch i plug it thread to the

commit history is how you get a nice boy

or a dog six commits head so it's way

better yeah I I put the logging in one

place the but there's like some commits

at the end that's just kind of do all

the rest like you couldn't cleanly

separate it out I guess it's this one

add new thread pool I think it's gonna

be like that just because right you

can't change half the thread pool all

the time right it didn't seem like it

was and sometimes even if you can is I

don't personally find it easier to read

if the same code is changing again and

again

versus jumping to the final state

certainly if any tests fail within the

state that's really irritating

alright well let's just cuts off one

let's discuss on jitter I guess but I

think trying to get the 32 bit in there

and moving some of the documents the

comment the first most immediate step is

I'll try to move comments and other

people are unhappy if other people want

to do it too by the way I can add you as

a collaborator so you don't have to make

peers and I guess we're gonna try to

land it on master right and just we were

discussing about the idea of landing on

master and calling for people to test it

and see if they can hit any problems

yeah another option could be if we have

the ability mr. Wran oops

run a whole crater or just stuff that

the the pencil rayon that would be

possible can we do creator that patches

the brown version yeah I don't know if

that the whole machinery is able to do

that but that should help us to see if

at least automate the process otherwise

calling for feedback probably would be

the other easy solution at least testing

with the firefox code is a good idea

yeah if we like pump the version and

nightly would be pretty cool i

understand the last part if we could run

the in firefox nightly that's too much

like that i think we could do a low

clerk at some point uh bob posted on my

wiki or something instructions for how

to run the test locally at least

yeah so I one of the things that would I

remember of his testing was that he was

seemed to be measuring things like how

often do they get caches which is a very

high level metric of performance for

them I don't expect any of that to be

very much affected by what we have here

so I think we actually care about raw

performance numbers real timing as an

effect of this branch do you have access

to like interesting hardware test guns

if we have yeah maybe under triplet this

well and we have a beaker lab which has

all kinds of interesting machines but

the really fun ones are often in use

right but also like I guess now I think

about it we should test on crazy but the

most relevant is probably smaller

unconventional setups another question

about what to do when and if we notice

that we are in another commit state

sorry what multiple more threads and CPU

cores yes I mean there's no way for us

to detect that in the sense of like we

can't detect other dreadful another ring

on so I think I think there's something

that I would just test I mean right

right maybe a test for and then see if I

can change stuff or makes that taste

better but there's no way to detect it

as far as enough they still would be out

of scope of Breanna

he does scope for this branch at minimum

there is some kind of uh there is some

it would be if she talked about it some

other point yes there's some people who

explored architectures where they

can like make changes of this kind but

it would require some a lot of work

where they can detect if they're kind of

grow and shrink the number of worker

threads I should hide my interest in

machines I should try some other

architectures to like hey I can get this

running on s/390 or PowerPC pretty

easily me and Luca both have a power 9

box under our desks oh that would be

good

ARMA spotted a more important target

located as well

32-bit arm is another one those cases

where we only have the atomic you 32 all

right cool I'll try to motivate myself

to do some of this too

I've been very I don't know I've just

been wanting to come while you're on the

couch I'm not working right now it's not

a bad thing but well all these pokey for

patch reviews yes at least that thanks

everybody

thank you