이 기능이 훨씬 느리게 실행되는 이유는 무엇입니까?

IT Share you

이 기능이 훨씬 느리게 실행되는 이유는 무엇입니까?

shareyou 2020. 11. 20. 17:19

이 기능이 훨씬 느리게 실행되는 이유는 무엇입니까?

함수의 지역 변수가 스택에 저장되어 있는지 확인하기 위해 실험을 시도했습니다.

그래서 약간의 성능 테스트를 작성했습니다

function test(fn, times){
    var i = times;
    var t = Date.now()
    while(i--){
        fn()
    }
    return Date.now() - t;
} 
ene
function straight(){
    var a = 1
    var b = 2
    var c = 3
    var d = 4
    var e = 5
    a = a * 5
    b = Math.pow(b, 10)
    c = Math.pow(c, 11)
    d = Math.pow(d, 12)
    e = Math.pow(e, 25)
}
function inversed(){
    var a = 1
    var b = 2
    var c = 3
    var d = 4
    var e = 5
    e = Math.pow(e, 25)
    d = Math.pow(d, 12)
    c = Math.pow(c, 11)
    b = Math.pow(b, 10)
    a = a * 5
}

역함수 작업이 훨씬 빨라질 것으로 예상했습니다. 대신 놀라운 결과가 나왔습니다.

함수 중 하나를 테스트 할 때까지 두 번째 테스트보다 10 배 빠르게 실행됩니다.

예:

> test(straight, 10000000)
30
> test(straight, 10000000)
32
> test(inversed, 10000000)
390
> test(straight, 10000000)
392
> test(inversed, 10000000)
390

대체 순서로 테스트 할 때 동일한 동작.

> test(inversed, 10000000)
25
> test(straight, 10000000)
392
> test(inversed, 10000000)
394

Chrome 브라우저와 Node.js 모두에서 테스트했으며 왜 이런 일이 발생하는지 전혀 알 수 없습니다. 효과는 현재 페이지를 새로 고치거나 Node REPL을 다시 시작할 때까지 지속됩니다.

이처럼 중요한 (~ 12 배 더 나쁜) 성능의 원인은 무엇일까요?

추신. 일부 환경에서만 작동하는 것 같으므로 테스트에 사용중인 환경을 작성하십시오.

내 것 :

OS : Ubuntu 14.04
Node v0.10.37
Chrome 43.0.2357.134 (공식 빌드) (64 비트)

/ 편집
Firefox 39에서는 순서에 관계없이 각 테스트에 ~ 5500ms가 걸립니다. 특정 엔진에서만 발생하는 것 같습니다.

/ Edit2
함수를 테스트 함수에 인라인하면 항상 같은 시간에 실행됩니다.
항상 동일한 함수 인 경우 함수 매개 변수를 인라인하는 최적화가있을 수 있습니까?

test두 개의 다른 함수 fn()callsite로 호출하면 내부에서 메가 모픽이되고 V8은 인라인 할 수 없습니다.

o.m(...)V8의 함수 호출 (메소드 호출과 반대 ) 에는 진정한 다형성 인라인 캐시 대신 하나의 요소 인라인 캐시 가 수반됩니다 .

V8은 fn()호출 사이트 에서 인라인 할 수 없기 때문에 코드에 다양한 최적화를 적용 할 수 없습니다. IRHydra 에서 코드를 살펴보면 (편의를 위해 컴파일 아티팩트를 업로드했습니다)의 첫 번째 최적화 된 버전 test(용으로 특수화되었을 때 fn = straight)에 완전히 빈 메인 루프가 있음을 알 수 있습니다.

V8 은 데드 코드 제거 최적화로 벤치마킹하려는 모든 코드를 인라인 straight하고 제거했습니다 . DCE V8 대신 이전 버전의 V8에서는 코드가 완전히 루프 불변이기 때문에 LICM을 통해 코드를 루프 밖으로 끌어 올릴 수 있습니다.

When straight is not inlined V8 can't apply these optimizations - hence the performance difference. Newer version of V8 would still apply DCE to straight and inversed themselves turning them into empty functions

so the performance difference is not that big (around 2-3x). Older V8 was not aggressive enough with DCE - and that would manifest in bigger difference between inlined and not-inlined cases, because peak performance of inlined case was solely result of aggressive loop-invariant code motion (LICM).

On related note this shows why benchmarks should never be written like this - as their results are not of any use as you end up measuring an empty loop.

If you are interested in polymorphism and its implications in V8 check out my post "What's up with monomorphism" (section "Not all caches are the same" talks about the caches associated with function calls). I also recommend reading through one of my talks about dangers of microbenchmarking, e.g. most recent "Benchmarking JS" talk from GOTO Chicago 2015 (video) - it might help you to avoid common pitfalls.

You're misunderstanding the stack.

While the "real" stack indeed only has the Push and Pop operations, this doesn't really apply for the kind of stack used for execution. Apart from Push and Pop, you can also access any variable at random, as long as you have its address. This means that the order of locals doesn't matter, even if the compiler doesn't reorder it for you. In pseudo-assembly, you seem to think that

var x = 1;
var y = 2;

x = x + 1;
y = y + 1;

translates to something like

push 1 ; x
push 2 ; y

; get y and save it
pop tmp
; get x and put it in the accumulator
pop a
; add 1 to the accumulator
add a, 1
; store the accumulator back in x
push a
; restore y
push tmp
; ... and add 1 to y

In truth, the real code is more like this:

push 1 ; x
push 2 ; y

add [bp], 1
add [bp+4], 1

If the thread stack really was a real, strict stack, this would be impossible, true. In that case, the order of operations and locals would matter much more than it does now. Instead, by allowing random access to values on the stack, you save a lot of work for both the compilers, and the CPU.

To answer your actual question, I'm suspecting neither of the functions actually does anything. You're only ever modifying locals, and your functions aren't returning anything - it's perfectly legal for the compiler to completely drop the function bodies, and possibly even the function calls. If that's indeed so, whatever performance difference you're observing is probably just a measurement artifact, or something related to the inherent costs of calling a function / iterating.

Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?

Yes, this seems to be exactly what you are observing. As already mentioned by @Luaan, the compiler likely drops the bodies of your straight and inverse functions anyway because they are not having any side effects, but only manipulating some local variables.

When you are calling test(…, 100000) for the first time, the optimising compiler realises after some iterations that the fn() being called is always the same, and does inline it, avoiding the costly function call. All that it does now is 10 million times decrementing a variable and testing it against 0.

But when you are calling test with a different fn then, it has to de-optimise. It may later do some other optimisations again, but now knowing that there are two different functions to be called it cannot inline them any more.

Since the only thing you're really measuring is the function call, that leads to the grave differences in your results.

An experiment to see if the local variables in functions are stored on a stack

Regarding your actual question, no, single variables are not stored on a stack (stack machine), but in registers (register machine). It doesn't matter in which order they are declared or used in your function.

Yet, they are stored on the stack, as part of so called "stack frames". You'll have one frame per function call, storing the variables of its execution context. In your case, the stack might look like this:

[straight: a, b, c, d, e]
[test: fn, times, i, t]
…

참고URL : https://stackoverflow.com/questions/31698296/what-makes-this-function-run-much-slower

'IT Share you' 카테고리의 다른 글

쿠키 대 CookieStore를 사용한 세션 (0)	2020.11.20
AssertJ assertThat에 사용자 지정 메시지를 추가 할 수 있습니까? (0)	2020.11.20
SQL Server Management Studio GUI에서 반환 한 원본과 동일한 SQL Server 저장 프로 시저 원본을 프로그래밍 방식으로 검색 하시겠습니까? (0)	2020.11.20
Hibernate HQL 쿼리 : 컬렉션을 쿼리의 명명 된 매개 변수로 설정하는 방법은 무엇입니까? (0)	2020.11.20
MongoDB에서 기본 키를 설정하는 방법은 무엇입니까? (0)	2020.11.20

현재글이 기능이 훨씬 느리게 실행되는 이유는 무엇입니까?

shareyou

이 기능이 훨씬 느리게 실행되는 이유는 무엇입니까?

이 기능이 훨씬 느리게 실행되는 이유는 무엇입니까?

'IT Share you' 카테고리의 다른 글

'IT Share you'의 다른글

티스토리툴바

이 기능이 훨씬 느리게 실행되는 이유는 무엇입니까?

이 기능이 훨씬 느리게 실행되는 이유는 무엇입니까?

'IT Share you' 카테고리의 다른 글

'IT Share you'의 다른글

관련글

티스토리툴바