Faster Float to DWORD conversions

Programming Reality Factory and Genesis3D.
Post Reply
User avatar
paradoxnj
RF2 Dev Team
Posts: 1328
Joined: Wed Mar 01, 2006 7:37 pm
Location: Brick, NJ
Contact:

Faster Float to DWORD conversions

Post by paradoxnj »

If you place the following code in the D3D7 driver, you will gain about 1-2FPS.

Replace:

Code: Select all

inline DWORD F2DW( FLOAT f ) { return *((DWORD*)&f); }
With:

Code: Select all

__inline DWORD F2DW(float f)
{
	DWORD				retval = 0;

	_asm {
		fld				f
		lea				eax, [retval]
		fistp			dword ptr[eax]
	}

	return retval;
}
This code has been tested. I noticed quite a few float to DWORD conversions in the D3D7 and OpenGL drivers that are not using a conversion method (they are using straight casts). Every time you use a cast, the compiler generates code that calls _ftol(). _ftol() resets the FPU everytime it's called caused many delays in code. If you run the code through a performance analyzer such as VTune, you will see this. I'm sure if you put this in the Genesis engine itself, you will gain some more FPS.
Many Bothans died to bring you this signature....
User avatar
madness
Posts: 91
Joined: Mon Feb 11, 2008 5:56 pm

Re: Faster Float to DWORD conversions

Post by madness »

I have very little idea what this is about but thanks for helping make the engine better! :)
Somewhere in Nevada...
Jay
RF Dev Team
Posts: 1232
Joined: Fri Jul 08, 2005 1:56 pm
Location: Germany

Re: Faster Float to DWORD conversions

Post by Jay »

@paradoxnj: just a question out of interest: Would it matter if you didn't initalize the variable at the beginning? (It does only save 1 machine code instruction and would not matter in the end, i know) I mean the variable's value would be set to the 'float' value anyways... Or is this not advisable, because the value is set by the assembler code?
Everyone can see the difficult, but only the wise can see the simple.
-----
User avatar
paradoxnj
RF2 Dev Team
Posts: 1328
Joined: Wed Mar 01, 2006 7:37 pm
Location: Brick, NJ
Contact:

Re: Faster Float to DWORD conversions

Post by paradoxnj »

@paradoxnj: just a question out of interest: Would it matter if you didn't initalize the variable at the beginning? (It does only save 1 machine code instruction and would not matter in the end, i know) I mean the variable's value would be set to the 'float' value anyways... Or is this not advisable, because the value is set by the assembler code?
Well...it's good practice to initalize all your variables. I can't count how many times I ran into an issue and wasted 4 hours debugging it only to find out it was an uninitialized variable. In this case, if the float to DWORD conversion fails for some reason, the function will return 0 instead of 0xcccccccc. As for the extra instruction, every little bit counts. Genesis and RF are not built for speed. All game engines and shells should be built for speed. For example, the structs in Genesis and RF are not 4 byte aligned. This makes the processor very angry because it has to work harder. Struct sizes should be in multiples of 4. For example:

Code: Select all

struct Test
{
    float   a;   // 8 bytes
    DWORD  b;  // 4 bytes
    char  c; // 1 byte
} Test;
This is not a properly aligned struct as 8 + 4 + 1 = 13. It should be:

Code: Select all

struct Test
{
    float   a;   // 8 bytes
    DWORD  b;  // 4 bytes
    int  c; // 4 bytes
} Test;
Now the processor is happy and will process this struct much faster.

Consider this....Genesis works with float values for colors. Direct3D works with DWORD values for colors. The entire D3D7 driver is riddled with float to DWORD conversions using the old method. I calculated a 1-2 FPS gain, but it will probably be more. The OpenGL driver works with unsigned chars for colors. This is worse because you have to convert a float to a DWORD, then a DWORD to an unsigned char. Casting from a DWORD to an unsigned char is not to bad, but that float to DWORD conversion was killing the GL driver. I have been working on the GL driver a bit and was able to make it a little more modern by making it conform to OpenGL 1.5 standards. Basically, it uses vertex buffer objects and a poly cache instead of rendering the polys inefficiently.

Another change that would affect speed is to use SIMD instructions for the vector and matrix math operations. Intel's SSE instruction set is now supported on AMD processors as well. Luckily, Visual Studio 2005 has a "Use SIMD Instruction Set" option that makes this an easy change. If that is not an option for compatibility reasons, I can supply the vector operations in SIMD.

One more optimization is to change public variables into inline functions. Inline functions tell the compiler to insert the contents of the function body where ever the function is called. So the compiler prefers:

Code: Select all

class MyClass
{
public:
    int my_var;

    inline MyClass::GetMyVar()
   {
        return my_var;
   }
};

MyClass cls;

cls.GetMyVar() = 0;
Over this:

Code: Select all

class MyClass
{
public:
    int my_var;
}

MyClass cls;

cls.my_var = 0;
Here's a pretty good reference on optimizing C++ code.
Many Bothans died to bring you this signature....
Jay
RF Dev Team
Posts: 1232
Joined: Fri Jul 08, 2005 1:56 pm
Location: Germany

Re: Faster Float to DWORD conversions

Post by Jay »

Thanks. The thing with the pre/post-incrementers is interesting, i didn't know that ++i is so much faster than i++, because ++i would be 1 machine instruction (add(i,1)), while i++ is more, because the value is saved before it even is incremented.

so this
for(int i=0; i<x; ++i)
would be faster than
for(int i=0; i<x; i++)

right?
Everyone can see the difficult, but only the wise can see the simple.
-----
User avatar
paradoxnj
RF2 Dev Team
Posts: 1328
Joined: Wed Mar 01, 2006 7:37 pm
Location: Brick, NJ
Contact:

Re: Faster Float to DWORD conversions

Post by paradoxnj »

Yep. Funny how something so small makes a difference.
Many Bothans died to bring you this signature....
User avatar
QuestOfDreams
Site Admin
Posts: 1520
Joined: Sun Jul 03, 2005 11:12 pm
Location: Austria
Contact:

Re: Faster Float to DWORD conversions

Post by QuestOfDreams »

Just a really late reply and a warning to all other programmers to not blindly copy paste code. :wink:
It has turned out that the reason for the bump mapping problem in RF 0.76 was caused by replacing

Code: Select all

inline DWORD F2DW( FLOAT f ) { return *((DWORD*)&f); }
with paradoxnj's code. But the above line does not do a conversion from float to DWORD but a reinterpretation. (This is needed for the DirectX SetTextureStageState function, which generally expects a DWORD as 3rd parameter but for bump mapping this needs to be a float value :roll:)
User avatar
paradoxnj
RF2 Dev Team
Posts: 1328
Joined: Wed Mar 01, 2006 7:37 pm
Location: Brick, NJ
Contact:

Re: Faster Float to DWORD conversions

Post by paradoxnj »

Interesting issue. I wonder if my lightmap issue in Jet3D is related to this. I will try it out.

The issue is not that it is doing a reinterpretation, the issue is that it is rounding the float when the fistp is called. This might throw off the calculations in D3D.
Many Bothans died to bring you this signature....
Post Reply