B) PowerPC Support in C or C++ ============================== Principially PPC Developpement in C/C++ runs in 5 phases: Note: If you are using vbcc-WarpOS, and not StormC, then you should also read vbcc.doc !!! Coding with vbcc differs in some parts to coding with StormC !!! 1) Rewrite all 68k ASM Stuff in C 2) Adapt Source to ANSI/StormC 3) Adapt to PPC 4) Contextswitch-Optimizing 5) Further Adaptions Contrary to what you might believe, 3) is only a very small step, the big step is 2). And yes, you can do this already, even if you do not own a PPC, mainly. I will explain the different steps of developpement now in a more detailed way. It has to be outlined, that it is advised to do steps 1)/2) already while developping an 68k version, even if at first no PPC Version is planned. It will simplify the PPC Developpement much, and it in fact does not need too much extra work... It has also to be noted, that things are not that easy using the PPC Software from Phase 5. This is a special feature of the WarpOS Software, that things can be such easy. I won't discuss rewriting 68k ASM to C Source here, you should be able to do this yourselves. 2) Adapt Source to ANSI/StormC ------------------------------ The most work is not the adaption to PPC, but the adaption from SAS/C or GNU C to StormC. StormC is a strict ANSI compiler, because of that it knows only Standard-C-Functions that are contained in the ANSI-Standard. Some of the not-supported functions can be emulated using the not-yet-released UnixLib, though. It should be noted, that, if your program compiles on SAS/C with the STRICT ANSI mode set. You can think of StormC as a compiler that ALWAYS runs in STRICT ANSI mode. The following SAS/C Functions are not contained in ANSI, and thus not supported by StormC (most of them are quite exotic functions, and it is possible that you do not even know a lot of them, even if you are a proficient C Coder) : astcsma isascii iscsym iscsymf toascii scdir stcpm stcpma stcsma stccpy stpcpy stcis stcisn stclen stpbrk stpchr stpchrn strcmpi strnset strset stcarg stpsym stptok stpblk strbpl strdup strins strmid stcd_i stcd_l ecvt fcvt gcvt stch_i stch_l stci_d stci_h stci_o stcl_d stcl_h stcl_o stco_i stco_l stcu_d stcul_d toascii stpdate stptime __datecvt __timecvt utpack utunpk cot iabs max min pow2 __emit getreg putreg geta isatty ovlyMgr dqsort fqsort lqsort sqsort strsrt tqsort drand48 erand48 jrand8 lcong48 lrand48 mrand8 nrand48 seed48 srand48 __autoopenfail chkabort Chk_Abort _CXBRK __exit onexit _XCEXIT forkl forkv onbreak wait waitm bldmem rstmem sizmem chkml getmem getml halloc lsbrk sbrk _MemCleanup rbrk rlsmem rlsml memccpy movmem repmem setmem swmem except __matherr poserr datecmp timer __tzset getch fgetchar fputchar _dread _dwrite read write clrerr close _dclose fcloseall creat _dcreat _dcreatx fdopen fileno fmode iomode open _dopen flushall mkstemp mktemp setnbf _dseek lseek tell access chkufb chmod fstat getfa getft stat stcgfe stcgfn stcgfp strmfe strmfn strmfp strsfn unlink argopt chgclk dos_packet getclk getasn getdfs putenv rawcon stackavail stacksize stackused chdir closedir dfind dnext findpath getcd getcwd getfnl getpath mkdir opendir readdir rmdir seekdir rewinddir telldir readlocale scr_beep scr_bs scr_cdelete scr_cinsert scr_clear scr_cr scr_curs scr_cursrt scr_cursup scr_eol scl_home scr_ldelete scr_lf scr_linsert scr_tab _CXFERR _CXOVF _EPILOG _PROLOG The most important of the "not allowed" functions are the Level 0 I/O functions (open,close,read,write). Use fopen,fclose,fread,fwrite instead. Note: Some of these functions might be included, in the first version of this text i by mistake declared stricmp and strnicmp as not included (what is wrong), there might be more errors in the list :) But probably not many... probably none... But STRICT ANSI does not only limit the functions, there are also some things, that cause a warning from SAS/C, but an error from a strict ANSI Compiler. Things like: char *string=malloc(300); cause an error from StormC. Correct would be: char *string=(char *)malloc(300); ANSI wants STRONG TYPING. If you do not own StormC, but want to make your code as easy compilable with StormC PPC later, compile with STRICT ANSI. Problems appear especially with function pointers. If you are not sure how to cast a thing for STRICT ANSI, maybe you should try void *, it works often for not strongly typed source. You should also replace all K&R Syntax (example) void main(argv,argc) int argv; char **argc; by the normal syntax (example) void main(int argv,char **argc); Also a code like int a=5; int stuff[a]; is not legal on ANSI. Array Dimensions have to be constants. If you need them variable, use dynamic allocation using malloc. A good method to convert to "Strict ANSI" is the following: 1. Just compile it, and look at every warning and error 2. Typecast everything that looks like a pointer (and causes an error) to void *, everything else that causes problems, to a int, long or double. 3. If some things still don't work, have a look at them now. Some Sources (like the Source of Doom) require parts of the Unix/TCP includes. If you need such things, please contact me, i have converted the needed things to StormC (contact address see below). Now we are nearly done with the ANSI/StormC Adaption. At the end some keyword have to be defined differently: #define __stdargs #define __regargs #define __asm #define __far FAR #define __inline inline #define __volatile volatile __chip, __fast and __interrupt do not exist on StormC, they have to be replaced by the appropriate OS Functions. Some programmers also use some strange cominations that won't work (static inline is complete nonsese, get it Unix-coders :) !!! Static OR inline but not both of them !!!) And if we are at "bad coding style": Bitfields only exist on C++, not in ANSI C... Ah, and one word to those fclose-always-works-fans. No, fclose does not work, if the file is NOT OPEN !!! You crash your task, if you try to close a file, that is not open. Do if (file) fclose(file); Some words to __attribute__ ((packed)). It does not exist, and is a feature that would slow down the PPC *much*, if it would exist. Please do not use __attribute ((packed)). The PPC needs a certain alignment to get optimal speed. About Text Constants longer than a line: It is legal to write: char *bla="...."\ "...."\ "...."; But the last character before the \ should be a \ here. The notation char bla[]={"..."\ "..."}; is not legal (This is sometimes used in GNU C Sources). If you have done all this, you now (should) have a working StormC 68k Source. Now we go to the PPC stuff. The most work is done now. Only small things remain to do. PPC-handling is mostly done internal by the compiler. C. Adapt to PPC --------------- At first we have to change register parameters: void test(register __a0 mytest); has to be changed (for example) to void test(register mytest); The PPC does not know a register a0. But you can tell him to use a register by usage of the keyword "register", without specifying a register number. Next we have to do some changes to OS-Includes: up to now, depending on which compiler you used, you did (example): #include #include or #include #include or #include #include or #include For StormC PPC you do: #include Do not include any pragmas/pragma files, or you will be swamped by error-messages. Also do not include any proto/ files. If you want to compile your source for both 68k and PPC (without changing the source) you do: #include #ifndef __PPC__ #include #endif __PPC__ is always set correctly. Yet another difference between 68k and PPC concerns the usage of Subtasks. If you want to do the Subtask as PPC Task (recommended) you have to replace functions like CreateTask() by CreateTaskPPC() of the powerpc.library. I won't go into detail here, most of the time the API is absolutely identic to the usual functions, with the exception of a PPC at the end of the function name. Read the documentation of WarpOS for more information. The other method would be doing the subtask as 68k task and calling CreateTask(). To do so you would have to make your program a mixed Binary, though, and you also would not get full PPC Speedup. So usually (unless the subtask does many OS Calls) the CreateTaskPPC() approach is the better method. Also, it is recommended not to use 68k Subtasks in PPC programs, so that your program will get optimal speed on a 100% PPC Amiga System (that surely will appear some time in the future). Earlier versions of the compiler had problems with Tags-versions of OS-functions. This is fixed since quite some time now. I did not notice, that is why i said in earlier versions of this document, that you would have to change this code. I did not test since quite some time. Then we come to the BeginIO-Function. This function only exists with a Library Base on the PPC Compiler. You can use the following code (example is for audio.device): #include #include void BeginIOAudioPPC(struct IORequest *arg1) { extern struct Library *AudioBase; ULONG regs[16]; regs[9] = (ULONG) arg1; __CallLibrary(AudioBase,-30,regs); } An example how this can be used (out of the Sound-Code of ZhaDoom...): AudioBase = (struct Library *)audio_io->ioa_Request.io_Device; c = &channel_info[cnum]; c->audio_io->ioa_Request.io_Command = CMD_WRITE; c->audio_io->ioa_Request.io_Flags = ADIOF_PERVOL; c->audio_io->ioa_Data = &chip_cache_info[cache_chip_data (id)].chip_data[8]; c->audio_io->ioa_Length = lengths[id] - 8; c->audio_io->ioa_Period = period_table[pitch]; c->audio_io->ioa_Volume = vol << 2; c->audio_io->ioa_Cycles = 1; #ifdef __PPC__ BeginIOAudioPPC((struct IORequest *)c->audio_io); #else BeginIO ((struct IORequest *)c->audio_io); #endif You see? You always have to read out the LibraryBase of a device to do a BeginIO on PPC... Some readers now probably ask themselves what about the famous "Context-Switch". Well, the truth is, under StormC, the Compiler automatically deals with the Contextswitch. You won't have to think about it... i will lose some words about it anyways: There are two sorts of Contextswitches: a) Function-Contextswitches You have to compile with Debugging-Information the first time you compile the Source. Then the compiler handles the Contextswitches automatically. Later you can compile without Debugging-Information, if you want. b) Library-Contextswitches These need so-called "function-stubs". ppcamiga.lib already contains the function-stubs for all Amiga-OS-functions, and for the 68k-functions of rtgmaster (But for rtgmaster also PPC-functions exist, and it is adviced to use these). To create a stub for a not yet supported library, you do: genppcstub mylib_protos.h mylib.fd VERBOSE You need the proto- and the FD-File to create the stub. The stub is a C Source file that you link together with your Source. The Contextswitch itselves then works automatically. D.) Contextswitch-Optimizing ---------------------------- With WarpOS a Contextswitch needs about 0.5 milliseconds (with a 200 MHz PPC 604e Board...). It should be avoided to do "many Contextswitches per Second" (BTW: The Phase 5 Software needs about 1 millisecond for a Contextswitch). Example of things to avoid: - Load Files on a Byte-per-Byte basis with fgetc (use fread instead and load to a Fastram Buffer, from which you get the stuff on a Byte-Per-Byte-Basis then) - WritePixel (work on a Fastram-Buffer instead) - OS-Calls that are called often per second Graphics can be handled completely PPC Native by using rtgmaster. rtgmaster is a PPC Shared Library. Notice, that some of the Standard-C-Functions do Contextswitches. I think clock() is among them, but am not sure about it. A possibility to deal timing without Contextswitches for sure is to use the PPC timer directly, in PPC ASM: double tb_scale_lo = ((double)(bus_clock >> 2)) / 35.0; double tb_scale_hi = (4.294967296E9 / (double)(bus_clock >> 2)) * 35.0; bus_clock is set to the Bus Clock in Hz, for example 50000000 for a 150 MHz Board, 66000000 for a 200 MHz Board. Stopping time is then done like this (example of the I_GetTime-function of Doom): int I_GetTime (void) { unsigned int clock[2]; double currtics; static double basetics=0.0; ppctimer (clock); if (basetics == 0.0) basetics = ((double) clock[0])*tb_scale_hi + ((double) clock[1])/tb_scale_lo; currtics = ((double) clock[0])*tb_scale_hi + ((double) clock[1])/tb_scale_lo; return (int) (currtics-basetics); } ppctimer looks like (object code for people who do not have StormPowerASM is contained inside this archive): vea XDEF _ppctimer _ppctimer: mftbu r4 mftbl r5 mftbu r6 cmpw r4,r6 bne _ppctimer stw r4,0(r3) stw r5,4(r3) blr But well, as i said, i am not sure, if clock() does use Contextswitches or not. Only i had the feeling that ZhaDoom speed up, after i replaced the usage of clock() by the usage of ppctimer(). 5) Further Adaptions -------------------- Note: The following is fully optional !!! (But it might speed up some things) It is possible to declare waste memory-areas as non-cachable using the BAT-registers of the PPC. How this is exactly done, read the documentation of WarpOS. Another optimization would be re-writing parts of the code in PPC Assembler. As to this, see below. In some newsgroups it was discussed to run program parts asynchronely on the 68k. Some people even claimed this would only be possible with the Phase 5 software. This is not true, if you want to implement it, you would use the PPC-Native Message-System of WarpOS (keyword "AllocXMsg", refer to WarpOS documentation). But i want to outline the disadvantages of this "parallel" method: 1) On PPC-only machines such code would have serious disadvantages. And such systems will come... 2) The PowerUP-Hardware is not good for true Multi-Processoring. As soon as your 68k/PPC tasks share memory, you will get serious problems. I won't get into detail, it was discussed enough in the newsgroup. And it really is not worth the effort. I seriously recommend to work only "synchrone", doing Sub-Tasks only on the same CPU the mainprogram also is running on. Sometimes it is also useful to do a manual Contextswitch to a 68k ASM function. If the ASM functions contains tons of OS calls, for example. But if you have such code, i recommend using a Mixed Binary, anyways. Makes things more easy. PowerPC ASM Optimization ------------------------ At last this one. Again i have to say, that it makes no sense to implement the whole stuff in PPC ASM. You start like this: 1) Implement all in C 2) Compile it for 68k and use the Profiler of StormC (the profiler currently only exists for 68k, but its data is also useful for PPC) When you use the profiler you run the program, and it does a statistic about which functions use how much CPU time. Then you implement the functions that take the most CPU time in PPC ASM. It is that simple. You have to keep in mind, though: - even ASM can't speedup massive numbers of Context-Switches - ASM also can't speed up the slow GFX Bus of the Amiga (Even Zorro III is slow as to today's standards...) Remember always: Doing a fast implementation in C and then using a Profiler to find out which functions are worth a ASM Optimization is much more clever than doing everything in PPC ASM. Of course the profiler is only available, if you own StormC. SAS/C and GNU C do not have a profiler. Now, what do you do, if your "original" source is in ASM, not in C ? Well, you insert timing checks and write some timing data to a file ("manual Profiling") at places where you think the most time is wasted. Of course, real profiling (using StormC) is much more easy. Also remember, that C defines it's functions like: _Functionname So if you want to profile ASM-Stuff you have to add a leading _ to all functionnames, and to XDEF them all. Example: stuff.asm --------- start: jsr morestuff ; lots of code rts ; lots of functions morestuff: ; lots of code rts Would have to be changed to: startit.c --------- extern void start(void); void main() { start(); } stuff.asm --------- XDEF _start XDEF _morestuff ;... lots of functions _start: jsr _morestuff ;lots of code rts _morestuff: ;lots of code rts Well, and now you can start profiling... the C thing simply starts the ASM main function...