Page 1 of 1

LdB's FreeRTOS example

Posted: Thu Nov 15, 2018 3:32 pm
by macload1
Hi LdB,
I saw your post in the USB thread and the new FreeRTOS examples you shared in your github account. The 4 core switcher interests me a lot!

Would it be possible for you to start from a newer FreeRTOS implimentation? By the time, I did myself an update of James Walmsley's examples (even two times, first for version 8.0.0, then for 10.0.0). You may check that out on my Github:

Actually, they are at version 10.1.1, but that should be the final version, as the library has been bought from and renamed by Amazon.

Many thanks in advance!

Best regards,

Re: LdB's FreeRTOS example

Posted: Fri Nov 16, 2018 4:30 am
by LdB
I am doing 10.1.1 at the moment, if you frequent the FreeRTOS forums you would have seen :-)

The aarch64 bit version has held me up the context switching has been driving me nuts, well specifically it is when you allow critical nesting. You can turn it off with the config setting but I really don't want to do it because of what I want to do with inter core messaging.

I see you already did what I only realized I needed to do as I got further into the port. That being that the Irq handler section has to come out on it's own for other stuff to access. You will be able to easily get your current system going probably later tonight as I publish the multicore irq block code tonight. That is really all the complication is to porting the FreeRTOS code to multicore.

With your background you probably don't need much more than that so here is what i did with the Irq code to make it multicore ... r_MacLoad1

So the interrupt handlers now have core numbers as there are four sets of tables one for each core.

The multicore QA7 pending interrupts are available at these addresses as an array of 4 ... 1 for each core

Code: Select all

#define QA7_CORE_IRQ_PENDING		((volatile __attribute__((aligned(4))) uint32_t*)(uintptr_t)(0x40000060)) // Array of 4 .. one for each core
#define QA7_CORE_FIQ_PENDING		((volatile __attribute__((aligned(4))) uint32_t*)(uintptr_t)(0x40000070)) // Array of 4 .. one for each core
The bottom eleven bits have valid Irq pending sources as per datasheet QA7 so I merge them in to the empty bits of the Basic 8 and you now have a new maximum Irq number of 84.

So this is what the IrqHandler looks like

Code: Select all

void irqHandler (void)
	uint32_t coreId;
	uint32_t ulMaskedStatus = (*BCM2835_INTC_IRQ_BASIC);			// Read the interrupt basic register
	__asm("mrc  p15, 0, %0, c0, c0, 5 \t\n" : "=r" (coreId));		// Read Core ID from CPU register
	coreId &= 3;													// Mask off all but core number
	// Bit 8 in IRQBasic indicates interrupts in Pending1 (interrupts 31-0):
	if (ulMaskedStatus & (1 << 8))
		handleRange(coreId, (*BCM2835_IRQ_PENDING1) & coreICB[coreId].enabled[0], 0);

	// Bit 9 in IRQBasic indicates interrupts in Pending2 (interrupts 63-32):
	if (ulMaskedStatus & (1 << 9))
		handleRange(coreId, (*BCM2835_IRQ_PENDING2) & coreICB[coreId].enabled[1], 32);

	// Bits 0 to 7 in IRQBasic represent interrupts 64-71
	ulMaskedStatus &= 0xFF;											// Clear all but bottom 8 bits
	// Bits 0 to 11 in Core Irq pending represent interrupts 72-83
	ulMaskedStatus |= ((QA7_CORE_IRQ_PENDING[coreId] &0xAFF) << 8);	// Add the QA7 irq bits to the IRQbasic (bits 8 & 10) ignored
	if (ulMaskedStatus != 0) 
		handleRange(coreId, ulMaskedStatus & coreICB[coreId].enabled[2], 64);
Compare it to your source .. so all that happens is the coreid is used to direct it to right struct, no more complex than that ... terrupts.c
You should be able to use that instead of your current one and it should function on your current code on core0. Alternatively you should be able to move your code to execute on another core and core0 should still function correctly with interrupts.

Now finally to deal with FreeRTOS grab all the internal data and simply place them into a struct, like I did with the interrupt handler above.
You add a core number to that structure and make an array of the struct of the number of systems you want.
Then go thru and any access to the internal variables you use the core number as the index of the structure from the array to use. It is just a lot of typing. All you then need is a call which sets the coreId, I just made a call FreeRTOS_Init and it needs to call before anything else and you know the rest from there. I just use static structs because there isn't much data but if you really wanted to push the envelope you could malloc a FreeRTOS struct size of data on demand.

You now have a version of FreeRTOS that can run on any core or multiple at once ... enjoy :-)

One of the things you could possibly help me with is do you know all the Irq numbers from that system, I am trying to make defines for all the common ones. I am going to have to map them all and it could save time if you know some already. The local core timer is 83 by the way if you do decide to play and 64 is the peripheral timer which you already know.

Re: LdB's FreeRTOS example

Posted: Fri Nov 16, 2018 6:06 am
by LdB
Oh if you were interested this was my minimalist version FreeRTOS struct.
I had to bring config_TICK_RATE_HZ in as a variable TICK_RATE_HZ because I wanted a different rate on each core

Code: Select all

 * FreeRTOS control block (FCB) is allocated to each FreeRTOS implementation.
 * A single core system usually has only one, but a multicore may have multiple.
 * It binds together all the variable data that a FreeRTOS system uses in one place
typedef struct FreeRTOSControlBlock 
	tskTCB * volatile pxCurrentTCB;		/*< The currently active task control block */
	xList pxReadyTasksLists[configMAX_PRIORITIES]; /*< Prioritised ready tasks. */
	xList xDelayedTaskList1;		/*< Delayed tasks. */
	xList xDelayedTaskList2;		/*< Delayed tasks (two lists are used - one for delays that have overflowed the current tick count. */
	xList* volatile pxDelayedTaskList;		/*< Points to the delayed task list currently being used. */
	xList* volatile pxOverflowDelayedTaskList;	/*< Points to the delayed task list currently being used to hold tasks that have overflowed the current tick count. */
	xList xPendingReadyList;			/*< Tasks that have been readied while the scheduler was suspended. They will be moved to the ready queue when scheduler resumes. */
	xList xTasksWaitingTermination;		/*< Tasks that have been deleted - but the their memory not yet freed. */
	volatile portUBASE_TYPE uxTasksDeleted;	/*< Tasks deleted count */
	xList xSuspendedTaskList;			/*< Tasks that are currently suspended. */

	portUBASE_TYPE TICK_RATE_HZ;		/*< Ticks per second of the FreeRTOS context switch system (Max is usually 1000) */

	/* This was all private data FreeRTOS used */
	volatile portUBASE_TYPE uxCurrentNumberOfTasks;
	volatile portTickType xTickCount;
	portUBASE_TYPE uxTopUsedPriority;
	volatile portUBASE_TYPE uxMissedTicks;
	volatile portBASE_TYPE xNumOfOverflows;
	portTickType xNextTaskUnblockTime;
	volatile unsigned int uxPercentLoadCPU;
	volatile unsigned int uxIdleTickCount;
	volatile unsigned int uxCPULoadCount;		/*< Last count of ticks that were the idle task .. which is used to calc CPU load */
	struct {
		volatile unsigned uxTopReadyPriority : 7;
		volatile unsigned xSchedulerRunning : 1;
		volatile unsigned uxSchedulerSuspended : 1;
		volatile unsigned xMissedYield : 1;
} FreeRTOS_CB;
Then there is an array of them one for each core .. that is all there is to it and a lot of typing so each core uses it's data :-)

Code: Select all

FreeRTOS_CB freeRTOS[4] = {0 };

Re: LdB's FreeRTOS example

Posted: Fri Nov 16, 2018 9:09 am
by macload1
Hi LdB,

Many thanks for the explanations. I didn't check your code in detail yet, but if I understand well, you will have 4 schedulers running, one on each core and not one scheduler that handles and distributes the tasks on the 4 cores?

As of the interrupt numbers, I didn't yet touch any of the multicore PIs (I even do not possess one yet). The format factor (dimensions) of the 3A+ interests me a lot, so I wanted to buy one and get my hands dirty with it. So, I have saidly to say, that I couldn't help you with that one yet; in some weeks maybe, but I suppose you will be faster than me!

I have a lot more background on small MCUs (Pic16, 18, 24 and 32MX or MZ, 8051, the cortex M4F, and newly on the cortex m7) so the armv6 is still a heavy part for me. For the multicore I need to check out the datasheet more in detail to get comfortable with, but I'll do my best to bring my future knowledge on those parts back to the community!

Re: LdB's FreeRTOS example

Posted: Fri Nov 16, 2018 10:02 am
by LdB
macload1 wrote:
Fri Nov 16, 2018 9:09 am
Many thanks for the explanations. I didn't check your code in detail yet, but if I understand well, you will have 4 schedulers running, one on each core and not one scheduler that handles and distributes the tasks on the 4 cores?
That is where things get interesting and you can have lots of fun.

So yes on the first pass I ran 4 schedulers and set up a simple ipc message so I could pass tasks around. On the simple setup I gave you that is as easy as changing the core number in a task .. remember how that determines which RTOS system is used. The issue is your semaphores and actual stack space belong to the core you launched from so now you also need to remember the original core your resources come from. This gets messy very fast and you could centralize all the resources so all cores use the same subsystem (one of many solutions).

You brought up another option which is to merge all the cores into one scheduler that makes the resources easy to deal with but the issue is it's hard to balance the load up on a live running RTOS. This is not like linux or Xinu which I did this on where you can hold everything off to do some balancing calculation .. you are on a live RTOS. So that means if you decide to migrate a task from one core to another you need to leave the old task running while you duplicate it on the new core and when the new core is ready you can then suspend it on the old core and start it on the new and met your RTOS requirements. Only then can you delete the now suspended task because the new task is running. You can imagine the impact if you start flip flopping tasks backward and forward between cores.

So an alternative to copying all the data around is to virtualize all data access using the MMU and then you get into a cache problem you must make sure everything is coherent before you switch from one virtualization on one core to the other. Given the huge macros blocks in the FreeRTOS code that is not easy to do, you need to unravel all those macros. I am sure the macros probably gave you hell as well when you were migrating the code.

There are lots of games to be played in this space which is why I have started the articles and there is no perfect solution. On my weekend hit list is to do an XTASK ( type setup. It's all rather fun and I am just playing around with different setups.

I also tried to compile your code but got huge numbers of errors about duplication it went on and on
I looked at the makefile but there was nothing obvious like a platform command or something I was supposed to use.
In file included from src/CC3100/simplelink/include/simplelink.h:402:0,
from src/Demo/main.c:20:
src/CC3100/simplelink/include/socket.h:347:0: warning: "FD_SETSIZE" redefined

In file included from d:\gcc-arm-none-eabi-7\arm-none-eabi\include\sys\types.h:68:0,
from d:\gcc-arm-none-eabi-7\arm-none-eabi\include\stdio.h:61,
from src/Demo/main.c:2:
d:\gcc-arm-none-eabi-7\arm-none-eabi\include\sys\select.h:31:0: note: this is the location of the previous definition
# define FD_SETSIZE 64

In file included from src/CC3100/simplelink/include/simplelink.h:402:0,
from src/Demo/main.c:20:
src/CC3100/simplelink/include/socket.h:415:0: warning: "FD_SET" redefined
#define FD_SET SL_FD_SET

In file included from d:\gcc-arm-none-eabi-7\arm-none-eabi\include\sys\types.h:68:0,
from d:\gcc-arm-none-eabi-7\arm-none-eabi\include\stdio.h:61,
from src/Demo/main.c:2:
d:\gcc-arm-none-eabi-7\arm-none-eabi\include\sys\select.h:48:0: note: this is the location of the previous definition
# define FD_SET(n, p) ((p)->fds_bits[(n)/NFDBITS] |= (1L << ((n) % NFDBITS)))

Re: LdB's FreeRTOS example

Posted: Fri Nov 16, 2018 10:43 am
by macload1
You just wrote down all the things I thought to discover when I saw your Github example telling about FreeRTOS on a multicore platform!

And you are right, that's a hard way to do the balancing and switch the tasks over the different cores. I thought a more lightweight way, where you define the core for the tasks at creation and you never switch them from core to core could be a possibility. (but that maybe is just as if every core would run its own FreeRTOS implimentation)

Damn, you're right, I forgot about that. I left my project non working as it was at that time before switching to a another. To compile it, you need to throw away the whole CC3100 stuff. I should have put some tags on working states of the project but was not aware of all the possibilities of github at that time. I had some low level communication right with the CC3100, but when I wanted to include the whole WiFi stack given by TI, those definition problems gave me a lot of trouble. As I remember well, TI putted some abstraction layers over the RTOS and I didn't get it to adapt it fully to my project. I never found some of the definitions in the libraries...

If you're interested in, I can purify my code this week-end to get it working again (without WiFi), but as I said, just throw away all CC3100 related things, and it should compile (and launch!).

Re: LdB's FreeRTOS example

Posted: Fri Nov 16, 2018 1:59 pm
by LdB
It's up to you but what you have done is useful, as I showed it's scary close to what I end up doing and very usable. I guess it's the only logical way to pull the code apart because we both got there doing it different ways and for different reasons. I end up getting frustrated with the the source code organization on FreeRTOS so I just totally tore it apart. However I threw away stuff it has that I wasn't interested in the streams etc, so your port is technically better and I can then just add the multicore features. So if you want to clean it up I will add what I can to it and perhaps it serves you for a start point, so could be a win win. I won't get a chance to do anything this weekend I have to do a pile of source documentation on stuff I have already done (before I forget) and a fair bit of code cleaning.

Re: LdB's FreeRTOS example

Posted: Fri Nov 16, 2018 6:37 pm
by macload1
I cleaned out my code to make it usable. If you want to show something on your screen, you just need to put your own config file (and maybe comment out the GPIO related stuff in the beginning of main, but I think that's not absolutely needed. In fact, I use a LCD connected through DPI)

I will not have the time this w-e neither, but probably next week I'll check out your code.

As I already told, the heavy stuff has been borrowed mainly from James Walmsley. I just updated the FreeRTOS, ported the uGFX library (very nice embedded library!) and added some stuff like malloc, ...

Have a nice w-e!

Re: LdB's FreeRTOS example

Posted: Tue Nov 20, 2018 5:49 pm
by LdB
Finished 10.1.1 port .. Technically it is 10.1.2 because I added the 10 lines to give CPU load which they dont have. ... TOSv10.1.1

I have all but finished the AARCH64 I just have an issue with getting the stack switching I need to do an elevate to EL2 to do it, so that will occur in next few days. There are a couple of FreeRTOS ports that also allow you to add FPU into the contexts for tasks and they then save and restore them. I will add that feature to the port as well so you can have hard fpu on in tasks.

I have a funny multicore one which will be up next :-)

Re: LdB's FreeRTOS example

Posted: Tue Dec 04, 2018 5:20 pm
by LdB
I had a bit of time over weekend to finish the Pi3 AARCH64 bit version of FreeRTOS 10.1.1 ... TOSv10.1.1

Should now work on any model Pi in 32 bit and either Pi3 model in 32 or 64 bit.