GNU Arm Embedded Toolchain

Byte by byte access to 32bit peripheral registers causes issues

Bug #1738730 reported by Murat Ursavas on 2017-12-18

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	GNU Arm Embedded Toolchain	New	Undecided	Unassigned

Bug Description

Hi,

Normally I am working on our application project with ARM-GCC 4.9. While adding a bootloader, I switched to 5.4 and things started to go out of track. Trace narrowed down the problem to an assignment into a peripheral register.

I've seen that the 4.9 version creates 6 lines of assembly but the same simple statement was creating 15 lines of assembly with 5.4. Then I tried the most recent one, 6-2017-q2-update. The result was the same. Although version 6 creates substantially smaller code compared to 5.4, the particular peripheral was not working at all.

Here's the working output:

101 NVM_SPI->CTRL = USART_CTRL_SYNC | USART_CTRL_MSBF;
00023fd6: ldr r2,[pc,#0x74] ; 0x24048
00023fd8: ldr r3,[r2]
00023fda: movs r3,#0x0
00023fdc: orr r3,r3,#0x400
00023fe0: orr r3,r3,#0x1
00023fe4: str r3,[r2]

And here's the problematic output (both 5 and 6 branch):

101 NVM_SPI->CTRL = USART_CTRL_SYNC | USART_CTRL_MSBF;
20001682: ldr r3,[pc,#0xf8] ; 0x20001778
20001684: ldrb r2,[r3]
20001686: movs r2,#0x0
20001688: orr r2,r2,#0x1
2000168c: strb r2,[r3]
2000168e: ldrb r2,[r3,#0x1]
20001690: movs r2,#0x0
20001692: orr r2,r2,#0x4
20001696: strb r2,[r3,#0x1]
20001698: ldrb r2,[r3,#0x2]
2000169a: movs r2,#0x0
2000169c: strb r2,[r3,#0x2]
2000169e: ldrb r2,[r3,#0x3]
200016a0: movs r2,#0x0
200016a2: strb r2,[r3,#0x3]

****
NVM_SPI: (void*) (0x4000C400UL)
CTRL: volatile uint32_t at offset 0
USART_CTRL_SYNC: (0x1UL << 0)
USART_CTRL_MSBF: (0x1UL << 10)
****

There is no optimization (-O0). Things could get better with other optimization options due to assignment simplification, but I haven't tried it, yet.

Regards,

Additional Info:

Compilers: 5.4.3 and 6-2017-q2-update
OS: Ubuntu 17.10

P.S: Second assembly could be seen to work on RAM. The result is the same when set to work on flash.

Revision history for this message

john (jkovach) wrote on 2017-12-18:

Looks like the compiler is accessing NVM_SPI->CTRL one byte at a time. There must be something in the declaration of NVM_SPI->CTRL and maybe in the compiler command line options that make it do it.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-18:

Could be. I'm switching back and forth between 4.9 and 5.4, using the same options. The thing is, why the newer versions try to access a 32bit variable byte by byte on a 32bit CPU? Let's say they have their own causes. So why is the functionality broken when accessing one byte at a time?

Revision history for this message

john (jkovach) wrote on 2017-12-18:

Changes in major compiler version number are known to change the way some pragmas, attributes and so on are interpreted. You should examine your code and see if anything in the changelog from gcc 4 to gcc 5 affects it.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-18:

Couldn't find something definite. Just have seen this one:

****************************

2015-12-17 Thomas Preud'homme <email address hidden>

  * config/arm/arm.h (TARGET_USE_MOVT): Check MOVT/MOVW availability
  with TARGET_HAVE_MOVT.
  (TARGET_HAVE_MOVT): Define.
  * config/arm/arm.c (const_ok_for_op): Check MOVT/MOVW
  availability with TARGET_HAVE_MOVT.
  * config/arm/arm.md (arm_movt): Use TARGET_HAVE_MOVT to check movt
  availability.
  (addsi splitter): Use TARGET_USE_MOVT to check whether to use
  movt + movw.
  (symbol_refs movsi splitter): Remove TARGET_32BIT check.
  (arm_movtas_ze): Use TARGET_HAVE_MOVT to check movt availability.
  * config/arm/constraints.md (define_constraint "j"): Use
  TARGET_HAVE_MOVT to check movt availability.
***********************

Maybe removing TARGET_32BIT could have an effect on this issue. It's a long shot, but I don't have further ideas based on this document:

https://gcc.gnu.org/svn/gcc/branches/ARM/embedded-5-branch/gcc/ChangeLog.arm

Revision history for this message

john (jkovach) wrote on 2017-12-18:

Maybe you should show us the declarations of the structure and the variable in your code. Compiler options too.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-18:

Sure,

This is the most obvious failing line:

NVM_SPI->CTRL = USART_CTRL_SYNC | USART_CTRL_MSBF;

NVM_SPI is defined in a header file like this:

#define NVM_SPI USART1

USART1 is defined in a vendor library like this:

#define USART1 ((USART_TypeDef *) USART1_BASE) /**< USART1 base pointer */

USART1_BASE is defined in another file like this:

#define USART1_BASE (0x4000C400UL) /**< USART1 base address */

Here's the struct definition for USART_TypeDef:

typedef struct
{
__IOM uint32_t CTRL; __IOM uint32_t FRAME; __IOM uint32_t TRIGCTRL; __IOM uint32_t CMD; __IM uint32_t STATUS; __IOM uint32_t CLKDIV; __IM uint32_t RXDATAX; __IM uint32_t RXDATA; __IM uint32_t __IM uint32_t RXDOUBLE; __IM uint32_t RXDATAXP; __IM uint32_t __IOM uint32_t TXDATAX; __IOM uint32_t TXDATA; __IOM uint32_t TXDOUBLEX; __IOM uint32_t TXDOUBLE; __IM uint32_t IF; __IOM uint32_t IFS; __IOM uint32_t IFC; __IOM uint32_t IEN; __IOM uint32_t IRCTRL; __IOM uint32_t ROUTE; __IOM uint32_t INPUT; __IOM uint32_t I2SCTRL; } USART_TypeDef; /** @} */ /**< Control Register */
/**< USART Frame Format Register */
/**< USART Trigger Control register */
/**< Command Register */
/**< USART Status Register */
/**< Clock Control Register */
/**< RX Buffer Data Extended Register */
/**< RX Buffer Data Register */
RXDOUBLEX; /**< RX Buffer Double Data Extended Register */
/**< RX FIFO Double Data Register */
/**< RX Buffer Data Extended Peek Register */
RXDOUBLEXP; /**< RX Buffer Double Data Extended Peek Register */
/**< TX Buffer Data Extended Register */
/**< TX Buffer Data Register */
/**< TX Buffer Double Data Extended Register */
/**< TX Buffer Double Data Register */
/**< Interrupt Flag Register */
/**< Interrupt Flag Set Register */
/**< Interrupt Flag Clear Register */
/**< Interrupt Enable Register */
/**< IrDA Control Register */
/**< I/O Routing Register */
/**< USART Input Register */
/**< I2S Control Register */

And __IOM is defined as:

#define __IOM volatile

The C++ compiler options are:

-g3 -gdwarf-2 -mcpu=cortex-m3 -mthumb -std=c++1y '-DDEBUG=1' -O0 -pedantic -Wall -Wextra -c -fmessage-length=0 -fno-rtti -fno-exceptions -mno-sched-prolog -fno-builtin -fpack-struct -fshort-enums -ffunction-sections -fdata-sections

Note:I've removed folder inclusions and some simple preprocessor definitions to hide some sensitive data about our project. I'm sure they are not related.

Do you need anything else?

Sure,

This is the most obvious failing line:

NVM_SPI->CTRL = USART_CTRL_SYNC | USART_CTRL_MSBF;

NVM_SPI is defined in a header file like this:

#define NVM_SPI                 USART1

USART1 is defined in a vendor library like this:

#define USART1       ((USART_TypeDef *) USART1_BASE)        /**< USART1 base pointer */

USART1_BASE is defined in another file like this:

#define USART1_BASE       (0x4000C400UL) /**< USART1 base address  */

Here's the struct definition for USART_TypeDef:

typedef struct
{
  __IOM uint32_t CTRL;       /**< Control Register  */
  __IOM uint32_t FRAME;      /**< USART Frame Format Register  */
  __IOM uint32_t TRIGCTRL;   /**< USART Trigger Control register  */
  __IOM uint32_t CMD;        /**< Command Register  */
  __IM uint32_t  STATUS;     /**< USART Status Register  */
  __IOM uint32_t CLKDIV;     /**< Clock Control Register  */
  __IM uint32_t  RXDATAX;    /**< RX Buffer Data Extended Register  */
  __IM uint32_t  RXDATA;     /**< RX Buffer Data Register  */
  __IM uint32_t  RXDOUBLEX;  /**< RX Buffer Double Data Extended Register  */
  __IM uint32_t  RXDOUBLE;   /**< RX FIFO Double Data Register  */
  __IM uint32_t  RXDATAXP;   /**< RX Buffer Data Extended Peek Register  */
  __IM uint32_t  RXDOUBLEXP; /**< RX Buffer Double Data Extended Peek Register  */
  __IOM uint32_t TXDATAX;    /**< TX Buffer Data Extended Register  */
  __IOM uint32_t TXDATA;     /**< TX Buffer Data Register  */
  __IOM uint32_t TXDOUBLEX;  /**< TX Buffer Double Data Extended Register  */
  __IOM uint32_t TXDOUBLE;   /**< TX Buffer Double Data Register  */
  __IM uint32_t  IF;         /**< Interrupt Flag Register  */
  __IOM uint32_t IFS;        /**< Interrupt Flag Set Register  */
  __IOM uint32_t IFC;        /**< Interrupt Flag Clear Register  */
  __IOM uint32_t IEN;        /**< Interrupt Enable Register  */
  __IOM uint32_t IRCTRL;     /**< IrDA Control Register  */
  __IOM uint32_t ROUTE;      /**< I/O Routing Register  */
  __IOM uint32_t INPUT;      /**< USART Input Register  */
  __IOM uint32_t I2SCTRL;    /**< I2S Control Register  */
} USART_TypeDef;             /** @} */

And __IOM is defined as:

#define     __IOM    volatile

The C++ compiler options are:

Note:I've removed folder inclusions and some simple preprocessor definitions to hide some sensitive data about our project. I'm sure they are not related.

Do you need anything else?

Revision history for this message

john (jkovach) wrote on 2017-12-18:

-fpack-struct must be responsible for this. Why would you want this option anyway? If you really insist, you might want to have a look at the option "-munaligned-access" as well. The manual says " If unaligned access is not enabled then words in packed data structures are accessed a byte at a time."

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-18:

John,

You're right. -fpack-struct option definitely changes the behavior for 6.3.1.

Here's enabled and disabled assembly output (got with 6 branch):

Enabled:
ldr r3, .L11+4
ldrb r2, [r3]
movs r2, #0
orr r2, r2, #1
strb r2, [r3]
ldrb r2, [r3, #1]
movs r2, #0
orr r2, r2, #4
strb r2, [r3, #1]
ldrb r2, [r3, #2]
movs r2, #0
strb r2, [r3, #2]
ldrb r2, [r3, #3]
movs r2, #0
strb r2, [r3, #3]

Disabled:

ldr r3, .L11+4
movw r2, #1025
str r2, [r3]

Looks like unaligned access is enabled for v7-m architecture by default.

One thing remains. OK, it has to access byte by byte to the packed struct members, but why with wrong values? This smells like a bug, isn't it?

[ After some short time ]

By the way, I've looked at the code again and looks like it has to set the register with correct values even with -fpacked-struct option. So why I'm seeing different values in the debug session? Let's say GDB reports wrong values, why the peripheral is not working as expected.

Note: I need packed structs for serializing communication and storage.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-19:

And additionally, why is still accessing words one byte at a time even unaligned access is enabled? This behavior is correct on 4.9 but not correct on 5 and 6 branch.

Revision history for this message

Leo Havmøller (leh-p) wrote on 2017-12-19:

#10

New tools can reveal old problems, in this case the use of -fpacked-struct on the entire project.
> Note: I need packed structs for serializing communication and storage.
We all do. Use #pragma pack(push, 1) and #pragma pack(pop) in the affected header files.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-19:

#11

Hi Leo, yes I was going to revise the project and put pragma's for the necessary structs. But this still does not clarify what bothers me, byte by byte method is used even with enabled unaligned access. What's worse is, this creates issues on the real hardware, with an unknown cause, not just making the final image longer.

Clearly there's a regression on 5+ but I don't know the internals enough to find the root cause/commit.

Revision history for this message

john (jkovach) wrote on 2017-12-19:

#12

I think you should try to construct a minimal example demonstrating the problem. Who knows, maybe you'll find something in your setup that causes this.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-19:

#13

I guess the problem lies under the -fpack-struct option. No matter unaligned access state is, the compiler switches one byte at a time mode with seeing -fpack-struct option.

I tried this on a small file with the options I said before. Therefore my setup should not be the cause. I will try to build a really small reproducible example, tomorrow.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-20:

#14

Minimum test case Edit (401 bytes, text/x-c++src)

I've found the holy grail. Problem is happening if you are using a pointer to a struct. Normally, while accessing to that struct members, nothing is wrong, it accesses word by word. But if you declare a pointer to that struct and do your business with it, problem arises and GCC starts using byte by byte method even while unaligned access is enabled.

Here's worse part: It doesn't matter whether you use -fpack-struct flag or #pragma pack(push, 1). You'll get the same result.

Please find the simplest example as attachment. And use this line (don't forget to add your toolchains path in the beginning)

arm-none-eabi-g++ -gdwarf-2 -mcpu=cortex-m3 -mthumb -std=c++1y '-DDEBUG=1' -O0 -pedantic -Wall -Wextra -c -fmessage-length=0 -fno-rtti -fno-exceptions -mno-sched-prolog -fno-builtin -fshort-enums -ffunction-sections -fdata-sections -MMD -MP -MF"main.d" -MT"main.o" -S -o "main.s" "main.cpp"

Here's the output of the important part:

.LCFI2:
.cfi_def_cfa_register 7
.loc 1 17 0
ldr r2, .L3
mov r3, r7
str r3, [r2]
.loc 1 19 0
ldr r3, .L3
ldr r3, [r3]
ldrb r2, [r3]
movs r2, #0
orr r2, r2, #1
strb r2, [r3]
ldrb r2, [r3, #1]
movs r2, #0
orr r2, r2, #4
strb r2, [r3, #1]
ldrb r2, [r3, #2]
movs r2, #0
strb r2, [r3, #2]
ldrb r2, [r3, #3]
movs r2, #0
strb r2, [r3, #3]

I would be glad if anyone can pinpoint the root cause. Things should be gone bad between 4.9 and 5.4.

Regards,

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-20:

#15

Update: Problem still exists on the fresh release, 7-2017-q4-major.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-20:

#16

Update: The problem also exists on the first official 5 branch release, 5.2 (5-2015-q4-major). Now I'm trying to isolate the change from 4.9 to 5.2, with my non-existing GCC infrastructure knowledge :) I feel like I'm entering adark cave with full of bats :)

Revision history for this message

john (jkovach) wrote on 2017-12-20:

#17

Just tested with the latest release. GCC: (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204

gcc -mcpu=cortex-m3 -mthumb -O0 -munaligned-access -S test.c -o test.s

test.c:

#pragma pack(1)
struct s { short i; } volatile *p;
void f(void) { p->i = 0; }

output:

ldr r3, .L2
ldr r3, [r3]
ldrb r2, [r3]
movs r2, #0
strb r2, [r3]
ldrb r2, [r3, #1]
movs r2, #0
strb r2, [r3, #1]

There two problems:
1) Byte-wise access. One would expect "-munaligned-access" to help avoid this.
2) The variable is read before it is written for no apparent reason. This is in violation of the volatile semantics, as far as I can see. And this problem goes away when I remove "volatile" from source.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-20:

#18

Hi John,

Thanks for your interest and effort. I appreciate it. By the way, I've checked the output with 5.2 and the result is the same.

And here's the 4.9 output:

ldr r3, .L2
ldr r3, [r3]
ldrh r2, [r3] @ movhi
movs r2, #0
strh r2, [r3] @ movhi
mov sp, r7

Another boring thing, there is a redundant nop instruction before popping sp, in 5,6 and 7 versions. I don't know the necessity.

For me, the most suspicious thing is the addition of ARM-v8 architec. This is the major change for the 5. Even though the change is announced in 4.7, I've seen significant changes in 5.2 comparing to 4.9. Maybe mixing 64bit logic with 32bit logic could have an effect on this issue. Who knows? (Definitely not me :)

Regards,

Revision history for this message

john (jkovach) wrote on 2017-12-20:

#19

My guess is, the nop is there to make sure the literal pool is word-aligned.
As for the bug, our hope is that the guys from GNU ARM Embedded pick it up and report upstream. I don't think there is a point for you to try and pinpoint the exact commit that introduced it.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-20:

#20

Today, I really tried, but fell short. Meld is still open for the source directories, 4.9 and 5.2.

I guess the time is not right. Maybe in the new year. Will be waiting for the team.

Thanks again.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2017-12-20:

#21

By the way, could you click on "affects me" link to heat the issue up? This way the team may give it a better priority?

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2018-10-01:

#22

Hi,

I've opened a bug record on GCC Bugzilla (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87373) and was able to trace what is going on under the hood.

You can find all the details on that bug record.

I'm not sure about the GCC teams willingness to fix this but there is a issue with packed structs anyway. Maybe that could be fixed by the Linaro team via reverting back to 4.9 style.

summary:	- Simple OR'ed assignment breaks the code silently + Byte by byte access to peripheral registers causes issues
summary:	- Byte by byte access to peripheral registers causes issues + Byte by byte access to peripheral 32bit registers causes issues
summary:	- Byte by byte access to peripheral 32bit registers causes issues + Byte by byte access to 32bit peripheral registers causes issues

Revision history for this message

Thomas Preud'homme (thomas-preudhomme) wrote on 2018-10-05:

#23

Hi Murat,

I've read the thread and I agree with the final conclusion from Richard Earnshaw: doing byte by byte is the safest option from GCC's perspective when doing a volatile access as you cannot guarantee that unaligned access do. I think the source code should be changed to explicitely mark the first field as being aligned, or to exclude the volatile field from being packed.

Best regards.

Revision history for this message

Murat Ursavas (murat-ursavas) wrote on 2018-10-05:

#24

Hi Thomas,

I've applied the workaround and it is working as designed. But the library developers were unaware from this situation and I've informed them to track this internally.

My conclusion is; never use project wide pack struct option, because you'd never know when it would hit you.

Regards,

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Minimum test case Edit

Add attachment

Remote bug watches

gcc-bugzilla #87373
[RESOLVED WONTFIX] Edit

Bug watches keep track of this bug in other bug trackers.