Learn Programming

2076 readers

12 users here now

Posting Etiquette

Ask the main part of your question in the title. This should be concise but informative.
Provide everything up front. Don't make people fish for more details in the comments. Provide background information and examples.
Be present for follow up questions. Don't ask for help and run away. Stick around to answer questions and provide more details.
Ask about the problem you're trying to solve. Don't focus too much on debugging your exact solution, as you may be going down the wrong path. Include as much information as you can about what you ultimately are trying to achieve. See more on this here: https://xyproblem.info/

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 2 years ago

MODERATORS

Ategon@programming.dev

JaumeI@programming.dev

bugsmith@programming.dev

During Data Races: What Is the Risk of Accessing Half-Written Variables? Is There Any? (slrpnk.net)

submitted 1 week ago by Killercat103@slrpnk.net to c/learn_programming@programming.dev

12 comments fedilink hide all child comments

This question comes mainly from curiosity. I'm not quite sure how to phrase it best. Especially in a title. But I'm wondering if say you have one thread writing to a variable of an essentially primitive type and one thread reading them at the same time if there's any likelihood of the read happening while the variable is half written causing either weird values or undefined behavior.

Take something like a value of 8 bits from 00010101 to 11101000.

I'm imagining if say 4 bits are written while we try to read it the result could be something like

11100101

To play around i made this small sample rust. It passed without making garbage. Printing at first a bunch of lines stating "String = Hello!" and second "String = Hi!" without weirdness or issues. I kind of half-expected something like "String = #æé¼¨A" or a segfault.

use std::thread::{self, JoinHandle, sleep};

const HELLO: &str = "Hello!";
const HI: &str = "Hi!";

struct ExemptSyncStringSlice<'a>(&'a str);

unsafe impl Sync for ExemptSyncStringSlice<'_> {}

fn print_ptr(pointer: *const ExemptSyncStringSlice)
{
	for _ in 1..500
	{
		unsafe
		{
			println!("String = {}", (*pointer).0);
		}
	}
}

fn main()
{
	
	static mut DESYNC_POINTER: ExemptSyncStringSlice = ExemptSyncStringSlice(HELLO);

	let join_handle: JoinHandle<()> = thread::spawn
	(
		|| {
			print_ptr(&raw const DESYNC_POINTER);
		}
	);
	sleep(time::Duration::from_millis(1));
	unsafe { DESYNC_POINTER.0 = HI; }
	
	join_handle.join().unwrap();
}

you are viewing a single comment's thread
view the rest of the comments

[–] e0qdk@reddthat.com 7 points 1 week ago (1 children)

In general, yes, it's possible to end up with half-written variables -- mutexes (and/or other synchronization primitives) are used to avoid that kind of problem.

Whether you can encounter it in practice depends on the specific programming language, CPU, compiler, and actual instructions involved. Some operations that seem like they should be atomic from the perspective of a high level language are actually implemented as multiple machine code instructions and the thread could be interrupted between them, causing problems, unless steps are deliberately taken to avoid issues with concurrency.

I have minimal experience with Rust, so I'm not sure how bad the footguns you can run into with unsafe are there specifically, but you can definitely blow your leg off in C/C++...

[–] Killercat103@slrpnk.net 2 points 1 week ago (1 children)

Hm. Good to know. As far as i know. rusts unsafe is kind of like sudo where more tools are granted to you with less checks and balances. Those checks being the borrow checker. So I think unsafe is more like saying "I don't want the extra protection against bad memory use." (Like say using memory you already freed). Now im not familiar with what makes a variable atomic or not. But assuming if a variable can be read or written with a single instruction. I'm interpereting that as making it safe?

[–] e0qdk@reddthat.com 4 points 1 week ago

Whether something's atomic or not depends on the language you use -- and if the language was vague about it (like old C) then also how the CPU works.

At the CPU instruction level, there are other factors like how an instruction interacts with memory. Go look up CMPXCHG (from x86) for example, if you want to go down the rabbit hole. There's a StackOverflow answer here that you might find interesting about using that instruction in combination with the LOCK prefix.

At the language level, there are usually either guarantees (or a lack of guarantees...) about what is safe to do. C++11 (and later) have std::atomic for defining variables that are accessible from multiple threads safely without manually using a mutex, for example. You generally cannot assume that writing to a variable will be safe between threads otherwise (even if you think the operation should compile to a single CPU instruction) without using a supported concurrency mechanism that the compiler knows how to deal with. Consider the case where the compiler chooses to store a value in a register during a loop as an optimization and only write the value back to RAM at the end of the loop -- while that value is changed in RAM by another thread! If you use an atomic variable or protect access with a mutex, then the program will behave coherently. If not, you can end up with inconsistent state between threads and then who the fuck knows what will happen. This SA answer might also be interesting to you.

In Python (specifically the cpython implementation), there's the Global Interpreter Lock (GIL). Some things are safe to do there in that language implementation that aren't safe to do in C because of the GIL. (You still generally shouldn't depend on them though since people are trying to remove the GIL to get better performance out of Python...) Basically, cpython guarantees that only one thread can run Python byte code at a time so interactions are serialized at that level even if the OS swaps threads in the middle of cpython computing the behavior of an instruction.

Hope that helps a bit.