matthiaserb.dev

Returning Strings

Memory Management in C: How to Return Strings from Functions?

Working with strings in C requires a solid grasp of memory management and how data is passed into and out of functions. In this article, we explore different methods for returning strings, their implications, and best practices.

Comparing Different String Return Approaches

Let’s examine three different ways to return a string from a function:

// Version 1: Returning a pointer to a stack-allocated array x
char *foo() {
    char ptr[] = "test";
    return ptr;
}

// Version 2: Returning a pointer to a string literal
char *foo() {
    char *ptr = "test";
    return ptr;
}

// Version 3: Directly returning a string literal
char *foo() {
    return "test";
}

Version 1: Stack-Allocated Array

This first version is incorrect and potentially dangerous. Here, ptr is an array allocated on the stack and initialized with “test”. When foo returns, the stack frame is destroyed and the memory for ptr deallocated. The returned pointer now points to memory that is no longer valid - a classic “dangling pointer”.

Attempting to use this pointer in the calling function constitutes undefined behavior. While it might seem to work if the memory isn’t immediately overwritten, this is unreliable, unsafe and must be avoided.

Version 2 & 3: String Literals

Versions 2 and 3 are functionally identical. In both cases, the function returns a pointer to a string literal. String literals in C exist for the entire lifetime of the process.

The third version is essentially a more concise version of the second - the compiler handles the array-to-pointer decay automatically without the need for an intermediate variable.

These approaches are safe from a memory management perspective because the string literal persists throughout the program’s lifetime. But there’s a critical limitation: string literals cannot be modified. Attempting to change the string (e.g., s[0] = 'b') in the calling function in the best case results in a segmentation fault, sending the program crashing. More about that later.

How String Literals Work in C

According to the C11 standard (Section 6.4.5), string literals are stored as arrays of characters with static storage duration:

… an array of static storage duration and length just sufficient to contain the sequence …

When a string literal appears in an expression context (like a return statement), it undergoes array-to-pointer decay (Section 6.3.2.1), implicitly converting it from an “array of type” to a pointer to its first element:

… an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue

This is why return "test"; returns a pointer to the character ’t’ - the first element of the static array containing “test”. In other words, it is a shorthand for:

    char *ptr = "test";
    return ptr;

Integer and Float Literals vs. String Literals

A bit off-topic, but unlike string literals, integer and float literals are returned by value:

int foo() {
    return 42;
}

float foo() {
    return 3.14;
}

The actual number (42 | 3.14) is copied to the caller, no pointers are involved.

Best Practices for Returning Modifiable Strings

Back to returning strings from functions. Our two earlier approaches are both correct, do work and with compiler optimizations even result in the exact same machine code, placing a memory address in the appropriate CPU register (tested with gcc -O2) before returning:

<+0>:    lea     0x2b99(%rip),%rax
<+7>:    ret

The issue is that in both cases any attempt to later modify the string in the calling function results in a segmentation fault and crash. Being stored in the read-only .rdata section of the binary, such strings can never be modified, they are immutable.

// Returning a pointer to a string literal
char *foo() {
    char *ptr = "test";
    return ptr;
}

// Directly returning a string literal, functionally identical
char *foo() {
    return "test";
}

int main(){
    char *s = foo();
    printf("%s\n", s); // Prints "test"
    s[0] = 'b'; // Segmentation fault!
}

So if that’s not the solution, and returning a character array created on the stack didn’t work either, the correct approach must make use of dynamic memory allocation on the heap. The difference is only where that happens:

Approach 1: Dynamic Allocation in the Called Function

char *foo() {
    char *ptr = malloc(5 * sizeof(char));  // Four characters + '\0'
    if (ptr == NULL) {
        return NULL;  // Handle allocation failure
    }
    strcpy(ptr, "test");  // Copy string into allocated memory
    return ptr;
}

int main() {
    char *s = foo();
    if (s != NULL) {
        printf("%s\n", s);  // Prints "test"
        s[0] = 'b';  // Safe to modify
        printf("%s\n", s);  // Prints "best"
        free(s);  // No memory leaks!
    }
    return 0;
}

This approach allocates memory on the heap, which:

The memory is allocated in the called function, but the caller takes ownership of the memory and is responsible for freeing it. This might make things harder to maintain.

Approach 2: Pre-allocating Memory in the Caller

void foo(char *buffer, size_t size) {
    strncpy(buffer, "test", size);
    buffer[size - 1] = '\0';  // Null termination
}

int main() {
    char *s = malloc(5 * sizeof(char));
    if (s != NULL) {
        foo(s, 5);
        printf("%s\n", s);  // Prints "test"
        s[0] = 'b';  // Safe to modify
        printf("%s\n", s);  // Prints "best"
        free(s);  // No memory leaks!
    }
    return 0;
}

Here, the caller allocates the memory and passes a pointer to it to the function. foo() writes the requested string into the buffer but does not manage its allocation or deallocation. This approach:

Choosing the Right Approach

Both approaches have their place:

The second approach is often considered the more native in C, aligning with many standard library functions like strncpy.

Conclusion

In C, seemingly trivial tasks like returning a string from a function require thorough understanding of memory management. When returning strings from functions:

  1. Stack-allocated arrays can never be returned
  2. String literals are safe to return and persist throughout the program’s lifetime but cannot be modified
  3. For modifiable strings, either:
    • Allocate memory in the function and have the caller free it, or
    • Have the caller allocate memory and pass a pointer to it to the function

The best approach depends on the specifics of the situation. No matter which method you choose, don’t forget to free()!