What does this mean by memory-safe language?

We often see many programming languages that have some kind of overlapping features, such as static-typed or memory-safety, or many more. While the first one is easy to explain, like we create a variable with a particular type at compile time, and then at the runtime environment, the type of this variable is preserved and cannot be changed, for example, if this was an integer at compile time then it still must be an integer at the compile time. But what about memory-safety? What does this mean by saying this X language is memory-safe? Let’s look at some examples in this post where different languages might have their own approach to memory safety.

Before going to a formal definition, we can imagine a situation where we allocate some amount of memory for an integer array with a size of 4, for example. Let’s say we use Java here, so the size of an integer will be 4 bytes, then essentially we have a contiguous block of 4 * 4 = 16 bytes for this array in memory, because Java is memory-safe, you can do all kind of operations that’s permitted in this memory segment, but not anywhere else! You can access the first index, you can update the value of the last index, but if you try to access the 5th index (which doesn’t exist in the array), then we get some IndexOutOfBoundException at runtime, meaning that the language will stop you from accessing the location that you didn’t allocate the memory for at the beginning.

Let’s look at another example, we’re going to calculate the NCD values of a pair of strings in Java:

float computeNCD(String a, String b) {
   Integer sizeA = compress(a);
   Integer sizeB = compress(b);
   Integer sizeBoth = compress(a + b);
   return (sizeBoth - Math.min(sizeA, sizeB)) / 
           (float)Math.max(sizeA, sizeB);
}

Let’s assume that we would call this computeNCD function 10000 times in a short period with different data input, and one potential question is whether memory is still allocated for the local variables after the function call has finished. In the snippet code above, this will not happen since Java has a garbage collector, and it will free up the memory of those unused local variables after the reference to them expires. If the memory is deallocated, you cannot reference it anymore, which is essentially the second property of the memory-safe language.

In an unsafe-memory language, let’s say C, you have to do the memory allocation and deallocation yourself, and dereference a variable where the memory allocated for it was freed, then it violates one of the memory safety properties, and in this event, the outcome would not be deterministic but occur in many different ways such as crash immediately, return garbage data, appear to work correctly or corrupt other data, etc…Here is one of the example where we rewrote the NCD formula but resulting in a memory-safety issue in C:

float computeNCD(char* a, char* b) {
  int* sizeA = (int*)malloc(sizeof(int));
  int* sizeB = (int*)malloc(sizeof(int));

  *sizeA = compress(a);
  *sizeB = compress(b);
  
  char* combined = (char*)malloc(strlen(a) + strlen(b) + 1);
  strcpy(combined, a);
  strcat(combined, b);
  
  int* sizeBoth = (int*)malloc(sizeof(int));
  *sizeBoth = compress(combined);
   
  float result = (*sizeBoth - MIN(*sizeA, *sizeB)) / 
                 (float)MAX(*sizeA, *sizeB);

  free(sizeBoth);
  
  // USE AFTER FREE - accessing freed memory
  printf("Compressed value was: %d\n", *compressedBoth);

  return result;
}

We freed the memory allocated for the sizeBoth variable, and then try to access this memory address again, as we discussed above, this might result in different unwanted behavior, such as crashing the program, or this now might contain a different piece of data. Technically, this kind of access is called “use-after-free” access. Attackers can exploit these “use-after-free” vulnerabilities to execute arbitrary code on our system.

From a formal perspective, a programming language is considered “memory-safe” when it guarantees all the memory accesses in programs written in this language are well-defined and cannot violate the intended memory model. This means the language prevents:

Spatial memory safety violations: Accessing memory outside of the bounds of allocated objects (e.g, accessing an index that doesn’t belong to an array)
Temporal memory safety violations: Accessing the memory that has already been deallocated or not yet allocated (like accessing the variable after it’s freed)

As demonstrated, you cannot violate those properties in Java, but this is possible to do in unsafe memory languages like C/C++.

One question we might wonder is if a language has a garbage collector, then is this language memory-safe? The answer is that this is not sufficient, as we can look back at the formal constraints that we have above, the garbage collector prevents the temporal memory safety, but doesn’t necessarily guarantee that the other property also holds. As a memory-safe language, Java also prevents spatial memory safety violations. It provides a comprehensive list of features to achieve this, some of examples are automatic bound-checking for arrays, null-dereference checking, no direct pointer manipulation, automatic string bounds management, and many more. Some of the features are demonstrated in the example below:

// Bound-checking for arrays
int[] arr = new int[5]; 
try {
   arr[10] = 42; // Attempt to access out of bound index, will not succeed
} catch(ArrayIndexOutOfBoundsException e) {
  System.out.println("Prevented spatial memory violation: " + e.getMessage());
}

// Null-deference checking
String str = null;
try {
   int len = str.length(); // will throw NullPointerException
} catch(NullPointerException e) {
   System.out.println("Prevented null dereference: " + e.getMessage());
}

// No direct pointer manipulation

// In Java, this is impossible
// int[] arr = new int[10];
// int* ptr = arr + 5; // no pointer arithmetic 
// *(ptr + 10) = 42; // No ability to move outside of bounds


// Automatic string bound management

// In Java, you don't have to manually allocate the size for the string

String str1 = "Hello";
String str2 = ", World!";
// automatically allocate sufficient array size for string concatenation 
String helloWorld = str1 + str2; // no need to pre-allocate or manipulation the buffer size

Interestingly, while Java achieves memory safety through garbage collection, automatic memory management, bounds checking, and no-direct pointer manipulation, Rust takes a fundamentally different approach to achieve memory safety, where it shifts most checks from runtime to compile time through its innovative ownership system.

The Rust memory-safety feature revolves around these key concepts in its memory model:

Ownership: Every value in Rust has a single owner variable.
Borrowing: References to values can be “borrowed” under some strict rules.
Lifetimes: The compiler checks how long references are valid

Here is how Rust’s ownership system prevents memory safety issues:

fn main() {
   // Ownership example
   let s1 = String::from("hello"); // s1 owns the string
   let s2 = s1; // the ownership now moves to s2

   // this would cause a compile error, since s1 no longer owns string
   // println!("{}", s1);
   
   // Borrowing example
   let s3 = String::from("World");
   println(&s3);  // borrowing s3 (immutable reference)
   
   mutate_str(&s3); // error 
   
   // only can have one mutable borrow, prevents data-race condition
   mutate_mutable_str(mut &s3);
   
}

fn mutate_mutable_str(s: &mut String) {
  s.push_str("OK"); // compile just fine
}

fn mutate_str(s: &String) {
   s.push_str("!!!"); // compile error, cannot borrow *s as mutable
}

Here, Rust’s compiler analyzes the code and ensures:

No use-after-move: Trying to use s1 after moving to s2 will be caught at compile time.
No dangling reference: References cannot outlive (live longer) than the data that they refer to
No data race: Can have more than one mutable reference to the same piece of data

Security vulnerabilities relate to memory-safety issues

Critical security vulnerability example: Heartbleed

Now, let’s walk through some critical examples where attackers can exploit the memory-safety issues of the program, potentially resulting in exposing sensitive information. Here is a simple version of the Heartbleed bug security vulnerability affecting the OpenSSL library written in C, it’s essentially a buffer-over-read where attackers can read more memory than intended, potentially exposing sensitive information like private keys:

// Vulnerable C code similar to Heartbleed
void process_heartbeat(unsigned char *request, int length) {
    // Request contains: [type(1 byte)][payload_length(2 bytes)][payload][...]
    unsigned short payload_length = *(unsigned short*)(request + 1);
    unsigned char* payload = request + 3; 

    // No validation that payload_length matches actual data length!
    
    unsigned char* response = malloc(3 + payload_length);
    response[0] = 1; // type;
    *(unsigned short*)(response + 1) = payload_length;
    
    // copying payload_length bytes regardless of the size of the actual data
    memcpy(response + 3, payload, payload_length);
    
    // send response...
}

The problem in this code is that it blindly trusts the payload_length will be the actual length of the request data and allocates this amount of memory for the payload. However, a malicious request would send the payload_length which is larger than the actual data it sends, the memcpy function would read beyond the bounds of the request buffer, potentially exposing sensitive server memory.

Some would argue that this is more likely a programming error from programmers, and we could just use some other memory-safe version of this program. However, this is indeed a memory-safety issue since the unwanted request could just read the memory location that it isn’t supposed to. Since C has no built-in function for bounds checking, arrays themselves don’t store their length.

We could reiterate that in Java, it’s easy to check for the length and access out-of-bound memory, always resulting ArrayIndexOutOfBoundException, while the same action might result in different behaviors:

Reading sensitive data from other parts of the memory (as with Heartbleed)
Overwriting critical memory structure
Remote code execution
Silent data corruption

In Java, we could simply do some checks:

void processHeartbeat(byte[] request) {
    if (request.length < 3) {
        throw new IllegalArgumentException("Request too short");
    }
    
    // Extract payload length (assuming big-endian)
    int payloadLength = ((request[1] & 0xFF) << 8) | (request[2] & 0xFF);
    
    // Validate actual length
    if (payloadLength + 3 > request.length) {
        throw new IllegalArgumentException("Invalid payload length");
    }
    
    // Safe copy - Java ensures we can't read past the end of arrays
    byte[] payload = Arrays.copyOfRange(request, 3, 3 + payloadLength);
    
    // Create response...
}

Another example: Remote code execution (RCE)

Another exploitative technique is through dangling pointers (use-after-free), when memory is accessed after being freed, potentially allowing attackers to execute arbitrary code (remote code execution). This technique is particularly dangerous in web browsers and has been exploited in many zero-day attacks.

Here is one simplified example of this type of attack:

class Document {
public:
    void removeElement(Element* element) {
        // remove element from the tree and free the memory
        delete element; 
        
        // Update counters, send events, etc...
        updateAfterRemoval();
    }

    void updateAfterRemoval() {
        // this might access to removed elements
        for(auto& observer: observers) {
           observer->onElementRemoved(lastRemovedElement); 
        }
    }
    
    Element* lastRemovedElement;
    std::vector<Observer*> observers;
}

In this code, lastRemovedElement could be a dangling pointer after the element is deleted. If the observer tries to access this during the notification event, it’s then accessing the freed memory, which could be exploited by attackers.

Java simply prevents this behavior from happening in the first place:

class Document {
   private Element lastRemovedElement;
   private List<Observer> observers;

   public void removeElement(Element element) {
       // Store reference before removal
       lastRemovedElement = element;

       // Remove the element from the tree (but the memory is not immediately freed)
       removeFromTree(element);
       
       // Update and notify 
       updateAfterRemoval();

       // event after this method ends, the garbage collector won't free the element's memory. 
       // until there are no more references
   } 

   public void updateAfterRemoval() {
       // Safe to access lastRemovedElement here
       for(Observer observer: observers) {
           observer.onElementRemoved(lastRemovedElement);
       }
   }
}

The garbage collector in Java only reclaims the memory when no more reference to an object exists, since the lastRemovedElement still holds a reference, the memory isn’t freed, preventing the use-after-free issue.

Why should we care about this memory issue?

As demonstrated in some examples above, the memory issue is a serious problem that goes beyond just some small code snippet examples. The real-world impact of the memory safety issue is staggering:

Microsoft: 70% of all vulnerabilities in Microsoft products over the last decade were memory safety issues.
Google Chrome: Around 70% of high-severity security bugs were caused by memory safety problems.
Android: Approximately 90% of security vulnerabilities were related to memory safety

0 0 votes

Article Rating

Facebook Tweet LinkedIn

What does this mean by memory-safe language?10 min read

Security vulnerabilities relate to memory-safety issues

Critical security vulnerability example: Heartbleed

Another example: Remote code execution (RCE)

Why should we care about this memory issue?