OSError: [Errno 30] Cannot create directory '/efs'. Detail: [errno 30] Read-only file system #57

Open
opened 2025-11-02 00:02:42 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @bhavya-giri on GitHub (Oct 26, 2023).

@GokuMohandas can you help me figure this out

Originally created by @bhavya-giri on GitHub (Oct 26, 2023). @GokuMohandas can you help me figure this out
Author
Owner

@bhavya-giri commented on GitHub (Oct 30, 2023):

Screenshot 2023-10-30 at 7 58 46 AM Screenshot 2023-10-30 at 7 59 08 AM Screenshot 2023-10-30 at 7 59 18 AM
@bhavya-giri commented on GitHub (Oct 30, 2023): <img width="1414" alt="Screenshot 2023-10-30 at 7 58 46 AM" src="https://github.com/GokuMohandas/Made-With-ML/assets/102273412/1669d9be-f8c6-4201-9f7f-0769a0efeeb2"> <img width="1414" alt="Screenshot 2023-10-30 at 7 59 08 AM" src="https://github.com/GokuMohandas/Made-With-ML/assets/102273412/9a57482d-1cb3-4f80-9cd5-204f5f101413"> <img width="1414" alt="Screenshot 2023-10-30 at 7 59 18 AM" src="https://github.com/GokuMohandas/Made-With-ML/assets/102273412/ff9aade6-d574-404f-9178-e3ec8b8a2a36">
Author
Owner

@Meryl-Fang commented on GitHub (Nov 12, 2023):

same error here, have you managed to resolve it?

@Meryl-Fang commented on GitHub (Nov 12, 2023): same error here, have you managed to resolve it?
Author
Owner

@taaha commented on GitHub (Nov 12, 2023):

I am having the same issue and no idea why. basically it is unable to load function from madewilml/data directory. A hack that worked for me is to create and run the following code cell above this erroneous code cell

import re
from typing import Dict, List, Tuple

import numpy as np
import pandas as pd
import ray
from ray.data import Dataset
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer

def stratify_split(
    ds: Dataset,
    stratify: str,
    test_size: float,
    shuffle: bool = True,
    seed: int = 1234,
) -> Tuple[Dataset, Dataset]:
    """Split a dataset into train and test splits with equal
    amounts of data points from each class in the column we
    want to stratify on.

    Args:
        ds (Dataset): Input dataset to split.
        stratify (str): Name of column to split on.
        test_size (float): Proportion of dataset to split for test set.
        shuffle (bool, optional): whether to shuffle the dataset. Defaults to True.
        seed (int, optional): seed for shuffling. Defaults to 1234.

    Returns:
        Tuple[Dataset, Dataset]: the stratified train and test datasets.
    """

    def _add_split(df: pd.DataFrame) -> pd.DataFrame:  # pragma: no cover, used in parent function
        """Naively split a dataframe into train and test splits.
        Add a column specifying whether it's the train or test split."""
        train, test = train_test_split(df, test_size=test_size, shuffle=shuffle, random_state=seed)
        train["_split"] = "train"
        test["_split"] = "test"
        return pd.concat([train, test])

    def _filter_split(df: pd.DataFrame, split: str) -> pd.DataFrame:  # pragma: no cover, used in parent function
        """Filter by data points that match the split column's value
        and return the dataframe with the _split column dropped."""
        return df[df["_split"] == split].drop("_split", axis=1)

    # Train, test split with stratify
    grouped = ds.groupby(stratify).map_groups(_add_split, batch_format="pandas")  # group by each unique value in the column we want to stratify on
    train_ds = grouped.map_batches(_filter_split, fn_kwargs={"split": "train"}, batch_format="pandas")  # combine
    test_ds = grouped.map_batches(_filter_split, fn_kwargs={"split": "test"}, batch_format="pandas")  # combine

    # Shuffle each split (required)
    train_ds = train_ds.random_shuffle(seed=seed)
    test_ds = test_ds.random_shuffle(seed=seed)

    return train_ds, test_ds

Basically instead of importing it which it is failing to do so (no idea why) we are directly using the function in the notebook

@taaha commented on GitHub (Nov 12, 2023): I am having the same issue and no idea why. basically it is unable to load function from madewilml/data directory. A hack that worked for me is to create and run the following code cell above this erroneous code cell ``` import re from typing import Dict, List, Tuple import numpy as np import pandas as pd import ray from ray.data import Dataset from sklearn.model_selection import train_test_split from transformers import BertTokenizer def stratify_split( ds: Dataset, stratify: str, test_size: float, shuffle: bool = True, seed: int = 1234, ) -> Tuple[Dataset, Dataset]: """Split a dataset into train and test splits with equal amounts of data points from each class in the column we want to stratify on. Args: ds (Dataset): Input dataset to split. stratify (str): Name of column to split on. test_size (float): Proportion of dataset to split for test set. shuffle (bool, optional): whether to shuffle the dataset. Defaults to True. seed (int, optional): seed for shuffling. Defaults to 1234. Returns: Tuple[Dataset, Dataset]: the stratified train and test datasets. """ def _add_split(df: pd.DataFrame) -> pd.DataFrame: # pragma: no cover, used in parent function """Naively split a dataframe into train and test splits. Add a column specifying whether it's the train or test split.""" train, test = train_test_split(df, test_size=test_size, shuffle=shuffle, random_state=seed) train["_split"] = "train" test["_split"] = "test" return pd.concat([train, test]) def _filter_split(df: pd.DataFrame, split: str) -> pd.DataFrame: # pragma: no cover, used in parent function """Filter by data points that match the split column's value and return the dataframe with the _split column dropped.""" return df[df["_split"] == split].drop("_split", axis=1) # Train, test split with stratify grouped = ds.groupby(stratify).map_groups(_add_split, batch_format="pandas") # group by each unique value in the column we want to stratify on train_ds = grouped.map_batches(_filter_split, fn_kwargs={"split": "train"}, batch_format="pandas") # combine test_ds = grouped.map_batches(_filter_split, fn_kwargs={"split": "test"}, batch_format="pandas") # combine # Shuffle each split (required) train_ds = train_ds.random_shuffle(seed=seed) test_ds = test_ds.random_shuffle(seed=seed) return train_ds, test_ds ``` Basically instead of importing it which it is failing to do so (no idea why) we are directly using the function in the notebook
Author
Owner

@bhavya-giri commented on GitHub (Nov 12, 2023):

But the same error would come in training, check this repo https://github.com/GokuMohandas/mlops-course

@bhavya-giri commented on GitHub (Nov 12, 2023): But the same error would come in training, check this repo https://github.com/GokuMohandas/mlops-course
Author
Owner

@gOsuzu commented on GitHub (Dec 3, 2023):

As the error message indicated, this error caused by the permission related to /efs folder, you are creating.
I assume you use your own local machine. I edited like below, and it worked in my local environment, Mac OS (14.1.2) and Python 3.10.11. The path would be different, depending on where your directory located. I hope this might help you.

  1. config.py
    Change line 13:
    EFS_DIR = Path(f"/Users/<your_user_name>/efs/shared_storage/madewithml/{os.environ.get('GITHUB_USERNAME', '')}")

  2. madewithml.ipynb
    Change the codes in Setup section:
    EFS_DIR = f"/Users/<your_user_name>/efs/shared_storage/madewithml/{os.environ['GITHUB_USERNAME']}"

@gOsuzu commented on GitHub (Dec 3, 2023): As the error message indicated, this error caused by the permission related to `/efs` folder, you are creating. I assume you use your own local machine. I edited like below, and it worked in my local environment, Mac OS (14.1.2) and Python 3.10.11. The path would be different, depending on where your directory located. I hope this might help you. 1. `config.py` Change line 13: `EFS_DIR = Path(f"/Users/<your_user_name>/efs/shared_storage/madewithml/{os.environ.get('GITHUB_USERNAME', '')}") ` 2. `madewithml.ipynb` Change the codes in Setup section: `EFS_DIR = f"/Users/<your_user_name>/efs/shared_storage/madewithml/{os.environ['GITHUB_USERNAME']}"`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/Made-With-ML#57