[GH-ISSUE #297] CTGan Synthetic Data generation #5310

Closed
opened 2026-04-21 21:18:34 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @emmanuel2406 on GitHub (Jun 24, 2024).
Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/297

#Ex. 5.3
This is for the external Colab notebook attached for that exercise.

Suggestion: Perhaps copying this notebook and changing a few things would make it a smoother experience for the user.

I have found two things:

  • Downloading the dataset patients.csv proves to be quite nontrivial - I bypassed the terminal command and simply just copied the file into my Google drive.
  • CTGanSynthesizer does not seem to exist in the latest version of ctgan. I have seen online references to the library sdv, which has this method, but I then have to change all the commands going forward.
Originally created by @emmanuel2406 on GitHub (Jun 24, 2024). Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/297 #Ex. 5.3 This is for the external Colab notebook attached for that exercise. Suggestion: Perhaps copying this notebook and changing a few things would make it a smoother experience for the user. I have found two things: - Downloading the dataset patients.csv proves to be quite nontrivial - I bypassed the terminal command and simply just copied the file into my Google drive. - CTGanSynthesizer does not seem to exist in the latest version of ctgan. I have seen online references to the library `sdv`, which has this method, but I then have to change all the commands going forward.
GiteaMirror added the area: booktype: improvement labels 2026-04-21 21:18:34 -05:00
Author
Owner

@profvjreddi commented on GitHub (Jun 26, 2024):

@emmanuel2406 Thanks for reporting this issue. Would you be able to share your Colab so that we can take a look at it? @shanzehbatool if you have some time to take a look at this it would be great!

<!-- gh-comment-id:2191744945 --> @profvjreddi commented on GitHub (Jun 26, 2024): @emmanuel2406 Thanks for reporting this issue. Would you be able to share your Colab so that we can take a look at it? @shanzehbatool if you have some time to take a look at this it would be great!
Author
Owner

@emmanuel2406 commented on GitHub (Jun 26, 2024):

@emmanuel2406 Thanks for reporting this issue. Would you be able to share your Colab so that we can take a look at it? @shanzehbatool if you have some time to take a look at this it would be great!

Sure, seems like it wasn't serious though. To run the cells you must just replace CTGANSyntehsizer with CTGAN. Here is a copy of the notebook
https://colab.research.google.com/drive/1jUBiyhW4_PaWt0RSGk2ewnBUDrnF2d6J#scrollTo=cpV1FWHevaWO

<!-- gh-comment-id:2192639217 --> @emmanuel2406 commented on GitHub (Jun 26, 2024): > @emmanuel2406 Thanks for reporting this issue. Would you be able to share your Colab so that we can take a look at it? @shanzehbatool if you have some time to take a look at this it would be great! Sure, seems like it wasn't serious though. To run the cells you must just replace `CTGANSyntehsizer` with `CTGAN`. Here is a copy of the notebook https://colab.research.google.com/drive/1jUBiyhW4_PaWt0RSGk2ewnBUDrnF2d6J#scrollTo=cpV1FWHevaWO
Author
Owner

@shanzehbatool commented on GitHub (Jul 1, 2024):

Just saw this; I'm not able to access the Colab shared here but yes, CTGAN should work instead of CTGANSyntehsizer. And would need to directly obtain the dataset from Synthea. If the Colab shared works, then great, else I could also look into an alternative.

<!-- gh-comment-id:2199003967 --> @shanzehbatool commented on GitHub (Jul 1, 2024): Just saw this; I'm not able to access the Colab shared here but yes, `CTGAN` should work instead of `CTGANSyntehsizer`. And would need to directly obtain the dataset from [Synthea](https://synthetichealth.github.io/synthea/). If the Colab shared works, then great, else I could also look into an alternative.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#5310