Open Source Heartbeat: Open Source Heartbeat

Open Source Heartbeat

vsoch created a new tag, 0.2.28 at oras-project/oras-py

View Repository

vsoch opened a pull request to converged-computing/state-machine-operator

View Pull Request

vsoch pushed to conda-forge/oras-py-feedstock

updated v0.2.28 (#33)

View Commit

vsoch pushed to sciworks/spack-updater

compilers are now packages

View Commit

vsoch pushed to rseng/software

Merge pull request #416 from rseng/update/software-2025-03-30

Update from update/software-2025-03-30</small>

View Commit

vsoch pushed to converged-computing/performance-study

Merge pull request #85 from converged-computing/azure-osu-reruns

osu re-runs - not a success</small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

wip: add support for workflow events (#27)

  • wip: add support for workflow events

This will add support for ending the workflow early due to a count of successes, failures, or job duration metric. We need to next add ability to grow or shrink (need to think about how to do that, since we want a cloud agnostic solution) and then how to handle application specific metrics

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • feat: add support for minicluster

If we really want to test scale (shrink and grow) of a job and have it work with the cluster autoscaler, plus collecting metrics from an HPC app, we can most easily do that with the flux operator. This feature adds support for specifying a minicluster property to convert the previous indexed job into a MiniCluster. The flux operator needs to be installed.

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • feat: shrink with flux minicluster example working.

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • save state

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • feat: support for custom metrics

In this example, the user is allowed to provide a custom script that will be used against the log, and it needs to return a dictionary of values (the custom metrics). These are passed back to the manager from the state machine step and can influence workflow behavior (e.g., stop early, grow, or shrink.

Signed-off-by: vsoch vsoch@users.noreply.github.com


Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #305 from singularityhub/update/containers-2025-03-27

[bot] update/containers-2025-03-27</small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

feat: shrink with flux minicluster example working.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue oras-project/oras-py#185.

I’d be happy to review a PR that adds this functionality then….

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: analysis and plotting functions (#26)

  • feat: analysis and plotting functions
  • ensure x axis is same scale
  • add analysis libfuncs

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue vsoch/oci-python#23.

Thanks! I remember this bit me for other projects, I appreciate the catch here….

View Comment

vsoch commented on spack/spack

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: allow multiple node jobs

There is a bug in the kubernetes tracker that we treat the failed/succeeded as boolean (0/1) when it is actually a count of indices. We have not done experiments with >1 nodes so this has not been an issue (or caught). This change will fix it.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/performance-study

Merge pull request #84 from converged-computing/redo-osu

osu: fix runs for gpu 128 GKE and CE</small>

View Commit

vsoch commented on issue oras-project/oras-py#185.

Is this supported for the oras client in Go?…

View Comment

vsoch opened a pull request to spack/spack

View Pull Request

vsoch commented on issue skypilot-org/skypilot#3777.

Closing for no interest….

View Comment

vsoch pushed to rseng/software

Merge pull request #415 from rseng/update/software-2025-03-23

Update from update/software-2025-03-23</small>

View Commit

vsoch pushed to flux-framework/spack

bug: cffi needs to be present for link (configure) (#308)

Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue oras-project/oras-py#164.

Are we good to close here?…

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: save kubernetes logs.

We have been saving artifacts for everything, relying on the application to take the burden of saving its own logging retrieved from the registry. For experiments with gpu selection we just need one little value, and I think it would be easier to save all the logs instead of using oras. This feature supports that, where the user adds a properties -> save-path, and under that path “logs” is created that is named by the job, step, and pod index.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to conda-forge/oras-py-feedstock

oras-py v0.2.27 (#32)

  • updated v0.2.27

  • MNT: Re-rendered with conda-build 25.1.2, conda-smithy 3.47.0, and conda-forge-pinning 2025.03.21.21.56.39</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #304 from singularityhub/update/containers-2025-03-20

[bot] update/containers-2025-03-20</small>

View Commit

vsoch pushed to flux-framework/spack

re-enable flux checks

View Commit

vsoch pushed to converged-computing/state-machine-operator

bug: flux failed jobs do not have status COMPLETED, they are FAILED

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue pydicom/deid#275.

Closed with #276 …

View Comment

vsoch commented on issue flux-framework/flux-core#6713.

I couldn’t say now - I wound up sending a kill signal to the job, and didn’t save the data because I considered the run erroneous!…

View Comment

vsoch pushed to conda-forge/deid-feedstock

Merge pull request #48 from regro-cf-autotick-bot/0.4.1_h47750b

deid v0.4.1</small>

View Commit

vsoch commented on issue pydicom/deid#276.

I can see the output above and the logic in the code, so no need. I think this is good to go - if you could please bump the version in version.py and add a corresponding note in the CHANGELOG.md we should be good….

View Comment

vsoch pushed to converged-computing/state-machine-operator

feat: add more resource specs to flux tracker job submit

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to spack/spack

View Pull Request

vsoch merged a pull request to singularityhub/shpc-registry

View Pull Request

vsoch commented on issue pydicom/deid#274.

Yes! Generally speaking you’d want to have a regular expression that matches that case here: https://github.com/pydicom/deid/blob/14d1e4eb70f2c9fda43fca411794be9d8a5a8516/deid/utils/actions.py#L32 and then throw an error when the particular name for the function is missing, or if the name is not found in “item.” I’d be happy to review a PR for that, and a test could go here….

View Comment

vsoch pushed to flux-framework/spack

Update from update-package/flux-sched-2025-03-12 (#306)

  • Automated deployment to update package flux-sched 2025-03-12

  • Add 0.43 back


Co-authored-by: github-actions github-actions@users.noreply.github.com Co-authored-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

make more resilient to error

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

add support for oras arch for arm, etc. (#19)

  • add support for oras arch for arm, etc.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch merged a pull request to converged-computing/state-machine-operator

View Pull Request

vsoch pushed to converged-computing/state-machine-operator

values are always strings

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch reviewed a snakemake/snakemake-storage-plugin-gcs pull request

Looks good to me, as long as @johanneskoester gives a :+1: as well….

View Review

vsoch closed issue flux-framework/flux-python#9.

Potential issue with Flux 0.58 or the 0.57 python bindings installed with pip

Ran into some import issues with the latest flux 0.58 install (the public systems that are flux native at llnl), and the 0.57 python bindings from pypi: it seems the python bindings install isn’t quite finding the right things? Importing flux fails with a missing function in the c-python bindings (stack trace below, but doesn’t seem like that function’s a particular problme, just the first to get hit): …View Comment

vsoch commented on issue flux-framework/flux-python#13.

Please test / compare with the system flux, and look for differences. First, if the system level flux import doesn’t work, the issue is there. If there is a difference in the install structure, then we likely need an update to the logic. Let me know what you find….

View Comment

vsoch pushed to converged-computing/state-machine-operator

nit: rename cores per task to cores_per_task

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch merged a pull request to converged-computing/state-machine-operator

View Pull Request

vsoch commented on issue snakemake/snakemake-executor-plugin-googlebatch#57.

Sure, have fun!…

View Comment

vsoch pushed to rseng/software

Merge pull request #413 from rseng/update/software-2025-03-09

Update from update/software-2025-03-09</small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

final tweak

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to flux-framework/spack

Fix missing hipBlas symbol (#49298)

Co-authored-by: Eric B. Chin chin23@llnl.gov Co-authored-by: Greg Becker becker33@llnl.gov</small>

View Commit

vsoch pushed to converged-computing/state-machine-operator

bug: additional active jobs added

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/mummi-experiments

notes from meeting and workflow updates

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #302 from singularityhub/update/containers-2025-03-06

[bot] update/containers-2025-03-06</small>

View Commit

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #144 from flux-framework/release-docs-2025-03-06

Update from release-docs-2025-03-06</small>

View Commit

vsoch opened a pull request to converged-computing/state-machine-operator

View Pull Request

vsoch opened a pull request to spack/spack

View Pull Request

vsoch commented on issue jbms/sphinx-immaterial#412.

Excellent, thank you!…

View Comment

vsoch commented on issue jbms/sphinx-immaterial#412.

Thank you!…

View Comment

vsoch pushed to flux-framework/spack

QtPackage: modify QT_ADDITIONAL_PACKAGES_PREFIX_PATH handling (#49297)

  • QtPackage: mv QT_ADDITIONAL_PACKAGES_PREFIX_PATH handling

  • geomodel: support Qt6

  • qt-base: rm import re</small>

View Commit

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #143 from flux-framework/release-docs-2025-03-05

Update from release-docs-2025-03-05</small>

View Commit

vsoch commented on issue rootless-containers/usernetes#368.

Confirmed just now that increasing the uid range in those files fixes all the issues. Are there other options to that? I don’t think we could do that on a production system….

View Comment

vsoch open issue jbms/sphinx-immaterial#412.

sphinx_immaterial.nav_adapt.MkdocsNavEntry object' has no attribute 'parent'

I haven’t built my docs for a while, and a new error has popped up! Does it look familar?…View Comment

vsoch commented on issue huggingface/gpu-fryer#3.

Gotcha, thanks for the speedy response! …

View Comment

vsoch merged a pull request to flux-framework/flux-operator

View Pull Request

vsoch pushed to converged-computing/aws-performance-study

add gpu-fryer note - only intended for single nodes

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue containerd/nerdctl#2020.

+1, this would be useful for me as well. I’m having a hard time with podman on our HPC clusters, and the underlying issue is the uid mappings (and how many there are in the rootless kind container, there would be no reasonable way to give those kinds of ranges to each user). But I think I could map specific ones to the user on the system but need this exposed. I was trying nerdctl and it failed with not being able to extract layers because of permissions….

View Comment

vsoch commented on issue cloudmercato/ai-benchmark#2.

@Oil3 did you ever test this on more than one GPU or node? I ran the benchmark today on one node, one GPU and only one test didn’t run (a verison of keras too new) and I’m wondering if it can extend beyond that. From a quick glance it seems like maybe it would work on >1 GPU but not more than one node?…

View Comment

vsoch pushed to flux-framework/Tutorials

Merge pull request #47 from flux-framework/hpcic-2024-tutorial-slides

hpcic 2024: adding tutorial slides</small>

View Commit

vsoch pushed to converged-computing/flux-tutorials

Add notebook tutorial (#9)

  • add notebook tutorial and ci

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/aws-performance-study

add gpu burn

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to vsoch/ai-benchmark

readme: tensorflow dependency update

The tensorflow-gpu library is deprecated (and no longer pip installable).</small>

View Commit

vsoch pushed to rseng/software

Merge pull request #412 from rseng/update/software-2025-03-02

Update from update/software-2025-03-02</small>

View Commit

vsoch pushed to flux-framework/spack

py-pymc3: not compatible with numpy 2 (#49225)

View Commit

vsoch pushed to converged-computing/google-performance-study

add initial test mnist data here

This is being removed from flux-usernetes and I do not want to lose it

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/flux-usernetes

Merge pull request #23 from converged-computing/google-cloud-gpu-experiment

experiment: gke/usernetes on compute engine v100 1:1 gpu:node</small>

View Commit

vsoch opened a pull request to converged-computing/flux-apps-helm

View Pull Request

vsoch commented on issue cloudmercato/ai-benchmark#2.

It looks like the code no longer has this (although the pip install does) but maybe this would work?…

View Comment

vsoch pushed to converged-computing/flux-usernetes

gke: sizes 4,8,16,32 pytorch mnist

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to spack/spack

View Pull Request

vsoch commented on hpc-social/good-first-issues

View Comment

vsoch commented on issue flux-framework/spack#303.

This will be closed by https://github.com/spack/spack/pull/49230…

View Comment

vsoch pushed to flux-framework/flux-go

Merge pull request #4 from flux-framework/add-python-grpc

feat: add python grpc service</small>

View Commit

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #142 from flux-framework/release-docs-2025-02-28

Update from release-docs-2025-02-28</small>

View Commit

vsoch open issue converged-computing/state-machine-operator#6.

A few TODO for state machine operator

These are from my personal notes - not high priority so putting them here….View Comment

vsoch commented on issue singularityhub/shpc-registry#301.

I think we would want to make sure that the path is derived as simply the digest with sif. If you want to open a PR to work on it I’d be happy to review….

View Comment

vsoch commented on issue spack/spack#49197.

I’m OK not being a maintainer here, but of course if you run into issues (I’m guessing this is for Mummi?) please come to me first!…

View Comment

vsoch closed a pull request to flux-framework/spack

View Pull Request

vsoch pushed to flux-framework/spack

py-sympy: add v1.13.1 (#48951)

  • py-sympy: add v1.13.1</small>

View Commit

vsoch commented on issue snakemake/snakemake-executor-plugin-googlebatch#56.

Is it possible something cached your directory state that needs to be reset / cleaned?…

View Comment

vsoch commented on issue rootless-containers/usernetes#366.

oh wow, this is really interesting!…

View Comment

vsoch pushed to researchapps/usernetes

ci: add test for rootful docker

This is important to run on multi-node

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue NeuroVault/neurovault_collection_downloader#3.

> I believe https://github.com/NeuroVault/pynv was the intended replacement. …

View Comment

vsoch pushed to rseng/software

Merge pull request #411 from rseng/update/software-2025-02-23

Update from update/software-2025-02-23</small>

View Commit

vsoch commented on issue converged-computing/performance-study#78.

This likely won’t be merged, but I’ll add the results (from when I ran them) for transparency. This thread is from December 15th 2024. …

View Comment

vsoch commented on issue NVIDIA/nvidia-container-toolkit#56.

Figured it out….

View Comment

vsoch pushed to vsoch/vsoch.github.io

rename AKS to Azure Kubernetes Service

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue rootless-containers/usernetes#242.

I got this fully working in rootless mode - I’ll put together a quick writeup soon….

View Comment

vsoch pushed to converged-computing/flux-usernetes

gpu pytorch add dockerfile

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue rootless-containers/usernetes#242.

I made some progress in the 11 hours since I posted, and I think now the issue is in the space of nvidia. What is almost working is to set no-cgroups = true in the nvidia container runtime config.toml, but then there are issues with containerd on the kubelet. I posted more here: https://github.com/NVIDIA/nvidia-container-toolkit/issues/56#issuecomment-2673830806…

View Comment

vsoch pushed to researchapps/usernetes

ci: add test for rootful docker

This is important to run on multi-node

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to hpc-social/jobs

run jobs updater once a day

View Commit

vsoch commented on issue NVIDIA/nvidia-container-toolkit#85.

@elezar is CDI supported for Docker 28.0.0 now? I am having this specific issue (where I can’t use no-cgroups) and would like to test CDI - my setup is using rootless docker and docker compose….

View Comment