Open Source Heartbeat: Open Source Heartbeat

Open Source Heartbeat

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #157 from flux-framework/release-docs-2025-06-06

Update from release-docs-2025-06-06</small>

View Commit

vsoch opened a pull request to converged-computing/flux-apps-helm

View Pull Request

vsoch pushed to converged-computing/aws-performance-study

wip: testing ml containers (#3)

  • wip: testing ml containers

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • add gpu-fryer note - only intended for single nodes

Signed-off-by: vsoch vsoch@users.noreply.github.com


Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #338 from singularityhub/update/containers-2025-06-05

[bot] update/containers-2025-06-05</small>

View Commit

vsoch commented on issue oras-project/oras-py#204.

> Do you mean to use one decorator to handle both …

View Comment

vsoch pushed to rseng/software

Merge pull request #425 from rseng/update/software-2025-06-01

Update from update/software-2025-06-01</small>

View Commit

vsoch commented on issue oras-project/oras-py#204.

> Because the ‘container’ arg’s position is different in get and put like methods, the two kinds of decorator allow user call with positional args. …

View Comment

vsoch pushed to singularityhub/shpc-registry

Merge pull request #336 from singularityhub/update/containers-2025-06-02

[bot] update/containers-2025-06-02</small>

View Commit

vsoch pushed to singularityhub/singularity-hpc

Test updated typos action (#692)

  • Test updated typos action
  • nit: spelling error exported
  • nit: spelling error install guide
  • example: remove global cli from google example</small>

View Commit

vsoch commented on issue oras-project/oras-py#204.

The call that triggers the error doesn’t provide args, it provides the key word arguments “kwargs” because it explicitly says target=….

View Comment

vsoch pushed to singularityhub/guts

feat: add support for trace with ldd (#9)

  • feat: add support for trace with ldd
  • dev: bump black version requirements
  • remove centos

Signed-off-by: vsoch vsoch@users.noreply.github.com


Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue oras-project/oras-py#204.

Why are you calling this in one off functions instead of at the class init?…

View Comment

vsoch pushed to converged-computing/google-performance-study

analysis: file access differences

the ebpf open information is likely most interesting when looking at differences between environments. E.g., for each command, cpu vs gpu, and then rocky vs ubuntu (holding mpi variant constant) and mpich vs. openmpi (holding base os constant).

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to converged-computing/google-performance-study

View Pull Request

vsoch pushed to compspec/ocifit

bump guts version

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch open issue flux-framework/flux-accounting#650.

limits: look into adding support for quota/queue limits based on compute hours

In cloud, it’s common to want to limit a user’s ability to run jobs on a specific instance type (e.g., one using GPUs). Speaking with @cmoussa1, we think the core metrics are already collecting / existing, and a little bit of extra exposure of configuration might make it work. Here are notes from our discussion:…View Comment

vsoch pushed to conda-forge/deid-feedstock

Merge pull request #50 from regro-cf-autotick-bot/0.4.3_hf289fb

deid v0.4.3</small>

View Commit

vsoch pushed to compspec/ocifit

Merge pull request #2 from compspec/add-guts

add support for mpi paths</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #335 from singularityhub/update/containers-2025-05-29

[bot] update/containers-2025-05-29</small>

View Commit

vsoch commented on issue pydicom/deid#278.

https://pypi.org/project/deid/0.4.3/…

View Comment

vsoch pushed to flux-framework/Tutorials

Merge pull request #50 from milroy/isc25-k8s

ISC2025 K8s configurations, setup, and instructions</small>

View Commit

vsoch pushed to flux-framework/Tutorials

Merge pull request #49 from flux-framework/update-jupyter-slim-containers

update jupyter to 4.2.0</small>

View Commit

vsoch pushed to converged-computing/descriptive-pixi

environment: llama

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch created a new branch, main at compspec/ocifit

View Repository

vsoch commented on issue pydicom/deid#279.

This looks good! Please bump the version in version.py and add a corresponding note to the CHANGELOG.md and we can merge….

View Comment

vsoch commented on oras-project/oras-py

View Comment

vsoch commented on issue oras-project/oras-py#201.

Totally ok! Glad you found the issue….

View Comment

vsoch commented on issue google/dranet#93.

We can definitely add support for the Flux Operator! How are the devices (?) exposed to the applications in a pod?…

View Comment

vsoch created a new branch, main at converged-computing/descriptive-pixi

View Repository

vsoch pushed to singularityhub/shpc-registry

Merge pull request #329 from singularityhub/update/containers-2025-05-26

[bot] update/containers-2025-05-26</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #330 from singularityhub/update/containers-2025-05-27

[bot] update/containers-2025-05-27</small>

View Commit

vsoch commented on oras-project/oras-py

View Comment

vsoch commented on issue oras-project/oras-py#201.

I would look into exactly how the (working) oras in Go is making the call - it could be a nuanced difference in a header or similar. And see if the response turns anything back that hints about the issue….

View Comment

vsoch pushed to converged-computing/flux-apps-helm

Merge pull request #35 from converged-computing/add-device-device-to-osu

osu example for device to device</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

manual update to remove 404 container digests for jupyter data science notebook

Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>

View Commit

vsoch opened a pull request to converged-computing/flux-tutorials

View Pull Request

vsoch commented on issue singularityhub/shpc-registry#327.

The issue here is your pull rate limit - you are the one doing the pulling. This is not under the jurisdiction of something I can control….

View Comment

vsoch pushed to rseng/software

Merge pull request #424 from rseng/update/software-2025-05-25

Update from update/software-2025-05-25</small>

View Commit

vsoch pushed to hpc-social/events

Add –ignore-installed

View Commit

vsoch pushed to converged-computing/google-performance-study

mfem and samurai runs

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/flux-apps-helm

test: amg2023

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #328 from singularityhub/update/containers-2025-05-24

[bot] update/containers-2025-05-24</small>

View Commit

vsoch open issue rootless-containers/usernetes#373.

Infiniband for older kernel

We’ve been able to get Infiniband working with Usernetes, primarily using UCX and then having the devices /dev/infiniband bound from the host. We have a setup of usernetes on on-premises (our first on a production cluster and not in VMs alongside) and what I’ve found is the avenue to bind devices and then use ibverbs and ucx works up until the point it needs ulimit -l to be unlimited:…View Comment

vsoch pushed to hpc-social/jobs

Update update-jobs.yaml

View Commit

vsoch pushed to converged-computing/usernetes-python

add user level services

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue singularityhub/shpc-registry#327.

What do you suggest the shpc library do about a registry rate limit? I’m not sure there is anything in our control to change or help….

View Comment

vsoch commented on issue mfem/mfem#4848.

@v-dobrev good news! I have credits left and started testing today - I have data for 3 iterations for sizes 4, 8, and 16, but the pod was OOMKilled at size 32….

View Comment

vsoch commented on issue kubernetes/enhancements#4671.

> Would anyone be interested in pursuing the possibility of having filter extension point to take a group of pods and a group of nodes and perform the searching for the combination of nodes for a list of pods at once? The current filter extension point is not built for this. Yet, some plugins might be extended this way and if a proper implementation is provided this might be a path forward. Has anyone already concluded this is no-go? Or, this path has not been explored in more detail due to its complexity?…

View Comment

vsoch commented on issue hpc-maths/samurai#322.

I would basically do:…

View Comment

vsoch commented on issue hpc-maths/samurai#322.

Do you have an example with strong scaling? Or just use min and max == 14 and give more resources?…

View Comment

vsoch pushed to flux-framework/flux-python

Merge pull request #15 from TauferLab/search_path_fix

Fixes the search for Flux in setup.py to eliminate symlinks</small>

View Commit

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #154 from flux-framework/release-docs-2025-05-22

Update from release-docs-2025-05-22</small>

View Commit

vsoch opened a pull request to converged-computing/flux-apps-helm

View Pull Request

vsoch pushed to converged-computing/converged-computing.github.io

feat: community survey 25

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue singularityhub/singularity-docker#25.

You are very welcome! The slim images are done as well: https://github.com/singularityhub/singularity-docker/actions/runs/15131013108. Definitely ping me anytime you want an updated version….

View Comment

vsoch open issue flux-framework/spack#336.

Spack - reorganized modules

Spack reorganized their modules again, e.g.,…View Comment

vsoch pushed to converged-computing/jobspec-conversion

gemma is terrible

I also did not check for both zero, so we have a set that are incorrect! But the accuracy is still pretty good. Of course now I am anxious about the API pricing… :X

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue E3SM-Project/codesign-kernels#9.

These are the mins and maxes for each cluster size:…

View Comment

vsoch pushed to singularityhub/shpc-registry

Merge pull request #324 from singularityhub/update/containers-2025-05-19

[bot] update/containers-2025-05-19</small>

View Commit

vsoch commented on issue rootless-containers/usernetes#372.

@AkihiroSuda - I sent the above to our admin asking to enable br_netfilter and he enabled all the extra kernel modules, and it is working now. So this was lot of work trying to debug, but ultimately we didn’t have the basic requirements needed for the setup (and I had no way to tell, so I went down a rabbit hole)….

View Comment

vsoch commented on issue rootless-containers/usernetes#372.

A little further - I manually added that file /run/flannel/subnet.env

View Comment

vsoch pushed to converged-computing/google-performance-study

add ebpf gpur uns for size 64

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/google-performance-study

fixed run - was not using correct problem size

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/flux-apps-helm

test laghos with jammy view

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue NVIDIA/nvidia-container-toolkit#85.

Sorry I didn’t report back! I got this working end of February, wrote up details here….

View Comment

vsoch commented on issue spack/spackbot#106.

I noticed this for my spack package updater - it’s definitely nice that they are modules now, but the need to do the transform (and errors that result for cases where that is forgotten) is

vsoch pushed to rseng/software

Merge pull request #423 from rseng/update/software-2025-05-18

Update from update/software-2025-05-18</small>

View Commit

vsoch commented on issue mfem/mfem#4848.

@v-dobrev I wanted to leave you a quick note that I haven’t forgotten about this - I’m been working around the clock to develop and finish eBPF experiments in Kubernetes with remaining cloud credits that expire on May 30th. I think I’m in good shape, and (if we have enough left) I’m going to finish setting this up and (hopefully) get to run. And if we don’t get to run the mfem benchmarks for this paper, I’ll still include the application with automated testing in our set, and we are planning on doing another round on AWS with slightly tweaked containers (libfabric) so it 100% will be included, either sooner or later! And we already have plenty of apps that use mfem, so rest assured you are in there!

vsoch pushed to converged-computing/google-performance-study

one more iteration for size 64, ebpf

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch merged a pull request to converged-computing/flux-apps-helm

View Pull Request

vsoch commented on issue rootless-containers/usernetes#372.

It’s called TOSS, and it’s a derivative of RHEL 8.10…

View Comment

vsoch pushed to flux-framework/spack

kokkos ecosystem: release 4.6.01 (#50271)

View Commit

vsoch pushed to flux-framework/flux-operator

bug: volumes across containers cannot be duplicated (#242)

  • bug: volumes across containers cannot be duplicated

We currently create mounts and allow for duplication. We can add a simple map with a boolean (emulating a set) to ensure the same mount is not added twice.

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • pre push

Signed-off-by: vsoch vsoch@users.noreply.github.com


Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/google-performance-study

ebpf lammps runs size 128

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/flux-apps-helm

working to collect 5 at once!

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to oras-project/oras-py

Merge pull request #198 from Sojamann/tls-verification

fix: method signature to allow custom CA-Bundles</small>

View Commit

vsoch merged a pull request to oras-project/oras-py

View Pull Request

vsoch commented on issue flux-framework/flux-docs#302.

New output type - job stories!…

View Comment

vsoch opened a pull request to converged-computing/google-performance-study

View Pull Request

vsoch pushed to converged-computing/flux-apps-helm

wip: adding working programs

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Merge pull request #323 from singularityhub/update/containers-2025-05-15

[bot] update/containers-2025-05-15</small>

View Commit

vsoch closed a pull request to flux-framework/flux-framework.github.io

View Pull Request

vsoch commented on issue rootless-containers/usernetes#372.

That is my impression as well after debugging - I don’t have permissions to look at firewalls and iptables, but I’ve pinged our admins to check on that. Once this block is open, we will have the first (fully functioning) on premises setup of usernetes at the lab! I will report back what we find….

View Comment

vsoch pushed to converged-computing/flux-apps-helm

three programs.

and tears. today was hard.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch open issue rootless-containers/usernetes#372.

resolv.conf has incorrect address

Hi @AkihiroSuda ! We have our first on-premises setup of usernetes (this is a huge deal, getting all the plumbing setup!) and worked through an issue today with the pod not having any network connectivity (but the usernetes node did). What I noticed is that the coredns pod had these addresses:…View Comment

vsoch pushed to flux-framework/flux-operator

feat: expose dnsPolicy (#241)

  • feat: expose dnsPolicy

Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/usernetes-python

hours of failure. fun.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/usernetes-python

organize Dockerfile alongside crd

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/flux-lima

simplify ebpf image

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/flux-apps-helm

add ebpf test program

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue hpc-maths/samurai#322.

Thank you! I should be able to test again this week! I’m epically flailing with ebpf in containers at the moment.

vsoch opened a pull request to converged-computing/flux-apps-helm

View Pull Request

vsoch pushed to compspec/compat-lib

do not require provided mount path

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue DLR-AMR/t8code#1615.

Thanks @Davknapp ! I can definitely try again, although I don’t have many ideas, at least at the moment. We run these with the flux operator, which is deploying flux framework (an HPC workload manager and scheduler) within a closed space of pods in a Kubernetes cluster, so that means (depending on the cloud) we can use networks like Infiniband (Azure) and EFA (AWS), and the performance isn’t bad. For this set of tests we are in Google Cloud, which unfortunately is just optimized ethernet (they call it “Titanium” and the details aren’t revealed), but I’ve run over 25 applications and (at least for small sizes) they scale generally OK up until about 64 nodes. …

View Comment

vsoch pushed to rseng/software

Merge pull request #422 from rseng/update/software-2025-05-11

Update from update/software-2025-05-11</small>

View Commit

vsoch pushed to flux-framework/spack

Automated deployment to update flux-sched versions 2025-05-11 (#335)

Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/google-performance-study

update lammps plots

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to sciworks/spack-updater

Package underscore (#50)

  • ensure we install with an underscore
  • do not use zlib
  • remove openslide, takes too long
  • use package name to install

Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue mpi4jax/mpi4jax#280.

Will be testing on multiple nodes later today - in the meantime I built the container and made the animation, it’s gorgeous!

vsoch pushed to flux-framework/spack

test branch that renames with underscore (#334)

  • test branch that renames with underscore
  • restore to main
  • quick test

Signed-off-by: vsoch vsoch@users.noreply.github.com


Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit