vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #157 from flux-framework/release-docs-2025-06-06
Update from release-docs-2025-06-06</small>
vsoch opened a pull request to converged-computing/flux-apps-helm
vsoch pushed to converged-computing/aws-performance-study
wip: testing ml containers (#3)
- wip: testing ml containers
Signed-off-by: vsoch vsoch@users.noreply.github.com
- add gpu-fryer note - only intended for single nodes
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #338 from singularityhub/update/containers-2025-06-05
[bot] update/containers-2025-06-05</small>
vsoch commented on issue oras-project/oras-py#204.
> Do you mean to use one decorator to handle both …
vsoch pushed to rseng/software
Merge pull request #425 from rseng/update/software-2025-06-01
Update from update/software-2025-06-01</small>
vsoch commented on issue oras-project/oras-py#204.
> Because the ‘container’ arg’s position is different in get and put like methods, the two kinds of decorator allow user call with positional args. …
vsoch pushed to singularityhub/shpc-registry
Merge pull request #336 from singularityhub/update/containers-2025-06-02
[bot] update/containers-2025-06-02</small>
vsoch pushed to singularityhub/singularity-hpc
Test updated typos action (#692)
- Test updated typos action
- nit: spelling error exported
- nit: spelling error install guide
- example: remove global cli from google example</small>
vsoch commented on issue oras-project/oras-py#204.
The call that triggers the error doesn’t provide args, it provides the key word arguments “kwargs” because it explicitly says target=….
vsoch pushed to singularityhub/guts
feat: add support for trace with ldd (#9)
- feat: add support for trace with ldd
- dev: bump black version requirements
- remove centos
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue oras-project/oras-py#204.
Why are you calling this in one off functions instead of at the class init?…
vsoch pushed to converged-computing/google-performance-study
analysis: file access differences
the ebpf open information is likely most interesting when looking at differences between environments. E.g., for each command, cpu vs gpu, and then rocky vs ubuntu (holding mpi variant constant) and mpich vs. openmpi (holding base os constant).
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to converged-computing/google-performance-study
vsoch pushed to compspec/ocifit
bump guts version
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch open issue flux-framework/flux-accounting#650.
limits: look into adding support for quota/queue limits based on compute hours
In cloud, it’s common to want to limit a user’s ability to run jobs on a specific instance type (e.g., one using GPUs). Speaking with @cmoussa1, we think the core metrics are already collecting / existing, and a little bit of extra exposure of configuration might make it work. Here are notes from our discussion:…View Comment
vsoch pushed to conda-forge/deid-feedstock
Merge pull request #50 from regro-cf-autotick-bot/0.4.3_hf289fb
deid v0.4.3</small>
vsoch pushed to compspec/ocifit
Merge pull request #2 from compspec/add-guts
add support for mpi paths</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #335 from singularityhub/update/containers-2025-05-29
[bot] update/containers-2025-05-29</small>
vsoch commented on issue pydicom/deid#278.
https://pypi.org/project/deid/0.4.3/…
vsoch pushed to flux-framework/Tutorials
Merge pull request #50 from milroy/isc25-k8s
ISC2025 K8s configurations, setup, and instructions</small>
vsoch pushed to flux-framework/Tutorials
Merge pull request #49 from flux-framework/update-jupyter-slim-containers
update jupyter to 4.2.0</small>
vsoch pushed to converged-computing/descriptive-pixi
environment: llama
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch created a new branch, main at compspec/ocifit
vsoch commented on issue pydicom/deid#279.
This looks good! Please bump the version in version.py and add a corresponding note to the CHANGELOG.md and we can merge….
vsoch commented on issue oras-project/oras-py#201.
Totally ok! Glad you found the issue….
vsoch commented on issue google/dranet#93.
We can definitely add support for the Flux Operator! How are the devices (?) exposed to the applications in a pod?…
vsoch created a new branch, main at converged-computing/descriptive-pixi
vsoch pushed to singularityhub/shpc-registry
Merge pull request #329 from singularityhub/update/containers-2025-05-26
[bot] update/containers-2025-05-26</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #330 from singularityhub/update/containers-2025-05-27
[bot] update/containers-2025-05-27</small>
vsoch commented on issue oras-project/oras-py#201.
I would look into exactly how the (working) oras in Go is making the call - it could be a nuanced difference in a header or similar. And see if the response turns anything back that hints about the issue….
vsoch pushed to converged-computing/flux-apps-helm
Merge pull request #35 from converged-computing/add-device-device-to-osu
osu example for device to device</small>
vsoch pushed to singularityhub/shpc-registry
manual update to remove 404 container digests for jupyter data science notebook
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch opened a pull request to converged-computing/flux-tutorials
vsoch commented on issue singularityhub/shpc-registry#327.
The issue here is your pull rate limit - you are the one doing the pulling. This is not under the jurisdiction of something I can control….
vsoch pushed to rseng/software
Merge pull request #424 from rseng/update/software-2025-05-25
Update from update/software-2025-05-25</small>
vsoch pushed to hpc-social/events
Add –ignore-installed
vsoch pushed to converged-computing/google-performance-study
mfem and samurai runs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-apps-helm
test: amg2023
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #328 from singularityhub/update/containers-2025-05-24
[bot] update/containers-2025-05-24</small>
vsoch open issue rootless-containers/usernetes#373.
Infiniband for older kernel
We’ve been able to get Infiniband working with Usernetes, primarily using UCX and then having the devices /dev/infiniband
bound from the host. We have a setup of usernetes on on-premises (our first on a production cluster and not in VMs alongside) and what I’ve found is the avenue to bind devices and then use ibverbs and ucx works up until the point it needs ulimit -l to be unlimited:…View Comment
vsoch pushed to hpc-social/jobs
Update update-jobs.yaml
vsoch pushed to converged-computing/usernetes-python
add user level services
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue singularityhub/shpc-registry#327.
What do you suggest the shpc library do about a registry rate limit? I’m not sure there is anything in our control to change or help….
vsoch commented on issue mfem/mfem#4848.
@v-dobrev good news! I have credits left and started testing today - I have data for 3 iterations for sizes 4, 8, and 16, but the pod was OOMKilled at size 32….
vsoch commented on issue kubernetes/enhancements#4671.
> Would anyone be interested in pursuing the possibility of having filter extension point to take a group of pods and a group of nodes and perform the searching for the combination of nodes for a list of pods at once? The current filter extension point is not built for this. Yet, some plugins might be extended this way and if a proper implementation is provided this might be a path forward. Has anyone already concluded this is no-go? Or, this path has not been explored in more detail due to its complexity?…
vsoch commented on issue hpc-maths/samurai#322.
I would basically do:…
vsoch commented on issue hpc-maths/samurai#322.
Do you have an example with strong scaling? Or just use min and max == 14 and give more resources?…
vsoch pushed to flux-framework/flux-python
Merge pull request #15 from TauferLab/search_path_fix
Fixes the search for Flux in setup.py to eliminate symlinks</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #154 from flux-framework/release-docs-2025-05-22
Update from release-docs-2025-05-22</small>
vsoch opened a pull request to converged-computing/flux-apps-helm
vsoch pushed to converged-computing/converged-computing.github.io
feat: community survey 25
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue singularityhub/singularity-docker#25.
You are very welcome! The slim images are done as well: https://github.com/singularityhub/singularity-docker/actions/runs/15131013108. Definitely ping me anytime you want an updated version….
vsoch open issue flux-framework/spack#336.
Spack - reorganized modules
Spack reorganized their modules again, e.g.,…View Comment
vsoch pushed to converged-computing/jobspec-conversion
gemma is terrible
I also did not check for both zero, so we have a set that are incorrect! But the accuracy is still pretty good. Of course now I am anxious about the API pricing… :X
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue E3SM-Project/codesign-kernels#9.
These are the mins and maxes for each cluster size:…
vsoch pushed to singularityhub/shpc-registry
Merge pull request #324 from singularityhub/update/containers-2025-05-19
[bot] update/containers-2025-05-19</small>
vsoch commented on issue rootless-containers/usernetes#372.
@AkihiroSuda - I sent the above to our admin asking to enable br_netfilter
and he enabled all the extra kernel modules, and it is working now. So this was lot of work trying to debug, but ultimately we didn’t have the basic requirements needed for the setup (and I had no way to tell, so I went down a rabbit hole)….
vsoch commented on issue rootless-containers/usernetes#372.
A little further - I manually added that file /run/flannel/subnet.env
…
vsoch pushed to converged-computing/google-performance-study
add ebpf gpur uns for size 64
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/google-performance-study
fixed run - was not using correct problem size
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-apps-helm
test laghos with jammy view
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue NVIDIA/nvidia-container-toolkit#85.
Sorry I didn’t report back! I got this working end of February, wrote up details here….
vsoch commented on issue spack/spackbot#106.
I noticed this for my spack package updater - it’s definitely nice that they are modules now, but the need to do the transform (and errors that result for cases where that is forgotten) is
vsoch pushed to rseng/software
Merge pull request #423 from rseng/update/software-2025-05-18
Update from update/software-2025-05-18</small>
vsoch commented on issue mfem/mfem#4848.
@v-dobrev I wanted to leave you a quick note that I haven’t forgotten about this - I’m been working around the clock to develop and finish eBPF experiments in Kubernetes with remaining cloud credits that expire on May 30th. I think I’m in good shape, and (if we have enough left) I’m going to finish setting this up and (hopefully) get to run. And if we don’t get to run the mfem benchmarks for this paper, I’ll still include the application with automated testing in our set, and we are planning on doing another round on AWS with slightly tweaked containers (libfabric) so it 100% will be included, either sooner or later! And we already have plenty of apps that use mfem, so rest assured you are in there!
vsoch pushed to converged-computing/google-performance-study
one more iteration for size 64, ebpf
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch merged a pull request to converged-computing/flux-apps-helm
vsoch commented on issue rootless-containers/usernetes#372.
It’s called TOSS, and it’s a derivative of RHEL 8.10…
vsoch pushed to flux-framework/spack
kokkos ecosystem: release 4.6.01 (#50271)
vsoch pushed to flux-framework/flux-operator
bug: volumes across containers cannot be duplicated (#242)
- bug: volumes across containers cannot be duplicated
We currently create mounts and allow for duplication. We can add a simple map with a boolean (emulating a set) to ensure the same mount is not added twice.
Signed-off-by: vsoch vsoch@users.noreply.github.com
- pre push
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/google-performance-study
ebpf lammps runs size 128
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-apps-helm
working to collect 5 at once!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to oras-project/oras-py
Merge pull request #198 from Sojamann/tls-verification
fix: method signature to allow custom CA-Bundles</small>
vsoch merged a pull request to oras-project/oras-py
vsoch commented on issue flux-framework/flux-docs#302.
New output type - job stories!…
vsoch opened a pull request to converged-computing/google-performance-study
vsoch pushed to converged-computing/flux-apps-helm
wip: adding working programs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #323 from singularityhub/update/containers-2025-05-15
[bot] update/containers-2025-05-15</small>
vsoch closed a pull request to flux-framework/flux-framework.github.io
vsoch commented on issue rootless-containers/usernetes#372.
That is my impression as well after debugging - I don’t have permissions to look at firewalls and iptables, but I’ve pinged our admins to check on that. Once this block is open, we will have the first (fully functioning) on premises setup of usernetes at the lab! I will report back what we find….
vsoch pushed to converged-computing/flux-apps-helm
three programs.
and tears. today was hard.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch open issue rootless-containers/usernetes#372.
resolv.conf has incorrect address
Hi @AkihiroSuda ! We have our first on-premises setup of usernetes (this is a huge deal, getting all the plumbing setup!) and worked through an issue today with the pod not having any network connectivity (but the usernetes node did). What I noticed is that the coredns pod had these addresses:…View Comment
vsoch pushed to flux-framework/flux-operator
feat: expose dnsPolicy (#241)
- feat: expose dnsPolicy
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/usernetes-python
hours of failure. fun.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/usernetes-python
organize Dockerfile alongside crd
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
simplify ebpf image
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-apps-helm
add ebpf test program
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue hpc-maths/samurai#322.
Thank you! I should be able to test again this week! I’m epically flailing with ebpf in containers at the moment.
vsoch opened a pull request to converged-computing/flux-apps-helm
vsoch pushed to compspec/compat-lib
do not require provided mount path
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue DLR-AMR/t8code#1615.
Thanks @Davknapp ! I can definitely try again, although I don’t have many ideas, at least at the moment. We run these with the flux operator, which is deploying flux framework (an HPC workload manager and scheduler) within a closed space of pods in a Kubernetes cluster, so that means (depending on the cloud) we can use networks like Infiniband (Azure) and EFA (AWS), and the performance isn’t bad. For this set of tests we are in Google Cloud, which unfortunately is just optimized ethernet (they call it “Titanium” and the details aren’t revealed), but I’ve run over 25 applications and (at least for small sizes) they scale generally OK up until about 64 nodes. …
vsoch pushed to rseng/software
Merge pull request #422 from rseng/update/software-2025-05-11
Update from update/software-2025-05-11</small>
vsoch pushed to flux-framework/spack
Automated deployment to update flux-sched versions 2025-05-11 (#335)
Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/google-performance-study
update lammps plots
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to sciworks/spack-updater
Package underscore (#50)
- ensure we install with an underscore
- do not use zlib
- remove openslide, takes too long
- use package name to install
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue mpi4jax/mpi4jax#280.
Will be testing on multiple nodes later today - in the meantime I built the container and made the animation, it’s gorgeous!
vsoch pushed to flux-framework/spack
test branch that renames with underscore (#334)
- test branch that renames with underscore
- restore to main
- quick test
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>