vsoch commented on issue singularityhub/singularity-hpc#670.
If you have interest! I didn’t comment because it’s marked as a Draft. Let me know when it’s ready for review….
vsoch open issue flannel-io/flannel#2254.
bug: br_netfilter requirement prevents startup - not required in user space container
This is similar to https://github.com/flannel-io/flannel/issues/2068, but a different environment….View Comment
vsoch pushed to converged-computing/usernetes
on-premises: infiniband fullly working with this setup
Infiniband is working on TOSS 4.18.0-553.56.1.1toss.t4 based on RHEL 8.10. For this to work, most of the issue was with respect to network firewalls, kernel modules, and system security. Fixes here include needing to create unique CNI names for podman, add a flag to ignore preflight errors (for the old kernel) and update the flannel install to be before 0.25.x when a check for br_netfilter was added. This used to be part of kubeadm, and it was removed with K8s 1.30. It is not technically needed in the podman container (it is needed on the physical host) but since the check is done in the container, this will fail flannel from starting up. For the time being, we will use an older flannel, and I will open an issue on the repository to ask for the ability to disable the check.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue zenodo/zenodo#1606.
You know, it’s been enough time this would be a fairly straight forward task for an LLM. As long as you provide it the structure and data you need to populate it….
vsoch pushed to singularityhub/shpc-registry
Merge pull request #346 from singularityhub/update/containers-2025-06-26
[bot] update/containers-2025-06-26</small>
vsoch opened a pull request to rootless-containers/usernetes
vsoch open issue rootless-containers/usernetes#376.
Flannel doesn't see br_netfilter - expected?
Hi @AkihiroSuda. We are testing the latest (current master) of Usernetes, and flannel fails on deployment not seeing br_netfilter….View Comment
vsoch commented on issue rootless-containers/usernetes#374.
And I think it would be unlikely for multiple users to be using the same physical node with Usernetes. …
vsoch pushed to researchapps/usernetes
docs: note on order of starting components
flannel requires an annotation to use a host external ip for a multi-node setup. If the ip addresses that are in the private space can be routed between nodes (possible in some clouds) this is not an issue. It is only an issue in an HPC or similar environment where the private 10.x address might go to a router and not be understood (and dropped). We ran into this issue on our HPC system, and I realized it was because of the order of operations - we should make sync-external-ip first (adding the annotation) and then make install-flannel to use it. This would only be a bug for specific, multi-node environments.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch created a new branch, unique-network-name at researchapps/usernetes
vsoch open issue kubernetes-sigs/node-feature-discovery#2185.
Question: how to expose all features?
Hi! I’m hoping this is a simple question to answer. How do I get NFD to expose all labels in Kubernetes? When I install out of the box, there are somewhere are 70 labels, however when you just export the raw features there are close to 10k. I’d like to get all 10K exported for my cluster. An I assume these are a set that were deemed important (but was this experimentally derived)? …View Comment
vsoch pushed to singularityhub/shpc-registry
Merge pull request #345 from singularityhub/update/containers-2025-06-23
[bot] update/containers-2025-06-23</small>
vsoch pushed to rseng/software
Merge pull request #428 from rseng/update/software-2025-06-22
Update from update/software-2025-06-22</small>
vsoch pushed to converged-computing/aws-performance-study
interface: tweak descriptions
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #344 from singularityhub/update/containers-2025-06-20
[bot] update/containers-2025-06-20</small>
vsoch pushed to converged-computing/aws-performance-study
add fom per dollar
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue kubernetes-sigs/node-feature-discovery#2170.
The nfd worker generates a lot of logs (and other unexpected output) so I wasn’t sure how to cleanly do that. Whereas with writing the file explicitly, I can control exactly the contents….
vsoch commented on issue expfactory-experiments/attention-network-test#2.
What sticks out to me is defining “blocks” that are then looped through for trials. So possible reduce the sections there, and within each entity added to blocks, you can also delete….
vsoch pushed to flux-framework/Tutorials
isc 2025: add links to readme
vsoch commented on issue kubernetes/website#51164.
@graz-dev do you need any more clarification from us? …
vsoch pushed to converged-computing/aws-performance-study
data: heatmaps and models
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #342 from singularityhub/update/containers-2025-06-16
[bot] update/containers-2025-06-16</small>
vsoch created a new branch, gpu-variant-hpcg at converged-computing/flux-apps-helm
vsoch pushed to rseng/software
Merge pull request #427 from rseng/update/software-2025-06-15
Update from update/software-2025-06-15</small>
vsoch pushed to converged-computing/aws-performance-study
data: add more instance types
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-apps-helm
docs: add zenodo link
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/aws-performance-study
Merge pull request #6 from converged-computing/add-hpcg-runs
update ui data</small>
vsoch pushed to converged-computing/aws-performance-study
results: d3.4xlarge
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue vsoch/oci-python#24.
Thanks! …
vsoch commented on issue flux-framework/Tutorials#54.
It also looks like there is a “Slides coming soon!” message that can now be updated….
vsoch pushed to conda-forge/oras-py-feedstock
updated v0.2.37 (#40)
vsoch commented on issue vsoch/oci-python#24.
Apologies @CormickKneey I must have missed this email! Let’s test, and then we will want to bump the version and add a note to the corresponding CHANGELOG.md….
vsoch pushed to singularityhub/shpc-registry
Merge pull request #341 from singularityhub/update/containers-2025-06-12
[bot] update/containers-2025-06-12</small>
vsoch pushed to converged-computing/aws-performance-study
experiment t3a.2xlarge
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/aws-performance-study
experiment: m6i.12xlarge
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue oras-project/oras-py#208.
@tarilabs I do too! It’s a great (almost) colleague for someone that works mostly on their own. Instead of asking Google questions (and having still to parse through a bunch of links) it comes back in one consolidated response. The answers aren’t always great (it’s good to double check) but I find it a really handy tool. …
vsoch pushed to conda-forge/oras-py-feedstock
updated v0.2.35 (#39)
vsoch pushed to singularityhub/shpc-registry
Merge pull request #340 from singularityhub/update/containers-2025-06-09
[bot] update/containers-2025-06-09</small>
vsoch commented on issue oras-project/oras-py#209.
Sure, I’d be open to a PR that sets the lower bounds. …
vsoch pushed to converged-computing/aws-performance-study
bug: move location of docs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch closed a pull request to conda-forge/oras-py-feedstock
## What’s Changed
- [tributors] contributors/update-2025-05-31 by @github-actions in https://github.com/oras-project/oras-py/pull/203
- Add support for Docker credsStore and credHelpers by @rasmusfaber in https://github.com/oras-project/oras-py/pull/206
- [tributors] contributors/update-2025-06-08 by @github-actions in https://github.com/oras-project/oras-py/pull/207
- Ecr auth support by @rasmusfaber in https://github.com/oras-project/oras-py/pull/205
New Contributors
- @rasmusfaber made their first contribution in https://github.com/oras-project/oras-py/pull/206
Full Changelog: https://github.com/oras-project/oras-py/compare/0.2.33…0.2.34</small>View Comment
vsoch pushed to compspec/ocifit-k8s
Merge pull request #4 from compspec/add-unmarshall-spec
fix: unmarshall spec customization</small>
vsoch pushed to vsoch/node-feature-discovery
feat: add ability to export feature labels for static node
For the HPC use case, we want to be able to export static features without Kubernetes.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to kubernetes-sigs/node-feature-discovery
vsoch created a new branch, main at compspec/ocifit-k8s
vsoch pushed to singularityhub/shpc-registry
Merge pull request #339 from HasseJohansen/latest-should-also-be-in-tags
latest should also be defined in tags section</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #156 from flux-framework/release-docs-2025-06-05
Update from release-docs-2025-06-05</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #157 from flux-framework/release-docs-2025-06-06
Update from release-docs-2025-06-06</small>
vsoch opened a pull request to converged-computing/flux-apps-helm
vsoch pushed to converged-computing/aws-performance-study
wip: testing ml containers (#3)
- wip: testing ml containers
Signed-off-by: vsoch vsoch@users.noreply.github.com
- add gpu-fryer note - only intended for single nodes
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #338 from singularityhub/update/containers-2025-06-05
[bot] update/containers-2025-06-05</small>
vsoch commented on issue oras-project/oras-py#204.
> Do you mean to use one decorator to handle both …
vsoch pushed to rseng/software
Merge pull request #425 from rseng/update/software-2025-06-01
Update from update/software-2025-06-01</small>
vsoch commented on issue oras-project/oras-py#204.
> Because the ‘container’ arg’s position is different in get and put like methods, the two kinds of decorator allow user call with positional args. …
vsoch pushed to singularityhub/shpc-registry
Merge pull request #336 from singularityhub/update/containers-2025-06-02
[bot] update/containers-2025-06-02</small>
vsoch pushed to singularityhub/singularity-hpc
Test updated typos action (#692)
- Test updated typos action
- nit: spelling error exported
- nit: spelling error install guide
- example: remove global cli from google example</small>
vsoch commented on issue oras-project/oras-py#204.
The call that triggers the error doesn’t provide args, it provides the key word arguments “kwargs” because it explicitly says target=….
vsoch pushed to singularityhub/guts
feat: add support for trace with ldd (#9)
- feat: add support for trace with ldd
- dev: bump black version requirements
- remove centos
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue oras-project/oras-py#204.
Why are you calling this in one off functions instead of at the class init?…
vsoch pushed to converged-computing/google-performance-study
analysis: file access differences
the ebpf open information is likely most interesting when looking at differences between environments. E.g., for each command, cpu vs gpu, and then rocky vs ubuntu (holding mpi variant constant) and mpich vs. openmpi (holding base os constant).
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch opened a pull request to converged-computing/google-performance-study
vsoch pushed to compspec/ocifit
bump guts version
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch open issue flux-framework/flux-accounting#650.
limits: look into adding support for quota/queue limits based on compute hours
In cloud, it’s common to want to limit a user’s ability to run jobs on a specific instance type (e.g., one using GPUs). Speaking with @cmoussa1, we think the core metrics are already collecting / existing, and a little bit of extra exposure of configuration might make it work. Here are notes from our discussion:…View Comment
vsoch pushed to conda-forge/deid-feedstock
Merge pull request #50 from regro-cf-autotick-bot/0.4.3_hf289fb
deid v0.4.3</small>
vsoch pushed to compspec/ocifit
Merge pull request #2 from compspec/add-guts
add support for mpi paths</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #335 from singularityhub/update/containers-2025-05-29
[bot] update/containers-2025-05-29</small>
vsoch commented on issue pydicom/deid#278.
https://pypi.org/project/deid/0.4.3/…
vsoch pushed to flux-framework/Tutorials
Merge pull request #50 from milroy/isc25-k8s
ISC2025 K8s configurations, setup, and instructions</small>
vsoch pushed to flux-framework/Tutorials
Merge pull request #49 from flux-framework/update-jupyter-slim-containers
update jupyter to 4.2.0</small>
vsoch pushed to converged-computing/descriptive-pixi
environment: llama
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch created a new branch, main at compspec/ocifit
vsoch commented on issue pydicom/deid#279.
This looks good! Please bump the version in version.py and add a corresponding note to the CHANGELOG.md and we can merge….
vsoch commented on issue oras-project/oras-py#201.
Totally ok! Glad you found the issue….
vsoch commented on issue google/dranet#93.
We can definitely add support for the Flux Operator! How are the devices (?) exposed to the applications in a pod?…
vsoch created a new branch, main at converged-computing/descriptive-pixi
vsoch pushed to singularityhub/shpc-registry
Merge pull request #329 from singularityhub/update/containers-2025-05-26
[bot] update/containers-2025-05-26</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #330 from singularityhub/update/containers-2025-05-27
[bot] update/containers-2025-05-27</small>
vsoch commented on issue oras-project/oras-py#201.
I would look into exactly how the (working) oras in Go is making the call - it could be a nuanced difference in a header or similar. And see if the response turns anything back that hints about the issue….
vsoch pushed to converged-computing/flux-apps-helm
Merge pull request #35 from converged-computing/add-device-device-to-osu
osu example for device to device</small>
vsoch pushed to singularityhub/shpc-registry
manual update to remove 404 container digests for jupyter data science notebook
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch opened a pull request to converged-computing/flux-tutorials
vsoch commented on issue singularityhub/shpc-registry#327.
The issue here is your pull rate limit - you are the one doing the pulling. This is not under the jurisdiction of something I can control….
vsoch pushed to rseng/software
Merge pull request #424 from rseng/update/software-2025-05-25
Update from update/software-2025-05-25</small>
vsoch pushed to hpc-social/events
Add –ignore-installed
vsoch pushed to converged-computing/google-performance-study
mfem and samurai runs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-apps-helm
test: amg2023
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #328 from singularityhub/update/containers-2025-05-24
[bot] update/containers-2025-05-24</small>
vsoch open issue rootless-containers/usernetes#373.
Infiniband for older kernel
We’ve been able to get Infiniband working with Usernetes, primarily using UCX and then having the devices /dev/infiniband
bound from the host. We have a setup of usernetes on on-premises (our first on a production cluster and not in VMs alongside) and what I’ve found is the avenue to bind devices and then use ibverbs and ucx works up until the point it needs ulimit -l to be unlimited:…View Comment
vsoch pushed to hpc-social/jobs
Update update-jobs.yaml
vsoch pushed to converged-computing/usernetes-python
add user level services
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch commented on issue singularityhub/shpc-registry#327.
What do you suggest the shpc library do about a registry rate limit? I’m not sure there is anything in our control to change or help….